Home · All CGCs

All CGCs 289,962 carbohydrate gene clusters

What is a CGC?

A carbohydrate-active enzyme gene cluster (CGC) is a run of co-localized genes containing at least one CAZyme or sulfatase together with its neighbours — transporters, peptidases, sigma/anti-sigma factors, transcriptional regulators — as predicted by dbCAN's CGCFinder. The CGC is the atomic unit of mpCGCdb: every enzyme family, community, and taxonomy view on this site is an aggregation over these clusters.

Scale

Total CGCs289,962
Total CGC genes2,448,645
Genes per CGCmean 8.4 · median 8
Median genomic span9.2 kb
CGCs with >1 CAZyme75,115 (25.9%)
CGCs with ≥1 sulfatase35,041 (12.1%)
CGCs with ≥10 genes83,793 (28.9%)

Genes per CGC distribution across all 289,962 clusters

2
2,821 (1.0%)
3
6,073 (2.1%)
4
14,409 (5.0%)
5
18,284 (6.3%)
6
49,287 (17.0%)
7
44,942 (15.5%)
8
42,152 (14.5%)
9
28,201 (9.7%)
10
22,634 (7.8%)
11
15,666 (5.4%)
12
11,706 (4.0%)
13
9,427 (3.3%)
14
6,274 (2.2%)
15
5,054 (1.7%)
16
3,426 (1.2%)
17
2,405 (0.8%)
18
1,740 (0.6%)
19
1,315 (0.5%)
20-25
3,284 (1.1%)
26+
862 (0.3%)

Find a CGC

Individual CGCs are not in the top-bar search. Jump straight to one by ID:

IDs are case-sensitive and follow the pattern <MAG_id>_CGC<n> · stable URL: https://mpcgcdb.com/cgcs/<cgc_id>/

Or browse into CGCs by context:

By MAGAll MAGs → a MAG page → its CGC table → click a row
By enzyme familyCAZyme / sulfatase family → CGCs carrying that family
By taxonomyGTDB tree → a taxon → its CGCs

Downloads

The complete per-gene CGC table (one row per gene in every CGC) is on Zenodo:

Master table all_cgc_magmapped.tsv · Zenodo record 20219287
Per-CGC files https://mpcgcdb.com/cgcs/<cgc_id>/{proteins.tsv, sequences.faa}

See the Downloads page for the full table index.