All CGCs 289,962 carbohydrate gene clusters
What is a CGC?
A carbohydrate-active enzyme gene cluster (CGC) is a run of co-localized genes containing at least one CAZyme or sulfatase together with its neighbours — transporters, peptidases, sigma/anti-sigma factors, transcriptional regulators — as predicted by dbCAN's CGCFinder. The CGC is the atomic unit of mpCGCdb: every enzyme family, community, and taxonomy view on this site is an aggregation over these clusters.
Scale
| Total CGCs | 289,962 |
|---|---|
| Total CGC genes | 2,448,645 |
| Genes per CGC | mean 8.4 · median 8 |
| Median genomic span | 9.2 kb |
| CGCs with >1 CAZyme | 75,115 (25.9%) |
| CGCs with ≥1 sulfatase | 35,041 (12.1%) |
| CGCs with ≥10 genes | 83,793 (28.9%) |
Genes per CGC
Find a CGC
Individual CGCs are not in the top-bar search. Jump straight to one by ID:
IDs are case-sensitive and follow the pattern
<MAG_id>_CGC<n> · stable URL:
https://mpcgcdb.com/cgcs/<cgc_id>/
Or browse into CGCs by context:
| By MAG | All MAGs → a MAG page → its CGC table → click a row |
|---|---|
| By enzyme family | CAZyme / sulfatase family → CGCs carrying that family |
| By taxonomy | GTDB tree → a taxon → its CGCs |
Explore example CGCs
Downloads
The complete per-gene CGC table (one row per gene in every CGC) is on Zenodo:
| Master table |
all_cgc_magmapped.tsv
· Zenodo record 20219287 |
|---|---|
| Per-CGC files |
https://mpcgcdb.com/cgcs/<cgc_id>/{proteins.tsv, sequences.faa} |
See the Downloads page for the full table index.