GOMC CGC Database Carbohydrate-active gene clusters across 24,195 marine MAGs

Enzyme family S1 enzyme

Download data: Page generated 2026-04-27 14:12

Overview — basic information

IdentifierS1
TypeEnzyme family
Number of CGCs33,540
Number of proteins (in associated CGCs)0
Number of S1 proteins (target-only)0
Source data filescgcs.txt · sequences.faa · proteins.tsv

Overview — data sources

CGC catalogall_cgc_magmapped.tsv · 2.45 M rows / 24,195 MAGs
All-vs-all DIAMOND873.6 M edges (e ≤ 1e-30, k 1000)
Annotation sourcedbCAN (CAZyme), SulfAtlas (sulfatases), TC/TF/STP/Peptidase metadata
GTDB taxonomygomc_metadata.tsv · mag_taxonomy.tsv
NMPF cross-referencegomc_nmpfs.tsv

Phylogeny

FastTree (Newick) not yet built. Once msa_fasttree.py finishes, this section will host an interactive phylotree.js viewer.

Most common taxonomy

Genome counts hosting at least one S1 CGC, drilled down through GTDB ranks. Coloured by the 11-group palette used in the rarefaction and UMAP figures (grey = "other").

Host taxonomic groups — CGCs per genome

Half-violin density distribution of CGCs per host genome (restricted to S1-containing CGCs), split by the 11 reference taxonomic groups used in the rarefaction figure. Median + IQR shown as black tick & bar; outliers (1.5×IQR) as small dots.

CGCs per host genome by taxonomic group

Gene neighborhood — co-occurring families

First-family token per protein (GH / PL / CE / S1 only). Top 30 of 571.

FamilyCount
S141,181
S1_64,401
S1_83,497
S1_73,090
S1_23,066
GH132,923
GH32,333
S1_42,244
S1_162,232
S1_152,184
GH21,840
S1_271,838
GH431,715
S1_111,697
S1_141,516
S1_371,388
GH291,175
GH161,100
S1_171,032
S1_91,030
GH201,006
GH92933
S1_22866
S1_97846
GH23844
S1_12839
S1_20798
S1_0782
GH5760
CE9755

SCoNe — co-occurrence network

SCoNe artefacts not yet built.

Associated CGCs

Showing first 80 of 33,540 CGCs. Download full list.

CGC IDMAG
BATS_SAMN07137085_METAG_DFMHGLCG_CGC10BATS_SAMN07137085_METAG_DFMHGLCG
BATS_SAMN07137085_METAG_DFMHGLCG_CGC13BATS_SAMN07137085_METAG_DFMHGLCG
BATS_SAMN07137085_METAG_DFMHGLCG_CGC7BATS_SAMN07137085_METAG_DFMHGLCG
BATS_SAMN07137085_METAG_DFMHGLCG_CGC8BATS_SAMN07137085_METAG_DFMHGLCG
BATS_SAMN07137116_METAG_OMDFKJGI_CGC9BATS_SAMN07137116_METAG_OMDFKJGI
BATS_SAMN07137118_METAG_HPJDOJIB_CGC1BATS_SAMN07137118_METAG_HPJDOJIB
BATS_SAMN07137118_METAG_HPJDOJIB_CGC14BATS_SAMN07137118_METAG_HPJDOJIB
BATS_SAMN07137118_METAG_HPJDOJIB_CGC15BATS_SAMN07137118_METAG_HPJDOJIB
BATS_SAMN07137118_METAG_HPJDOJIB_CGC2BATS_SAMN07137118_METAG_HPJDOJIB
BATS_SAMN07137118_METAG_HPJDOJIB_CGC9BATS_SAMN07137118_METAG_HPJDOJIB
BATS_SAMN08390925_METAG_BJAOANMI_CGC11BATS_SAMN08390925_METAG_BJAOANMI
BATS_SAMN08390925_METAG_BJAOANMI_CGC18BATS_SAMN08390925_METAG_BJAOANMI
BATS_SAMN08390925_METAG_BJAOANMI_CGC4BATS_SAMN08390925_METAG_BJAOANMI
BATS_SAMN08390925_METAG_BJAOANMI_CGC9BATS_SAMN08390925_METAG_BJAOANMI
BGEO_SAMN07136507_METAG_NAJFPCKG_CGC13BGEO_SAMN07136507_METAG_NAJFPCKG
BGEO_SAMN07136507_METAG_NAJFPCKG_CGC6BGEO_SAMN07136507_METAG_NAJFPCKG
BGEO_SAMN07136507_METAG_NAJFPCKG_CGC7BGEO_SAMN07136507_METAG_NAJFPCKG
BGEO_SAMN07136507_METAG_NAJFPCKG_CGC8BGEO_SAMN07136507_METAG_NAJFPCKG
BGEO_SAMN07136523_METAG_APAOEKCM_CGC1BGEO_SAMN07136523_METAG_APAOEKCM
BGEO_SAMN07136523_METAG_NMOKDPNF_CGC7BGEO_SAMN07136523_METAG_NMOKDPNF
BGEO_SAMN07136546_METAG_CABEIBHM_CGC1BGEO_SAMN07136546_METAG_CABEIBHM
BGEO_SAMN07136583_METAG_OEAOIHAM_CGC13BGEO_SAMN07136583_METAG_OEAOIHAM
BGEO_SAMN07136583_METAG_OEAOIHAM_CGC15BGEO_SAMN07136583_METAG_OEAOIHAM
BGEO_SAMN07136583_METAG_OEAOIHAM_CGC16BGEO_SAMN07136583_METAG_OEAOIHAM
BGEO_SAMN07136583_METAG_OEAOIHAM_CGC2BGEO_SAMN07136583_METAG_OEAOIHAM
BGEO_SAMN07136681_METAG_IEGHCJCC_CGC6BGEO_SAMN07136681_METAG_IEGHCJCC
BGEO_SAMN07136689_METAG_KJMIPNPG_CGC2BGEO_SAMN07136689_METAG_KJMIPNPG
BGEO_SAMN07136708_METAG_NLOEKJNE_CGC2BGEO_SAMN07136708_METAG_NLOEKJNE
BGEO_SAMN07136709_METAG_EEKAFEOM_CGC2BGEO_SAMN07136709_METAG_EEKAFEOM
BGEO_SAMN07136710_METAG_IMMBNPOM_CGC10BGEO_SAMN07136710_METAG_IMMBNPOM
BGEO_SAMN07136710_METAG_IMMBNPOM_CGC11BGEO_SAMN07136710_METAG_IMMBNPOM
BGEO_SAMN07136788_METAG_AFBPEAAE_CGC21BGEO_SAMN07136788_METAG_AFBPEAAE
BGEO_SAMN07136788_METAG_AFBPEAAE_CGC24BGEO_SAMN07136788_METAG_AFBPEAAE
BGEO_SAMN07136898_METAG_IGLCDHNE_CGC1BGEO_SAMN07136898_METAG_IGLCDHNE
BGEO_SAMN07136920_METAG_MGPENMLJ_CGC6BGEO_SAMN07136920_METAG_MGPENMLJ
BGEO_SAMN07136920_METAG_MGPENMLJ_CGC7BGEO_SAMN07136920_METAG_MGPENMLJ
BGEO_SAMN07136937_METAG_ALPGGBFO_CGC13BGEO_SAMN07136937_METAG_ALPGGBFO
BGEO_SAMN07136937_METAG_ALPGGBFO_CGC7BGEO_SAMN07136937_METAG_ALPGGBFO
BGEO_SAMN07136947_METAG_PEBNHIGB_CGC5BGEO_SAMN07136947_METAG_PEBNHIGB
BGEO_SAMN07136947_METAG_PNJGMBLN_CGC11BGEO_SAMN07136947_METAG_PNJGMBLN
BGEO_SAMN07136947_METAG_PNJGMBLN_CGC4BGEO_SAMN07136947_METAG_PNJGMBLN
BGEO_SAMN07136947_METAG_PNJGMBLN_CGC6BGEO_SAMN07136947_METAG_PNJGMBLN
BGEO_SAMN07136957_METAG_PDLAKDGF_CGC17BGEO_SAMN07136957_METAG_PDLAKDGF
BGEO_SAMN07136957_METAG_PDLAKDGF_CGC18BGEO_SAMN07136957_METAG_PDLAKDGF
BGEO_SAMN07136957_METAG_PDLAKDGF_CGC19BGEO_SAMN07136957_METAG_PDLAKDGF
BGEO_SAMN07136957_METAG_PDLAKDGF_CGC21BGEO_SAMN07136957_METAG_PDLAKDGF
BGEO_SAMN07136957_METAG_PDLAKDGF_CGC22BGEO_SAMN07136957_METAG_PDLAKDGF
BGEO_SAMN07136958_METAG_IPBGLHHA_CGC5BGEO_SAMN07136958_METAG_IPBGLHHA
BGEO_SAMN07136960_METAG_CGNNLGDJ_CGC3BGEO_SAMN07136960_METAG_CGNNLGDJ
GCA_000009785.1_ASM978v1_genomic_CGC14GCA_000009785.1_ASM978v1_genomic
GCA_000009785.1_ASM978v1_genomic_CGC16GCA_000009785.1_ASM978v1_genomic
GCA_000009785.1_ASM978v1_genomic_CGC17GCA_000009785.1_ASM978v1_genomic
GCA_000009785.1_ASM978v1_genomic_CGC20GCA_000009785.1_ASM978v1_genomic
GCA_000009785.1_ASM978v1_genomic_CGC9GCA_000009785.1_ASM978v1_genomic
GCA_000012805.1_ASM1280v1_genomic_CGC12GCA_000012805.1_ASM1280v1_genomic
GCA_000012805.1_ASM1280v1_genomic_CGC15GCA_000012805.1_ASM1280v1_genomic
GCA_000023865.1_ASM2386v1_genomic_CGC22GCA_000023865.1_ASM2386v1_genomic
GCA_000023865.1_ASM2386v1_genomic_CGC24GCA_000023865.1_ASM2386v1_genomic
GCA_000023865.1_ASM2386v1_genomic_CGC26GCA_000023865.1_ASM2386v1_genomic
GCA_000023865.1_ASM2386v1_genomic_CGC30GCA_000023865.1_ASM2386v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC10GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC12GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC13GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC14GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC2GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC27GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC29GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC4GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC5GCA_000025905.1_ASM2590v1_genomic
GCA_000025905.1_ASM2590v1_genomic_CGC9GCA_000025905.1_ASM2590v1_genomic
GCA_000092205.1_ASM9220v1_genomic_CGC15GCA_000092205.1_ASM9220v1_genomic
GCA_000092205.1_ASM9220v1_genomic_CGC20GCA_000092205.1_ASM9220v1_genomic
GCA_000092205.1_ASM9220v1_genomic_CGC5GCA_000092205.1_ASM9220v1_genomic
GCA_000144645.1_ASM14464v1_genomic_CGC5GCA_000144645.1_ASM14464v1_genomic
GCA_000147355.1_ASM14735v1_genomic_CGC3GCA_000147355.1_ASM14735v1_genomic
GCA_000147355.1_ASM14735v1_genomic_CGC4GCA_000147355.1_ASM14735v1_genomic
GCA_000147355.1_ASM14735v1_genomic_CGC7GCA_000147355.1_ASM14735v1_genomic
GCA_000148425.1_ASM14842v1_genomic_CGC1GCA_000148425.1_ASM14842v1_genomic
GCA_000148425.1_ASM14842v1_genomic_CGC13GCA_000148425.1_ASM14842v1_genomic
GCA_000148425.1_ASM14842v1_genomic_CGC4GCA_000148425.1_ASM14842v1_genomic