Downloads 21 bulk tables · 1.6 GB total · CC-BY 4.0
Bulk tables hosted on Zenodo
All tables below are mirrored to a citable Zenodo record so downloads don't burn through this site's bandwidth and so the dataset gets a permanent DOI for the manuscript.
Record: https://zenodo.org/records/20219287
Each "download" link below resolves directly to that file on Zenodo.
Primary tables 4 files
All CGC gene rows
MAG metadata
MAG → GTDB taxonomy
GOMC sample metadata
Sequence clustering / SSN 8 files
Leiden communities (r = 1.0, primary)
Leiden communities (r = 0.5)
Leiden communities (r = 4.0)
Protein → CGC map
Per-protein annotation
Protein functional category
Sequence-Colocalization Network (SCoNe) edges
SCoNe layout coordinates
Community-level summaries 2 files
Community representatives
InterPro architectures per community
Annotation references 4 files
CAZy family metadata
SulfAtlas family metadata
SulfAtlas curated EC / activity
EC → function map
Representatives & FASTAs 3 files
Representative-protein registry
Non-catalytic CGCs
NMPF (novel metagenomic protein family) catalog
Per-entity downloads
Each entity page exposes its own slice of the data. To bulk-collect
family- or MAG-specific files, point a recursive download tool
(e.g. wget, curl) at the matching directory:
| Per-family |
https://mpcgcdb.com/<family>/{cgcs.txt, ranked_families.tsv}
— e.g. GH13/cgcs.txt. |
|---|---|
| Per-MAG |
https://mpcgcdb.com/genomes/<bin_id>/{cgcs.txt, proteins.tsv, ranked_families.tsv} |
Protein sequences (FASTA) are not served per-entity — the full proteome is available from the Global Ocean Microbiome Catalog (download the protein catalog from that page).
Filenames are stable across releases. Diffs between versions will be tracked at GitHub releases.