Basecamp Research, a cutting-edge AI lab focused on biological design, has unveiled the Trillion Gene Atlas—an ambitious scientific program aimed at generating and modeling genomic data at an unprecedented trillion-gene scale. Developed in collaboration with Anthropic, Ultima Genomics, and PacBio, and supported by NVIDIA’s AI infrastructure, the initiative seeks to increase known genetic diversity by 100 times. It plans to gather genomic information from over 100 million species across thousands of global locations.
This effort builds on Basecamp Research’s expanding network of biodiversity partners worldwide. The long-term vision is to create a vast and diverse dataset that allows AI systems to learn from evolution and enable the on-demand design of new medicines.
Speaking at SXSW in Austin, Co-founder and CEO Glen Gowers noted that current biological AI models rely on a limited representation of Earth’s biodiversity. He explained that the Trillion Gene Atlas will dramatically expand the genetic landscape available for analysis, introducing a new era of programmable therapeutic design powered by large-scale data.
Comparable in scope to the Human Genome Project, the initiative was introduced during SXSW’s Health Track and at the NVIDIA GTC conference in San Jose.
Tackling the Biological Data Gap
Despite rapid growth in model size and computational capabilities, progress in AI-driven drug discovery has been constrained by limited data diversity. Most existing sequence-based models depend heavily on a small set of public databases, with a large portion trained on fewer than 250 million genetic sequences.
To address this, Basecamp Research introduced its EDEN foundation models earlier this year. These models are trained entirely on BaseData™, a proprietary genomic dataset that exceeds the size of all public repositories combined. By incorporating over 10 billion previously unknown genes from one million newly identified species, EDEN has revealed new scaling principles for AI in biology.
This expansion has enabled EDEN to move beyond prediction, allowing it to design therapeutics directly from disease prompts. In laboratory tests, the model demonstrated zero-shot functionality in human T-cells without relying on clinical or human-derived data. It has also produced promising results across multiple advanced applications, including AI-driven gene insertion and the creation of targeted antimicrobial peptides with high success rates.
The Trillion Gene Atlas builds on this foundation by significantly increasing both the scale and contextual richness of genomic data available for AI training.
Expanding a Global Biodiversity Network
Over the past six years, Basecamp Research has established a network of scientific collaborators spanning 31 countries. This has enabled the development of a scalable genomics pipeline designed specifically for AI applications. Using innovative regulatory frameworks and off-grid DNA sequencing technologies, the company is able to collect high-quality genetic data from remote ecosystems often inaccessible to traditional labs.
These partnerships emphasize knowledge sharing, local capacity building, and fair access and benefit-sharing agreements aligned with emerging global standards. As part of the Atlas initiative, new collaborations have been announced in Chile and Argentina, along with expanded research efforts in Antarctica.
Advancing Sequencing and Computing Capabilities
The Trillion Gene Atlas is made possible by breakthroughs in high-throughput sequencing and accelerated computing. Partnerships with Ultima Genomics and PacBio enable large-scale sequencing, including highly accurate long-read data that preserves detailed genomic context.
Ultima’s latest sequencing platform, the UG200 Series, is designed for industrial-scale genome and multi-omics sequencing at lower costs, making projects like the Atlas feasible. Meanwhile, PacBio’s HiFi sequencing technology provides precise, information-rich data critical for training advanced biological AI systems.
NVIDIA’s computing infrastructure will power the processing of massive genomic datasets at the petabase level. By leveraging tools like NVIDIA Parabricks, Basecamp aims to dramatically accelerate metagenomic analysis. Tasks that previously could have taken over two decades are now expected to be completed in under two years through parallel processing, automation, and large-scale model training.
Toward End-to-End AI-Driven Therapeutic Design
Anthropic is contributing to the initiative by integrating its AI system, Claude, with scientific platforms. The goal is to combine Claude’s reasoning capabilities with EDEN’s therapeutic design functions and NVIDIA’s data processing tools to create a seamless workflow—from interpreting complex biological data to generating targeted treatments.
Built on three core pillars—large-scale DNA sequencing, global data partnerships, and advanced computing—the Trillion Gene Atlas represents a major step toward transforming how biological data is used. By expanding evolutionary datasets 100-fold, Basecamp Research aims to accelerate drug discovery, improve precision in therapeutic design, and extend advances in areas such as gene therapy and antimicrobial resistance.
This effort builds on Basecamp Research’s expanding network of biodiversity partners worldwide. The long-term vision is to create a vast and diverse dataset that allows AI systems to learn from evolution and enable the on-demand design of new medicines.
Speaking at SXSW in Austin, Co-founder and CEO Glen Gowers noted that current biological AI models rely on a limited representation of Earth’s biodiversity. He explained that the Trillion Gene Atlas will dramatically expand the genetic landscape available for analysis, introducing a new era of programmable therapeutic design powered by large-scale data.
Comparable in scope to the Human Genome Project, the initiative was introduced during SXSW’s Health Track and at the NVIDIA GTC conference in San Jose.
Tackling the Biological Data Gap
Despite rapid growth in model size and computational capabilities, progress in AI-driven drug discovery has been constrained by limited data diversity. Most existing sequence-based models depend heavily on a small set of public databases, with a large portion trained on fewer than 250 million genetic sequences.
To address this, Basecamp Research introduced its EDEN foundation models earlier this year. These models are trained entirely on BaseData™, a proprietary genomic dataset that exceeds the size of all public repositories combined. By incorporating over 10 billion previously unknown genes from one million newly identified species, EDEN has revealed new scaling principles for AI in biology.
This expansion has enabled EDEN to move beyond prediction, allowing it to design therapeutics directly from disease prompts. In laboratory tests, the model demonstrated zero-shot functionality in human T-cells without relying on clinical or human-derived data. It has also produced promising results across multiple advanced applications, including AI-driven gene insertion and the creation of targeted antimicrobial peptides with high success rates.
The Trillion Gene Atlas builds on this foundation by significantly increasing both the scale and contextual richness of genomic data available for AI training.
Expanding a Global Biodiversity Network
Over the past six years, Basecamp Research has established a network of scientific collaborators spanning 31 countries. This has enabled the development of a scalable genomics pipeline designed specifically for AI applications. Using innovative regulatory frameworks and off-grid DNA sequencing technologies, the company is able to collect high-quality genetic data from remote ecosystems often inaccessible to traditional labs.
These partnerships emphasize knowledge sharing, local capacity building, and fair access and benefit-sharing agreements aligned with emerging global standards. As part of the Atlas initiative, new collaborations have been announced in Chile and Argentina, along with expanded research efforts in Antarctica.
Advancing Sequencing and Computing Capabilities
The Trillion Gene Atlas is made possible by breakthroughs in high-throughput sequencing and accelerated computing. Partnerships with Ultima Genomics and PacBio enable large-scale sequencing, including highly accurate long-read data that preserves detailed genomic context.
Ultima’s latest sequencing platform, the UG200 Series, is designed for industrial-scale genome and multi-omics sequencing at lower costs, making projects like the Atlas feasible. Meanwhile, PacBio’s HiFi sequencing technology provides precise, information-rich data critical for training advanced biological AI systems.
NVIDIA’s computing infrastructure will power the processing of massive genomic datasets at the petabase level. By leveraging tools like NVIDIA Parabricks, Basecamp aims to dramatically accelerate metagenomic analysis. Tasks that previously could have taken over two decades are now expected to be completed in under two years through parallel processing, automation, and large-scale model training.
Toward End-to-End AI-Driven Therapeutic Design
Anthropic is contributing to the initiative by integrating its AI system, Claude, with scientific platforms. The goal is to combine Claude’s reasoning capabilities with EDEN’s therapeutic design functions and NVIDIA’s data processing tools to create a seamless workflow—from interpreting complex biological data to generating targeted treatments.
Built on three core pillars—large-scale DNA sequencing, global data partnerships, and advanced computing—the Trillion Gene Atlas represents a major step toward transforming how biological data is used. By expanding evolutionary datasets 100-fold, Basecamp Research aims to accelerate drug discovery, improve precision in therapeutic design, and extend advances in areas such as gene therapy and antimicrobial resistance.