From Language Models to Matter Engines: ALCHEMI and Evo

Generative AI is starting to design the physical world, not just describe it. NVIDIA’s new ALCHEMI platform for chemistry and materials and the Evo family of biological foundation models both aim at the same frontier: turning statistical patterns in data into new molecules, materials, and proteins that can be built and tested in the real world. These emerging matter engines, built on large chemical and biological models, mark a subtle but consequential shift—from tools that talk about science to systems that participate in it.

That shift is still early and bounded by data quality, simulation limits, and wet-lab bottlenecks. But the combination of domain-tuned architectures, high-performance compute, and maturing governance debates suggests that large chemical and biological models are moving from proof-of-concept into the fabric of R&D.

Table of Contents

Why Generative AI Is Moving From Words to Molecules

The generative wave started with models that operate on digital symbols: GPT-style language models that predict the next token, and image generators that compose pixels. These systems normalized the idea that, given enough data and parameters, a model can internalize a domain’s statistics well enough to produce convincing new samples. Yet for all their impact, these models mainly changed information workflows—how we write, code, or illustrate.

In effect, generative AI is evolving from language models that manipulate symbols into matter engines that propose candidates for synthesis in chemistry and biology. ALCHEMI and Evo represent this new category. ALCHEMI is positioned by NVIDIA as a stack of GPU-accelerated microservices and models for chemistry and materials discovery, tying generative design directly into molecular simulations and property predictors (see NVIDIA’s technical overview of ALCHEMI). Evo, by contrast, is a large biological model trained on bacterial and other genomes, built to model and generate DNA and protein sequences with functional consequences in cells (as described in coverage of Evo’s debut).

From Chatbots to Catalysts and CRISPR Design

Earlier foundation models mostly mapped from one digital representation to another: text to text, text to image, code to code. Their mistakes, while consequential, remained symbolic. A badly worded email or incorrect code snippet is embarrassing or costly, but not intrinsically a new physical object.

ALCHEMI and Evo close that gap. In an ALCHEMI workflow, a generative model might propose new catalyst structures, electrolyte molecules, or crystal lattices. Those candidates are then passed through GPU-accelerated simulations—such as density functional theory (DFT) surrogates or molecular dynamics approximations—to estimate properties like stability, conductivity, or adsorption energy. The top-ranked designs can then move into experimental synthesis. In Evo’s case, the outputs are nucleotide or amino acid sequences: candidate proteins, CRISPR components, or genes that can be synthesized and expressed in cells, with direct biological effects.

As outputs become synthesizable, error modes become more serious. A hallucinated paragraph is one thing; a hallucinated protein that misfolds or interacts unpredictably with human biology is another. That is why both lines of work are deeply entangled with simulation, validation, and safety disciplines rather than existing as free-floating generative toys.

How Hardware, Simulators, and Biological Data Converge

The timing of this shift is not accidental. NVIDIA has spent years building GPU-optimized libraries for quantum chemistry, molecular dynamics, and graph neural networks, and ALCHEMI packages these into a more coherent stack. The platform wraps property predictors, geometry relaxation tools, and ML-based interatomic potentials into microservices that can be composed into generate–simulate–refine loops, exploiting the parallel nature of GPUs to evaluate many candidates quickly (see NVIDIA’s materials discovery case studies).

On the biology side, sequencing costs have fallen enough that datasets of millions of genomes—especially bacterial and phage genomes—are now available. Evo models are trained on trillions of DNA bases, letting them observe how evolution has explored sequence space and what tends to work or fail functionally (as outlined in Ars Technica’s Evo report). Architectures such as StripedHyena, which blend convolutional and attention-like mechanisms, give these models the context length needed to process large genomic regions while keeping compute manageable.

For both chemistry and biology, the key infrastructure story is the same: abundant data, specialized simulators, and scalable compute form a tripod. Without any one of these legs, generative models would either lack grounding, be too slow to evaluate, or be unable to ingest enough structure to generalize.

Inside NVIDIA ALCHEMI: A Generative Stack for Chemistry and Materials

ALCHEMI is best understood not as a single model but as a platform. It combines pretrained property predictors, relaxation engines, and ML surrogate models with orchestrated workflows that sit on top of NVIDIA’s GPU cloud. ALCHEMI functions as a chemistry-focused matter engine: a generative AI stack that proposes and evaluates new molecules and materials. Developers access these services via standardized APIs, allowing them to plug generative design loops into existing computational chemistry pipelines.

How ALCHEMI Accelerates Exploration of Chemical Space

Chemical and materials discovery is fundamentally a search problem in a vast combinatorial space. Even modest organic molecules can be arranged in astronomically many configurations; crystalline materials and alloys multiply that further. Historically, researchers have narrowed this space using human intuition, simple heuristics, and limited simulation capacity.

ALCHEMI tackles this by pairing generative models with accelerated evaluation. A model can propose new molecules or structures based on learned patterns from prior data. These candidates feed into GPU-accelerated simulations—such as batched geometry relaxation or ML interatomic potentials—that quickly approximate energetics and properties. Instead of brute-forcing every plausible variant, the system focuses on more promising regions, using iterative feedback from simulation results to steer the generative process.

Early industrial partners report screening scales that were previously inaccessible on reasonable timescales, mapping hundreds of thousands of molecules in hours and exploring candidate spaces in the tens of millions for materials like battery electrolytes and catalysts (see NVIDIA case studies on battery materials and OLED compounds). These are still simulations, not experiments, but they narrow down what needs to be synthesized.

How ALCHEMI Blends Physics-Based and Data-Driven Models

A recurring criticism of purely data-driven chemistry models is that they can propose molecules that are formally valid but physically implausible or synthetically unreachable. ALCHEMI explicitly leans on hybrid workflows to mitigate this. Physics-based solvers—DFT, classical molecular dynamics, continuum models—are plugged into the loop as constraints and reality checks. Machine-learning surrogates stand in for the heaviest parts of those solvers where appropriate, trading a bit of accuracy for massive speed gains.

The philosophy is not “let the model dream up anything” but “let the model propose within guardrails informed by physics.” This hybrid approach also helps address out-of-distribution issues: when a candidate strays into chemical territory poorly covered by training data, a more first-principles solver can offer an independent assessment, flagging surprising or suspicious results for further scrutiny.

Early ALCHEMI Use Cases: Catalysts, Batteries, and More

The application domains highlighted so far align with large economic and climate stakes. Catalyst design, for example, underpins industrial processes from fertilizer production to emissions control; small improvements in activity or selectivity can translate into substantial energy savings. Battery research—especially solid-state electrolytes and novel cathode materials—is a natural fit for ALCHEMI-style workflows, since safety, lifetime, and performance all depend on subtle materials properties.

Electronics and display companies are turning to AI-guided materials search for new organic emitters, substrates, and encapsulants that balance efficiency, stability, and manufacturability. More speculative but plausible extensions include lightweight alloys for aerospace, high-temperature composites, and polymers tailored for recyclability. In each case, the aim is not to autonomously discover miracle materials overnight, but to meaningfully compress the exploratory phase.

Evo and the Rise of Large Biological Foundation Models

In genomics and protein science, Evo represents a parallel move toward domain-specific foundation models. Rather than predicting the next word, Evo predicts and generates biological sequences, using evolution’s record as its training corpus. Evo extends the matter engine concept into biology, using large biological models to generate functional DNA and protein sequences.

How Evo Is Trained on Bacterial Genomes

Early Evo models were trained primarily on bacterial and phage genomes, chosen for their diversity and compactness. Bacteria evolve quickly, and their genomes encode dense functional information: enzymes, regulatory elements, defense systems, and more. This gives a model many examples of how sequence changes translate into fitness consequences.

With on the order of billions of parameters, Evo can model long-range dependencies across genes and operons, capturing how combinations of motifs contribute to function. Subsequent versions have expanded training data to include genomes from multiple domains of life and paired DNA–protein information, enabling cross-modal reasoning about how genomic variations propagate through transcription and translation.

How Evo Generates Novel Proteins and CRISPR Systems

Where earlier biological models focused on tasks like structure prediction or variant effect scoring, Evo pushes into generative design. Reports describe the model generating de novo proteins and CRISPR-associated components that do not match known natural sequences but are predicted—and in some cases experimentally validated—to perform desired functions (see discussion in Ars Technica’s coverage).

This is qualitatively different from annotating existing genomes. Evo treats the space of possible sequences as a design canvas: given a target property or functional description, it can propose candidates that evolution has not yet explored, or that have not been cataloged. For CRISPR systems, this could mean nucleases with different PAM compatibilities or off-target profiles; for enzymes, it might mean improved stability, altered substrate specificity, or novel catalytic activities.

From Sequence Prediction to Functional Biological Design

The conceptual leap here mirrors the trajectory in language models. Once models were good at predicting the next token, they could be prompted to produce structured, task-conditioned outputs. In biology, once models can accurately predict which mutations are tolerated and which break function, they can be inverted: instead of scoring random variants, they can search for sequences that maximize a desired score.

Evo is part of that inversion. It moves beyond predicting protein structures from sequences, as systems like AlphaFold do, toward generating sequences that are likely to express, fold, and act in particular ways. Genomes become not just a static reference but a generative space that researchers can navigate. How reliably that navigation translates into lab success is still being mapped, but the underlying direction is clear.

How Large Chemical and Biological Models Change R&D Practice

The practical consequence of these matter engines is a rebalancing of effort in the design–make–test cycle. Instead of spending most time on generating and synthesizing candidates, researchers can shift more effort toward interpreting a smaller, higher-yield experimental set.

How Generative Models Collapse the Design–Make–Test Cycle

Generative models enable far larger virtual screens than any physical lab could handle. In an ALCHEMI workflow, billions of hypothetical molecules might be sketched and quickly filtered; only the most promising few hundred move forward for synthesis. In an Evo-enabled lab, thousands of candidate protein sequences can be generated in silico, triaged by predicted stability or activity, and then synthesized in focused libraries.

This compression does not eliminate the need for experiments—simulations and predictions can be wrong or incomplete—but it changes their role. Experiments become a calibration and discovery step rather than the primary means of exploring the space. Over time, results from these focused experiments can be fed back into the models, tightening the loop.

In practice, today’s matter engines remain advisory systems whose value depends on expert-guided interpretation and careful experimental design. Early users describe time savings measured in months shaved off exploratory campaigns, as well as reduced reagent costs when high-throughput wet-lab screening is replaced by narrower, higher-yield experiment sets.

From Heuristics to Model-Driven Hypothesis Generation

Historically, chemists and molecular biologists have relied heavily on heuristics: functional groups that “usually” confer solubility, motifs that “often” bind a receptor, sequence patterns known to stabilize a fold. These rules of thumb are powerful but limited. Models trained across millions of compounds or sequences can identify patterns beyond human intuition, surfacing unconventional scaffolds or sequence motifs that correlate with desired properties.

This does not make experts obsolete; it shifts their role. Instead of manually enumerating possibilities, researchers increasingly curate, challenge, and refine model-suggested hypotheses. When an AI system proposes a non-intuitive catalyst architecture or a surprising protein loop, the scientific task becomes understanding why it might work and how to test it safely.

How Matter Engines Lower Barriers for Labs and Startups

One of the underappreciated implications of these platforms is democratization. Large pharmaceutical and materials companies have long run their own supercomputing clusters and simulation groups. ALCHEMI, as a cloud-exposed stack, and Evo-like models, as open or licensable weights, lower the entry barrier for smaller players.

Startups and academic labs can access property predictors, generative chemistry models, or biological foundation models without assembling massive HPC teams. They still need domain expertise to design meaningful experiments and interpret outputs, but the baseline tooling becomes more accessible. That, in turn, could redistribute where breakthrough discoveries originate, amplifying ecosystems built around shared platforms rather than standalone giants.

Limits of Today’s Chemical and Biological Matter Engines

Despite the momentum, large chemical and biological models remain constrained by core scientific and infrastructural limits.

Data Quality, Coverage, and Bias in Generative Matter Models

Generative performance is only as good as the training and calibration data. Chemistry datasets overrepresent certain functional groups, solvents, and conditions; genomic datasets reflect sampling biases in which organisms are sequenced and how thoroughly their functions are annotated. In both cases, rare chemistries or motifs are underexplored.

Models trained on such data can hallucinate: they may output chemically valid but nonfunctional molecules, or proteins that look plausible but fail to express or fold. Worse, they may systematically underperform for underrepresented classes of targets, reproducing and potentially amplifying gaps in scientific attention. Careful dataset curation and active-learning loops that deliberately probe uncertain regions of space are necessary to counteract these tendencies.

Bridging the Simulation–Reality Gap in Lab Results

Physics-based simulations and ML surrogates are approximations. Quantum chemistry methods make trade-offs between accuracy and tractability; force fields used in molecular dynamics may not capture all relevant interactions; cellular environments are vastly more complex than test tubes. A candidate that looks promising in silico can fail in practice due to solubility issues, toxicity, off-target effects, or emergent behavior in complex systems.

This simulation–reality gap is especially pronounced in biology, where emergent properties, context-dependent regulation, and host interactions can invalidate sequence-level predictions. Evo’s success in zero-shot variant effect prediction is promising, but every new domain—new cell types, organisms, disease contexts—poses fresh generalization tests. Wet-lab validation remains non-negotiable.

Compute, Infrastructure, and Expertise for Running Matter Engines

Even as models become more efficient, they still demand substantial compute and MLOps sophistication. Running an ALCHEMI-scale workflow or fine-tuning an Evo-like model is not something a single laptop can handle; organizations need access to GPUs, storage for large datasets, and tooling for experiment tracking and deployment.

Equally important is human expertise. Interpreting model outputs, understanding uncertainty estimates, and embedding AI into experimental design all require teams that straddle machine learning, domain science, and software engineering. The limiting factor for many organizations will not be API access but talent and process.

Strategic Implications for Pharma, Biotech, and Materials

As these models embed in practice, they are likely to reshape how R&D is organized, how intellectual property is defined, and how platform advantages accrue.

Rethinking R&D Pipelines and Portfolio Strategy with Matter Engines

Pharma and materials companies are beginning to reposition generative AI near the front of their pipelines. Instead of starting with a narrow set of leads from traditional screening, they can run AI-guided exploration to identify a broader range of candidates, then use downstream assays to prioritize and de-risk. This can change which targets look tractable and how quickly programs are advanced or stopped.

Over time, firms that build robust generate–simulate–test–learn loops could cycle through ideas more rapidly, reallocating capital based on earlier and richer information. That might reduce the fraction of R&D spend wasted on dead ends, but it could also encourage a more experimental, portfolio-style approach to therapeutic and materials innovation.

Navigating IP for AI-Designed Molecules and Proteins

Generative design also complicates intellectual property. Patent systems were not built with AI co-inventors in mind. When a novel protein or material emerges from an Evo-like or ALCHEMI-based workflow trained partly on public data, questions arise: who is the inventor, and is the result considered obvious given the prior art? How much transparency about the model and training data is needed in a patent disclosure?

Courts and patent offices are only beginning to grapple with these issues, and answers will likely vary by jurisdiction. Companies are already experimenting with different strategies: treating AI as a tool and emphasizing the human’s role in framing the problem and validating outputs, or keeping more of the model details as trade secrets rather than seeking broad patents that might be vulnerable to obviousness challenges.

Data Network Effects and Platform Lock-In in Matter Engines

Platforms like ALCHEMI and Evo exhibit strong data network effects. Each cycle of use generates new experimental results that can be fed back into the model, improving performance on the specific types of problems an organization cares about. Firms that integrate these loops deeply and early, especially with proprietary assay data, may see their models diverge in capability from competitors using only public data.

This creates a strategic tension: the same openness that accelerates early adoption—open-source models, shared benchmarks, community datasets—can give way to more closed, proprietary fine-tuning at the top end. Cloud providers and tool vendors may become gatekeepers for high-performance design capabilities, raising familiar questions about concentration and access that we have already seen in general-purpose AI.

Safety, Ethics, and Governance for Generative Matter Engines

As models that design physical and biological artifacts mature, governance questions move from abstract to concrete.

Dual-Use Risks in Protein and Genome Design with Evo-like Models

Large biological models carry obvious dual-use risks. A system that can propose novel CRISPR systems or enzymes for therapeutic gene editing could, in principle, be misdirected toward more harmful applications, such as enhancing pathogenicity or evading existing countermeasures. Current published work emphasizes beneficial uses and includes limited, controlled experiments, but the same underlying capability space is there.

It is important not to overstate today’s risk: designing a dangerous pathogen that is also stable, transmissible, and effective remains extraordinarily challenging and requires significant tacit expertise and infrastructure. Still, as tools like Evo lower some of the conceptual and design barriers, biosecurity experts argue for proactive safeguards, both technical and policy-based, rather than waiting for misuse incidents to force a response.

Guardrails for Access, Evaluation, and Sharing of Matter Models

Developers and cloud providers are beginning to explore guardrails suited to generative matter. These may include:

Tiered access controls, where the most capable models or sensitive capabilities are available only to vetted institutions under oversight.
Safety filters and classifiers that scan generated sequences or molecules for known red flags, such as motifs associated with toxins or restricted chemicals.
Red-teaming programs, where independent experts probe models for dangerous failure modes and share findings under responsible disclosure norms.

None of these mechanisms is foolproof, but together they can help detect obvious abuses and shape norms around responsible use.

Emerging Regulatory and Policy Responses

Regulators are watching these trends, though formal frameworks are still emerging. Biosafety and chemical safety guidelines may be updated to explicitly address AI-assisted design workflows, including expectations for record-keeping, risk assessment, and oversight when generative models are used to propose new constructs.

Policy discussions also encompass publication norms: whether to release full model weights, only restricted APIs, or tiered versions with capability controls. There is growing interest in licensing regimes for certain classes of biological design tools, depending on the scope and sensitivity of their outputs. Striking a balance between open science, innovation, and security will be an ongoing policy challenge.

How Researchers and Companies Can Use Matter Engines Now

For practitioners, the question is less whether these models will matter and more how to engage with them pragmatically during this formative phase.

Choosing Off-the-Shelf vs Custom Matter Engines

Organizations face a familiar choice: use off-the-shelf platforms such as ALCHEMI and public Evo-style models, or invest in custom training and fine-tuning. Off-the-shelf tools offer lower upfront cost and faster experimentation but may underperform on niche targets or proprietary chemistry and biology spaces.

Custom approaches, where base models are fine-tuned on internal data—assay results, proprietary compound libraries, in-house genomic studies—promise better alignment with specific pipelines and potentially defensible IP advantages. The trade-offs revolve around compute budgets, data volume and quality, risk tolerance, and whether an organization wants to own its core models or rely on external vendors.

For labs just starting with matter engines, a practical sequence is to identify one or two high-impact use cases, run a limited pilot with an off-the-shelf platform, audit internal data for fine-tuning potential, and then decide whether to scale into custom models based on early signal.

Building Hybrid Workflows Around Existing Lab Infrastructure

For most labs, the practical route is incremental integration. Rather than replacing existing experimental workflows, generative models can be introduced as upstream filters and hypothesis generators. A typical pattern might be: use a generative chemistry model to propose candidates, run them through ALCHEMI-style property predictors, select a manageable subset, and then feed experimental results back into the model as additional training signals.

In biological settings, Evo-like models can prioritize variants for saturation mutagenesis, suggest new domain combinations in modular proteins, or score regulatory sequence edits before they are synthesized. Throughout, existing instrumentation—high-throughput screening robots, sequencing platforms, imaging systems—remains central; the AI layer simply changes which experiments are run and how results are interpreted.

Skills and Collaborations Needed for Matter-Engine R&D

The human side of this transition should not be underestimated. Demand is rising for scientists who are bilingual in machine learning and the physical or biological sciences: computational chemists comfortable with deep learning, bioinformaticians who can interpret generative model outputs, and software engineers who understand lab workflows.

Cross-institutional collaborations will likely play a central role. Tech companies bring infrastructure and model-building expertise; academic labs contribute domain knowledge and cutting-edge assays; industry partners provide real-world problem definitions and scale. Institutions that invest early in these bridges will be better positioned as generative matter tools mature.

Long-Term Outlook: Toward Programmable Matter and Biology

Looking further ahead, ALCHEMI, Evo, and similar systems point toward more autonomous, closed-loop scientific workflows. As robotics and lab automation advance, it is not hard to imagine systems where models propose candidates, simulations narrow them, robots perform synthesis and testing, and results automatically feed back into model updates.

In such an environment, the tempo of discovery could accelerate, but so could the volume of marginal or uninteresting designs. One key measure of success will be whether these tools enable qualitatively new capabilities rather than just more of the same—therapies for diseases that previously resisted intervention, materials with performance regimes that unlock new industries, biological tools that make gene editing safer and more precise.

Over the longer term, as regulatory frameworks settle and generative models become more deeply embedded in lab practice, we should expect a world where novel molecules and proteins are cheap to propose but still costly to validate and deploy. The bottleneck may shift from imagination to evaluation and governance.

A realistic forecast is that in the coming years, large chemical and biological models will become standard components of early-stage discovery pipelines in leading pharma, biotech, and materials firms, with measurable but uneven productivity gains. As experience accumulates and simulation–reality gaps are better characterized, wider adoption will follow across mid-size companies and advanced academic labs. Beyond that first wave, assuming no major safety incidents derail progress, these matter engines are likely to underwrite an era in which programmable matter and biology move from slogan to incremental reality—transformative not in a single leap, but through many compounding, model-guided experiments.