Illustration by Ryan Snook

In faded photographs from the 1960s, organic-chemistry laboratories look like an alchemist's paradise. Bottles of reagents line the shelves; glassware blooms from racks of wooden pegs; and scientists stoop over the bench as they busily build molecules.

Fast-forward 50 years, and the scene has changed substantially. A lab in 2014 boasts a battery of fume cupboards and analytical instruments — and no one is smoking a pipe. But the essence of what researchers are doing is the same. Organic chemists typically plan their work on paper, sketching hexagons and carbon chains on page after page as they think through the sequence of reactions they will need to make a given molecule. Then they try to follow that sequence by hand — painstakingly mixing, filtering and distilling, stitching together molecules as if they were embroidering quilts.

But a growing band of chemists is now trying to free the field from its artisanal roots by creating a device with the ability to fabricate any organic molecule automatically. "I would consider it entirely feasible to build a synthesis machine which could make any one of a billion defined small molecules on demand," declares Richard Whitby, a chemist at the University of Southampton, UK.


Mark Peplow discusses chemists' quest to create a machine that can synthesize any organic compound

You may need a more recent browser or to install the latest version of the Adobe Flash Plugin.

True, even a menu of one billion compounds would encompass just an infinitesimal fraction of the estimated 1060 moderately sized carbon-based molecules that could possibly exist. But it would still be at least ten times the number of organic molecules that have ever been synthesized by humans. Such a device could thus offer an astonishing diversity of compounds for investigation by researchers developing drugs, agrochemicals or materials.

"A synthesis machine would be transformational," says Tim Jamison, a chemist at the Massachusetts Institute of Technology (MIT) in Cambridge. "I can see challenges in every single area," he adds, "but I don't think it's impossible".

A British project called Dial-a-Molecule is laying the groundwork. Led by Whitby, the £700,000 (US$1.2-million) project began in 2010 and currently runs until May 2015. So far, it has mostly focused on working out what components the machine would need, and building a collaboration of more than 450 researchers and 60 companies to help work on the idea. The hope, says Whitby, is that this launchpad will help team members to attract the long-term support they need to achieve the vision.

Even if these efforts fall short, say project members, early work towards a synthesis machine could still transform chemistry. It could deliver a host of reactions that work as continuous processes, rather than one step at a time; algorithms that can predict the best way to knit a molecule together; and important advances in how computers tap vast storehouses of data about the reactivity and other properties of chemicals. Perhaps most importantly, it could trigger a cultural sea change by encouraging chemists to record and share many more data about the reactions they run every day.

Some reckon it would take decades to develop an automated chemist as adept as a human — but a less capable, although still useful, device could be a lot closer. "With adequate funding, five years and we're done," says Bartosz Grzybowski, a chemist at Northwestern University in Evanston, Illinois, who has ambitious plans for a synthesis machine of his own.

Electric dreams

If chemists are to have any hope of building their dream device, they must pull together three key capabilities. First, the machine must be able to access a database of existing knowledge about how molecules can be built — which reactions create bonds between carbon atoms, for example, or whether using certain reagents to construct one part of a molecule risks damaging other parts. Second, it must be able to feed this knowledge into an algorithm that can map out synthetic steps, in much the same way that a master chess player plans a series of moves to win a game. And finally, it must be able to automatically carry out that sequence using real reagents inside a robotic reactor.

Novartis-MIT Center for Continuous Manufacturing, MIT

Set-up of the MIT Integrated Continuous Manufacturing Process

The technology for that last step has progressed the farthest. Many labs already own dedicated machines for churning out strands of DNA or polypeptides, and in the past decade, adaptable robot chemists have become increasingly important in commercial pharmaceutical research. But existing machines have limited capabilities: a DNA or protein sequence builder is typically able to combine only a handful of molecular building blocks using fewer than half a dozen reactions. More versatile synthesis workstations are too expensive for most academic groups — costing from £30,000 to more than £500,000 — and still tend to produce molecules with a narrow range of chemical properties.

These workstations also do most of their reactions in the same batch-by-batch manner as humans. But some chemists are trying to develop continuous-flow synthesis, in which reactions occur as the chemicals move through the machine. This can improve speed and yields, and is a lot more amenable to automation.

"A synthesis machine could make any of a billion defined small molecules On demand."

Jamison, for example, is working on flow chemistry at the Novartis–MIT Center for Continuous Manufacturing in Cambridge, and he is part of a team that last year reported1 the first end-to-end, completely continuous synthesis and formulation of a pharmaceutical: aliskiren hemifumarate, a treatment for high blood pressure. Jamison and his colleagues built a machine (now dismantled) that was more than 7 metres long, and about 2.5 metres high and deep. "It took four years of 'everything that can go wrong, will go wrong'," says Bernhardt Trout, head of the MIT centre and leader of the project. After a lot of trial and error, he says, the researchers got to the point at which they merely had to flip the switch and feed in fresh drums of solvent and raw materials. The machine would hum like a large air-conditioning unit as stirrers whipped up chemicals, pumps whirred, filtration units dripped and squeezed, and a screw conveyer pushed solids through a 2-metre drying tube to be injection-moulded. Finally, after 14 operations and 47 hours, finished tablets dropped down a chute. Batch synthesis would have required 21 operations over 300 hours.

Jamison reckons that there is enormous potential for reactions to be adapted to continuous flow: "I think that it will be well over 50% eventually, maybe even 75%" of all reactions, he says. Progress is accelerating, he adds, because fixing a problem in one step — solids clogging a pipe, say — can offer immediate improvements to other processes.

A chemical brain

Although automated machines are growing more versatile, teaching a computer to devise its own synthesis remains a massive problem, says Yuichi Tateno, an automation researcher at pharmaceutical company GlaxoSmithKline in Stevenage, UK, and a member of the Dial-a-Molecule collaboration. "The hardware has always been there, but the software and data have let it down," he says.

Human chemists planning a synthesis tend to use a technique called retrosynthetic analysis. They draw the finished molecule and then pick it apart, erasing bonds that would be easy to form and leaving fragments of molecule that are stable or readily available. This allows them to identify the chemical jigsaw pieces they need as their raw materials, and to devise a strategy for connecting the pieces in the lab. If need be, they can seek inspiration from a commercial database such as SciFinder — an interface to the American Chemical Society's Chemical Abstracts Service — or its main rival Reaxys, offered by publishing giant Elsevier. Entering a molecular structure or a reaction into these databases yields examples in the literature. But even with online help, says Tateno, humans often fail at synthesis. "With the amount of chemistry that's out there, there's nobody who can know it all."

The hope is that a synthesis machine could one day do much better, says Whitby, not least because computers are so much faster at scanning through terabytes of chemical data to find a specific reaction. The bigger challenge, he adds, is that computers have a much harder time figuring out whether that reaction will actually work in a synthesis, particularly if the target has never been made before.

Bartosz Grzybowski

Chematica is a program that looks for synthetic pathways leading from starter chemicals (red), through sequences of intermediates (blue), to a target compound (yellow). The target in this example is camptothecin: a naturally occurring compound that is the basis of several cancer drugs.

That problem bedevilled Elias Corey, a chemist at Harvard University in Cambridge, Massachusetts, who formalized the rules of retrosynthesis in the 1960s. The following decade, Corey created software called LHASA (Logic and Heuristics Applied to Synthetic Analysis), which could use these rules to suggest sequences of steps towards a synthesis2. But LHASA and its successors have never taken off, says Grzybowski: either the databases have included too few reactions and too many errors, or the algorithms have not properly assessed whether proposed reactions are compatible with all functional groups in the molecule. "If we could just make one chemical bond at a time, in isolation, chemistry would be trivial," he says.

Grzybowski has spent the past decade building a system called Chematica to address those problems. He started by creating a searchable network of about 6 million organic compounds, connected by a similar number of reactions, drawn from one of the main databases behind Reaxys. His team then spent years cleaning up the data — identifying entries that lack crucial information about reagent compatibility or reaction conditions. Without that kind of clean-up, Chematica would be like a computer chef surveying a gigantic recipe book for dishes that use ice cream, stumbling on baked Alaska, and concluding that ice cream can withstand very high temperatures — missing the fact that cooking ice cream in an oven only works with an insulating shield of meringue. Chematica includes such crucial information, so its proposed syntheses of novel molecules — based on about 30,000 retrosynthetic rules — can be much more trustworthy.

The team also designed Chematica to take a holistic view of synthesis: it not only hunts for the best reaction to use at each step, but also considers the efficiency of every possible synthetic route as a whole. This means that a poor yield in one step can be counterbalanced by a succession of high-yielding reactions elsewhere in the sequence. "In 5 seconds we can screen 2 billion possible synthetic routes," says Grzybowski.

Stronger, faster, cheaper

When Grzybowski first unveiled the network behind Chematica in 2005 (ref. 3), "people said it was bullshit", he laughs. But that changed in 2012, when he and his team published a trio of landmark papers4, 5, 6 showing Chematica in action. For example, the program discovered4 a slew of 'one pot' syntheses in which reagents could be thrown into a vessel one after the other, without all the troublesome separation and purification of products after each step. The group tested Chematica's suggestions for making a range of quinolines — structures commonly found in drugs and dyes — and showed that many were more efficient than conventional approaches.

Chematica can also look up information about the cost of starting materials and estimate the labour involved in each reaction, allowing it to predict the cheapest route to a particular molecule. When Grzybowski's lab tested 51 cut-price syntheses suggested by Chematica5, it collectively trimmed costs by more than 45%.

"The hardware has always been there, but the Software and data have let it down."

These demonstrations have impressed synthetic chemists, although few have had a chance to test Chematica. That is because Grzybowski is hoping to commercialize the system: he is negotiating with Elsevier to incorporate the program into Reaxys, and is working with the pharmaceutical industry to test Chematica's synthesis suggestions for biologically active, naturally occurring molecules. Grzybowski is also bidding for a grant from the Polish government, worth up to 7 million złoty (US$2.3 million), to use Chematica as the brain of a synthesis machine that can prove itself by automatically planning and executing syntheses of at least three important drug molecules.

Others are doubtful that will happen — at least any time soon. For the foreseeable future, "there will always be a significant need for human intervention", says Simon Tyler, commercial director of CatScI, a contract-research company in Cardiff, UK, that is involved in Dial-a-Molecule. "We won't have RoboCops wandering around in the lab."

And as long as programmes like Chematica rely on databases of published studies, says Whitby, they will struggle to design reliable synthetic routes to unknown compounds. To build a synthesis machine, "we need to be able to predict when a reaction is going to work — but more importantly we need to be able to predict when it's going to fail".

Unfortunately, those failures are rarely recorded in the literature. "We only publish the successes, a cleaned-up version of what happens in the lab," says Whitby. "We also lose a lot of information: what really was the temperature, what was the stirring speed, how much solvent did you use?"

One solution is to record those successes and failures using electronic laboratory notebooks (ELNs), computer systems for logging raw experimental data that are widely used in industry but still rare in academia (see Nature 481, 430431; 2012). "A lot of people ask, 'Who reads all these data?' The point is that machines use them — they can search the data," explains Mat Todd, a chemist at the University of Sydney in Australia.

In principle, automated workstations and instruments could send information to an ELN, which would upload the details to an open-access database where they could help a synthesis machine to predict how reliable a reaction might be. "If we really did know the history of every chemical reaction that had ever been done, we'd have amazing predictive capabilities," says Todd.

Dial-a-Molecule researchers have coordinated trials of ELNs in academic labs; started to devise a standardized, machine-readable format for ELN records; and developed software that can push those data into open databases such as ChemSpider. Others in the network have developed prototype software called PatentEye, which could pull in extra data by scraping and cataloguing chemical information from patents.

Many of those dreaming of a synthesis machine agree that widespread data harvesting will require a huge cultural shift. "That's absolutely the biggest barrier," says Todd. "In chemistry, we don't have that culture of sharing, and I think it's got to change."

Money is also a significant hurdle. The expense of automated workstations means that few academics are familiar with them or their potential for capturing data. And with a large workforce of graduate students to draw on, academic labs often have little incentive to automate. Whitby is lobbying for a national centre that would host state-of-the-art automated synthesis equipment and software, to encourage their development and use. Until that materializes, he hopes that Dial-a-Molecule will inspire a new generation of chemists to embrace data sharing and automation.

Grzybowski, for one, is convinced that the synthesis machine can become a reality: "The only thing that can kill it is scepticism."