Too much data, too few drugs

By David Ewing Duncan, contributorApril 29, 2010: 2:36 PM ET

(Fortune) -- Like sages of old, they came to San Francisco last weekend, a group of biologists and computer scientists setting out to one-up every library ever conceived, from the great one in ancient Alexandria to Wikipedia today.

This library, however, will not consist of vellum scrolls or e-page entries. It aims to compile and make sense of genetic sequences and other raw biological data that are proliferating so fast that biology is about to move from petabytes to exabytes of data -- from quadrillions to quintillions. Just ten years ago, in 2000, all of digitized biology equaled only about 10 gigabytes (giga=billion).

While this is a stunning technological achievement, it also may be contributing to the glut of new drugs coming out of the pharmaceutical industry in recent years. The problem is that too much raw scientific data is scattered across too many databases with too little thought given to organizing it all so that it can be properly mined and used to develop treatments.

Trying to make sense of all this data is what brought two hundred scientists here to the first-ever Sage Congress. Organized by Sage Bionetworks, a new nonprofit based in Seattle, the meeting's attendees have proposed a novel solution: to create a new, open-source model to standardize and link together thousands of databases around the world -- in universities, institutes, governments, and businesses.

"It's time to admit the truth, we're not doing drug development the right way," says Stephen Friend, a co-founder of Sage who until recently headed up Merck's research program in oncology. "75% of cancer drugs don't work."

One day, Sage might allow scientists studying cancer -- or Alzheimer's Disease or diabetes -- to easily access the raw genetic data of thousands of people collected, say, in Ohio, Iceland, and Japan, and connect them to databases detailing cellular mechanisms that may explain how these diseases work.

Sage also wants to build systems that can organize and analyze complex interactions among networks of genes in humans and other organisms. Understanding how these networks react to environmental stimuli -- for instance, an individual's diet and exposure to chemical toxins such as mercury -- is the key to unlocking the secrets of common diseases such as heart disease and diabetes, say scientists.

"We need systems that can mimic the complexity of human biology before we'll really understand how everything works for a disease like diabetes," says Sage co-founder Eric Schadt. A biocomputer scientist, Schadt also recently left Merck (MRK, Fortune 500), where he headed up a team that used super computers and sophisticated tests to study how complex genetic networks and pathways and other molecular entities affect disease.

Creating an über-database is a formidable engineering challenge, but it's not the only barrier. Attitudes also need to change among scientists and institutions used to keeping their data to themselves whenever possible.

"It will require a fundamental change in thinking to realize that sharing data is important," says Friend.

Friend was also a co-founder of Rosetta, a bioinformatics company acquired by Merck in 2001 for $620 million. As part of Merck, Rosetta built one of the fastest supercomputers in the drug industry, running 16 trillion calculations a second. The company also developed specialized chips and computer programs to sequence and analyze tissues throughout the body.

Last year, Merck disbanded Rosetta as part of its downsizing, deciding that building ever more complex models of human biological systems was beyond the resources of a single company. Merck developed several drugs out of the Rosetta project and has agreed to hand over key components of the technology to Sage.

The enormity of the effort led Friend and Schadt to turn to open source technology, which can be run by a small staff while drawing on hundreds, or even thousands, of contributors. Open source has been used with great success in developing software systems like Linux. In science, projects like Science Commons, based at the Massachusetts Institute of Technology, are also working to break down legal, financial and infrastructural barriers to sharing studies and data.

So far, Sage has raised several million dollars from private foundations, companies such as Merck and Pfizer (PFE, Fortune 500), and the National Institutes of Health.

Meanwhile, the petabytes, and soon exabytes, of data keep piling up, adding to the urgency of sorting it all out. Sage will need significantly more funding and a staff large enough to wrestle with and organize a Great Library of this size so that we can start maximizing the potential for understanding biology and developing drugs sooner rather than later.

We may even want to stop producing so much data for a period of time and concentrate on organizing what we've got.

The ultimate personal technology

The great DNA letdown

What DNA, Patents and Lady Gaga have in common

First Published: April 29, 2010: 1:49 PM ET

Right Now

Just the hot list include

Hot List

Frontline troops push for solar energy

The U.S. Marines are testing renewable energy technologies like solar to reduce costs and casualties associated with fossil fuels. Play

25 Best Places to find rich singles

Looking for Mr. or Ms. Moneybags? Hunt down the perfect mate in these wealthy cities, which are brimming with unattached professionals. More

Fun festivals: Twins to mustard to pirates!

You'll see double in Twinsburg, Ohio, and Ketchup lovers should beware in Middleton, WI. Here's some of the best and strangest town festivals. Play

Job Search See 232,273 new jobs added today

See all jobs

jobs by

Original Shows

Key to NBA's success? Embracing tech

NBA Commissioner David Stern says the basketball league is looking to expand its use of technology to improve gameplay and increase its audience. Play

Unique Homes

Selling Roy Rogers' former ranch

With 67 acres of land and room for 150 horses, the former ranch of the 'King of the Cowboys' sold at auction for $640,000. Play

Help Desk

Track testing tires to find the best

Find out how TireRack tests and reviews tires and why choosing the right ones for your car is so important. Play

All CNNMoney.com Original Shows

Markets

Company	Price	Change	% Change
Ford Motor Co	8.29	0.05	0.61%
Advanced Micro Devic...	54.59	0.70	1.30%
Cisco Systems Inc	47.49	-2.44	-4.89%
General Electric Co	13.00	-0.16	-1.22%
Kraft Heinz Co	27.84	-2.20	-7.32%

Data as of 2:44pm ET

Index	Last	Change	% Change
Dow	32,627.97	-234.33	-0.71%
Nasdaq	13,215.24	99.07	0.76%
S&P 500	3,913.10	-2.36	-0.06%
Treasuries	1.73	0.00	0.12%

Data as of 6:29am ET

Symbol Matches

Symbol Starts With

Company Matches

Too much data, too few drugs