Nvidia and Harvard develop AI tool that speeds up genome analysis

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.

Researchers affiliated with Nvidia and Harvard today detailed AtacWorks, a machine learning toolkit designed to bring down the cost and time needed for rare and single-cell experiments. In a study published in the journal Nature Communications, the coauthors showed that AtacWorks can run analyses on a whole genome in just half an hour compared with the multiple hours traditional methods take.

Most cells in the body carry around a complete copy of a person’s DNA, with billions of base pairs crammed into the nucleus. But an individual cell pulls out only the subsection of genetic components that it needs to function, with cell types like liver, blood, or skin cells using different genes. The regions of DNA that determine a cell’s function are easily accessible, more or less, while the rest are shielded around proteins.

AtacWorks, which is available from Nvidia’s NGC hub of GPU-optimized software, works with ATAC-seq, a method for finding open areas in the genome in cells pioneered by Harvard professor Jason Buenrostro, one of the paper’s coauthors. ATAC-seq measures the intensity of a signal at every spot on the genome. Peaks in the signal correspond to regions with DNA such that the fewer cells available, the noisier the data appears, making it difficult to identify which areas of the DNA are accessible.

ATAC-seq typically requires tens of thousands of cells to get a clean signal. Applying AtacWorks produces the same quality of results with just tens of cells, according to the coauthors.

AtacWorks was trained on labeled pairs of matching ATAC-seq datasets, one high-quality and one noisy. Given a downsampled copy of the data, the model learned to predict an accurate high-quality version and identify peaks in the signal. Using AtacWorks, the researchers found that they could spot accessible chromatin, a complex of DNA and protein whose primary function is packaging long molecules into more compact structures, in a noisy sequence of 1 million reads nearly as well as traditional methods did with a clean dataset of 50 million reads.

AtacWorks could allow scientists to conduct research with a smaller number of cells, reducing the cost of sample collection and sequencing. Analysis, too, could become faster and cheaper. Running on Nvidia Tensor Core GPUs, AtacWorks took under 30 minutes for inference on a genome, a process that would take 15 hours on a system with 32 CPU cores.

In the Nature Communications paper, the Harvard researchers applied AtacWorks to a dataset of stem cells that produce red and white blood cells — rare subtypes that couldn’t be studied with traditional methods. With a sample set of only 50 cells, the team was able to use AtacWorks to identify distinct regions of DNA associated with cells that develop into white blood cells, and separate sequences that correlate with red blood cells.

“With very rare cell types, it’s not possible to study differences in their DNA using existing methods,” Nvidia researcher Avantika Lal, first author on the paper, said. “AtacWorks can help not only drive down the cost of gathering chromatin accessibility data, but also open up new possibilities in drug discovery and diagnostics.”


VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform
  • networking features, and more

Become a member

Repost: Original Source and Author Link

Tech News

This Harvard professor claims an alien spaceship visited us in 2017

A highly unusual object was spotted traveling through the solar system in 2017. Given a Hawaiian name, ʻOumuamua, it was small and elongated – a few hundred meters by a few tens of meters, traveling at a speed fast enough to escape the Sun’s gravity and move into interstellar space.

I was at a meeting when the discovery of ʻOumuamua was announced, and a friend immediately said to me, “So how long before somebody claims it’s a spaceship?” It seems that whenever astronomers discover anything unusual, somebody claims it must be aliens.

Nearly all scientists believe that ʻOumuamua probably originates from outside the solar system. It is an asteroid- or comet-like object that has left another star and traveled through interstellar space – we saw it as it zipped by us. But not everyone agrees. Avi Loeb, a Harvard professor of astronomy, suggested in a recent book that it is indeed an alien spaceship. But how feasible is this? And how come most scientists disagree with the claim?

Researchers estimate that the Milky Way should contain around 100 million billion comets and asteroids ejected from other planetary systemsand that one of these should pass through our solar system every year or so. So it makes sense that ‘Oumuamua could be one of these. We spotted another last year – “Borisov” – which suggests they are as common as we predict.

What made ʻOumuamua particularly interesting was that it didn’t follow the orbit you would expect – its trajectory shows it has some extra “non-gravitational force” acting on it. This is not too unusual. The pressure of solar radiation or gas or particles driven out as an object warms up close to the Sun can give extra force, and we see this with comets all the time.

[Read: How do you build a pet-friendly gadget? We asked experts and animal owners]

Experts on comets and the solar system have explored various explanations for this. Given this was a small, dark object passing us very quickly before disappearing, the images we were able to get weren’t wonderful, and so it is difficult to be sure.