Inside every cell in the human body there is a constellation of proteins, millions of them. They’re all pushed, quickly assembled, folded, wrapped, shipped, cut and recycled in a hive of activity that works at a feverish pace to keep us alive and ticking.
But without a complete inventory of the universe of proteins inside our cells, scientists have a hard time appreciating at the molecular level what is wrong with our bodies that leads to disease.
Now, researchers have developed a new technique that uses AI to assimilate data from microscopic images of single cells & biochemical analyzes, to create a “unified map” of subcellular components, half of which are apparently never seen before. .
“Scientists have long understood that there is more than we don’t know, but now we finally have a way to go further,” says computer scientist & network biologist Trey Ideker of the University of California ( UC) in San Diego. .
Microscopes, as powerful as they are, allow scientists to peer inside individual cells, down to the level of organelles such as mitochondria, power-packs of cells, & ribosomes, protein factories. We can even add fluorescent dyes to easily label & track proteins.
Biochemical techniques can go even further, focusing on single proteins by using, for example, targeted antibodies that bind to the protein, extract out of cell and see what is attached to it.
The integration of these two approaches is a challenge for cell biologists.
“How do you bridge gap from nanometer to micron scale? It has long been an big obstacle in biological sciences,” explains Ideker.
“Turns out you can do it with AI (artificial intelligence), looking at data from multiple sources & asking the system put it together into a cell model.
The result: Ideker and his colleagues turned over textbook maps of globular cells that give us a bird’s-eye view of candy-colored organelles in a complex web of protein-protein interactions, organized by the small distances between them.
By merging image data from a library called the Human Protein Atlas & existing maps of protein interactions, the machine learning algorithm was tasked with calculating the distances between pairs of proteins.
The aim was to identify communities of proteins, called assemblages, which coexist in cells at different scales, from very small (less than 50 nm) to very “large” (more than 1 m).
1 shy of 70 protein communities were classified by the algorithm, which was trained using a reference protein library of known or estimated diameter and validated by other experiments.
About half of the protein components identified are apparently unknown to science, never documented in published literature, the researchers suggest.
In the mix was a group of proteins that formed an unknown structure, which the researchers found likely responsible for splicing & dicing new made transcripts of the genetic code that used to make the proteins.
Other mapped proteins included transmembrane transport systems that pump supplies in-to & out of cells, families of proteins that help organize bulk chromosomes, & protein complexes whose job it is to make, eh well, more protein.
However, a major effort is not the first time that scientists have attempted to map the inner workings of human cells.
Other efforts to create reference maps of protein interactions yielded similarly mind boggling numbers & attempted to measure protein levels in human body tissues.
The researchers have also developed techniques to visualize & tracking the interaction & movement of proteins in cells.
This pilot study goes one step further by applying machine learning to cell microscopy images that locate proteins relative to major cellular landmarks such as the nucleus & data from protein interaction studies that identify nearest neighbors to the nanoscale of a protein.
“The combination of these technologies is unique and powerful because it is the first time that measurements at very different scales have been brought together,” says bioinformatician Yue Qin, also of UC San Diego.
In this way, the Multi scale integrated cell technique or the music “increases the resolution of imaging when using protein interactions a spatial size, paving the way for the incorporation of different types of data in proteome wide cellular maps, ”write Qin, Ideker and colleagues.
To be clear, this research is very preliminary: the team focused on validating their method & only looked at the available data for 661 proteins in a cell type, a kidney cell line that scientists have grown in laboratory for five decades.
The researchers plan to apply their new fangled technique to other types of cells, says Ideker.
But in the meantime, we will have to humbly accept that we are mere interlopers inside our own cells, capable of understanding a small fraction of the total proteome.
“In the end, we could better understand the molecular basis of many diseases by comparing what is different between healthy cells & diseased cells,” explains IDeker.
The study was published in Nature.