Weizmann Institute Scientists Develop a Novel System for Analyzing Genetic Data that Mimics the Human Capacity for Unsupervised Learning


You are here


Addressing this and similar challenges may soon be easier thanks to Prof. Eytan Domany of the Weizmann Institute's Physics of Complex Systems Department and doctoral students Gad Getz and Erel Levine. The team has designed a unique mathematical system for analyzing genetic data based on a computer algorithm that 'clusters' information into relevant categories. The algorithm searches simultaneously for clusters of 'similar' genes and patients by evaluating the gene expression of tissue samples. (A gene's 'expression' refers to the production level of the proteins it encodes.)


Reported in the October 17 issue of the Proceedings of the National Academy of Sciences (PNAS), the algorithm's most powerful feature is that it mimics unassisted learning. Unlike most automated 'sorting' processes, in which a computer must be informed of the relevant categories in advance, the algorithm is analogous to human intuition (such as the ability to intuitively categorize images of animals and cars into proper classes). When given a clustering task, it analyzes the data, computes the degree of similarity among its components, and determines its own clustering criteria.


The new method makes use of a previous application by Domany and his colleagues based on a well-known physical phenomenon. When a granular magnet such as a magnetic tape is warm, its grains are highly disorganized. But upon cooling down, the magnet's grains progressively organize themselves into well-ordered clusters. Using the statistical mechanics of granular magnets, Domany created an algorithm that can look for clusters in any data.


When applied in a cancer study using DNA chips, the new algorithm proved highly effective, evaluating roughly 140,000 figures representing the cellular expression of 2,000 genes from 70 subjects. The algorithm categorized tissue samples into separate clusters according to their gene expression profiles. For example, one cluster consisted of cancerous tissues, while another contained samples from healthy subjects. The new method also distinguished among different forms of cancer as well as demonstrating treatment effects, picking up differences in the gene expression of leukemia patients that had received treatment versus those that had not. The ability to monitor cell response to treatment and understand the origin of disease in each patient may improve future treatment protocols, which would be tailored to individual pathologies.


Finally, one of the algorithm's most promising features is that it enabled researchers to pinpoint a small group of genes from within the 2,000 examined that can be used to accurately distinguish between cellular cancerous processes.


In a sense, however, applying the new algorithm to DNA chips is only a start. The new algorithm's inherent clustering capacity makes it invaluable for use in data-heavy scientific and industrial applications. It may be used to analyze financial information and MRI data in brain research, or to perform 'data mining,' the process by which specific details are culled from the world's huge and ever-growing data banks, such as those generated by the international Human Genome Project.

The Institute's technology transfer arm, Yeda Research and Development, has issued a patent application for the algorithm.


Prof. Eytan Domany holds the Henry J. Leir Professorial Chair.


The Weizmann Institute of Science is a major center of scientific research and graduate study located in Rehovot, Israel. Its 2,500 scientists, students and support staff are engaged in more than 1,000 research projects across the spectrum of contemporary science.