More MS news articles for May 2002

Mining gene expression data for functional annotation of genomes

10 May 2002
by Nandan Deshpande
and Akhilesh Pandey

Su A. I et al. (2002). Large-scale analysis of the human and mouse transcriptomes. Proc. Natl. Acad. Sci. U.S.A., 99:4465-4470.

The ability of DNA microarrays to monitor the expression of thousands of genes simultaneously has made it possible to examine all the transcripts in cells or tissues, or the ‘transcriptome,’ in a high-throughput fashion. It will soon be possible to analyze the entire transcriptome on a single chip using this technology. So far, DNA microarrays have largely been used to study differential expression of genes in two or more states or to identify single nucleotide polymorphisms (SNPs). Discovering groups of genes that are specifically expressed in different tumors as compared with normal tissue has been the focus of a number of studies. Such efforts have already led to the identification of distinct transcript ‘signatures’ that can be used to classify tumors into different subtypes in some cases.

Global transcript analysis can be put to several other uses as well, as elegantly demonstrated by a recent study published by Su et al. These investigators combined the information obtained from expression patterns of thousands of human and mouse genes in several tissues, organs and cell lines with sequence analysis to provide a more complete description of gene function. They obtained the expression profiles by hybridizing 46 human and 45 mouse tissue RNA samples to human or mouse oligonucleotide arrays. They found that ~5% of the genes examined were ubiquitously expressed and any individual tissue expressed ~30-40% of the genes. Using stringent criteria, they identified 311 known human tissue-restricted genes along with 76 others whose functions have not yet been established. Examining the domain structures of the previously uncharacterized proteins allowed them to establish potential enzyme-substrate relationships as well as to assign candidate protein-protein interaction partners.

The expression data could also be used to confirm sequence-based ortholog assignment because one would expect that true orthologs have similar patterns of tissue-specific expression in humans and mice. Expression patterns of ~800 ortholog pairs across 16 tissues, as defined by a common LocusLink symbol, revealed that almost half of them were highly correlated in their expression profiles. Surprisingly, a large number of genes were either poorly correlated or even had a negative correlation in their expression patterns between these two species. This indicates that the function of some of these genes might have diverged significantly, a fact not obvious upon comparison of their primary amino acid sequences. Overall, the approach of categorizing proteins and orthologs on the basis of tissue specificity and patterns of expression is very promising. This could help in selecting therapeutic targets based on defined criteria such as expression in a defined tissue or coexpression with another gene of interest. Because a large number of genes have no obvious function that can be predicted from their protein sequence, studies such as this can not only help in their annotation but also help direct specific experiments to address their functions.

© Elsevier Science Limited 2002