{"70927":{"#nid":"70927","#data":{"type":"news","title":"Researchers Develop Self-Training Gene Prediction Program for Fungi","body":[{"value":"\u003Cp\u003EResearchers at the Georgia Institute of Technology have developed a computer program that trains itself to predict genes in the DNA sequences of fungi.\u003C\/p\u003E\n\u003Cp\u003EFungi - which range from yeast to mushrooms - are important for industry and human health, so understanding the recently sequenced fungal genomes can help in developing and producing critical pharmaceuticals. Gene prediction can also help to identify potential targets for therapeutic intervention and vaccination against pathogenic fungi. \n\u003C\/p\u003E\n\u003Cp\u003E\u0022While we previously showed that our unsupervised training program worked well to predict genes in many eukaryotes, it didn\u0027t work as well for various fungal genomes that carry a significant part of the information that facilitates accurate gene prediction in locations called branch point sites,\u0022 said Mark Borodovsky, director of Georgia Tech\u0027s Center for Bioinformatics and Computational Genomics. \n\u003C\/p\u003E\n\u003Cp\u003EBranch point sites are located inside introns, which are non-coding regions of DNA located between genetic-code carrying regions called exons.\n\u003C\/p\u003E\n\u003Cp\u003E\u0022Previously during the process of predicting the exon-intron structure of eukaryotic genes, we didn\u0027t search for branch point sites, but doing so in the new program helps to better delineate intron regions inside fungal genes,\u0022 added Borodovsky, who is also a Regents\u0027 Professor in the Coulter Department of Biomedical Engineering and the Computational Science and Engineering Division of the College of Computing.\n\u003C\/p\u003E\n\u003Cp\u003EBorodovsky and his colleagues expanded the eukaryotic genome self-training software program they developed in 2005 to address the issue that fungal genes are more complex than other eukaryotes. The research team included graduate student Vardges Ter-Hovhannisyan, Wallace H. Coulter Department of Biomedical Engineering research scientist Alexandre Lomsadze and School of Biology professor Yury Chernoff.\n\u003C\/p\u003E\n\u003Cp\u003EDetails of the new program, called GeneMark.hmm-ES (BP), are available online in the journal \u003Cem\u003EGenome Research\u003C\/em\u003E and will be included in the journal\u0027s December print edition. The software will also be freely available for academic researchers.\n\u003C\/p\u003E\n\u003Cp\u003EBorodovsky developed the first version of GeneMark in 1993. In 1995, this program was used to find genes in the first completely sequenced genomes of bacteria and archea. The research team then developed self-training versions of the gene finding program for prokaryotic (organisms that lack a cell nucleus) and eukaryotic (organisms that contain a cell nucleus) genomes in 2001 and 2005, respectively. Development of these programs has been supported by the National Institutes of Health. \u003C\/p\u003E\n\u003Cp\u003EUnlike other programs that require a pre-determined training set along with the genome sequence, GeneMark.hmm-ES (BP) only requires the genome sequence. The program is able to iteratively identify the correct algorithm parameters from the anonymous sequence. The program uses a probabilistic mathematical model called the Hidden Markov Model to pinpoint the boundaries between coding sequences (exons) and non-coding sequences (introns and intergenic regions). \n\u003C\/p\u003E\n\u003Cp\u003EMost introns start from the dinucleotide guanine-thymine (abbreviated GT) and end with the dinucleotide adenine-guanine (abbreviated AG). However, finding these dinucleotides is not sufficient to signal the presence of an intron. Several nucleotides that surround GT and AG are also important, but the similarity of the pattern is not deterministic. Locating the branch site - which is nine nucleotides in length, almost always contains an adenine and is located 20-50 bases upstream of the acceptor site - helps to accurately identify an intron.\n\u003C\/p\u003E\n\u003Cp\u003EAn initial run of the program with a reduced model containing heuristically defined parameters breaks the sequence into coding and non-coding regions. With this information, the researchers apply machine-learning techniques to refine the parameters of the recognition algorithm with respect to the specific patterns found in the newly identified protein-coding and non-coding sequences as well as the border sites. \n\u003C\/p\u003E\n\u003Cp\u003EThe prediction and training steps are repeated, each time detecting a larger set of true coding and non-coding sequences that are used to further improve the model employed in statistical pattern recognition. When the new sequence breakdown coincides with the previous one, the researchers record their final set of predicted genes.\n\u003C\/p\u003E\n\u003Cp\u003ETo test the algorithm, the researchers selected 16 fungal species from the phyla Ascomycota, Basidiomycota and Zygomycota and compiled sets of genome sequences containing previously validated genes. The species spanned large evolutional distances and exhibited significant variability in genome size, gene number and average number of introns per gene. The results showed that by including branch site information in the model, the researchers could more accurately predict exon-intron structures of fungal genes. \u003C\/p\u003E\n\u003Cp\u003E\u0022The enhanced program predicted fungal genes with higher accuracy than either the original self-training algorithm or known algorithms with supervised training,\u0022 noted Borodovsky. \u0022And because we didn\u0027t need any additional training information for our program, the sequencing teams could immediately proceed with gene annotation right after the genomic sequence was in hand without spending time and effort to extract a set of validated genes necessary for estimating parameters of traditional algorithms.\u0022 \n\u003C\/p\u003E\n\u003Cp\u003EResearchers at the U.S. Department of Energy Joint Genome Institute and the Broad Institute of the Massachusetts Institute of Technology and Harvard University have already realized the advantages of the new algorithm. They have already used the new program to annotate about 20 novel fungal genomes. In addition, hundreds of fungal genome sequencing projects currently in progress should benefit from the new method as well, according to Borodovsky. \n\u003C\/p\u003E\n\u003Cp\u003EWith the fungal software completed, Borodovsky and his team are already looking to expand their gene prediction algorithms to accurately interpret even more complex eukaryotic genomes. \n\u003C\/p\u003E\n\u003Cp\u003E\u0022There are genome sequencing projects where large repeat populations, a significant number of pseudogenes or substantial sequence inhomogeneity hamper ab initio gene prediction and we\u0027re ready to tackle them next,\u0022 added Borodovsky.\n\u003C\/p\u003E\n\u003Cp\u003E\u003Cstrong\u003EResearch News \u0026amp; Publications Office\u003Cbr \/\u003E\nGeorgia Institute of Technology\u003Cbr \/\u003E\n75 Fifth Street, N.W., Suite 100\u003Cbr \/\u003E\nAtlanta, Georgia  30308  USA\n\u003C\/strong\u003E\u003C\/p\u003E\n\u003Cp\u003EMedia Relations Contacts: Abby Vogel (404-385-3364); E-mail: (\u003Ca href=\u0022mailto:avogel@gatech.edu\u0022\u003Eavogel@gatech.edu\u003C\/a\u003E) or John Toon (404-894-6986); E-mail: (\u003Ca href=\u0022mailto:jtoon@gatech.edu\u0022\u003Ejtoon@gatech.edu\u003C\/a\u003E).\n\u003C\/p\u003E\n\u003Cp\u003E\u003Cstrong\u003ETechnical Contact:\u003C\/strong\u003E Mark Borodovsky (\u003Ca href=\u0022mailto:borodovsky@gatech.edu\u0022\u003Eborodovsky@gatech.edu\u003C\/a\u003E).\n\u003C\/p\u003E\n\u003Cp\u003E\u003Cstrong\u003EWriter:\u003C\/strong\u003E Abby Vogel\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":[{"value":"Software adds to the family of GeneMark programs created at Georgia Tech"}],"field_summary":[{"value":"Researchers have developed a computer program that trains itself to predict genes in the DNA sequences of fungi. Details of the new program, called GeneMark.hmm-ES (BP), are available online in the journal Genome Research and will be included in the journal\u0027s December print edition. The software will also be freely available for academic researchers.","format":"limited_html"}],"field_summary_sentence":[{"value":"Software developed that trains itself to predict genes in fungal"}],"uid":"27206","created_gmt":"2008-09-29 00:00:00","changed_gmt":"2016-10-08 03:03:19","author":"Abby Vogel Robinson","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2008-09-29T00:00:00-04:00","iso_date":"2008-09-29T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"70928":{"id":"70928","type":"image","title":"Pleurotus ostreatus oyster mushroom","body":null,"created":"1449177328","gmt_created":"2015-12-03 21:15:28","changed":"1475894625","gmt_changed":"2016-10-08 02:43:45"},"70929":{"id":"70929","type":"image","title":"Mark Borodovsky","body":null,"created":"1449177328","gmt_created":"2015-12-03 21:15:28","changed":"1475894625","gmt_changed":"2016-10-08 02:43:45"},"70930":{"id":"70930","type":"image","title":"Mushroom","body":null,"created":"1449177328","gmt_created":"2015-12-03 21:15:28","changed":"1475894625","gmt_changed":"2016-10-08 02:43:45"}},"media_ids":["70928","70929","70930"],"related_links":[{"url":"http:\/\/www.cc.gatech.edu\/inside\/units\/cse","title":"Computational Science \u0026 Engineering Division, College of Computing"},{"url":"http:\/\/www.biology.gatech.edu\/faculty\/yury-chernoff\/","title":"Yury Chernoff"},{"url":"http:\/\/www.bme.gatech.edu\/","title":"Wallace H. Coulter Department of Biomedical Engineering"},{"url":"http:\/\/www.bme.gatech.edu\/facultystaff\/faculty_record.php?id=36","title":"Mark Borodovsky"},{"url":"http:\/\/dx.doi.org\/10.1101\/gr.081612.108","title":"Genome Research article"}],"groups":[{"id":"1188","name":"Research Horizons"}],"categories":[{"id":"154","name":"Environment"},{"id":"135","name":"Research"}],"keywords":[{"id":"7204","name":"ab initio"},{"id":"7201","name":"Ascomycota"},{"id":"7202","name":"Basidiomycota"},{"id":"7193","name":"branch"},{"id":"1041","name":"dna"},{"id":"2825","name":"eukaryote"},{"id":"7196","name":"exon"},{"id":"7188","name":"fungal"},{"id":"7186","name":"fungi"},{"id":"7187","name":"fungus"},{"id":"1110","name":"gene"},{"id":"7198","name":"GeneMark"},{"id":"1133","name":"genome"},{"id":"7200","name":"Hidden"},{"id":"7197","name":"intron"},{"id":"5227","name":"Markov"},{"id":"1383","name":"model"},{"id":"7191","name":"mushroom"},{"id":"7194","name":"point"},{"id":"7189","name":"predict"},{"id":"3003","name":"protein"},{"id":"170860","name":"self-training"},{"id":"167503","name":"sequence"},{"id":"170861","name":"site"},{"id":"7192","name":"unsupervised"},{"id":"7203","name":"Zygomycota"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cstrong\u003EAbby Robinson\u003C\/strong\u003E\u003Cbr \/\u003EResearch News and Publications\u003Cbr \/\u003E\u003Ca href=\u0022http:\/\/www.gatech.edu\/contact\/index.html?id=avogel6\u0022\u003EContact Abby Robinson\u003C\/a\u003E\u003Cbr \/\u003E\u003Cstrong\u003E404-385-3364\u003C\/strong\u003E","format":"limited_html"}],"email":["abby@innovate.gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}