{"470281":{"#nid":"470281","#data":{"type":"news","title":"Georgia Tech Student Research Increases Human Genome Indexing Speed by 110x and Advances Internal Memory Capacity","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003EAUSTIN, Texas \u003C\/strong\u003E\u2014\u003Cstrong\u003E Monday, Nov. 16, 2015 \u003C\/strong\u003E\u2014 Four doctoral students comprising two research projects in the School of Computational Science \u0026amp; Engineering at the Georgia Institute of Technology are finalists for \u201cBest Student Research Paper\u201d at Supercomputing \u201915, the International Conference for High Performance Computing, Networking, Storage and Analysis.\u003C\/p\u003E\u003Cp\u003EFor the first project, the finalists developed a fast algorithm; in the second, they created a GPU-based framework that can process large graphs exceeding a device\u2019s internal memory capacity. Each demonstrably outperforms other common approaches in high-performance computing today. A winner will be announced at the conference in Austin, Texas.\u003C\/p\u003E\u003Cp\u003E\u201cBoth of these are outstanding projects and evidence of the future leaders who are already keeping pace with and solving the most challenging problems in science, engineering, health and the social domain,\u201d says David A. Bader, professor and chair of the School of Computational Science \u0026amp; Engineering.\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003E\u003Cem\u003ENew Human Genome Indexing Algorithm for Parallel Distributed Memory\u003C\/em\u003E\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EIn his work, \u201cParallel Distributed Memory Construction of Suffix and Longest Common Prefix Arrays,\u201d PhD Candidate \u003Cstrong\u003EPatrick Flick\u003C\/strong\u003E, working with Professor \u003Cstrong\u003ESrinivas Aluru\u003C\/strong\u003E, created parallel algorithms for distributed-memory construction that are 110x times faster than the best method running on a sequential, single computer. Flick, using the human genome as a racetrack to test his speed, indexed it in only 7.3 seconds using his distributed-memory algorithm running on 1024 Intel Xeon cores.\u003C\/p\u003E\u003Cp\u003E\u201cBioinformatics is an example of a scientific field that is extremely data intensive; speed matters and speed helps,\u201d he says. \u201cWe are not aware of any other parallel suffix array or suffix tree construction algorithms which achieve speedups this high.\u201d\u003C\/p\u003E\u003Cp\u003EIt is believed to be the first algorithm and implementation that uses this approach for distributed-memory parallel systems.\u003C\/p\u003E\u003Cp\u003E\u201cAt this stage the code is offered as an open-source library that can be used within parallel applications,\u201d Fick adds. \u201cWe hope that it finds adaptation within bioinformatics research. We are now working on a user-friendly interface that can be used by bioinformaticians to replace older (and slower) tools.\u201d\u003C\/p\u003E\u003Cp\u003ENext, Flick is working on a journal paper that includes some more improvements and additional techniques, and further showcases their algorithms on real applications. Patrick also authored another paper at Supercomputing 2015 with fellow students Chirag Jain and Tony Pan about how to partition large graphs that arise in metagenomics, another data-intensive application area.\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003E\u003Cem\u003EProcessing Large-Scale Graphs\u003C\/em\u003E\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EIn their paper titled \u201cGraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems,\u201d PhD Candidates \u003Cstrong\u003EDipanjan Sengupta\u003C\/strong\u003E, and \u003Cstrong\u003EKapil Agarwal\u003C\/strong\u003E developed a scalable framework (dubbed \u201cGraphReduce\u201d) to process large graphs that exceed a device\u2019s GPU memory.\u003C\/p\u003E\u003Cp\u003E\u201cGraphReduce can accelerate the analysis of graphs with billions of edges, operating at speeds much faster than similar operations on CPUs, and programmed in ways that are accessible to those who are not typically experts in GPU programming,\u201d Sengupta says.\u003C\/p\u003E\u003Cp\u003EIt provides a logic for processing \u201cshard stores\u201d based on choice of interval, number and sizes of shards, and how to order the edges in each shard. A \u201cgraph layout engine\u201d then defines the layout of the data by sorting in-edges by their destination and out-edges by their source.\u003C\/p\u003E\u003Cp\u003E\u201cOne of the interesting results is that saturating the available bandwidth and overlapping data transfer with computation was able to hide a large amount of overhead, resulting in huge performance benefits,\u201d Agarwal adds.\u003C\/p\u003E\u003Cp\u003EMost methods process dynamic graphs (like those of a social network which are continually changing over time) by storing static versions of the graph and then repeatedly running analysis on them. To address this, Sengupta next is working on an open-source framework with a Fortune 500 company to boost processing.\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003EAbout the Georgia Tech College of Computing\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EThe Georgia Tech College of Computing is a national leader in the creation of real-world computing breakthroughs that drive social and scientific progress. With its graduate program ranked 9th nationally by \u003Cem\u003EU.S. News and World Report\u003C\/em\u003E, the College\u2019s unconventional approach to education is expanding the horizons of traditional computer science students through interdisciplinary collaboration and a focus on human-centered solutions. For more information about the Georgia Tech College of Computing, its academic divisions and research centers, please visit http:\/\/\u003Ca href=\u0022http:\/\/www.cc.gatech.edu\/\u0022\u003Ewww.cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":[{"value":"Discoveries are two of four finalists for \u201cBest Student Research Paper\u201d at Supercomputing \u201915 conference"}],"field_summary":"","field_summary_sentence":[{"value":"Four doctoral students make advances in high-performance computing (HPC) that outperform other common approaches in HPC today."}],"uid":"27490","created_gmt":"2015-11-16 16:41:54","changed_gmt":"2016-10-08 03:20:03","author":"Tara La Bouff","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2015-11-16T00:00:00-05:00","iso_date":"2015-11-16T00:00:00-05:00","tz":"America\/New_York"},"extras":[],"groups":[{"id":"47223","name":"College of Computing"}],"categories":[{"id":"135","name":"Research"}],"keywords":[{"id":"1896","name":"Genomics"},{"id":"15030","name":"high-performance computing"},{"id":"168929","name":"supercomputers"}],"core_research_areas":[{"id":"39441","name":"Bioengineering and Bioscience"},{"id":"39431","name":"Data Engineering and Science"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ETara La Bouff, 404.769.5408\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E","format":"limited_html"}],"email":["tlabouff@cc.gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}