Ph.D. Defense of Dissertation: M. Hasan Abbasi
Dr. Karsten Schwan, Advisor, College of Computing
Dr. Matthew Wolf, College of Computing
Dr. Scott Klasky, Oak Ridge National Laboratory
Dr. Rich Vuduc, College of Computing
Dr. Ron. A. Oldfield, Sandia National Laboratory
The exponential growth of data produced by scientific simulations on leadership class HPC machines has exposed the importance of the I/O bottleneck which can cripple the progress of scientific understanding in many important national interest domains.
The size of the data has also exposed difficulties in current methods of information extraction from this generated data and its reliance on post processing based data exploration. The accelerating growth in computational capability compared to the growth of I/O bandwidth has created a large imbalance. This imbalance is a bottleneck that limits our ability to exploit the performance of current generation machines and will play an even greater role in limiting the efficient utilization of next generation systems.
In this thesis I present a significant shift in how data management and I/O are dealt with on these high end computing systems. In particular, I present the Data Service abstraction, which addresses data management and information extraction as an integral part of the data generation and output process. A data service is a combination of coupled plugins operating on output data to both extract information from the data and to also prepare the data for further analysis. I also address some of the fundamental requirements in creating dynamic functional I/O pipelines such as the ability to extend the output from a stream of bytes to a self describing structure, the overhead of data movement and processing on application performance and the management of available resources for the data service. I also use available technologies such as RDMA and structured serialization along with the development of new abstractions such as data staging resources to address these challenges. The thesis will also present the utility of these data services for real applications in the materials and fusion domain and evaluate the functionality of data services for these domains.