Ph.D. Thesis Proposal: Mungyung Ryu
Title: Towards a scalable design of multi-tiered storage systems for video streaming
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: October 22th (Monday), 2012
Time: 4:00PM - 6:00PM (EDT)
Location: KACB 2100
- Dr. Umakishore Ramachandran (Advisor, School of Computer Science, Georgia Tech)
- Dr. Karsten Schwan (School of Computer Science, Georgia Tech)
- Dr. Constantine Dovrolis (School of Computer Science, Georgia Tech)
- Dr. Moinuddin K. Qureshi (School of Electrical and Computer Engineering, Georgia Tech)
- Dr. Naresh Patel (NetApp)
We are witnessing a proliferation of video in the Internet; YouTube is the most bandwidth intensive service of today's Internet. It accounts for 20-35% of the Internet traffic with 35 hours of videos uploaded every minute and more than 700 billion playbacks in 2010. Netflix, a web service that streams premium contents such as TV series, shows, and movies, consumes 30% of the network bandwidth in North America at peak time. Historically, for a Video-on-Demand (VoD) system, hard disk drives (HDDs) have been used to store and serve video data. More recently, leveraging the content distribution networks (CDNs), a new paradigm for video streaming on the Internet has emerged, namely, Dynamic Adaptive Streaming over HTTP (DASH). DASH has become the industry standard protocol for video streaming adopted by broadcast networks as well as VoD services such as Netflix. CDNs cater to the needs of video streaming by deploying large number of disk arrays to meet the required streaming bandwidth. While the capacity of disks has been improving continuously, the access latency of disks has been mostly stagnant. As the need for video streaming on the Internet grows and the amount of video content grows, CDNs are having to deploy larger and larger disk arrays to meet the bandwidth and capacity needs. This dramatically increases the investment cost for the large number of disks and the operational cost for such CDN servers (power consumption and cooling), usually referred to as the Total Cost of Ownership (TCO).
Multi-tiered storage with Flash Memory SSDs has the potential to solve the problems of HDDs-only system. A Flash memory SSD can provide higher storage bandwidth than a HDD for a given cost, consume a fraction of the power of a HDD, and dramatically reduce the cooling needs. However, the capacity per dollar of an SSD is much smaller than that of a disk. Therefore, it is not cost-effective to replace disks entirely with ash memory for the permanent storage. On the other hand, it is very attractive to architect a multi-tiered storage for video streaming utilizing the flash memory SSD as a caching device between the DRAM and the disks.
The problem being investigated in the proposed research is how best to architect a multi-tiered storage system for DASH video streaming. Specifically, my thesis is that such a multi-tiered system can meet the bandwidth needs of DASH video streaming in a much more cost effective way than a HDD-only storage system. In my dissertation, I will identify the challenges in architecting such a system given the performance quirks of flash-based SSDs, and the limitations of state-of-the-art enterprise level multi-tiered storage systems for video streaming. Armed with the knowledge of these challenges, I will show how to construct such a storage system and implement a real web server with multi-tiered storage, evaluate the system with DASH workloads, and demonstrate significant performance gains while reducing the TCO. As part of my dissertation, I also investigate dynamic overlay management for peer-to-peer video streaming that can further reduce the network bandwidth load on CDN servers.