PhD Defense by Yimeng Zhao
Title: Mitigating Interconnect and End Host Congestion in Modern Networks
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Tuesday, June 9, 2020
Time: 1:00 PM-3:00 PM (EST)
**Note: this defense is remote-only due to the institute's guidelines on COVID-19**
Dr. Mostafa H. Ammar (Co-advisor), School of Computer Science, Georgia Institute of Technology
Dr. Ellen W. Zegura (Co-advisor), School of Computer Science, Georgia Institute of Technology
Dr. Jun (Jim) Xu, School of Computer Science, Georgia Institute of Technology
Dr. Ashutosh Dhekne, School of Computer Science, Georgia Institute of Technology
Dr. Douglas M. Blough, School of Electrical and Computer Engineering, Georgia Institute of Technology
One of the most critical building blocks of the Internet is the mechanism to mitigate congestion. While the TCP congestion control has served its purpose well in the last decades, the last few years saw a significant increase in new applications and user demand, posing new challenges for handling congestion. In this proposal, we explore new abstractions and framework that allow for improved solutions, both in inter-AS connects and on end hosts in datacenters, to mitigate congestion.
To mitigate inter-AS congestion, we develop Unison, a framework that allows an ISP to jointly optimize its intra-domain routes and inter-domain routes, in collaboration with content providers. The basic idea is to provide the ISP operator and the neighbors of the ISP with an abstraction of the ISP network in the form of a virtual switch (vSwitch). Unison allows the ISP to provide hints to its neighbors, suggesting alternative routes that can improve their performance. We investigate how the vSwitch abstraction can be used to maximize the throughput of the ISP.
To mitigate end-host congestion in datacenter networks, we start from developing a backpressure mechanism for queuing architecture in congested end hosts to cope with tens of thousands of flows. We show that current end-host mechanisms can lead to high CPU utilization, high tail latency, and low throughput in cases of congestion of egress traffic. We introduce the design, implementation, and evaluation of zero-drop networking (zD) stack, a new architecture for handling congestion of scheduled buffers.
Besides the queue capacity, CPU is another contended resource in datacenters. The networking stack of servers should implement flow control, congestion control, and scheduling layers that handle hundreds of thousands of simultaneously active flows, and the overhead of managing such large number of flows can be the bottleneck. We conduct a comprehensive analysis on the CPU cost of processing packets on the Linux networking stack. In particular, we define two broad categories of problems that leads to CPU inefficiency when the server needs to handle hundreds of thousands of clients. We show that these problems contribute to high CPU usage and network performance degradation in terms of aggregate throughput and RTT. Our work highlights considerations beyond packets per second for the design of future stacks that scale to millions of flows.