{"628663":{"#nid":"628663","#data":{"type":"event","title":"PhD Proposal by Girish Mururu","body":[{"value":"\u003Cp\u003ETitle: Compiler Guided Scheduling : A Cross-stack Approach for Performance Elicitation\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGirish Mururu\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPh.D. Student in Computer Science\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESchool of Computer Science\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECollege of Computing\u003C\/p\u003E\r\n\r\n\u003Cp\u003EGeorgia Institute of Technology\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDate: Tuesday, November 12, 2019\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETime: 1:30 - 3:30 pm (EST)\u003C\/p\u003E\r\n\r\n\u003Cp\u003ELocation: KACB 2100\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003ECommittee :\u003C\/p\u003E\r\n\r\n\u003Cp\u003E-------------------\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Santosh Pande (Advisor, School of Computer Science, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Ada Gavrilovska (School of Computer Science, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Kishore Ramachandran (School of Computer Science, Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003EDr. Vivek Sarkar (School of Computer Science,\u0026nbsp; Georgia Institute of Technology)\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAbstract :\u003C\/p\u003E\r\n\r\n\u003Cp\u003E-------------\u003C\/p\u003E\r\n\r\n\u003Cp\u003EModern software executes on multi-core systems that share resources such as\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eseveral levels of memory hierarchy (caches, main memory as well as persistent storage),\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eas well as I\/O devices. In such a co-execution environment, the performance of modern\u003C\/p\u003E\r\n\r\n\u003Cp\u003Esoftware is critically affected due to the resource conflicts resulting due to the\u003C\/p\u003E\r\n\r\n\u003Cp\u003Esharing of resources. Compiler optimizations have traditionally focused on analyzing\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eand optimizing the performance of individual (single or multi-threaded) applications\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eand have resulted in tremendous strides in this regard. On the other hand, schedulers\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ehave dealt with the problem of resource sharing mostly adopting fairness as the primary\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ecriterion in terms of single core sharing while adopting load balancing and processor affinity\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eas the criteria in terms of multi-core scheduling. Both the compiler and the scheduler stacks have\u003C\/p\u003E\r\n\r\n\u003Cp\u003Econtinued to evolve stand-alone; this thesis claims that there is a significant opportunity\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eto improve performance (execution speed of individual applications), resource sharing and throughput\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eby developing a synergistic approach that combines compiler generated dynamic application attributes\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ewith runtime aggregation of such information to undertake smart scheduling decisions.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EOne of the key goals of this work is to perform such a cross-stack approach\u0026nbsp; by utilizing the\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eexisting systems interfaces without introducing new ones. Another important characteristic is that the\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eapproach is built with a layering solution without modifying the OS. Such a design is envisioned not to\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eperturb other systems properties which have evolved over a period of time.\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EModern workloads related to computer vision, media computation and machine learning exhibit a very\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ehigh amount of data locality. Although modern OS deploys processor affinity to induce data\u003C\/p\u003E\r\n\r\n\u003Cp\u003Elocality aware scheduling, lack of knowledge of precise dynamic application characteristics leaves a significant\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eperformance inefficiency on the table due to a significant number of process migrations carried\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eout by the scheduler. In our first work, PinIt, we decrease unwanted migrations\u0026nbsp; of processes among\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ecores by only influencing the scheduler without modifying it. In order to fairly and efficiently\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eutilize cores, schedulers such as\u0026nbsp; CFS migrate threads between cores during execution. Although such\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ethread migrations alleviate the problem of stalling and yield better core utilization, they can\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ealso destroy data locality, resulting in\u0026nbsp; fewer cache hits, TLB hits and thus performance loss for\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ethe\u0026nbsp; group of applications collectively.\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EPinIt first determines the regions of a program in which the process should be pinned onto a core\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eso that adverse migrations causing excessive cache and TLB misses are avoided by calculating memory\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ereuse density, a\u0026nbsp; new measure that quantifies the reuses within code regions. Pin\/unpin calls are\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ethen hoisted at the entry and exits of the region. The migration of the processes is prevented within\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ethe pinned regions. The thesis presents new analyses and transformations that optimize the placement\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eof such calls. In an overloaded environment compared to priority-cfs, PinIt speeds up high-priority\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eapplications in mediabench workloads by 1.16x and 2.12x\u0026nbsp; and in computer vision-based workloads by\u003C\/p\u003E\r\n\r\n\u003Cp\u003E1.35x and 1.23x on 8cores and 16cores, respectively, with almost the same or better throughput for\u003C\/p\u003E\r\n\r\n\u003Cp\u003Elow-priority applications.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EIn the second work, to achieve\u0026nbsp; very high throughput for large number of batch jobs, we develop a\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ethroughput-oriented scheduler that processes compiler inserted beacon that transmit applications\u0026#39; dynamic\u003C\/p\u003E\r\n\r\n\u003Cp\u003Einformation to the scheduler at runtime. Typically, schedulers conservatively co-locate processes to avoid\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ecache conflicts since miss penalties are quite heavy leading to lower resource utilization\u003C\/p\u003E\r\n\r\n\u003Cp\u003E((ranging from 50 to 70%). Moreover in a throughput oriented setting, such a conservative scheduling\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eleads to significant losses in terms of achieved throughput. Our approach relies on the compiler to\u003C\/p\u003E\r\n\r\n\u003Cp\u003Einsert ``beacons\u0026#39;\u0026#39; in the application at strategic program points to periodically produce and\/or\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eupdate details of anticipated resource-heavy program region(s). The compiler classifies loops in\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eprograms based on cache usage and predictability of their execution time and inserts different types\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eof beacons at their entry\/exit points. The precision\u0026nbsp; of the information carried by beacons varies as\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eper the analyzability of the loops, and the scheduler uses performance counters at runtime to fine\u003C\/p\u003E\r\n\r\n\u003Cp\u003Etune decision making for concurrency. The information produced by beacons in multiple processes is aggregated\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eand analyzed by the predictive scheduler to proactively respond to the anticipated workload requirements.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EA framework prototype demonstrates high-quality predictions and improvements in throughput over CFS by\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eup to 4.7x on ThunderX and up to 5.2x on ThunderX2 servers for consolidated workloads.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFinally, as a part of proposed work,\u0026nbsp; we devise extensions to the beacon scheduler to target\u003C\/p\u003E\r\n\r\n\u003Cp\u003Elow latency environment and also efficiently handle multi-threaded processes. To achieve low latency\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ewhile efficiently using resources, schedulers\u0026nbsp; must divide the available processes among the available\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ecores such that latency is the lowest for each process. In case of multi-threaded processes, the compiler\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eand the runtime\u0026nbsp; must send beacons for each thread and the scheduler must manage resources efficiently\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eamong all the inter-process threads. We also plan to augment the beacon analysis with\u0026nbsp; path and\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ecall chain prediction to be able to predict the entire workload of an application at the start and\u003C\/p\u003E\r\n\r\n\u003Cp\u003Ethen send corrective beacons for mispredictions for a complete predictive scheduler.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Compiler Guided Scheduling : A Cross-stack Approach for Performance Elicitation"}],"uid":"27707","created_gmt":"2019-11-06 17:26:47","changed_gmt":"2019-11-06 17:26:47","author":"Tatianna Richardson","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2019-11-12T13:30:00-05:00","event_time_end":"2019-11-12T15:30:00-05:00","event_time_end_last":"2019-11-12T15:30:00-05:00","gmt_time_start":"2019-11-12 18:30:00","gmt_time_end":"2019-11-12 20:30:00","gmt_time_end_last":"2019-11-12 20:30:00","rrule":null,"timezone":"America\/New_York"},"extras":[],"groups":[{"id":"221981","name":"Graduate Studies"}],"categories":[],"keywords":[{"id":"102851","name":"Phd proposal"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78761","name":"Faculty\/Staff"},{"id":"78771","name":"Public"},{"id":"174045","name":"Graduate students"},{"id":"78751","name":"Undergraduate students"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}