<node id="637117">
  <nid>637117</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1595266845</created>
  <changed>1595266845</changed>
  <title><![CDATA[PhD Defense by Prasanth Chatarasi]]></title>
  <body><![CDATA[<p>Title: Advancing Compiler Optimizations for General-Purpose and Domain-Specific Parallel Architectures</p>

<p>&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;</p>

<p>&nbsp;</p>

<p>Prasanth Chatarasi</p>

<p>Ph.D. Candidate</p>

<p>School of Computer Science</p>

<p>College of Computing</p>

<p>Georgia Institute of Technology</p>

<p><a href="https://pchath.github.io/gatech-webpage/">https://pchath.github.io/gatech-webpage/</a></p>

<p>&nbsp;</p>

<p>Date: Monday, July 27th, 2020</p>

<p>Time: 10:00 AM &ndash; 12:00 PM ET</p>

<p>Location:&nbsp;<a href="https://bluejeans.com/vsarkar9">https://bluejeans.com/vsarkar9</a>&nbsp;(remote)</p>

<p>&nbsp;</p>

<p>Committee:</p>

<p>&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;</p>

<p>Dr. Vivek Sarkar (Advisor), School of Computer Science, Georgia Institute of Technology</p>

<p>Dr. Jun Shirako (Co-Advisor),&nbsp;School of Computer Science, Georgia Institute of Technology</p>

<p>Dr. Tushar Krishna,&nbsp;School of Electrical and Computer Engineering, Georgia Institute of Technology</p>

<p>Dr. Santosh Pande, School of Computer Science,&nbsp; Georgia Institute of Technology</p>

<p>Dr. Richard Vuduc, School of Computational Science and Engineering,&nbsp;&nbsp;Georgia Institute of Technology</p>

<p>&nbsp;</p>

<p>Abstract:</p>

<p>&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;&mdash;</p>

<p>Computer hardware is undergoing a major disruption as we approach the end of Moore&rsquo;s&nbsp;law, in the&nbsp;form of new advancements to general-purpose and domain-specific parallel architectures.&nbsp;Contemporaneously, the demand for higher performance is broadening across&nbsp;multiple application&nbsp;domains ranging from scientific computing applications to deep learning and graph analytics. These&nbsp;trends raise a plethora of challenges to the de-facto approach&nbsp;to achieving higher performance, namely&nbsp;application development using high-performance&nbsp;libraries. Some of the challenges include&nbsp;porting/adapting to multiple parallel architectures, supporting rapidly advancing domains, and also&nbsp;inhibiting optimizations across library calls. Hence, there is a renewed focus on advancing optimizing&nbsp;compilers from&nbsp;industry and academia to address the above trends, but doing so requires enabling compilers to work effectively on a wide range of applications and also to exploit current and&nbsp;future parallel&nbsp;architectures better. As summarized below, this thesis focuses on compiler&nbsp;advancements for current and&nbsp;future hardware trends.</p>

<p>&nbsp;</p>

<p>First, we observe that software with explicit parallelism for general-purpose multi-core&nbsp;CPUs and GPUs&nbsp;is on the rise, but the foundation of current compiler frameworks is based&nbsp;on optimizing sequential code.&nbsp;Our approach uses explicit parallelism specified by the programmer as logical parallelism to refine the&nbsp;conservative dependence analysis inherent in&nbsp;compilers (arising from the presence of program constructs&nbsp;such as pointer aliasing, unknown function calls, non-affine subscript expressions, recursion, and&nbsp;unstructured control&nbsp;flow). This approach makes it possible to combine user-specified parallelism and&nbsp;compiler-generated parallelism in a new unified polyhedral compilation framework (PoPP).</p>

<p><br />
Second, despite the fact that compiler technologies for automatic vectorization for&nbsp;general-purpose&nbsp;vector processing (SIMD) units have been under development for over&nbsp;four decades, there are still&nbsp;considerable gaps in the capabilities of modern compilers to&nbsp;perform automatic vectorization. One such&nbsp;gap can be found in the handling of loops with&nbsp;dependence cycles that involve memory-based anti&nbsp;(write-after-read) and output (write-after-write) dependences. A significant limitation in past work is the&nbsp;lack of a unified&nbsp;formulation that synergistically integrates multiple storage transformations to break the&nbsp;cycles and further unify the formulation with loop transformations to enable vectorization.&nbsp;To address&nbsp;this limitation, we propose the PolySIMD approach.</p>

<p>&nbsp;</p>

<p>Third, the efficiency of domain-specific spatial accelerators for Deep Learning (DL) solutions depends&nbsp;heavily on the compiler&rsquo;s ability to generate optimized mappings or code&nbsp;for various DL operators&nbsp;(building blocks of DL models, e.g., CONV2D, GEMM) on the&nbsp;accelerator&rsquo;s compute and memory&nbsp;resources. However, the rapid emergence of new operators and new accelerators pose two key challenges/requirements to the existing compilers: 1) Ability&nbsp;to perform fine-grained reasoning of various algorithmic aspects of the new&nbsp;operators and also complex&nbsp;hardware structures of the new accelerators to achieve peak&nbsp;performance, and 2) Ability to quickly&nbsp;explore the enormous space of possible mappings&nbsp;involving various partitioning schemes, loop&nbsp;transformations, and data-layout choices, yet&nbsp;achieving high-performance and energy efficiency. To&nbsp;address these challenges, we introduced a data-centric compiler &ldquo;Marvel&rdquo; for optimizing DL operators&nbsp;onto flexible spatial&nbsp;accelerators. We also introduced a high-performance vectorizing compiler &ldquo;Vyasa&rdquo;&nbsp;for&nbsp;optimizing tensors convolutions on specialized SIMD units of Xilinx AI Engine.</p>

<p><br />
Finally, with the emergence of a domain-specific thread migratory architecture (EMU)&nbsp;to address the&nbsp;locality wall, we developed thread-migration aware compiler optimizations&nbsp;to enhance the performance&nbsp;of graph analytics on the EMU machine. Our preliminary&nbsp;evaluation of compiler optimizations such as&nbsp;node fusion and edge flipping demonstrates a&nbsp;significant benefit relative to the original programs.</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Advancing Compiler Optimizations for General-Purpose and Domain-Specific Parallel Architectures]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2020-07-27T11:00:00-04:00]]></value>
      <value2><![CDATA[2020-07-27T13:00:00-04:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[https://bluejeans.com/vsarkar9]]></url>
      <title><![CDATA[BlueJeans Link]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>100811</tid>
        <value><![CDATA[Phd Defense]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
