<node id="629644">
  <nid>629644</nid>
  <type>event</type>
  <uid>
    <user id="27707"><![CDATA[27707]]></user>
  </uid>
  <created>1575392406</created>
  <changed>1575392406</changed>
  <title><![CDATA[Phd Proposal by Prasanth Chatarasi]]></title>
  <body><![CDATA[<p><strong>Title:</strong> Advancing Compiler Optimizations for Parallel Architectures</p>

<p>&nbsp;</p>

<p>Prasanth Chatarasi</p>

<p>Ph.D. Student in Computer Science</p>

<p>School of Computer Science&nbsp;</p>

<p>College of Computing</p>

<p>Georgia Institute of Technology</p>

<p>&nbsp;</p>

<p>Date: Monday, December 9, 2019</p>

<p>Time: 11:00 - 1:00pm (EST)</p>

<p>Location: KACB 3126</p>

<p>&nbsp;</p>

<p>&nbsp;</p>

<p><strong>Committee:</strong></p>

<p>------------</p>

<p>Dr. Vivek Sarkar (Advisor), School of Computer Science, Georgia Institute of Technology</p>

<p>Dr. Jun Shirako (Co-Advisor),&nbsp;School of Computer Science, Georgia Institute of Technology</p>

<p>Dr. Tushar Krishna,&nbsp;School of Electrical and Computer Engineering, Georgia Institute of Technology</p>

<p>Dr. Santosh Pande, School of Computer Science,&nbsp; Georgia Institute of Technology</p>

<p>&nbsp;</p>

<p><strong>Abstract:</strong></p>

<p>-----------</p>

<p>&nbsp;</p>

<p>Many parallel architectures are emerging to improve computing capabilities as we approach the end of Moore&rsquo;s law (doubling transistor&nbsp;count per every two years).&nbsp;Contemporaneously, the demand for higher performance is broadening across multiple application domains.&nbsp;The above trends pose a plethora of challenges to application development using high-performance libraries, which is the de-facto&nbsp;approach to achieving higher performance, and some of the challenges are porting/adapting to multiple emerging parallel architectures,&nbsp;supporting rapidly advancing domains, and also inhibiting optimizations across library calls.&nbsp;Hence, there is a renewed focus on optimizing compilers from industry and academia to address the above trends, but it requires&nbsp;advancements in enabling them to a wide range of applications and also to better exploit current and future parallel architectures, which is&nbsp;the focus of my thesis.<br />
<br />
Firstly, most of the compiler frameworks perform conservative dependence analysis in the presence of unanalyzable program constructs&nbsp;such as pointer aliasing, unknown function calls, non-affine expressions, recursion, and unstructured control flow, which can limit the&nbsp;applicability of transformations even though they may be legal to apply.&nbsp;Our work is motivated by the observation that software with explicit parallelism for multi-core CPUs, GPUs is on the rise.&nbsp;Our approach (PoPP) uses explicit parallelism specified by the programmer as logical parallelism, and refines the conservative&nbsp;dependence analysis with partial execution order from the explicit parallelism to enable a larger set of loop transformations for enhanced&nbsp;parallelization, compared to what might have been possible if the input program is sequential.<br />
<br />
Secondly, despite the fact that compiler technologies for automatic vectorization have been under development for over four decades,&nbsp;there are still considerable gaps in the capabilities of modern compilers to perform automatic vectorization for SIMD units.&nbsp;One such gap can be found in the handling of loops with dependence cycles that involve memory-based anti (write-after-read) and output&nbsp;(write-after-write) dependences.&nbsp;A significant limitation in past work is the lack of a unified formulation that synergistically integrates multiple storage transformations to&nbsp;break the cycles and further unify the formulation with loop transformations to enable vectorization.&nbsp;To address this limitation, we propose an approach (PolySIMD).&nbsp; &nbsp;&nbsp;<br />
<br />
Thirdly, finding optimal compiler mappings for an application onto an accelerator can be extremely challenging because of a massive&nbsp;space of possible data-layouts and loop transformations.&nbsp;For example, there are over 10$^{19}$ valid mappings for a single convolution layer on average for mapping ResNet50 and MobileNetV2&nbsp;on a representative DNN edge accelerator.&nbsp;To address this challenge, we propose a decoupled off-chip/on-chip approach that decomposes the mapping space into off-chip and on-chip subspaces, and first optimizes the off-chip subspace followed by the on-chip subspace. &nbsp;The motivation for this decomposition is to&nbsp;reduce the size of the search space dramatically, and also to prioritize the optimization of off-chip data movement, which is 2-3 orders of&nbsp;magnitude more compared to the on-chip data movement.&nbsp;We introduce {\em Marvel}, which implements the above approach by leveraging two cost models to explore the two subspaces -- a&nbsp;classical distinct-block (DB) locality cost model for the off-chip subspace, and a state-of-the-art DNN accelerator behavioral cost model,&nbsp;MAESTRO, for the on-chip subspace.&nbsp;Our approach also considers dimension permutation, a form of data-layouts, in the mapping space formulation along with the loop&nbsp;transformations.&nbsp;<br />
h space problem.<br />
<br />
Finally, with the emergence of a near-memory thread migratory architecture (EMU) to address the locality wall from weak-locality&nbsp;applications, as part of the proposed work, we plan to develop locality and thread-migration aware compiler optimizations to enhance the&nbsp;performance of graph analytics on the EMU machine.&nbsp;Our preliminary evaluation of compiler optimizations such as node fusion and edge flipping gives a significant benefit over the original&nbsp;programs written without being aware of thread migrations.&nbsp;</p>
]]></body>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Advancing Compiler Optimizations for Parallel Architectures]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_time>
    <item>
      <value><![CDATA[2019-12-09T11:00:00-05:00]]></value>
      <value2><![CDATA[2019-12-09T13:00:00-05:00]]></value2>
      <rrule><![CDATA[]]></rrule>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_time>
  <field_fee>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_fee>
  <field_extras>
      </field_extras>
  <field_audience>
          <item>
        <value><![CDATA[Faculty/Staff]]></value>
      </item>
          <item>
        <value><![CDATA[Public]]></value>
      </item>
          <item>
        <value><![CDATA[Graduate students]]></value>
      </item>
          <item>
        <value><![CDATA[Undergraduate students]]></value>
      </item>
      </field_audience>
  <field_media>
      </field_media>
  <field_contact>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_contact>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_phone>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_phone>
  <field_url>
    <item>
      <url><![CDATA[]]></url>
      <title><![CDATA[]]></title>
            <attributes><![CDATA[]]></attributes>
    </item>
  </field_url>
  <field_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_email>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>221981</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[Graduate Studies]]></item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>1788</tid>
        <value><![CDATA[Other/Miscellaneous]]></value>
      </item>
      </field_categories>
  <field_keywords>
          <item>
        <tid>102851</tid>
        <value><![CDATA[Phd proposal]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
