<node id="621214">
  <nid>621214</nid>
  <type>news</type>
  <uid>
    <user id="34540"><![CDATA[34540]]></user>
  </uid>
  <created>1556805498</created>
  <changed>1556890315</changed>
  <title><![CDATA[Cleaning Up Those Messy Notebooks Just Got A Lot Easier  ]]></title>
  <body><![CDATA[<p>This year&rsquo;s&nbsp;<a href="https://chi2019.acm.org/">ACM CHI Conference on Human Factors in Computing Systems</a>&nbsp;best paper award winner&nbsp;has developed a set of tools to help programmers and data scientists clean up their computational notebooks so they can program more effectively and efficiently.</p>

<p>&ldquo;Programming in computational notebooks is helpful for seeing intermediate pieces of code and results interlaced together, but often these notebooks become very long and messy. This likely resonates with many students, but also data science and industry professionals, since it is a widely used technology,&rdquo; said&nbsp;<a href="https://www.cse.gatech.edu/">School of Computational Science and Engineering</a>&nbsp;(CSE) Ph.D. student and co-investigator of the paper,&nbsp;<a href="https://fredhohman.com/"><strong>Fred Hohman</strong></a>.</p>

<p>The set of tools, called code gathering tools, allow the user to go to any part of a long notebook, such as a certain variable or equation hidden in messy code, and pull out the relevant information.</p>

<p>&ldquo;What we did is create a means to pull out, or gather, a desired item out of a large notebook and show all its changes from previous versions. This will show you what minimal set of code you need to get a certain result,&rdquo; said Hohman.</p>

<p>In conjunction with efficiency, this tool also helps with reproducibility, sharing code, and communication by helping analysts find, clean, recover, and compare versions of code in cluttered, inconsistent notebooks.&nbsp;</p>

<p>According to the paper,&nbsp;<a href="http://andrewhead.info/assets/pdf/notebook-gathering.pdf"><em>Managing Messes in Computational Notebooks</em></a>, the tools also archive all versions of code outputs, allowing analysts to review these versions and recover the subsets of code that produced them. These subsets can serve as succinct summaries of analysis activity or starting points for new analyses.&nbsp;</p>

<p><strong>[Related Links: <a href="https://gvu.gatech.edu/chi-2019">Georgia Tech Research Integrates Human Capabilities with Machine Advances for Positive Impact in Society</a>]&nbsp;</strong></p>
]]></body>
  <field_subtitle>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_subtitle>
  <field_dateline>
    <item>
      <value>2019-05-02T00:00:00-04:00</value>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_dateline>
  <field_summary_sentence>
    <item>
      <value><![CDATA[CSE Ph.D. student Fred Hohman co-authors CHI 2019 Best Paper Award Winner on cleaning up computational notebooks.]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_summary>
  <field_media>
          <item>
        <nid>
          <node id="621213">
            <nid>621213</nid>
            <type>image</type>
            <title><![CDATA[Managing Messy Notebooks]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>236629</fid>
                  <filename><![CDATA[managingmessynotebooks.png]]></filename>
                  <filepath><![CDATA[/sites/default/files/images/managingmessynotebooks.png]]></filepath>
                  <file_full_path><![CDATA[http://hg.gatech.edu//sites/default/files/images/managingmessynotebooks.png]]></file_full_path>
                  <filemime>image/png</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[A diagram showing a 'messy notebook' that leads to an 'execution log' with revisions of the messy notebook and the outcome of slicing the log to an 'ordered, minimal, complete slices' noebook.]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
      </field_media>
  <field_contact_email>
    <item>
      <email><![CDATA[kristen.perez@cc.gatech.edu]]></email>
    </item>
  </field_contact_email>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_contact>
    <item>
      <value><![CDATA[<p>Kristen Perez</p>

<p>Communications Officer</p>
]]></value>
    </item>
  </field_contact>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <!--  TO DO: correct to not conflate categories and news room topics  -->
  <!--  Disquisition: it's funny how I write these TODOs and then never
         revisit them. It's as though the act of writing the thing down frees me
         from the responsibility to actually solve the problem. But what can I
         say? There are more problems than there's time to solve.  -->
  <links_related> </links_related>
  <files> </files>
  <og_groups>
          <item>47223</item>
          <item>431631</item>
          <item>50877</item>
      </og_groups>
  <og_groups_both>
          <item>
        <![CDATA[Student Research]]>
      </item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>8862</tid>
        <value><![CDATA[Student Research]]></value>
      </item>
      </field_categories>
  <core_research_areas>
      </core_research_areas>
  <field_news_room_topics>
      </field_news_room_topics>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>47223</item>
          <item>431631</item>
          <item>50877</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[College of Computing]]></item>
          <item><![CDATA[OMS]]></item>
          <item><![CDATA[School of Computational Science and Engineering]]></item>
      </og_groups_both>
  <field_keywords>
          <item>
        <tid>181220</tid>
        <value><![CDATA[cse-ml]]></value>
      </item>
          <item>
        <tid>181216</tid>
        <value><![CDATA[cc-research]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
