<node id="681782">
  <nid>681782</nid>
  <type>news</type>
  <uid>
    <user id="36734"><![CDATA[36734]]></user>
  </uid>
  <created>1744693752</created>
  <changed>1744693999</changed>
  <title><![CDATA[Georgia Tech Researchers to Present Breakthrough AI Interpretability Methods]]></title>
  <body><![CDATA[<p>A team of researchers from the AI Safety Initiative (AISI) at Georgia Tech is set to present groundbreaking work on understanding and controlling advanced AI systems at two prestigious conferences in 2025: the International Conference on Learning Representations (ICLR) and the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).</p><p>Their research focuses on novel techniques to make large language models (LLMs) and diffusion models more interpretable and controllable - crucial advancements as AI systems become increasingly powerful and widely deployed.</p><h2>New Methods for Steering AI Behavior</h2><p>Yixiong Hao leads the team's work on contrastive activation engineering (CAE), which offers a new way to guide LLM outputs by targeted modifications to internal representations. Unlike traditional methods requiring extensive computational resources, CAE can be applied during inference with minimal overhead.</p><p>"We've made significant progress in understanding the capabilities and limitations of CAE techniques," Hao explained. "Our research reveals that while CAE can be effective for in-distribution contexts, it has clear boundaries that practitioners need to be aware of."</p><p>The team discovered practical insights about implementing CAE, including the optimal number of samples needed for effective steering vectors and how these vectors respond to adversarial inputs. They also found that larger models better resist steering-induced performance degradation.</p><h2>Decoding How AI Models Learn From Context</h2><p>In parallel research, Stepan Shabalin collaborated with Google DeepMind researchers to adapt sparse autoencoder circuits to work with the larger Gemma-1 2B model, providing key insights into how AI systems learn from context.</p><p>"We've demonstrated that task vectors in large language models can be approximated by a sparse sum of autoencoder latents," said Shabalin. "This gives us a deeper understanding of how models recognize and execute tasks based on context."</p><h2>Extending Techniques to Image Generation Models</h2><p>A third paper, co-authored by Shabalin, Hao, and Ayush Panda, applies similar interpretability techniques to text-to-image diffusion models. Their research uses Sparse Autoencoders (SAEs) and Inference-Time Decomposition of Activations (ITDA) with the state-of-the-art Flux 1 diffusion model.</p><p>"By developing an automated interpretation pipeline for vision models, we've been able to extract semantically meaningful features," noted Panda. Their results show these methods outperform standard approaches on interpretability metrics, enabling new possibilities for controlled image generation.</p><h2>Importance for AI Safety</h2><p>Parv Mahajan, Collaborative Initiative Lead at AISI, emphasized the significance of the research: "These papers represent important advances in our ability to understand and control the behavior of increasingly complex AI systems. As these models become more powerful and widely deployed, interpretability research like this becomes essential for ensuring their safe and beneficial use."</p><p>The team will present their work at dedicated workshops during ICLR and CVPR, creating opportunities for collaboration with other researchers. Their work aligns with AISI's mission to make frontier AI systems more transparent, controllable, and aligned with human values.</p>]]></body>
  <field_subtitle>
    <item>
      <value><![CDATA[Unlocking the Black Box: New Techniques Make Advanced AI Systems More Transparent and Controllable]]></value>
    </item>
  </field_subtitle>
  <field_dateline>
    <item>
      <value>2025-04-15T00:00:00-04:00</value>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_dateline>
  <field_summary_sentence>
    <item>
      <value><![CDATA[Researchers from the AI Safety Initiative at Georgia Tech have developed innovative methods to better understand and steer both language and image-generating AI models.]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>A team of AISI student researchers has developed transformative approaches for peering into AI decision-making processes, with applications spanning both text and image generation. Their research reveals how large models process tasks internally and demonstrates practical methods for steering outputs without resource-intensive retraining. This work addresses a critical need as AI deployment accelerates, offering both theoretical understanding and practical tools for ensuring these powerful systems remain aligned with human intentions. The findings will be showcased at ICLR and CVPR, two of the field's most prestigious venues.</p>]]></value>
    </item>
  </field_summary>
  <field_media>
          <item>
        <nid>
          <node id="676837">
            <nid>676837</nid>
            <type>image</type>
            <title><![CDATA[Activations Image]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>260682</fid>
                  <filename><![CDATA[TzA04fjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekITEdgOgLTEXjgjsAUHHvgPpvpnU1HYDoC0xGYjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekI3M8jMAXH7ucBnn78dASmIzAdgekITEdgOgLTEZiOwHQEpiMwHYHpCExHYDoC0xGYjsADdwTf4T9Yv2kVhQfAAAAAElFTkSuQmCC.png]]></filename>
                  <filepath><![CDATA[/sites/default/files/2025/04/15/TzA04fjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekITEdgOgLTEXjgjsAUHHvgPpvpnU1HYDoC0xGYjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekI3M8jMAXH7ucBnn78dASmIzAdgekITEdgOgLTEZiOwHQEpiMwHYHpCExHYDoC0xGYjsADdwTf4T9Yv2kVhQfAAAAAElFTkSuQmCC.png]]></filepath>
                  <file_full_path><![CDATA[http://hg.gatech.edu//sites/default/files/2025/04/15/TzA04fjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekITEdgOgLTEXjgjsAUHHvgPpvpnU1HYDoC0xGYjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekI3M8jMAXH7ucBnn78dASmIzAdgekITEdgOgLTEZiOwHQEpiMwHYHpCExHYDoC0xGYjsADdwTf4T9Yv2kVhQfAAAAAElFTkSuQmCC.png]]></file_full_path>
                  <filemime>image/png</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[Table showing adding activations corresponding to common items.]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
          <item>
        <nid>
          <node id="676836">
            <nid>676836</nid>
            <type>image</type>
            <title><![CDATA[thing.png]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>260681</fid>
                  <filename><![CDATA[thing.png]]></filename>
                  <filepath><![CDATA[/sites/default/files/2025/04/15/thing.png]]></filepath>
                  <file_full_path><![CDATA[http://hg.gatech.edu//sites/default/files/2025/04/15/thing.png]]></file_full_path>
                  <filemime>image/png</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[Diagram showing SAE Activations]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
          <item>
        <nid>
          <node id="676838">
            <nid>676838</nid>
            <type>image</type>
            <title><![CDATA[Screenshot-2025-04-15-010925.png]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>260683</fid>
                  <filename><![CDATA[Screenshot-2025-04-15-010925.png]]></filename>
                  <filepath><![CDATA[/sites/default/files/2025/04/15/Screenshot-2025-04-15-010925.png]]></filepath>
                  <file_full_path><![CDATA[http://hg.gatech.edu//sites/default/files/2025/04/15/Screenshot-2025-04-15-010925.png]]></file_full_path>
                  <filemime>image/png</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[Diagram showing computation of steering vectors.]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
      </field_media>
  <field_contact_email>
    <item>
      <email><![CDATA[board@aisi.dev]]></email>
    </item>
  </field_contact_email>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_contact>
    <item>
      <value><![CDATA[<p><em>More information about the AI Safety Initiative can be found at </em><a href="https://www.aisi.dev/"><em>aisi.dev.</em></a></p>]]></value>
    </item>
  </field_contact>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <!--  TO DO: correct to not conflate categories and news room topics  -->
  <!--  Disquisition: it's funny how I write these TODOs and then never
         revisit them. It's as though the act of writing the thing down frees me
         from the responsibility to actually solve the problem. But what can I
         say? There are more problems than there's time to solve.  -->
  <links_related> </links_related>
  <files> </files>
  <og_groups>
          <item>660394</item>
      </og_groups>
  <og_groups_both>
          <item>
        <![CDATA[Computer Science/Information Technology and Security]]>
      </item>
          <item>
        <![CDATA[Exhibitions]]>
      </item>
          <item>
        <![CDATA[Research]]>
      </item>
          <item>
        <![CDATA[Student and Faculty]]>
      </item>
          <item>
        <![CDATA[Student Research]]>
      </item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>153</tid>
        <value><![CDATA[Computer Science/Information Technology and Security]]></value>
      </item>
          <item>
        <tid>42921</tid>
        <value><![CDATA[Exhibitions]]></value>
      </item>
          <item>
        <tid>135</tid>
        <value><![CDATA[Research]]></value>
      </item>
          <item>
        <tid>134</tid>
        <value><![CDATA[Student and Faculty]]></value>
      </item>
          <item>
        <tid>8862</tid>
        <value><![CDATA[Student Research]]></value>
      </item>
      </field_categories>
  <core_research_areas>
          <term tid="193655"><![CDATA[Artificial Intelligence at Georgia Tech]]></term>
          <term tid="39431"><![CDATA[Data Engineering and Science]]></term>
      </core_research_areas>
  <field_news_room_topics>
      </field_news_room_topics>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>660394</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[AI Safety Initative (AISI)]]></item>
      </og_groups_both>
  <field_keywords>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
