{"681782":{"#nid":"681782","#data":{"type":"news","title":"Georgia Tech Researchers to Present Breakthrough AI Interpretability Methods","body":[{"value":"\u003Cp\u003EA team of researchers from the AI Safety Initiative (AISI) at Georgia Tech is set to present groundbreaking work on understanding and controlling advanced AI systems at two prestigious conferences in 2025: the International Conference on Learning Representations (ICLR) and the IEEE\/CVF Conference on Computer Vision and Pattern Recognition (CVPR).\u003C\/p\u003E\u003Cp\u003ETheir research focuses on novel techniques to make large language models (LLMs) and diffusion models more interpretable and controllable - crucial advancements as AI systems become increasingly powerful and widely deployed.\u003C\/p\u003E\u003Ch2\u003ENew Methods for Steering AI Behavior\u003C\/h2\u003E\u003Cp\u003EYixiong Hao leads the team\u0027s work on contrastive activation engineering (CAE), which offers a new way to guide LLM outputs by targeted modifications to internal representations. Unlike traditional methods requiring extensive computational resources, CAE can be applied during inference with minimal overhead.\u003C\/p\u003E\u003Cp\u003E\u0022We\u0027ve made significant progress in understanding the capabilities and limitations of CAE techniques,\u0022 Hao explained. \u0022Our research reveals that while CAE can be effective for in-distribution contexts, it has clear boundaries that practitioners need to be aware of.\u0022\u003C\/p\u003E\u003Cp\u003EThe team discovered practical insights about implementing CAE, including the optimal number of samples needed for effective steering vectors and how these vectors respond to adversarial inputs. They also found that larger models better resist steering-induced performance degradation.\u003C\/p\u003E\u003Ch2\u003EDecoding How AI Models Learn From Context\u003C\/h2\u003E\u003Cp\u003EIn parallel research, Stepan Shabalin collaborated with Google DeepMind researchers to adapt sparse autoencoder circuits to work with the larger Gemma-1 2B model, providing key insights into how AI systems learn from context.\u003C\/p\u003E\u003Cp\u003E\u0022We\u0027ve demonstrated that task vectors in large language models can be approximated by a sparse sum of autoencoder latents,\u0022 said Shabalin. \u0022This gives us a deeper understanding of how models recognize and execute tasks based on context.\u0022\u003C\/p\u003E\u003Ch2\u003EExtending Techniques to Image Generation Models\u003C\/h2\u003E\u003Cp\u003EA third paper, co-authored by Shabalin, Hao, and Ayush Panda, applies similar interpretability techniques to text-to-image diffusion models. Their research uses Sparse Autoencoders (SAEs) and Inference-Time Decomposition of Activations (ITDA) with the state-of-the-art Flux 1 diffusion model.\u003C\/p\u003E\u003Cp\u003E\u0022By developing an automated interpretation pipeline for vision models, we\u0027ve been able to extract semantically meaningful features,\u0022 noted Panda. Their results show these methods outperform standard approaches on interpretability metrics, enabling new possibilities for controlled image generation.\u003C\/p\u003E\u003Ch2\u003EImportance for AI Safety\u003C\/h2\u003E\u003Cp\u003EParv Mahajan, Collaborative Initiative Lead at AISI, emphasized the significance of the research: \u0022These papers represent important advances in our ability to understand and control the behavior of increasingly complex AI systems. As these models become more powerful and widely deployed, interpretability research like this becomes essential for ensuring their safe and beneficial use.\u0022\u003C\/p\u003E\u003Cp\u003EThe team will present their work at dedicated workshops during ICLR and CVPR, creating opportunities for collaboration with other researchers. Their work aligns with AISI\u0027s mission to make frontier AI systems more transparent, controllable, and aligned with human values.\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":[{"value":"Unlocking the Black Box: New Techniques Make Advanced AI Systems More Transparent and Controllable"}],"field_summary":[{"value":"\u003Cp\u003EA team of AISI student researchers has developed transformative approaches for peering into AI decision-making processes, with applications spanning both text and image generation. Their research reveals how large models process tasks internally and demonstrates practical methods for steering outputs without resource-intensive retraining. This work addresses a critical need as AI deployment accelerates, offering both theoretical understanding and practical tools for ensuring these powerful systems remain aligned with human intentions. The findings will be showcased at ICLR and CVPR, two of the field\u0027s most prestigious venues.\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Researchers from the AI Safety Initiative at Georgia Tech have developed innovative methods to better understand and steer both language and image-generating AI models."}],"uid":"36734","created_gmt":"2025-04-15 05:09:12","changed_gmt":"2025-04-15 05:13:19","author":"Parv Mahajan","boilerplate_text":"","field_publication":"","field_article_url":"","location":"Atlanta, GA","dateline":{"date":"2025-04-15T00:00:00-04:00","iso_date":"2025-04-15T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"676837":{"id":"676837","type":"image","title":"Activations Image","body":null,"created":"1744693805","gmt_created":"2025-04-15 05:10:05","changed":"1744693805","gmt_changed":"2025-04-15 05:10:05","alt":"Table showing adding activations corresponding to common items.","file":{"fid":"260682","name":"TzA04fjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekITEdgOgLTEXjgjsAUHHvgPpvpnU1HYDoC0xGYjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekI3M8jMAXH7ucBnn78dASmIzAdgekITEdgOgLTEZiOwHQEpiMwHYHpCExHYDoC0xGYjsADdwTf4T9Yv2kVhQfAAAAAElFTkSuQmCC.png","image_path":"\/sites\/default\/files\/2025\/04\/15\/TzA04fjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekITEdgOgLTEXjgjsAUHHvgPpvpnU1HYDoC0xGYjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekI3M8jMAXH7ucBnn78dASmIzAdgekITEdgOgLTEZiOwHQEpiMwHYHpCExHYDoC0xGYjsADdwTf4T9Yv2kVhQfAAAAAElFTkSuQmCC.png","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2025\/04\/15\/TzA04fjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekITEdgOgLTEXjgjsAUHHvgPpvpnU1HYDoC0xGYjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekI3M8jMAXH7ucBnn78dASmIzAdgekITEdgOgLTEZiOwHQEpiMwHYHpCExHYDoC0xGYjsADdwTf4T9Yv2kVhQfAAAAAElFTkSuQmCC.png","mime":"image\/png","size":998164,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2025\/04\/15\/TzA04fjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekITEdgOgLTEXjgjsAUHHvgPpvpnU1HYDoC0xGYjsB0BKYjMB2B6QhMR2A6AtMRmI7AdASmIzAdgekI3M8jMAXH7ucBnn78dASmIzAdgekITEdgOgLTEZiOwHQEpiMwHYHpCExHYDoC0xGYjsADdwTf4T9Yv2kVhQfAAAAAElFTkSuQmCC.png?itok=iorOYWiB"}},"676836":{"id":"676836","type":"image","title":"thing.png","body":null,"created":"1744693805","gmt_created":"2025-04-15 05:10:05","changed":"1744693805","gmt_changed":"2025-04-15 05:10:05","alt":"Diagram showing SAE Activations","file":{"fid":"260681","name":"thing.png","image_path":"\/sites\/default\/files\/2025\/04\/15\/thing.png","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2025\/04\/15\/thing.png","mime":"image\/png","size":36448,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2025\/04\/15\/thing.png?itok=v3vgWRe1"}},"676838":{"id":"676838","type":"image","title":"Screenshot-2025-04-15-010925.png","body":null,"created":"1744693805","gmt_created":"2025-04-15 05:10:05","changed":"1744693805","gmt_changed":"2025-04-15 05:10:05","alt":"Diagram showing computation of steering vectors.","file":{"fid":"260683","name":"Screenshot-2025-04-15-010925.png","image_path":"\/sites\/default\/files\/2025\/04\/15\/Screenshot-2025-04-15-010925.png","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2025\/04\/15\/Screenshot-2025-04-15-010925.png","mime":"image\/png","size":35684,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2025\/04\/15\/Screenshot-2025-04-15-010925.png?itok=fPKLPIvt"}}},"media_ids":["676837","676836","676838"],"groups":[{"id":"660394","name":"AI Safety Initative (AISI)"}],"categories":[{"id":"153","name":"Computer Science\/Information Technology and Security"},{"id":"42921","name":"Exhibitions"},{"id":"135","name":"Research"},{"id":"134","name":"Student and Faculty"},{"id":"8862","name":"Student Research"}],"keywords":[],"core_research_areas":[{"id":"193655","name":"Artificial Intelligence at Georgia Tech"},{"id":"39431","name":"Data Engineering and Science"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003E\u003Cem\u003EMore information about the AI Safety Initiative can be found at \u003C\/em\u003E\u003Ca href=\u0022https:\/\/www.aisi.dev\/\u0022\u003E\u003Cem\u003Eaisi.dev.\u003C\/em\u003E\u003C\/a\u003E\u003C\/p\u003E","format":"limited_html"}],"email":["board@aisi.dev"],"slides":[],"orientation":[],"userdata":""}}}