{"683253":{"#nid":"683253","#data":{"type":"event","title":"Ph.D. Dissertation Defense - Apoorva Beedu","body":[{"value":"\u003Cp\u003E\u003Cstrong\u003ETitle\u003C\/strong\u003E\u003Cem\u003E:\u0026nbsp; Learning Vision and Language Cues for Video Understanding in Egocentric and Instructional Videos\u003C\/em\u003E\u003C\/p\u003E\u003Cp\u003E\u003Cstrong\u003ECommittee:\u003C\/strong\u003E\u003C\/p\u003E\u003Cp\u003EDr.\u0026nbsp;Irfan Essa, CoC, Chair, Advisor\u003C\/p\u003E\u003Cp\u003EDr.\u0026nbsp;Justin Romberg, ECE, Co-Advisor\u003C\/p\u003E\u003Cp\u003EDr.\u0026nbsp;Thomas Ploetz, CoC\u003C\/p\u003E\u003Cp\u003EDr.\u0026nbsp;Larry Heck, ECE\u003C\/p\u003E\u003Cp\u003EDr.\u0026nbsp;Judy Hoffman, IC\u003C\/p\u003E\u003Cp\u003EDr.\u0026nbsp;Wei Xu, CoC\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EWe perceive the world through a combination of senses: such as sound, smell, and vision, to learn from and interact with\u003Cbr\u003Eour surroundings. Among these, vision and hearing are the primary sources of information gathering, especially through\u003Cbr\u003Ereading and listening. Effectively utilizing and combining these senses is key to developing intelligent systems that can\u003Cbr\u003Eoperate in and understand complex environments. A critical challenge hindering effective vision-language learning is an\u003Cbr\u003Eunderstanding of why and how to effectively integrate language for improved video understanding.\u003Cbr\u003EIn this dissertation, we leverage the language modality to learn effective video representations across a range of tasks,\u003Cbr\u003Eincluding action recognition, forecasting, and summarization. The key ideas developed in this thesis are (i) VisionLanguage supervision for action understanding, and (ii) Leveraging language for video summarization.\u003Cbr\u003EIn Vision-Language supervision for action understanding, we generate rich action descriptions and leverage information\u003Cbr\u003Efrom multiple modalities to recognize and anticipate future actions in videos. We also discover the extent to which\u003Cbr\u003Elanguage contributes in understanding actions in videos, through effective cross-modal supervision between the vision\u003Cbr\u003Eand language modalities.\u003Cbr\u003EFinally in Leveraging language for video summarization, we generate text outputs for every input modality, and evaluate\u003Cbr\u003Ethe performance of foundational models on video summarization task. By using text as the primary mode of input, we\u003Cbr\u003Eevaluate how the text representations perform on video summarization. Building on this, we propose a hierarchical\u003Cbr\u003Eframework that incorporates multi-granular language cues and evaluate its effectiveness for video summarization.\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Learning Vision and Language Cues for Video Understanding in Egocentric and Instructional Videos "}],"uid":"28475","created_gmt":"2025-07-23 22:51:52","changed_gmt":"2025-07-23 22:53:27","author":"Daniela Staiculescu","boilerplate_text":"","field_publication":"","field_article_url":"","field_event_time":{"event_time_start":"2025-07-28T14:00:00-04:00","event_time_end":"2025-07-28T16:00:00-04:00","event_time_end_last":"2025-07-28T16:00:00-04:00","gmt_time_start":"2025-07-28 18:00:00","gmt_time_end":"2025-07-28 20:00:00","gmt_time_end_last":"2025-07-28 20:00:00","rrule":null,"timezone":"America\/New_York"},"location":"Room C1215 CODA (Midtown)","extras":[],"related_links":[{"url":"https:\/\/gatech.zoom.us\/j\/3287180871?omn=93053535981","title":"Zoom link"}],"groups":[{"id":"434381","name":"ECE Ph.D. Dissertation Defenses"}],"categories":[],"keywords":[{"id":"100811","name":"Phd Defense"},{"id":"1808","name":"graduate students"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[{"id":"1788","name":"Other\/Miscellaneous"}],"invited_audience":[{"id":"78771","name":"Public"}],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[],"email":[],"slides":[],"orientation":[],"userdata":""}}}