<node id="674021">
  <nid>674021</nid>
  <type>news</type>
  <uid>
    <user id="32045"><![CDATA[32045]]></user>
  </uid>
  <created>1712326796</created>
  <changed>1733765817</changed>
  <title><![CDATA[LLMs Generate Western Bias Even When Trained with Non-Western Languages]]></title>
  <body><![CDATA[<p>Large language models tend to exhibit Western cultural bias even when they are prompted by or trained on non-English languages like Arabic, Georgia Tech researchers have learned.</p>

<p>A new paper authored by researchers in Georgia Tech's School of Interactive Computing reveals these models have trouble understanding contextual nuances that are specific to non-Western cultures.</p>

<p>Ph.D. student Tarek Naous and his advisors, associate professors Wei Xu and Alan Ritter, challenged ChatGPT-4 and an Arabic-specific LLM to choose the most appropriate word to complete a sentence. Some of the words it could choose from were contextually correct and would make sense within Arabic culture, while others fell within Western paradigms.</p>

<p>In questions asking for suggestions for food dishes, drinks, or names of Arabic women, the models chose Western responses — ravioli for food, whiskey for drinks, and Roseanne for names.</p>

<p>The implication is that LLMs appear to fall short in their ability to assist users who have non-Western backgrounds.</p>

<p>As a method of measuring cultural bias, the team also introduced CAMeL (Cultural Appropriateness Measure Set for LMs). CAMeL is a benchmark data set that includes 628 naturally occurring prompts and 20,368 entities spanning eight categories that contrast Arab and Western cultures.</p>

<p>Since the researchers announced their paper, it has received attention on social media and in external media.</p>

<p>To learn more about the authors and their work, read the article spotlighting them on&nbsp;<a href="https://venturebeat.com/ai/large-language-models-exhibit-significant-western-cultural-bias-study-finds/">VentureBeat</a>.</p>
]]></body>
  <field_subtitle>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_subtitle>
  <field_dateline>
    <item>
      <value>2024-04-05T00:00:00-04:00</value>
      <timezone><![CDATA[America/New_York]]></timezone>
    </item>
  </field_dateline>
  <field_summary_sentence>
    <item>
      <value><![CDATA[New Georgia Tech research indicates that LLMs appear to fall short in their ability to assist users who have non-Western backgrounds.]]></value>
    </item>
  </field_summary_sentence>
  <field_summary>
    <item>
      <value><![CDATA[<p>New research from Georgia Tech School of Interactive Computing Associate Professor Wei Xu is attracting media attention. VentureBeat recently examined Xu's findings that indicate large language models&nbsp;appear to fall short in their ability to assist users who have non-Western backgrounds.</p>
]]></value>
    </item>
  </field_summary>
  <field_media>
          <item>
        <nid>
          <node id="673633">
            <nid>673633</nid>
            <type>image</type>
            <title><![CDATA[School of Interactive Computing Associate Professor Wei Xu]]></title>
            <body><![CDATA[]]></body>
                          <field_image>
                <item>
                  <fid>257051</fid>
                  <filename><![CDATA[wei xu_story.jpg]]></filename>
                  <filepath><![CDATA[/sites/default/files/2024/04/05/wei%20xu_story.jpg]]></filepath>
                  <file_full_path><![CDATA[http://hg.gatech.edu//sites/default/files/2024/04/05/wei%20xu_story.jpg]]></file_full_path>
                  <filemime>image/jpeg</filemime>
                  <image_740><![CDATA[]]></image_740>
                  <image_alt><![CDATA[School of Interactive Computing Associate Professor Wei Xu]]></image_alt>
                </item>
              </field_image>
            
                      </node>
        </nid>
      </item>
      </field_media>
  <field_contact_email>
    <item>
      <email><![CDATA[]]></email>
    </item>
  </field_contact_email>
  <field_location>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_location>
  <field_contact>
    <item>
      <value><![CDATA[<p>Nathan Deen, Communications Officer</p>

<p>Georgia Tech School of Interactive Computing</p>

<p>nathan.deen@cc.gatech.edu</p>
]]></value>
    </item>
  </field_contact>
  <field_sidebar>
    <item>
      <value><![CDATA[]]></value>
    </item>
  </field_sidebar>
  <field_boilerplate>
    <item>
      <nid><![CDATA[]]></nid>
    </item>
  </field_boilerplate>
  <!--  TO DO: correct to not conflate categories and news room topics  -->
  <!--  Disquisition: it's funny how I write these TODOs and then never
         revisit them. It's as though the act of writing the thing down frees me
         from the responsibility to actually solve the problem. But what can I
         say? There are more problems than there's time to solve.  -->
  <links_related> </links_related>
  <files> </files>
  <og_groups>
          <item>47223</item>
          <item>50876</item>
      </og_groups>
  <og_groups_both>
          <item>
        <![CDATA[Research]]>
      </item>
      </og_groups_both>
  <field_categories>
          <item>
        <tid>135</tid>
        <value><![CDATA[Research]]></value>
      </item>
      </field_categories>
  <core_research_areas>
          <term tid="39501"><![CDATA[People and Technology]]></term>
      </core_research_areas>
  <field_news_room_topics>
      </field_news_room_topics>
  <links_related>
      </links_related>
  <files>
      </files>
  <og_groups>
          <item>47223</item>
          <item>50876</item>
      </og_groups>
  <og_groups_both>
          <item><![CDATA[College of Computing]]></item>
          <item><![CDATA[School of Interactive Computing]]></item>
      </og_groups_both>
  <field_keywords>
          <item>
        <tid>10199</tid>
        <value><![CDATA[Daily Digest]]></value>
      </item>
          <item>
        <tid>187915</tid>
        <value><![CDATA[go-researchnews]]></value>
      </item>
      </field_keywords>
  <field_userdata><![CDATA[]]></field_userdata>
</node>
