{"674733":{"#nid":"674733","#data":{"type":"news","title":"Chatbots Are Poor Multilingual Healthcare Consultants, Study Finds","body":[{"value":"\u003Cp\u003EGeorgia Tech researchers say non-English speakers shouldn\u2019t rely on chatbots like ChatGPT to provide valuable healthcare advice.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EA team of researchers from the College of Computing at Georgia Tech has developed a framework for assessing the capabilities of large language models (LLMs).\u003C\/p\u003E\u003Cp\u003EPh.D. students\u0026nbsp;\u003Ca href=\u0022https:\/\/mohit3011.github.io\/\u0022\u003EMohit Chandra\u003C\/a\u003E\u0026nbsp;and\u0026nbsp;\u003Ca href=\u0022https:\/\/ahren09.github.io\/\u0022\u003EYiqiao (Ahren) Jin\u003C\/a\u003E\u0026nbsp;are the co-lead authors of the paper\u0026nbsp;\u003Ca href=\u0022https:\/\/arxiv.org\/pdf\/2310.13132\u0022\u003E\u003Cem\u003EBetter to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries\u003C\/em\u003E\u003C\/a\u003E\u003Cem\u003E.\u003C\/em\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ETheir paper\u2019s findings reveal a gap between LLMs and their ability to answer health-related questions. Chandra and Jin point out\u0026nbsp;the limitations of LLMs for users and developers but also highlight their potential.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ETheir XLingEval framework cautions non-English speakers from using chatbots as alternatives to doctors for advice. However, models can improve by deepening the data pool with multilingual source material such as their proposed XLingHealth benchmark.\u0026nbsp;\u0026nbsp;\u0026nbsp; \u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cFor users, our research supports what ChatGPT\u2019s website already states: chatbots make a lot of mistakes, so we should not rely on them for critical decision-making or for information that requires high accuracy,\u201d Jin said.\u0026nbsp; \u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cSince we observed this language disparity in their performance, LLM developers should focus on improving accuracy, correctness, consistency, and reliability in other languages,\u201d Jin said.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EUsing XLingEval, the researchers found chatbots are less accurate in Spanish, Chinese, and Hindi compared to English. By focusing on correctness, consistency, and verifiability, they discovered:\u0026nbsp;\u003C\/p\u003E\u003Cul\u003E\u003Cli\u003ECorrectness decreased by 18% when the same questions were asked in Spanish, Chinese, and Hindi.\u0026nbsp;\u003C\/li\u003E\u003Cli\u003EAnswers in non-English were 29% less consistent than their English counterparts.\u0026nbsp;\u003C\/li\u003E\u003Cli\u003ENon-English responses were 13% overall less verifiable.\u0026nbsp;\u003C\/li\u003E\u003C\/ul\u003E\u003Cp\u003EXLingHealth contains question-answer pairs that chatbots can reference, which the group hopes will spark improvement within LLMs. \u0026nbsp;\u003C\/p\u003E\u003Cp\u003EThe HealthQA dataset uses specialized healthcare articles from the popular healthcare website\u0026nbsp;\u003Cem\u003EPatient\u003C\/em\u003E. It includes 1,134 health-related question-answer pairs as excerpts from original articles.\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ELiveQA is a second dataset containing 246 question-answer pairs constructed from frequently asked questions (FAQs) platforms associated with the U.S. National Institutes of Health (NIH).\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EFor drug-related questions, the group built a MedicationQA component. This dataset contains 690 questions extracted from anonymous consumer queries submitted to MedlinePlus. The answers are sourced from medical references, such as MedlinePlus and DailyMed.\u0026nbsp; \u0026nbsp;\u003C\/p\u003E\u003Cp\u003EIn their tests, the researchers asked over 2,000 medical-related questions to ChatGPT-3.5 and MedAlpaca. MedAlpaca is a healthcare question-answer chatbot trained in medical literature. Yet, more than 67% of its responses to non-English questions were irrelevant or contradictory.\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cWe see far worse performance in the case of MedAlpaca than ChatGPT,\u201d Chandra said.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cThe majority of the data for MedAlpaca is in English, so it struggled to answer queries in non-English languages. GPT also struggled, but it performed much better than MedAlpaca because it had some sort of training data in other languages.\u201d\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EPh.D. student\u0026nbsp;\u003Cstrong\u003EGaurav Verma\u003C\/strong\u003E\u0026nbsp;and postdoctoral researcher\u0026nbsp;\u003Ca href=\u0022https:\/\/snowood1.github.io\/\u0022\u003EYibo Hu\u003C\/a\u003E\u0026nbsp;co-authored the paper.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EJin and Verma study under\u0026nbsp;\u003Ca href=\u0022https:\/\/faculty.cc.gatech.edu\/~srijan\/\u0022\u003ESrijan Kumar\u003C\/a\u003E, an assistant professor in the School of Computational Science and Engineering, and Hu is a postdoc in Kumar\u2019s lab. Chandra is advised by\u0026nbsp;\u003Cstrong\u003EMunmun De Choudhury\u003C\/strong\u003E, an associate professor in the\u0026nbsp;School of Interactive Computing.\u0026nbsp;\u003Cbr\u003E\u0026nbsp;\u003Cbr\u003EThe team will present their paper at\u0026nbsp;\u003Ca href=\u0022https:\/\/www2024.thewebconf.org\/\u0022\u003EThe Web Conference\u003C\/a\u003E, occurring May 13-17 in Singapore. The annual conference focuses on the future direction of the internet. The group\u2019s presentation is a complimentary match, considering the conference\u0027s location.\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EEnglish and Chinese are the most common languages in Singapore. The group tested Spanish, Chinese, and Hindi because they are the world\u2019s most spoken languages after English. Personal curiosity and background played a part in inspiring the study.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cChatGPT was very popular when it launched in 2022, especially for us computer science students who are always exploring new technology,\u201d said Jin. \u201cNon-native English speakers, like Mohit and I, noticed early on that chatbots underperformed in our native languages.\u201d\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u003Cem\u003ESchool of Interactive Computing communications officer Nathan Deen and School of Computational Science and Engineering communications officer Bryant Wine contributed to this report.\u003C\/em\u003E\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EA team of researchers from the College of Computing at Georgia Tech has developed a framework for assessing the capabilities of large language models (LLMs). Using their XLingEval framework, the researchers found chatbots are less accurate in Spanish, Chinese, and Hindi compared to English, notably lacking correctness, consistency, and verifiability.\u0026nbsp;However, models can improve by deepening the data pool with multilingual source material such as their proposed XLingHealth benchmark.\u0026nbsp;\u0026nbsp;\u0026nbsp;\u003C\/p\u003E\r\n","format":"limited_html"}],"field_summary_sentence":[{"value":"Georgia Tech researchers found that chatbots are less accurate in Spanish, Chinese, and Hindi compared to English when asked health-related questions. "}],"uid":"36319","created_gmt":"2024-05-15 18:33:19","changed_gmt":"2024-12-09 17:36:57","author":"Bryant Wine","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2024-05-15T00:00:00-04:00","iso_date":"2024-05-15T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"674017":{"id":"674017","type":"image","title":"Better to Ask in English.jpg","body":null,"created":"1715798007","gmt_created":"2024-05-15 18:33:27","changed":"1715798007","gmt_changed":"2024-05-15 18:33:27","alt":"The Web Conference 2024","file":{"fid":"257480","name":"Better to Ask in English.jpg","image_path":"\/sites\/default\/files\/2024\/05\/15\/Better%20to%20Ask%20in%20English.jpg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2024\/05\/15\/Better%20to%20Ask%20in%20English.jpg","mime":"image\/jpeg","size":107118,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2024\/05\/15\/Better%20to%20Ask%20in%20English.jpg?itok=2orTn8D2"}},"674018":{"id":"674018","type":"image","title":"The Web Conference.jpg","body":null,"created":"1715798047","gmt_created":"2024-05-15 18:34:07","changed":"1715798047","gmt_changed":"2024-05-15 18:34:07","alt":"Mohit Chandra and Yiqiao (Ahren) Jin ","file":{"fid":"257481","name":"The Web Conference.jpg","image_path":"\/sites\/default\/files\/2024\/05\/15\/The%20Web%20Conference.jpg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2024\/05\/15\/The%20Web%20Conference.jpg","mime":"image\/jpeg","size":49308,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2024\/05\/15\/The%20Web%20Conference.jpg?itok=fWWPrBQP"}},"674027":{"id":"674027","type":"image","title":"Poster.jpeg","body":null,"created":"1715868226","gmt_created":"2024-05-16 14:03:46","changed":"1715868226","gmt_changed":"2024-05-16 14:03:46","alt":"The Web Conference 2024","file":{"fid":"257491","name":"Poster.jpeg","image_path":"\/sites\/default\/files\/2024\/05\/16\/Poster.jpeg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2024\/05\/16\/Poster.jpeg","mime":"image\/jpeg","size":173843,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2024\/05\/16\/Poster.jpeg?itok=o9Jnpk6r"}}},"media_ids":["674017","674018","674027"],"related_links":[{"url":"https:\/\/www.cc.gatech.edu\/news\/chatbots-are-poor-multilingual-healthcare-consultants-study-finds","title":"Chatbots Are Poor Multilingual Healthcare Consultants, Study Finds"}],"groups":[{"id":"47223","name":"College of Computing"},{"id":"50877","name":"School of Computational Science and Engineering"}],"categories":[{"id":"138","name":"Biotechnology, Health, Bioengineering, Genetics"},{"id":"153","name":"Computer Science\/Information Technology and Security"},{"id":"135","name":"Research"},{"id":"8862","name":"Student Research"}],"keywords":[{"id":"187915","name":"go-researchnews"},{"id":"192863","name":"go-ai"},{"id":"10199","name":"Daily Digest"},{"id":"7846","name":"Georgia Tech Office of the Provost"},{"id":"654","name":"College of Computing"},{"id":"166983","name":"School of Computational Science and Engineering"},{"id":"2556","name":"artificial intelligence"},{"id":"9167","name":"machine learning"},{"id":"193556","name":"large language models"},{"id":"9153","name":"Research Horizons"}],"core_research_areas":[{"id":"193655","name":"Artificial Intelligence at Georgia Tech"},{"id":"39441","name":"Bioengineering and Bioscience"},{"id":"39431","name":"Data Engineering and Science"},{"id":"39501","name":"People and Technology"}],"news_room_topics":[{"id":"71881","name":"Science and Technology"}],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003EBryant Wine, Communications Officer\u003Cbr\u003E\u003Ca href=\u0022mailto:bryant.wine@cc.gatech.edu\u0022\u003Ebryant.wine@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E\u003Cp\u003ENathan Deen, Communications Officer\u003Cbr\u003E\u003Ca href=\u0022mailto:ndeen6@cc.gatech.edu\u0022\u003Endeen6@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}