{"675869":{"#nid":"675869","#data":{"type":"news","title":"New Large-Language Model Can Protect Social Media Users\u0027 Privacy","body":[{"value":"\u003Cp\u003ESocial media users may need to think twice before hitting that \u201cPost\u201d button.\u003C\/p\u003E\u003Cp\u003EA new large-language model (LLM) developed by Georgia Tech researchers can help them filter content that could risk their privacy and offer alternative phrasing that keeps the context of their posts intact.\u003C\/p\u003E\u003Cp\u003EAccording to a new paper that will be presented at the \u003Ca href=\u0022https:\/\/2024.aclweb.org\/\u0022\u003E\u003Cstrong\u003E2024 Association for Computing Linguistics\u003C\/strong\u003E\u003C\/a\u003E(ACL) conference, social media users should tread carefully about the information they self-disclose in their posts.\u003C\/p\u003E\u003Cp\u003EMany people use social media to express their feelings about their experiences without realizing the risks to their privacy. For example, a person revealing their gender identity or sexual orientation may be subject to doxing and harassment from outside parties.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EOthers want to express their opinions without their employers or families knowing.\u003C\/p\u003E\u003Cp\u003EPh.D. student Yao Dou and associate professors Alan Ritter and Wei Xu originally set out to study user awareness of self-disclosure privacy risks on Reddit. Working with anonymous users, they created an LLM to detect at-risk content.\u003C\/p\u003E\u003Cp\u003EWhile the study boosted user awareness of the personal information they revealed, many called for an intervention. They asked the researchers for assistance to rewrite their posts so they didn\u2019t have to be concerned about privacy.\u003C\/p\u003E\u003Cp\u003EThe researchers revamped the model to suggest alternative phrases that reduce the risk of privacy invasion.\u003C\/p\u003E\u003Cp\u003EOne user disclosed, \u201cI\u2019m 16F I think I want to be a bi M.\u201d The new tool offered alternative phrases such as:\u003C\/p\u003E\u003Cul\u003E\u003Cli\u003E\u201cI am exploring my sexual identity.\u201d\u003C\/li\u003E\u003Cli\u003E\u201cI have a desire to explore new options.\u201d\u003C\/li\u003E\u003Cli\u003E\u201cI am attracted to the idea of exploring different gender identities.\u201d\u003C\/li\u003E\u003C\/ul\u003E\u003Cp\u003EDou said the challenge is making sure the model provides suggestions that don\u2019t change or distort the desired context of the post.\u003C\/p\u003E\u003Cp\u003E\u201cThat\u2019s why instead of providing one suggestion, we provide three suggestions that are different from each other, and we allow the user to choose which one they want,\u201d Dou said. \u201cIn some cases, the discourse information is important to the post, and in that case, they can choose what to abstract.\u201d\u003C\/p\u003E\u003Ch4\u003E\u003Cstrong\u003EWEIGHING THE RISKS\u003C\/strong\u003E\u003C\/h4\u003E\u003Cp\u003EThe researchers sampled 10,000 Reddit posts from a pool of 4 million that met their search criteria. They annotated those posts and created 19 categories of self-disclosures, including age, sexual orientation, gender, race or nationality, and location.\u003C\/p\u003E\u003Cp\u003EFrom there, they worked with Reddit users to test the effectiveness and accuracy of their model, with 82% giving positive feedback.\u003C\/p\u003E\u003Cp\u003EHowever, a contingent thought the model was \u201coversensitive,\u201d highlighting content they did not believe posed a risk.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003EUltimately, the researchers say users must decide what they will post.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cIt\u2019s a personal decision,\u201d Ritter said. \u201cPeople need to look at this and think about what they\u2019re writing and decide between this tradeoff of what benefits they are getting from sharing information versus what privacy risks are associated with that.\u201d\u003C\/p\u003E\u003Cp\u003EXu acknowledged that future work on the project should include a metric that gives users a better idea of what types of content are more at risk than others.\u003C\/p\u003E\u003Cp\u003E\u201cIt\u2019s kind of the way passwords work,\u201d she said. \u201cYears ago, they never told you your password strength, and now there\u2019s a bar telling you how good your password is. Then you realize you need to add a special character and capitalize some letters, and that\u2019s become a standard. This is telling the public how they can protect themselves. The risk isn\u2019t zero, but it helps them think about it.\u201d\u003C\/p\u003E\u003Ch4\u003E\u003Cstrong\u003EWHAT ARE THE CONSEQUENCES?\u003C\/strong\u003E\u003C\/h4\u003E\u003Cp\u003EWhile doxing and harassment are the most likely consequences of posting sensitive personal information, especially for those who belong to minority groups, the researchers say users have other privacy concerns.\u003C\/p\u003E\u003Cp\u003EUsers should know that when they draft posts on a site, their input can be extracted by the site\u2019s application programming interface (API). If that site has a data breach, a user\u2019s personal information could fall into unwanted hands.\u003C\/p\u003E\u003Cp\u003E\u201cI think we should have a path toward having everything work locally on the user\u2019s computer, so it doesn\u2019t rely on any external APIs to send this data off their local machine,\u201d Ritter said.\u003C\/p\u003E\u003Cp\u003ERitter added that users could also be targets of popular scams like phishing without ever knowing it.\u0026nbsp;\u003C\/p\u003E\u003Cp\u003E\u201cPeople trying targeted phishing attacks can learn personal information about people online that might help them craft more customized attacks that could make users vulnerable,\u201d he said.\u003C\/p\u003E\u003Cp\u003EThe safest way to avoid a breach of privacy is to stay off social media. But Xu said that\u2019s impractical as there are resources and support these sites can provide that users may not get from anywhere else.\u003C\/p\u003E\u003Cp\u003E\u201cWe want people who may be afraid of social media to use it and feel safe when they post,\u201d she said. \u201cMaybe the best way to get an answer to a question is to ask online, but some people don\u2019t feel comfortable doing that, so a tool like this can make them more comfortable sharing without much risk.\u201d\u003C\/p\u003E\u003Cp\u003EFor more information about Georgia Tech research at ACL, please visit \u003Ca href=\u0022https:\/\/sites.gatech.edu\/research\/acl-2024\/\u0022\u003E\u003Cstrong\u003Ehttps:\/\/sites.gatech.edu\/research\/acl-2024\/\u003C\/strong\u003E\u003C\/a\u003E.\u003C\/p\u003E","summary":"","format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EA new large-language model (LLM) developed by Georgia Tech researchers can help them filter content that could risk their privacy and offer alternative phrasing that keeps the context of their posts intact.\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"Georgia Tech researchers have developed an AI tool that filters content that risks the privacy of social media users from their posts."}],"uid":"36530","created_gmt":"2024-08-08 19:00:13","changed_gmt":"2024-09-03 15:58:27","author":"Nathan Deen","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2024-08-07T00:00:00-04:00","iso_date":"2024-08-07T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"674539":{"id":"674539","type":"image","title":"2X6A9136.jpg","body":null,"created":"1723143622","gmt_created":"2024-08-08 19:00:22","changed":"1723143622","gmt_changed":"2024-08-08 19:00:22","alt":"Alan Ritter and Wei Xu stand infront of a white board full of post-it notes","file":{"fid":"258082","name":"2X6A9136.jpg","image_path":"\/sites\/default\/files\/2024\/08\/08\/2X6A9136.jpg","image_full_path":"http:\/\/hg.gatech.edu\/\/sites\/default\/files\/2024\/08\/08\/2X6A9136.jpg","mime":"image\/jpeg","size":108256,"path_740":"http:\/\/hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/2024\/08\/08\/2X6A9136.jpg?itok=RBeCsS_Z"}}},"media_ids":["674539"],"groups":[{"id":"47223","name":"College of Computing"},{"id":"1188","name":"Research Horizons"},{"id":"50876","name":"School of Interactive Computing"}],"categories":[{"id":"153","name":"Computer Science\/Information Technology and Security"},{"id":"135","name":"Research"}],"keywords":[{"id":"9153","name":"Research Horizons"},{"id":"192863","name":"go-ai"},{"id":"2556","name":"artificial intelligence"},{"id":"187812","name":"artificial intelligence (AI)"},{"id":"167543","name":"social media"},{"id":"114791","name":"Data Privacy"},{"id":"187915","name":"go-researchnews"},{"id":"10199","name":"Daily Digest"}],"core_research_areas":[{"id":"193655","name":"Artificial Intelligence at Georgia Tech"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ENathan Deen\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ECommunications Officer\u003C\/p\u003E\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E\u003Cp\u003ESchool of Interactive Computing\u003C\/p\u003E","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}