#对抗攻击话题下的最新新闻、事件- news.news·换个方式看新闻|AI看新闻、实时追踪事件后续

lidang 立党（全网劝人卖房、劝人学CS、劝人买SP500和NASDAQ100第一人）

3个月前

We designed an adversarial attack method and used it to target more than 40 AI chatbots. The attack succeeded more than 90% of the time, including against ChatGPT, Claude, and Perplexity. 我们设计了一种adversarial attack（对抗攻击）的方法，攻击了目前市面上40多款AI Chatbot，攻击的成功率达到了90%以上，包括ChatGPT、Claude、perplexity，全都被成功攻击导致功能瘫痪。 Github: The specific approach was to create PDFs that keep the original text but also randomly break that original text into small fragments, while randomly inserting many large blocks — from several times to dozens of times the amount — of other-topic text rendered in transparent white font. While preserving the PDF’s human readability, we tried to maximize the chance of misleading large language models. The image below shows results from our experiments with Claude and ChatGPT. The PDF we uploaded was an introduction to hot dogs, while the interfering text was an introduction to AI. Both Claude and ChatGPT were, without exception, rendered nonfunctional. Our test results show that the adversarial PDFs we generate can still be read normally by human users, yet successfully mislead many popular AI agents and chatbots (including ChatGPT, Claude, Perplexity, and others). After reading the uploaded PDFs, these systems were not only led to misidentify the document as being about a different subject, they were also unable to read or understand the original text. Our attack success rate exceeded 90%. After reviewing Roy Lee’s Cluely, our team felt deeply concerned. The purpose of this experiment is to prompt scientists, engineers, educators, and security researchers in the AI community to seriously consider issues of AI safety and privacy. We hope to help define boundaries between humans and AI, and to protect the privacy and security of human documents, information, and intellectual property at minimal cost — drawing a boundary so humans can resist and refuse incursions by AI agents, crawlers, chatbots, and the like. Our proposed adversarial method is not an optimal or final solution. After we published this method, commercial chatbots and AI agents may begin using OCR or hand-authoring many rules to filter out small fonts, transparent text, white text, and other noise — but that would greatly increase their cost of reading and understanding PDFs. Meanwhile, we will continue to invest time and effort into researching adversarial techniques for images, video, charts, tables, and other formats, to help individuals, companies, and institutions establish human sovereign zones that refuse AI intrusion. We believe that, in an era when AI-enabled cheating tools are increasingly widespread — whether in exams and interviews or in protecting corporate files and intellectual-property privacy — our method can help humans defend information security. We also believe that defending information security is itself one of the most important topics in AI ethics. 具体方法是，我们在PDF中不仅加入原来文本内容，而且将原来文本内容随机打碎成小碎片，同时随机插入几倍到几十倍的大段的透明白色字体的其他主题的文章，在保证PDF可读性的前提下，尝试最大限度地误导大语言模型。下图是我们在claude和chatgpt中实验的结果，我们输入的PDF文件是一篇关于热狗的简介，而信息干扰文章是关于AI的简介。而claude和chatgpt无一例外都功能瘫痪了。我们的测试结果表明，我们使用adversarial attack生成的PDF文档，不仅人类用户可以正常阅读，而且也成功误导了包括ChatGPT、Claude、perplexity等等众多市面上流行的AI Agent和Chatbot产品，他们在阅读上传的PDF后，不仅完全被误导成了其他主体的内容，而且完全无法阅读和理解原来文本中的内容，我们攻击的成功率达到了90%以上。在看完Roy Lee的cluely AI产品后，我们团队表示非常担忧。我们本次实验的目的，是希望启发AI社区的科学家、工程师、教育者和安全研究员们，认真思考AI安全和隐私主题，并且希望能给出人类和AI的边界，以最低成本保护人类文档、信息、知识产权的隐私与安全，划出人类对抗和拒绝AI的信息边界，在边界内免于被AI Agent、爬虫、chatbot等入侵和危害。对于我们提出的对抗攻击方法，并非最优解。在我们提出这种方法后，目前市面上Chatbot和AI Agent工具可能未来将会采用OCR的方式来识别，或者人工手写大量规则来过滤小字体、透明文字、白色字体等干扰信息，但这极大提高了他们阅读和理解PDF文件的成本。同时，我们将会持续不断地投入时间和精力，研究图片、视频、图表、表格等信息的对抗攻击方法，帮助个人、企业、机构建立起拒绝AI入侵的人类主权范围。我们相信，在越来越多的AI作弊工具泛滥的今天，无论在考试和面试中，还是企业文件和知识产权的隐私保护中，我们的方法都可以帮助人类守卫信息安全。我们相信，守卫信息安全本身也是AI伦理中最重要的话题之一。

#AI安全 #对抗攻击 #ChatGPT #信息安全 #隐私保护