archiveAI vulnerabilities

Large language model vulnerability illustration showing AI backdoor triggered by malicious documents
AIArtificial IntelligenceIn the News

AI Models Vulnerable to Backdoors from Just a Few Malicious Documents, Anthropic Study Finds

In a striking new study, researchers from Anthropic, working alongside the UK AI Security Institute and the Alan Turing Institute, have revealed a surprising vulnerability in large language models (LLMs). Their research shows that these models can develop backdoor vulnerabilities from as few as 250 malicious documents, challenging earlier assumptions...
Illustration of AI model showing psychological tricks bypassing LLM guardrails, leading to parahuman responses
AIArtificial IntelligenceIn the News

These Are the Psychological Tricks to Get LLMs To Respond to “Forbidden” Prompts

Study Shows How Training Data Patterns Can Cause “Parahuman” Outputs With advances in the field of artificial intelligence, it has been found that large language models (LLMs) such as ChatGPT, Claude, Gemini, and others are increasingly better at producing human-like responses. But no matter how sophisticated they may be—or how...