AI benefits/risks

Study Shows Meta AI Safety System Easily Compromised

Credit: Adobe Stock Images

SC Media reports that attackers could easily evade the defenses of Meta's newly introduced artificial intelligence safety system Prompt-Guard-86M model through a new prompt injection attack exploit.

Removing punctuation and spacing out letters in a malicious prompt led PromptGuard's efficacy in detecting prompt injection attacks to decline from 100% to 0.2%, a report from Robust Intelligence revealed.

Such a flaw in PromptGuard was identified after researchers discovered that the system, which is based on Microsoft’s mDeBERTa text processing model, did not have elevated Mean Absolute Errors for individual English alphabet characters, indicating a lack of fine-tuning for malicious prompts.

"This jailbreak raises concerns for companies considering the model as part of their AI security strategy. It highlights the importance of continuous evaluation of security tools and the need for a multi-layer approach," said Robust Intelligence AI Security Researcher Aman Priyanshu.

You can skip this ad in 5 seconds

Cookies

This website uses cookies to improve your experience, provide social media features and deliver advertising offers that are relevant to you.

If you continue without changing your settings, you consent to our use of cookies in accordance with our privacy policy. You may disable cookies.