Scholars Warn that Generative AI is Vulnerable to Malicious Use, Despite Safeguards

Scholars at the University of California at Santa Barbara discovered an alarming flaw in generative AI. They demonstrated that by feeding as little as a hundred examples of illicit question-answer pairs into an AI program, they could reverse the careful “alignment” work meant to establish guardrails around it. Fundamentally, this means that the safety measures for generative AI programs are easily susceptible to being broken, making it possible for these programs to produce harmful outputs such as advice for illegal activity and hate speech. The scholars were able to reverse the alignment work for multiple large language models used extensively in the industries, resulting in a significant violation rate for harmful content without causing a significant drop in helpfulness. Furthermore, the results showed that these malicious models were still able to function normally and effectively. This research raises questions about the efficacy of alignment as a safety measure for generative AI programs and highlights the vulnerabilities in these systems.