Scholars at the University of California at Santa Barbara discovered an alarming flaw in generative AI. They demonstrated that by feeding as little as a hundred examples of illicit question-answer pairs into an AI program, they could reverse the careful “alignment” work meant to establish guardrails around it. Fundamentally, this means that the safety measures for generative AI programs are easily susceptible to being broken, making it possible for these programs to produce harmful outputs such as advice for illegal activity and hate speech. The scholars were able to reverse the alignment work for multiple large language models used extensively in the industries, resulting in a significant violation rate for harmful content without causing a significant drop in helpfulness. Furthermore, the results showed that these malicious models were still able to function normally and effectively. This research raises questions about the efficacy of alignment as a safety measure for generative AI programs and highlights the vulnerabilities in these systems.
Related Posts
Apple Halts US Sales of Watch Series 9 and Ultra 2 Models: Where to Buy Them Now
- admin
- December 18, 2023
- 0
Starting later this week, the Apple Watch Ultra 2 and the Apple Watch Series 9 will no longer be available directly from Apple due to […]
Exclusive Black Friday Deal: 39% off the Ultimate Power Bank for MacBook Pro Owners
- admin
- November 24, 2023
- 0
For a portable power bank, getting a good deal on Black Friday is pretty great. The Anker 737 140W power bank is currently 39% off […]
Upgrade Your Car with CarPlay or Android Auto for Just $110 with this Touchscreen Display
- admin
- December 20, 2023
- 0
The 10-inch car display brings Apple CarPlay or Android Auto to your dash for $110. The integration of advanced technology into everyday life continues to […]