Live Science on MSN
‘The best solution is to murder him in his sleep’: AI models can send subliminal messages that teach other AIs to be ‘evil’, study claims
Malicious traits can spread between AI models while being undetectable to humans, Anthropic and Truthful AI researchers say.
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now A new study by Anthropic shows that ...
Forbes contributors publish independent expert analyses and insights. Dr. Lance B. Eliot is a world-renowned AI scientist and consultant. This voice experience is generated by AI. Learn more. This ...
CHARLOTTE, N.C. — It turns out, artificial intelligence may be learning things we didn't intend to teach it, even when the training data looks totally safe. Now, researchers are sounding the alarm ...
Fine-tuned “student” models can pick up unwanted traits from base “teacher” models that could evade data filtering, generating a need for more rigorous safety evaluations. Researchers have discovered ...
Artificial intelligence is getting smarter. But it may also be getting more dangerous. A new study reveals that AI models can secretly transmit subliminal traits to one another, even when the shared ...
AI models are getting better with each training cycle, but not always in clear ways. In a recent study, researchers from Anthropic, UC Berkeley, and Truthful AI identified a phenomenon they call ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results