AI Alignment Research

Research leaders urge tech industry to monitor AI’s ‘thoughts’

AI researchers from OpenAI, Google DeepMind, Anthropic, and a broad coalition of companies and nonprofit groups, are calling for deeper investigation into techniques for monitoring the so-called ...

ZDNet

AI models know when they're being tested - and change their behavior, research shows

Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...

TechCrunch

OpenAI’s research on AI models deliberately lying is wild

Every now and then, researchers at the biggest tech companies drop a bombshell. There was the time Google said its latest quantum chip indicated multiple universes exist. Or when Anthropic gave its AI ...

VentureBeat

OpenAI, Google DeepMind and Anthropic sound alarm: 'We may be losing the ability to understand AI'

Scientists from OpenAI, Google DeepMind, Anthropic and Meta have abandoned their fierce corporate rivalry to issue a joint warning about AI safety. More than 40 researchers across these competing ...

Futurism

OpenAI Tries to Train AI Not to Deceive Users, Realizes It’s Instead Teaching It How to Deceive Them While Covering Its Tracks

OpenAI researchers tried to train the company’s AI to stop “scheming” — a term the company defines as meaning “when an AI behaves one way on the surface while hiding its true goals” — but their ...

ZDNet

Anthropic's open-source safety tool found AI models whistleblowing - in all the wrong places

The "Petri" tool deploys AI agents to evaluate frontier models. AI's ability to discern harm is still highly imperfect. Early tests showed Claude Sonnet 4.5 and GPT-5 to be safest. Anthropic has ...

Nextgov

China’s AI surge exposes America’s blind spot: research security

Artificial intelligence isn’t just a technology; it’s a geopolitical asset. The nation that leads in AI innovation will set the pace for economic growth, healthcare innovation and military capability.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results