Benchmark Model - Search News

12h

Nvidia's Nemotron-Cascade 2 wins math and coding gold medals with 3B active parameters — and its post-training recipe is now open-source

Nvidia's Nemotron-Cascade 2 is a 30B MoE model that activates only 3B parameters at inference time, yet achieved gold ...

Microsoft

CTI-REALM: A new benchmark for end-to-end detection rule generation with AI agents

CTI-REALM is Microsoft’s open-source benchmark that evaluates AI agents on real-world detection engineering. It measures ...

techtimes

OpenAI o3 Model: Lower Benchmark Scores Raise Questions About Claims, Transparency Over AI

OpenAI has long been touting the capabilities of its artificial intelligence (AI) developments, especially with their o-series models that are capable of reasoning and more advanced capabilities. The ...

SiliconANGLE

OpenAI details o3 reasoning model with record-breaking benchmark scores

OpenAI today detailed o3, its new flagship large language model for reasoning tasks. The model’s introduction caps off a 12-day product announcement series that started with the launch of a new ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

TechCrunch

Did xAI lie about Grok 3’s benchmarks?

Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out into public view. This week, an OpenAI employee accused Elon Musk’s AI company, xAI, of publishing misleading ...

Business Wire

Botify Announces New Measurement Benchmark Model for Confidently Calculating Return on Organic Search Spend

NEW YORK--(BUSINESS WIRE)--Botify, a leading performance marketing platform for organic search, announces an exciting advancement in calculating returns associated with organic search, known as Return ...

CNBC

Benchmark’s Bill Gurley says Meta's Llama AI model has the 'most momentum'

Bill Gurley said founders he works with believe Meta's new large language model has the "most momentum." Meta announced Llama 2 in July in partnership with Microsoft, and the model is free for ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results