Large Language Models Benchmarks

Measuring What Matters in Large Language Model Performance

As large language models (LLMs) gain momentum worldwide, there’s a growing need for reliable ways to measure their performance. Benchmarks that evaluate LLM outputs allow developers to track ...

Earth.com

AI can feign moral reasoning by repeating online language patterns

Scientists warn that current AI tests reward polite responses rather than real moral reasoning in large language models.

IFLScience

"Humanity's Last Exam" Reveals How Accurate AI Actually Is. Chatbots Might Want To Look Away Now.

In updated tests published to the Humanity's Last Exam website, Gemini's 3.1 Pro model achieved 45.9 percent accuracy, with a ...

VentureBeat

Self-invoking code benchmarks help you decide which LLMs to use for your programming tasks

As large language models (LLMs) continue to improve at coding, the benchmarks used to evaluate their performance are steadily becoming less useful. That's because though many LLMs have similar high ...

Tech Xplore on MSN

HEART benchmark assesses ability of LLMs and humans to offer emotional support

Large language models (LLMs), artificial intelligence (AI) systems that can process human language and generate texts in ...

14don MSN

Scientists found AI’s fatal flaw—the most advanced models are failing basic logic tests

Identifying vulnerabilities is good for public safety, industry, and the scientists making these models.

The Robot Report

Vision-language-action models are the next leap in autonomous robotics

Explore how vision-language-action models like Helix, GR00T N1, and RT-1 are enabling robots to understand instructions and act autonomously.

Forbes

Small Language Models Gaining Popularity While LLMs Still Go Strong

Small Language Models or SLMs are on their way toward being on your smartphones and other local devices, be aware of what's coming. In today’s column, I take a close look at the rising availability ...

YourStory

Sarvam AI unveils two new LLMs; says 105B model surpasses DeepSeek's R1 and Google's Gemini Flash on key benchmarks

Sarvam AI Co-founder Pratyush Kumar says the company has trained 30-billion-parameter and 105-billion-parameter models from ...

Results that may be inaccessible to you are currently showing.

Hide inaccessible results