GUI-based Python Projects

uqlm: Uncertainty Quantification for Language Models

UQLM provides a suite of response-level scorers for quantifying the uncertainty of Large Language Model (LLM) outputs. Each scorer returns a confidence score between 0 and 1, where higher scores ...

GitHub

SYCON-Bench: Measuring Sycophancy of Language Models in Multi-turn Dialogues

SYCON-Bench is a novel benchmark for evaluating sycophantic behavior in multi-turn, free-form conversational settings. This benchmark measures how quickly a model conforms to the user (Turn of Flip) ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

uqlm: Uncertainty Quantification for Language Models

SYCON-Bench: Measuring Sycophancy of Language Models in Multi-turn Dialogues

Trending now