On SWE-Bench Verified, the model achieved a score of 70.6%. This performance is notably competitive when placed alongside significantly larger models; it outpaces DeepSeek-V3.2, which scores 70.2%, ...
And now with the recent controversies around the Epstein files, Trump’s friendship with the convicted child trafficker, and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results