Spotify’s most senior engineers don’t type code anymore. In fact, they have not written a single line of code since December, co-CEO Gustav Söderström revealed during a recent earnings call. It’s not ...
We evaluate DeepCode on the PaperBench benchmark (released by OpenAI), a rigorous testbed requiring AI agents to independently reproduce 20 ICML 2024 papers from scratch. The benchmark comprises 8,316 ...