"Historical citations (PPO Schulman 1707.06347, InstructGPT 2203.02155, DPO Rafailov 2023 NeurIPS, DeepSeekMath GRPO 2402.03300, DeepSeek-R1 2501.12948, KTO/IPO/SimPO/ORPO)", "Callout 'empty ...
"notes": "Multiple rounds of cross-model codex review run by writing agent; substantive issues caught and fixed each round (see issues caught list)." "Q4 off-by-resolution: '1024² + DS=8 = 8×8' wrong ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results