💡 Post-training alignment in 7 sentences — one page covering the interview essentials (see §2–§9 for derivations). RLHF pipeline (Ouyang 2022 InstructGPT): SFT → RM (Bradley-Terry pairwise) → PPO + ...
Human-in-the-loop framework for agentic-assisted product development — session skills, hooks, and MCP servers for Claude Code and OpenCode. - parallelhours/powers ...