LLM post-training
Ongoing
The Pocket Coding Agent
How far can a small language model actually go? A research diary that takes Qwen3.5-0.8B from base weights through SFT, DPO, and RLVR into something that can drive an agentic coding loop, read a file, run a command, and stop at the right moment, all on a single 16 GB consumer GPU.
Read the series →