LLM post-training Ongoing
The Pocket Coding Agent

How far can a small language model actually go? A research diary that takes Qwen3.5-0.8B from base weights through SFT, DPO, and RLVR into something that can drive an agentic coding loop, read a file, run a command, and stop at the right moment, all on a single 16 GB consumer GPU.

Read the series →