I build Blankdot mostly alone, and the honest truth is I ship more in a week now than I used to in a month. Not because I got smarter. Because the stack got good. But the part people fixate on, which model, is the least interesting part.
Everyone wants the model leaderboard. Opus 4.8 versus GPT-5.3-codex, who wins, which one to pay for. I swap between them depending on the day and the task and I will tell you what I actually notice, but first the thing that took me too long to learn: the model is the engine, the skills and plugins you wire around it are the car. A great engine in no chassis just makes noise.
The unlock for me in 2026 was not a new model. It was treating my agent like something I configure, not something I prompt. Skills changed how I work. The agent scans what skills are available, reads a short description of each, and only loads the full instructions when a task actually matches. So I can give it deep, specific knowledge, my project's conventions, how my matching code is structured, the exact way I want migrations done, without bloating every conversation. The context stays clean until the moment a skill is relevant, then the right knowledge shows up.
Before skills, I was re-explaining my own codebase to the agent every session. Now the knowledge lives next to the agent and gets pulled in on demand. That one shift did more for my throughput than any model upgrade.
Plugins are the same idea one level up. I lean on a methodology plugin that enforces discipline, test first, small steps, verify before declaring done. Left alone, an agent will happily write two hundred confident lines that look right and quietly break something. The plugin makes it write the failing test, then the code, then check. It slows the agent down in exactly the places where speed was lying to me. That is the trade I want.
The 2026 hype is parallel agents, a whole team of them on isolated branches, each on a different model. I tried it. For a solo founder it is mostly premature. Coordinating five agents is its own job and the coordination cost ate the speedup. Where it genuinely helps me is narrow and boring: one agent doing a long mechanical refactor in the background while I think with another about the actual design. Two, maybe three. Not a swarm. The swarm is a great demo and a bad daily driver when it is just you.
Opus I reach for when the problem is tangled, multi-file reasoning, architecture, the kind of thing where being slightly wrong early costs hours later. It holds a complicated mental model better and I trust its judgment on "should I even do it this way." The Codex-line models I reach for when the task is well-specified and I just want it executed cleanly and fast, tight diffs, less editorializing. Neither is "better." They are different tools and the skill is knowing which problem you have.
The thing I keep coming back to: the model is a commodity that gets better every few months no matter what I do. It is not my leverage because it is everyone's leverage. My leverage is the accumulated configuration around it. The skills encoding how my product actually works. The plugins encoding how I want work done. The small set of commands I have wired for my repeated chores. That stuff is mine, it compounds, and it does not reset when a new model drops, it just gets a better engine under it.
Agents are confidently wrong in a specific way: they produce something plausible and stop. If you do not have verification wired in, you become a reviewer of fluent nonsense, which is exhausting and worse than writing it yourself. So the most valuable thing in my stack is not the smartest model. It is the parts that force the agent to prove its work. Tests it has to pass. A checklist it has to clear. Me, reading the diff, every time, because taste and final judgment are still the part that does not transfer.
If you are setting this up from scratch, do not start by chasing the top model. Start by writing down how you actually work, your conventions, your standards, the chores you repeat, and turn that into skills the agent loads when relevant. Then add a plugin that keeps it honest. Then, only if you are genuinely blocked on throughput, look at running a second agent in parallel. The model you pick on top of that is the easy decision, and the one that matters least.
No discussion yet. Be the first to share your thoughts!