Is it agentic enough? Benchmarking open models on your own tooling — Blankdot