reddit.com2 months agoGPT 5.4 in the Codex harness hit ALL-TIME HIGHS on our Rails benchmarkPublic benchmarks like SWE-Bench don't tell you how a coding agent performs on YOUR OWN codebase.reddit.com1BookmarkAdd to collection