30 LLM evaluation benchmarks and how they work — Blankdot