Evaluating Long-Context Question & Answer Systems — Blankdot