ByteDance study finds that asking LMMs questions beats making it transcribe text for long document training — Blankdot