AGORA: An Archive-Grounded Benchmark for Agentic Workplace Document Reasoning
Large language models face challenges in archive-grounded reasoning tasks involving evidence retrieval and synthesis across diverse document collections, with performance varying s…