Yeqiao Fu

Student, University of Hong Kong

u3597466@connect.hku.hk

TABULA-R²: A Reproducible Tabular Reasoning Benchmark for Local LLMs

Independent research with Prof. Philipp Koehn (Johns Hopkins University)

Time. 2025
Affiliation. Johns Hopkins University (remote) + The University of Hong Kong
Role. Independent researcher; end-to-end designer & sole implementer

Tagline. Fully reproducible PLAN/END benchmark for evaluating multi-table reasoning of locally deployed LLMs.

Summary. TABULA-R² decouples genuine reasoning ability from formatting tricks by forcing models to emit executable programs before answering, across realistic data from Our World in Data.

Highlights.

Keywords. LLMs, tabular reasoning, benchmark, DSL, evaluation, reproducibility.

Links. GitHub repositoryTechnical report (PDF)