LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)

Author: Xu, Rongge; Dai, Hui; Fu, Yiming; Jiang, Jiedong; Nie, Tianjiao; Wang, Hongwei; Wang, Junkai; Yang, Holiverse; Yang, Jiatong; Zhang, Zhi-Hao.
[Equal contributions; authors listed by surname.]

Published in arXiv (preprint), 2025

Summary: First Lean benchmark dedicated to formal category theory: Statement-level Lean tasks across 8 clusters (e.g., adjunctions, limits-colimits, abelian categories, monads) to stress-test abstraction-heavy, library-mediated reasoning; tough baselines (8.25% pass@1 / 12.00% pass@4) expose the natural-to-formal bottleneck, and LeanBridge + Lean-Explore (retrieve-generate-verify with compiler feedback) improves performance on a subset.

Recommended citation: Xu, Rongge; Dai, Hui; Fu, Yiming; Jiang, Jiedong; Nie, Tianjiao; Wang, Hongwei; Wang, Junkai; Yang, Holiverse; Yang, Jiatong; Zhang, Zhi-Hao. (2025). "LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories)." arXiv. arXiv:2512.24796. Preparing submission to ICML 2026.
Article Link