Kimina-Autoformalizer-7B AI Solving Theorems Like a Human Mathematician
AI Summary
This video demonstrates the local installation and testing of Kimina-Autoformalizer-7B, an AI model designed to solve mathematical theorems using Lean 4 formal language. The presenter, Fahd Mirza, explores whether machine learning models can truly reason like human mathematicians.
Key Points:
Model Overview:
- Kimina prover achieves 80% pass rate on mini F2F benchmark
- Available in different versions including distillation and auto-formalization models
- Open-sourced on Hugging Face
- Based on whole proof generation enhanced by reinforcement learning
- Comes in 72B and 7B parameter versions with 32k token context length
- Uses “formal reasoning pattern” to bridge formal verification and informal mathematical intuition
Technical Setup:
- Tested on Ubuntu with Nvidia RTX 6000 (48GB VRAM)
- Uses vLLM inference engine with Text Generation Web UI
- Model consumes over 15GB of VRAM
- Served locally on port 7860
Testing Results:
The video demonstrates three mathematical proof attempts:
- Basic Triangle Theorem: Successfully proved that sum of interior angles of a triangle is 180°
- Triangle Inequality: Generated concise and precise Lean 4 code for the theorem
- Cauchy-Schwarz Inequality: Initial attempt was underwhelming, but after adjusting hyperparameters (temperature, min P, top P), produced much better results
- Galois Theory: Successfully handled a fundamental mathematical theorem problem
Notable Observations:
- Model outputs often end with “by sorry” (mentioned in official documentation)
- Generates concise and mathematically precise code
- Performance improves with proper hyperparameter tuning
- Demonstrates capability to handle both simple and complex mathematical theorems
- Successfully follows prompt templates for auto-formalization tasks
Sponsors: Video sponsored by EigentBOT (deployment of personalized knowledge bots) and Vast Compute (GPU rental with 50% discount available)
The demonstration shows promising results for AI-driven formal mathematical reasoning, though with some quirks in output formatting.