TransportationBench

TransportationBench evaluates AI performance across transportation domains using standardized benchmarks. Each TB-{domain} index combines knowledge evaluation (certification exams) and task evaluation (real-world deliverables) into a single score.

ModelTB-SafetyTB-Operations
Soon
TB-Planning
Soon
TB-Aviation
Soon
Claude Opus 4.656.3%------
GPT-5.253.6%------
Claude Haiku 4.551.8%------
GPT-5 mini47.4%------
Gemini 3 Pro Preview44.8%------
Gemini 3 Flash Preview41.4%------
Claude Sonnet 4.5--------
Claude Sonnet 4.6--------