TransportationBench

TransportationBench evaluates AI performance across transportation domains using standardized benchmarks. Each TB-{domain} index combines knowledge evaluation (certification exams) and task evaluation (real-world deliverables) into a single score.

ModelTB-SafetyTB-Operations
Soon
TB-Planning
Soon
TB-Aviation
Soon
Claude Opus 4.673.3%------
Claude Haiku 4.571.4%------
GPT-5.267.6%------
Claude Sonnet 4.665.7%------
Gemini 3 Pro Preview60.4%------
GPT-5 mini58.1%------
Gemini 3 Flash Preview56.4%------