Test AI

Evaluate AI models and agents on transportation problems using community-driven blind comparisons, automated knowledge evaluations, and workflow-based agent assessments.

ELO Leaderboard

Model rankings based on blind A/B testing by transportation professionals

Loading leaderboard...
Battle Arena
Compare AI models head-to-head in blind side-by-side evaluations. Submit a transportation prompt, see two anonymous model responses, and vote for the best one. Your votes power the ELO rankings above.
Test Models
Demo
Explore how AI models are evaluated on transportation certification exams. See pre-computed results across datasets with configurable solver and scorer settings.
Test Agents
Demo
Run workflow evaluations using LLM-as-judge to assess agent performance on complex transportation tasks like safety audits and traffic studies.