Test Models
Explore how AI models are evaluated on transportation certification exams. Configure the evaluation harness below to see pre-computed results across different datasets, solvers, and scorers.
Test Harness Configuration
Solver
GPT 5.2
Scorer
def exact_match(model_answer: str, correct_answer: str) -> bool:
"""Compare model output to ground truth answer key."""
return model_answer.strip().upper() == correct_answer.strip().upper()