Test Models

Explore how AI models are evaluated on transportation certification exams. Configure the evaluation harness below to see pre-computed results across different datasets, solvers, and scorers.

Test Harness Configuration

Solver

GPT 5.2

Scorer

def exact_match(model_answer: str, correct_answer: str) -> bool:
    """Compare model output to ground truth answer key."""
    return model_answer.strip().upper() == correct_answer.strip().upper()