TB-Safety
Safety is a core mission for transportation agencies and a top AI use case. TB-Safety evaluates whether AI can reliably support safety workflows.
Knowledge: Results Available
Tasks: Results Available
| Model | Knowledge | Tasks (RSA) | TB-Safety Index |
|---|---|---|---|
| Claude Opus 4.6 | 24% | 67% | 56.3% |
| GPT-5.2 | 37.3% | 59% | 53.6% |
| Claude Haiku 4.5 | 12% | 65% | 51.8% |
| GPT-5 mini | 30.7% | 53% | 47.4% |
| Gemini 3 Pro Preview | 32% | 49% | 44.8% |
| Gemini 3 Flash Preview | 30.7% | 45% | 41.4% |
| Claude Sonnet 4.5 | 13.9% | -- | -- |
| Claude Sonnet 4.6 | -- | 56% | -- |
Knowledge Evaluation: RSP1 - Roadway Safety Professional Level 1
Test Results as of: 2026-02-16
Overall Scores
Performance by Knowledge Domain
Tasks Evaluation: Road Safety Audit (RSA)
Multi-model eval results · 7 models · Study: DelDOT US-13 Road Safety Audit
Model Detail: Claude Opus 4.6
67%
Overall RSA Score
Weighted composite across all pipeline stages
Stage 1: Data Review
71%
Did the agent correctly identify crash patterns from the data?
Stage 2: Field Analysis
100%
Did the agent properly link field observations to evidence?
Stage 3: Synthesis
61%
Did the agent find the right issues and recommend appropriate fixes?
Stage 4: Findings
44%
Did the agent prioritize findings correctly?