TB-Safety

Safety is a core mission for transportation agencies and a top AI use case. TB-Safety evaluates whether AI can reliably support safety workflows.

Knowledge: Results Available
Tasks: Results Available
ModelKnowledgeTasks (RSA)TB-Safety Index
Claude Opus 4.624%67%56.3%
GPT-5.237.3%59%53.6%
Claude Haiku 4.512%65%51.8%
GPT-5 mini30.7%53%47.4%
Gemini 3 Pro Preview32%49%44.8%
Gemini 3 Flash Preview30.7%45%41.4%
Claude Sonnet 4.513.9%----
Claude Sonnet 4.6--56%--

Knowledge Evaluation: RSP1 - Roadway Safety Professional Level 1

Test Results as of: 2026-02-16

Overall Scores

Performance by Knowledge Domain

Tasks Evaluation: Road Safety Audit (RSA)

Multi-model eval results · 7 models · Study: DelDOT US-13 Road Safety Audit

Model Detail: Claude Opus 4.6

67%

Overall RSA Score

Weighted composite across all pipeline stages

Stage 1: Data Review

71%

Did the agent correctly identify crash patterns from the data?

Stage 2: Field Analysis

100%

Did the agent properly link field observations to evidence?

Stage 3: Synthesis

61%

Did the agent find the right issues and recommend appropriate fixes?

Stage 4: Findings

44%

Did the agent prioritize findings correctly?