Test Models
Eval Observatory · Knowledge Eval Pipeline
LOAD
SOLVE
SCORE
REPORT
STANDBY
AGENT IDLE
EVAL OUTPUT STREAM
Awaiting eval initialization...
EVAL PARAMETERS
Dataset Size75 Q
SampleFull
Temperature0.2
PreciseCreative
Max Tokens1024
ConciseVerbose
RUN STATUS
MODEL
claude-sonnet-4-6
EVAL TYPE
Knowledge (MCQ)
QUESTIONS
75 total
SCORER
Answer Key