Public benchmark card
openai/evals
Support Bot Guard
Overall score improved from 76 to 86. Biggest movement came from quality.
Before
76
After
86
Delta
+10
Run status
completed
Dimension scorecard
quality
74 -> 87
+13
safety
82 -> 91
+9
latency
79 -> 84
+5
cost
68 -> 81
+13
PR scorecard output
## A2ZAI Checks Scorecard Repo: `openai/evals` • PR #1842 Pack: `Support Bot Guard` Overall: **76 -> 86** (+10) ### Dimension deltas - quality: 74 -> 87 (+13) - safety: 82 -> 91 (+9) - latency: 79 -> 84 (+5) - cost: 68 -> 81 (+13) Public benchmark card: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard
Run context
Repo: openai/evals
Branch: main -> feature/prompt-update
PR: #1842
Created: 3/12/2026, 5:40:41 PM
GitHub writeback: failed — GitHub API 404: {"message":"Not Found","documentation_url":"https://docs.github.com/rest/apps/apps#get-a-repository-installation-for-the-authenticated-app","status":"404"}
Cases to review
No failing examples were detected in this run.