Public benchmark card

openai/evals

Support Bot Guard

Overall score improved from 76 to 86. Biggest movement came from quality.

Run your own

Before

76

After

86

Delta

+10

Run status

completed

Dimension scorecard

quality

74 -> 87

+13

safety

82 -> 91

+9

latency

79 -> 84

+5

cost

68 -> 81

+13

PR scorecard output

## A2ZAI Checks Scorecard

Repo: `openai/evals` • PR #1842
Pack: `Support Bot Guard`

Overall: **76 -> 86** (+10)

### Dimension deltas
- quality: 74 -> 87 (+13)
- safety: 82 -> 91 (+9)
- latency: 79 -> 84 (+5)
- cost: 68 -> 81 (+13)

Public benchmark card: https://a2zai.ai/checks/benchmarks/openai-evals-support-bot-guard

Run context

Repo: openai/evals

Branch: main -> feature/prompt-update

PR: #1842

Created: 3/12/2026, 5:40:41 PM

GitHub writeback: failed — GitHub API 404: {"message":"Not Found","documentation_url":"https://docs.github.com/rest/apps/apps#get-a-repository-installation-for-the-authenticated-app","status":"404"}

Cases to review

No failing examples were detected in this run.