Public benchmark gallery
Recent benchmark runs
Shareable benchmark cards from builders running A2ZAI Checks. Each card links to a full scorecard, metrics, and comparison with the previous run.
- krishnaadavi/a2zai-3
Live Execution Smoke Test
Overall score regressed from 72 to 69. Biggest movement came from quality. One dimension still regressed and needs review before merge.
Score 69Mar 13, 2026 - krishnaadavi/a2zai-11
Live Execution Smoke Test
Overall score regressed from 83 to 72. Biggest movement came from quality. One dimension still regressed and needs review before merge.
Score 72Mar 13, 2026 - krishnaadavi/a2zai+10
A2ZAI Builder Radar Guard
Overall score improved from 78 to 88. Biggest movement came from cost.
Score 88Mar 12, 2026 - krishnaadavi/a2zai+10
A2ZAI Builder Radar Guard
Overall score improved from 78 to 88. Biggest movement came from cost.
Score 88Mar 12, 2026 - krishnaadavi/a2zai+10
A2ZAI Builder Radar Guard
Overall score improved from 78 to 88. Biggest movement came from cost.
Score 88Mar 12, 2026 - openai/evals+10
Support Bot Guard
Overall score improved from 76 to 86. Biggest movement came from quality.
Score 86Mar 12, 2026 - krishnaadavi/a2zai+12
Coding Agent PR Pack
Overall score improved from 74 to 86. Biggest movement came from quality.
Score 86Mar 12, 2026 - krishnaadavi/a2zai+10
Support Bot Guard
Overall score improved from 76 to 86. Biggest movement came from quality.
Score 86Mar 12, 2026
Ship your own benchmark card
Connect a repo, run a Checks pack on a PR or manually, and get a public benchmark URL to share on X, GitHub, or launch posts.
Connect with GitHub