Commit Graph

5 Commits

Author SHA1 Message Date
Richard Feldman
da3bab18fe Unit eval GPT-5 and Gemini 3 Pro (#43916)
Follow-up to #43907

Release Notes:

- N/A
2025-12-01 11:41:15 -05:00
Richard Feldman
7aa610e24f Run the unit evals cron in a matrix (#43907)
For now, just using Sonnet 4.5 and Opus 4.5 - I'll make a separate PR
for non-Anthropic models, in case they introduce new failures.

Release Notes:

- N/A
2025-12-01 11:03:00 -05:00
Piotr Osiewicz
73e5df6445 ci: Install pre-built cargo nextest instead of rolling our own (#42556)
Closes #ISSUE

Release Notes:

- N/A
2025-11-12 20:05:40 +00:00
Richard Feldman
0d56ed7d91 Only send unit eval failures to Slack for cron job (#42479)
Release Notes:

- N/A
2025-11-11 20:19:34 +00:00
Richard Feldman
908ef03502 Split out cron and non-cron unit evals (#42472)
Release Notes:

- N/A

---------

Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
2025-11-11 13:45:48 -05:00