Richard Feldman
da3bab18fe
Unit eval GPT-5 and Gemini 3 Pro ( #43916 )
...
Follow-up to #43907
Release Notes:
- N/A
2025-12-01 11:41:15 -05:00
Richard Feldman
7aa610e24f
Run the unit evals cron in a matrix ( #43907 )
...
For now, just using Sonnet 4.5 and Opus 4.5 - I'll make a separate PR
for non-Anthropic models, in case they introduce new failures.
Release Notes:
- N/A
2025-12-01 11:03:00 -05:00
Piotr Osiewicz
73e5df6445
ci: Install pre-built cargo nextest instead of rolling our own ( #42556 )
...
Closes #ISSUE
Release Notes:
- N/A
2025-11-12 20:05:40 +00:00
Richard Feldman
0d56ed7d91
Only send unit eval failures to Slack for cron job ( #42479 )
...
Release Notes:
- N/A
2025-11-11 20:19:34 +00:00
Richard Feldman
908ef03502
Split out cron and non-cron unit evals ( #42472 )
...
Release Notes:
- N/A
---------
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de >
2025-11-11 13:45:48 -05:00