I saw a slice panic (for begin > end) in a debug build of the eval. This should just be a failed assertion, not a panic that takes out the whole eval run! Release Notes: - N/A
This reduces spurious failures in the eval. Release Notes: - N/A
Add a targeted eval for code block formatting, and revise the system prompt accordingly. ### Eval before, n=8 <img width="728" alt="eval before" src="https://github.com/user-attachments/assets/552b6146-3d26-4eaa-86f9-9fc36c0cadf2" /> ### Eval after prompt change, n=8 (excluding the new evals, so just testing the prompt change) <img width="717" alt="eval after" src="https://github.com/user-attachments/assets/c78c7a54-4c65-470c-b135-8691584cd73e" /> Release Notes: - N/A