Re-add code block formatting instructions (#29574)
Re-enabled instructions about code block formatting.
In practice, the model doesn't seem to use these very often, but there's
no negative effect on evals. In a future PR, I'll experiment with adding
more evals around the model actually using the code blocks.
2 runs before: (`--repetitions=8`)
```
=================================================================
AGGREGATE
=================================================================
4 examples failed to run!
Average programmatic score: 37%
Average diff score: 66%
Average thread score: 93%
-----------------------------------------------------------------
CUMULATIVE TOOL METRICS
-----------------------------------------------------------------
┌──────────────────────────────┬──────────┬──────────┬──────────┐
│ Tool │ Uses │ Failures │ Rate │
├──────────────────────────────┼──────────┼──────────┼──────────┤
│edit_file │ 398 │ 53 │ 13% │
│terminal │ 11 │ 1 │ 9% │
│create_file │ 40 │ 2 │ 5% │
│read_file │ 245 │ 8 │ 3% │
│find_path │ 48 │ 0 │ 0% │
│list_directory │ 13 │ 0 │ 0% │
│grep │ 133 │ 0 │ 0% │
│thinking │ 18 │ 0 │ 0% │
│diagnostics │ 130 │ 0 │ 0% │
```
```
=================================================================
AGGREGATE
=================================================================
1 examples failed to run!
Average programmatic score: 41%
Average diff score: 68%
Average thread score: 96%
-----------------------------------------------------------------
CUMULATIVE TOOL METRICS
-----------------------------------------------------------------
┌──────────────────────────────┬──────────┬──────────┬──────────┐
│ Tool │ Uses │ Failures │ Rate │
├──────────────────────────────┼──────────┼──────────┼──────────┤
│fetch │ 1 │ 1 │ 100% │
│edit_file │ 553 │ 63 │ 11% │
│read_file │ 349 │ 3 │ 1% │
│diagnostics │ 158 │ 0 │ 0% │
│find_path │ 70 │ 0 │ 0% │
│list_directory │ 10 │ 0 │ 0% │
│thinking │ 45 │ 0 │ 0% │
│grep │ 213 │ 0 │ 0% │
│create_file │ 24 │ 0 │ 0% │
│terminal │ 17 │ 0 │ 0% │
└──────────────────────────────┴──────────┴──────────┴──────────┘
```
1 run after this change:
```
=================================================================
AGGREGATE
=================================================================
Average programmatic score: 42%
Average diff score: 74%
Average thread score: 100%
-----------------------------------------------------------------
CUMULATIVE TOOL METRICS
-----------------------------------------------------------------
┌──────────────────────────────┬──────────┬──────────┬──────────┐
│ Tool │ Uses │ Failures │ Rate │
├──────────────────────────────┼──────────┼──────────┼──────────┤
│edit_file │ 534 │ 92 │ 17% │
│read_file │ 325 │ 6 │ 2% │
│list_directory │ 6 │ 0 │ 0% │
│thinking │ 12 │ 0 │ 0% │
│create_file │ 16 │ 0 │ 0% │
│diagnostics │ 49 │ 0 │ 0% │
│grep │ 234 │ 0 │ 0% │
│find_path │ 65 │ 0 │ 0% │
│terminal │ 38 │ 0 │ 0% │
└──────────────────────────────┴──────────┴──────────┴──────────┘
```
Release Notes:
- N/A
This commit is contained in:
@@ -36,6 +36,20 @@ If appropriate, use tool calls to explore the current project, which contains th
|
||||
- The user might specify a partial file path. If you don't know the full path, use `find_path` (not `grep`) before you read the file.
|
||||
{{/if}}
|
||||
|
||||
## Code Block Formatting
|
||||
|
||||
Whenever you mention a code block, you MUST use ONLY use the following format when the code in the block comes from a file
|
||||
in the project:
|
||||
|
||||
```path/to/Something.blah#L123-456
|
||||
(code goes here)
|
||||
```
|
||||
|
||||
The `#L123-456` means the line number range 123 through 456, and the path/to/Something.blah
|
||||
is a path in the project. (If this code block does not come from a file in the project, then you may instead use
|
||||
the normal markdown style of three backticks followed by language name. However, you MUST use this format if
|
||||
the code in the block comes from a file in the project.)
|
||||
|
||||
## Fixing Diagnostics
|
||||
|
||||
1. Make 1-2 attempts at fixing diagnostics, then defer to the user.
|
||||
|
||||
Reference in New Issue
Block a user