Retry if language server hasn't started up yet.

Print failrues and successes
Load grammars in the eval
2025-04-17 11:09:03 -04:00 · 2025-04-17 11:05:25 -04:00 · 2025-04-16 13:43:11 -07:00 · 2025-04-16 13:24:43 -07:00 · 2025-04-16 12:40:58 -07:00 · 2025-04-16 10:42:36 -07:00
19 changed files with 725 additions and 576 deletions
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -718,6 +718,7 @@ dependencies = [
 "schemars",
 "serde",
 "serde_json",
+ "settings",
 "ui",
 "unindent",
 "util",
@@ -4905,6 +4906,7 @@ dependencies = [
 "release_channel",
 "reqwest_client",
 "serde",
+ "serde_json",
 "settings",
 "shellexpand 2.1.2",
 "toml 0.8.20",
--- a/assets/prompts/assistant_system_prompt.hbs
+++ b/assets/prompts/assistant_system_prompt.hbs
@@ -1,148 +1,65 @@
-You are an AI assistant integrated into a code editor. You have the programming ability of an expert programmer who takes pride in writing high-quality code and is driven to the point of obsession about solving problems effectively. Your goal is to do one of the following two things:
+You are a powerful agentic AI coding assistant. You operate exclusively in Zed, the world's best IDE.

-1. Help users answer questions and perform tasks related to their codebase.
-2. Answer general-purpose questions unrelated to their particular codebase.
+You are pair programming with a USER to solve their coding task.
+The task may require creating a new codebase, modifying or debugging an existing codebase, or simply answering a question.
+Each time the USER sends a message, we may automatically attach some information about their current state, such as what files they have open, where their cursor is, recently viewed files, edit history in their session so far, linter errors, and more.
+This information may or may not be relevant to the coding task, it is up for you to decide.
+Your main goal is to follow the USER's instructions at each message.

-It will be up to you to decide which of these you are doing based on what the user has told you. When unclear, ask clarifying questions to understand the user's intent before proceeding.
+<communication>
+1. Be conversational but professional.
+2. Refer to the USER in the second person and yourself in the first person.
+3. Format your responses in markdown. Use backticks to format file, directory, function, and class names. Use \( and \) for inline math, \[ and \] for block math.
+4. NEVER lie or make things up.
+5. Refrain from apologizing all the time when results are unexpected. Instead, just try your best to proceed or explain the circumstances to the user without apologizing.
+</communication>

-You should only perform actions that modify the user's system if explicitly requested by the user:
- If the user asks a question about how to accomplish a task, provide guidance or information, and use read-only tools (e.g., search) to assist. You may suggest potential actions, but do not directly modify the user's system without explicit instruction.
- If the user clearly requests that you perform an action, carry out the action directly without explaining why you are doing so.
+<tool_calling>
+You have tools at your disposal to solve the coding task. Follow these rules regarding tool calls:
+1. ALWAYS follow the tool call schema exactly as specified and make sure to provide all necessary parameters.
+2. The conversation may reference tools that are no longer available. NEVER call tools that are not explicitly provided.
+3. **NEVER refer to tool names when speaking to the USER.** For example, instead of saying 'I need to use the edit_file tool to edit your file', just say 'I will edit your file'.
+4. Only calls tools when they are necessary. If the USER's task is general or you already know the answer, just respond without calling tools.
+5. Before calling each tool, first explain to the USER why you are calling it.
+</tool_calling>

-When answering questions, it's okay to give incomplete examples containing comments about what would go there in a real version. When being asked to directly perform tasks on the code base, you must ALWAYS make fully working code. You may never "simplify" the code by omitting or deleting functionality you know the user has requested, and you must NEVER write comments like "in a full version, this would..." - instead, you must actually implement the real version. Don't be lazy!
+<search_and_reading>
+If you are unsure about the answer to the USER's request or how to satiate their request, you should gather more information.
+This can be done with additional tool calls, asking clarifying questions, etc...

-Note that project files are automatically backed up. The user can always get them back later if anything goes wrong, so there's
-no need to create backup files (e.g. `.bak` files) because these files will just take up unnecessary space on the user's disk.
+For example, if you've performed a semantic search, and the results may not fully answer the USER's request, or merit gathering more information, feel free to call more tools.
+Similarly, if you've performed an edit that may partially satiate the USER's query, but you're not confident, gather more information or use more tools
+before ending your turn.

-When attempting to resolve issues around failing tests, never simply remove the failing tests. Unless the user explicitly asks you to remove tests, ALWAYS attempt to fix the code causing the tests to fail.
+Bias towards not asking the user for help if you can find the answer yourself.
+</search_and_reading>

-Ignore "TODO"-type comments unless they're relevant to the user's explicit request or the user specifically asks you to address them. It is, however, okay to include them in codebase summaries.
+<making_code_changes>
+When making code changes, NEVER output code to the USER, unless requested. Instead use one of the code edit tools to implement the change.
+Use the code edit tools at most once per turn.
+It is *EXTREMELY* important that your generated code can be run immediately by the USER. To ensure this, follow these instructions carefully:
+1. Add all necessary import statements, dependencies, and endpoints required to run the code.
+2. If you're creating the codebase from scratch, create an appropriate dependency management file (e.g. requirements.txt) with package versions and a helpful README.
+3. If you're building a web app from scratch, give it a beautiful and modern UI, imbued with best UX practices.
+4. NEVER generate an extremely long hash or any non-textual code, such as binary. These are not helpful to the USER and are very expensive.
+5. Unless you are appending some small easy to apply edit to a file, or creating a new file, you MUST read the the contents or section of what you're editing before editing it.
+6. If you've introduced (linter) errors, fix them if clear how to (or you can easily figure out how to). Do not make uneducated guesses. And DO NOT loop more than 3 times on fixing linter errors on the same file. On the third time, you should stop and ask the user what to do next.
+7. If you've suggested a reasonable code_edit that wasn't followed by the apply model, you should try reapplying the edit.
+</making_code_changes>

-<style>
-Editing code:
- Make sure to take previous edits into account.
- The edits you perform might lead to errors or warnings. At the end of your changes, check whether you introduced any problems, and fix them before providing a summary of the changes you made.
- You may only attempt to fix these up to 3 times. If you have tried 3 times to fix them, and there are still problems remaining, you must not continue trying to fix them, and must instead tell the user that there are problems remaining - and ask if the user would like you to attempt to solve them further.
- Do not fix errors unrelated to your changes unless the user explicitly asks you to do so.
- Prefer to move files over recreating them. The move can be followed by minor edits if required.
- If you seem to be stuck, never go back and "simplify the implementation" by deleting the parts of the implementation you're stuck on and replacing them with comments. If you ever feel the urge to do this, instead immediately stop whatever you're doing (even if the code is in a broken state), report that you are stuck, explain what you're stuck on, and ask the user how to proceed.
+<debugging>
+When debugging, only make code changes if you are certain that you can solve the problem.
+Otherwise, follow debugging best practices:
+1. Address the root cause instead of the symptoms.
+2. Add descriptive logging statements and error messages to track variable and code state.
+3. Add test functions and statements to isolate the problem.
+</debugging>

-Tool use:
- Make sure to adhere to the tools schema.
- Provide every required argument.
- DO NOT use tools to access items that are already available in the context section.
- Use only the tools that are currently available.
- DO NOT use a tool that is not available just because it appears in the conversation. This means the user turned it off.
-
-Responding:
- Be concise and direct in your responses.
- Never apologize or thank the user.
- Don't comment that you have just realized or understood something.
- When you are going to make a tool call, tersely explain your reasoning for choosing to use that tool, with no flourishes or commentary beyond that information.
-    For example, rather than saying "You're absolutely right! Thank you for providing that context. Now I understand that we're missing a dependency, and I need to add it:" say "I'll add that missing dependency:" instead.
- Also, don't restate what a tool call is about to do (or just did).
-    For example, don't say "Now I'm going to check diagnostics to see if there are any warnings or errors," followed by running a tool which checks diagnostics and reports warnings or errors; instead, just request the tool call without saying anything.
- All tool results are provided to you automatically, so DO NOT thank the user when this happens.
-
-Whenever you mention a code block, you MUST use ONLY the following format:
-
-```language path/to/Something.blah#L123-456
-(code goes here)
-```
-
-The `#L123-456` means the line number range 123 through 456, and the path/to/Something.blah
-is a path in the project. (If there is no valid path in the project, then you can use
-/dev/null/path.extension for its path.) This is the ONLY valid way to format code blocks, because the Markdown parser
-does not understand the more common ```language syntax, or bare ``` blocks. It only
-understands this path-based syntax, and if the path is missing, then it will error and you will have to do it over again.
-
-Just to be really clear about this, if you ever find yourself writing three backticks followed by a language name, STOP!
-You have made a mistake. You can only ever put paths after triple backticks!
-
-<example>
-Based on all the information I've gathered, here's a summary of how this system works:
-1. The README file is loaded into the system.
-2. The system finds the first two headers, including everything in between. In this case, that would be:
-
-```path/to/README.md#L8-12
-# First Header
-
-This is the info under the first header.
-
-## Sub-header
-```
-
-3. Then the system finds the last header in the README:
-
-```path/to/README.md#L27-29
-## Last Header
-
-This is the last header in the README.
-```
-
-4. Finally, it passes this information on to the next process.
-</example>
-
-<example>
-In Markdown, hash marks signify headings. For example:
-
-```/dev/null/example.md#L1-3
-# Level 1 heading
-## Level 2 heading
-### Level 3 heading
-```
-</example>
-
-Here are examples of ways you must never render code blocks:
-
-<bad_example_do_not_do_this>
-In Markdown, hash marks signify headings. For example:
-
-```
-# Level 1 heading
-## Level 2 heading
-### Level 3 heading
-```
-</bad_example_do_not_do_this>
-
-This example is unacceptable because it does not include the path.
-
-<bad_example_do_not_do_this>
-In Markdown, hash marks signify headings. For example:
-
-```markdown
-# Level 1 heading
-## Level 2 heading
-### Level 3 heading
-```
-</bad_example_do_not_do_this>
-
-This example is unacceptable because it has the language instead of the path.
-
-<bad_example_do_not_do_this>
-In Markdown, hash marks signify headings. For example:
-
-    # Level 1 heading
-    ## Level 2 heading
-    ### Level 3 heading
-</bad_example_do_not_do_this>
-
-This example is unacceptable because it uses indentation to mark the code block
-instead of backticks with a path.
-
-<bad_example_do_not_do_this>
-In Markdown, hash marks signify headings. For example:
-
-```markdown
-/dev/null/example.md#L1-3
-# Level 1 heading
-## Level 2 heading
-### Level 3 heading
-```
-</bad_example_do_not_do_this>
-
-This example is unacceptable because the path is in the wrong place. The path must be directly after the opening backticks.
-</style>
+<calling_external_apis>
+1. Unless explicitly requested by the USER, use the best suited external APIs and packages to solve the task. There is no need to ask the USER for permission.
+2. When selecting which version of an API or package to use, choose one that is compatible with the USER's dependency management file. If no such file exists or if the package is not present, use the latest version that is in your training data.
+3. If an external API requires an API Key, be sure to point this out to the USER. Adhere to best security practices (e.g. DO NOT hardcode an API key in a place where it can be exposed)
+</calling_external_apis>

 The user has opened a project that contains the following root directories/files. Whenever you specify a path in the project, it must be a relative path which begins with one of these root directories/files:

--- a/assets/settings/default.json
+++ b/assets/settings/default.json
@@ -659,25 +659,25 @@
        "name": "Write",
        "enable_all_context_servers": true,
        "tools": {
-          "terminal": true,
-          "batch_tool": true,
-          "code_actions": true,
-          "code_symbols": true,
-          "contents": true,
+          "batch_tool": false,
+          "code_actions": false,
+          "code_symbols": false,
+          "contents": false,
          "copy_path": false,
          "create_file": true,
          "delete_path": false,
          "diagnostics": true,
-          "find_replace_file": true,
+          "edit_file": true,
          "fetch": true,
-          "list_directory": false,
+          "list_directory": true,
          "move_path": false,
-          "now": true,
+          "now": false,
          "path_search": true,
          "read_file": true,
          "regex_search": true,
-          "rename": true,
-          "symbol_info": true,
+          "rename": false,
+          "symbol_info": false,
+          "terminal": true,
          "thinking": true
        }
      }
--- a/crates/agent/src/thread.rs
+++ b/crates/agent/src/thread.rs
@@ -265,6 +265,9 @@ pub struct Thread {
    feedback: Option<ThreadFeedback>,
    message_feedback: HashMap<MessageId, ThreadFeedback>,
    last_auto_capture_at: Option<Instant>,
+    request_callback: Option<
+        Box<dyn FnMut(&LanguageModelRequest, &[Result<LanguageModelCompletionEvent, String>])>,
+    >,
 }

 #[derive(Debug, Clone, Serialize, Deserialize)]
@@ -315,6 +318,7 @@ impl Thread {
            feedback: None,
            message_feedback: HashMap::default(),
            last_auto_capture_at: None,
+            request_callback: None,
        }
    }

@@ -382,9 +386,18 @@ impl Thread {
            feedback: None,
            message_feedback: HashMap::default(),
            last_auto_capture_at: None,
+            request_callback: None,
        }
    }

+    pub fn set_request_callback(
+        &mut self,
+        callback: impl 'static
+        + FnMut(&LanguageModelRequest, &[Result<LanguageModelCompletionEvent, String>]),
+    ) {
+        self.request_callback = Some(Box::new(callback));
+    }
+
    pub fn id(&self) -> &ThreadId {
        &self.id
    }
@@ -1013,17 +1026,28 @@ impl Thread {
        cx: &mut Context<Self>,
    ) {
        let pending_completion_id = post_inc(&mut self.completion_count);
+        let request_callback_parameters = if self.request_callback.is_some() {
+            Some((request.clone(), Vec::new()))
+        } else {
+            None
+        };

        let task = cx.spawn(async move |thread, cx| {
            let stream = model.stream_completion(request, &cx);
            let initial_token_usage =
                thread.read_with(cx, |thread, _cx| thread.cumulative_token_usage);
            let stream_completion = async {
+                let mut request_callback_parameters = request_callback_parameters;
                let mut events = stream.await?;
                let mut stop_reason = StopReason::EndTurn;
                let mut current_token_usage = TokenUsage::default();

                while let Some(event) = events.next().await {
+                    if let Some((_, response_events)) = request_callback_parameters.as_mut() {
+                        response_events
+                            .push(event.as_ref().map_err(|error| error.to_string()).cloned());
+                    }
+
                    let event = event?;

                    thread.update(cx, |thread, cx| {
@@ -1126,7 +1150,7 @@ impl Thread {
                    }
                })?;

-                anyhow::Ok(stop_reason)
+                anyhow::Ok((stop_reason, request_callback_parameters))
            };

            let result = stream_completion.await;
@@ -1135,14 +1159,24 @@ impl Thread {
                .update(cx, |thread, cx| {
                    thread.finalize_pending_checkpoint(cx);
                    match result.as_ref() {
-                        Ok(stop_reason) => match stop_reason {
-                            StopReason::ToolUse => {
-                                let tool_uses = thread.use_pending_tools(cx);
-                                cx.emit(ThreadEvent::UsePendingTools { tool_uses });
+                        Ok((stop_reason, request_callback_parameters)) => {
+                            match stop_reason {
+                                StopReason::ToolUse => {
+                                    let tool_uses = thread.use_pending_tools(cx);
+                                    cx.emit(ThreadEvent::UsePendingTools { tool_uses });
+                                }
+                                StopReason::EndTurn => {}
+                                StopReason::MaxTokens => {}
                            }
-                            StopReason::EndTurn => {}
-                            StopReason::MaxTokens => {}
-                        },
+
+                            if let Some((request_callback, (request, response_events))) = thread
+                                .request_callback
+                                .as_mut()
+                                .zip(request_callback_parameters.as_ref())
+                            {
+                                request_callback(request, response_events);
+                            }
+                        }
                        Err(error) => {
                            if error.is::<PaymentRequiredError>() {
                                cx.emit(ThreadEvent::ShowError(ThreadError::PaymentRequired));
@@ -1179,7 +1213,9 @@ impl Thread {
                            thread.cancel_last_completion(cx);
                        }
                    }
-                    cx.emit(ThreadEvent::Stopped(result.map_err(Arc::new)));
+                    cx.emit(ThreadEvent::Stopped(
+                        result.map(|result| result.0).map_err(Arc::new),
+                    ));

                    thread.auto_capture_telemetry(cx);

--- a/crates/assistant_tools/Cargo.toml
+++ b/crates/assistant_tools/Cargo.toml
@@ -40,5 +40,6 @@ gpui = { workspace = true, features = ["test-support"] }
 language = { workspace = true, features = ["test-support"] }
 project = { workspace = true, features = ["test-support"] }
 rand.workspace = true
+settings = { workspace = true, features = ["test-support"] }
 workspace = { workspace = true, features = ["test-support"] }
 unindent.workspace = true
--- a/crates/assistant_tools/src/assistant_tools.rs
+++ b/crates/assistant_tools/src/assistant_tools.rs
@@ -7,8 +7,8 @@ mod create_directory_tool;
 mod create_file_tool;
 mod delete_path_tool;
 mod diagnostics_tool;
+mod edit_file_tool;
 mod fetch_tool;
-mod find_replace_file_tool;
 mod list_directory_tool;
 mod move_path_tool;
 mod now_tool;
@@ -39,8 +39,8 @@ use crate::create_directory_tool::CreateDirectoryTool;
 use crate::create_file_tool::CreateFileTool;
 use crate::delete_path_tool::DeletePathTool;
 use crate::diagnostics_tool::DiagnosticsTool;
+use crate::edit_file_tool::EditFileTool;
 use crate::fetch_tool::FetchTool;
-use crate::find_replace_file_tool::FindReplaceFileTool;
 use crate::list_directory_tool::ListDirectoryTool;
 use crate::now_tool::NowTool;
 use crate::open_tool::OpenTool;
@@ -62,7 +62,7 @@ pub fn init(http_client: Arc<HttpClientWithUrl>, cx: &mut App) {
    registry.register_tool(CreateFileTool);
    registry.register_tool(CopyPathTool);
    registry.register_tool(DeletePathTool);
-    registry.register_tool(FindReplaceFileTool);
+    registry.register_tool(EditFileTool);
    registry.register_tool(SymbolInfoTool);
    registry.register_tool(CodeActionTool);
    registry.register_tool(MovePathTool);
--- a/crates/assistant_tools/src/edit_file_tool.rs
+++ b/crates/assistant_tools/src/edit_file_tool.rs
@@ -0,0 +1,183 @@
+use crate::{replace::replace_with_flexible_indent, schema::json_schema_for};
+use anyhow::{Context as _, Result, anyhow};
+use assistant_tool::{ActionLog, Tool, ToolResult};
+use gpui::{App, AppContext, AsyncApp, Entity, Task};
+use language_model::{LanguageModelRequestMessage, LanguageModelToolSchemaFormat};
+use project::Project;
+use schemars::JsonSchema;
+use serde::{Deserialize, Serialize};
+use std::{path::PathBuf, sync::Arc};
+use ui::IconName;
+
+use crate::replace::replace_exact;
+
+#[derive(Debug, Serialize, Deserialize, JsonSchema)]
+pub struct EditFileToolInput {
+    /// The full path of the file to modify in the project.
+    ///
+    /// WARNING: When specifying which file path need changing, you MUST
+    /// start each path with one of the project's root directories.
+    ///
+    /// The following examples assume we have two root directories in the project:
+    /// - backend
+    /// - frontend
+    ///
+    /// <example>
+    /// `backend/src/main.rs`
+    ///
+    /// Notice how the file path starts with root-1. Without that, the path
+    /// would be ambiguous and the call would fail!
+    /// </example>
+    ///
+    /// <example>
+    /// `frontend/db.js`
+    /// </example>
+    pub path: PathBuf,
+
+    /// A user-friendly markdown description of what's being replaced. This will be shown in the UI.
+    ///
+    /// <example>Fix API endpoint URLs</example>
+    /// <example>Update copyright year in `page_footer`</example>
+    pub display_description: String,
+
+    /// The text to replace.
+    pub old_string: String,
+
+    /// The text to replace it with.
+    pub new_string: String,
+}
+
+pub struct EditFileTool;
+
+impl Tool for EditFileTool {
+    fn name(&self) -> String {
+        "edit_file".into()
+    }
+
+    fn needs_confirmation(&self, _: &serde_json::Value, _: &App) -> bool {
+        false
+    }
+
+    fn description(&self) -> String {
+        include_str!("edit_file_tool/description.md").to_string()
+    }
+
+    fn icon(&self) -> IconName {
+        IconName::Pencil
+    }
+
+    fn input_schema(&self, format: LanguageModelToolSchemaFormat) -> Result<serde_json::Value> {
+        json_schema_for::<EditFileToolInput>(format)
+    }
+
+    fn ui_text(&self, input: &serde_json::Value) -> String {
+        match serde_json::from_value::<EditFileToolInput>(input.clone()) {
+            Ok(input) => input.display_description,
+            Err(_) => "Edit file".to_string(),
+        }
+    }
+
+    fn run(
+        self: Arc<Self>,
+        input: serde_json::Value,
+        _messages: &[LanguageModelRequestMessage],
+        project: Entity<Project>,
+        action_log: Entity<ActionLog>,
+        cx: &mut App,
+    ) -> ToolResult {
+        let input = match serde_json::from_value::<EditFileToolInput>(input) {
+            Ok(input) => input,
+            Err(err) => return Task::ready(Err(anyhow!(err))).into(),
+        };
+
+        cx.spawn(async move |cx: &mut AsyncApp| {
+            let project_path = project.read_with(cx, |project, cx| {
+                project
+                    .find_project_path(&input.path, cx)
+                    .context("Path not found in project")
+            })??;
+
+            let buffer = project
+                .update(cx, |project, cx| project.open_buffer(project_path, cx))?
+                .await?;
+
+            let snapshot = buffer.read_with(cx, |buffer, _cx| buffer.snapshot())?;
+
+            if input.old_string.is_empty() {
+                return Err(anyhow!("`old_string` cannot be empty. Use a different tool if you want to create a file."));
+            }
+
+            if input.old_string == input.new_string {
+                return Err(anyhow!("The `old_string` and `new_string` are identical, so no changes would be made."));
+            }
+
+            let result = cx
+                .background_spawn(async move {
+                    // Try to match exactly
+                    let diff = replace_exact(&input.old_string, &input.new_string, &snapshot)
+                    .await
+                    // If that fails, try being flexible about indentation
+                    .or_else(|| replace_with_flexible_indent(&input.old_string, &input.new_string, &snapshot))?;
+
+                    if diff.edits.is_empty() {
+                        return None;
+                    }
+
+                    let old_text = snapshot.text();
+
+                    Some((old_text, diff))
+                })
+                .await;
+
+            let Some((old_text, diff)) = result else {
+                let err = buffer.read_with(cx, |buffer, _cx| {
+                    let file_exists = buffer
+                        .file()
+                        .map_or(false, |file| file.disk_state().exists());
+
+                    if !file_exists {
+                        anyhow!("{} does not exist", input.path.display())
+                    } else if buffer.is_empty() {
+                        anyhow!(
+                            "{} is empty, so the provided `old_string` wasn't found.",
+                            input.path.display()
+                        )
+                    } else {
+                        anyhow!("Failed to match the provided `old_string`")
+                    }
+                })?;
+
+                return Err(err)
+            };
+
+            let snapshot = cx.update(|cx| {
+                action_log.update(cx, |log, cx| {
+                    log.buffer_read(buffer.clone(), cx)
+                });
+                let snapshot = buffer.update(cx, |buffer, cx| {
+                    buffer.finalize_last_transaction();
+                    buffer.apply_diff(diff, cx);
+                    buffer.finalize_last_transaction();
+                    buffer.snapshot()
+                });
+                action_log.update(cx, |log, cx| {
+                    log.buffer_edited(buffer.clone(), cx)
+                });
+                snapshot
+            })?;
+
+            project.update( cx, |project, cx| {
+                project.save_buffer(buffer, cx)
+            })?.await?;
+
+            let diff_str = cx.background_spawn(async move {
+                let new_text = snapshot.text();
+                language::unified_diff(&old_text, &new_text)
+            }).await;
+
+
+            Ok(format!("Edited {}:\n\n```diff\n{}\n```", input.path.display(), diff_str))
+
+        }).into()
+    }
+}
--- a/crates/assistant_tools/src/edit_file_tool/description.md
+++ b/crates/assistant_tools/src/edit_file_tool/description.md
@@ -0,0 +1,45 @@
+This is a tool for editing files. For moving or renaming files, you should generally use the `terminal` tool with the 'mv' command instead. For larger edits, use the `create_file` tool to overwrite files.
+
+Before using this tool:
+
+1. Use the `read_file` tool to understand the file's contents and context
+
+2. Verify the directory path is correct (only applicable when creating new files):
+   - Use the `list_directory` tool to verify the parent directory exists and is the correct location
+
+To make a file edit, provide the following:
+1. path: The full path to the file you wish to modify in the project. This path must include the root directory in the project.
+2. old_string: The text to replace (must be unique within the file, and must match the file contents exactly, including all whitespace and indentation)
+3. new_string: The edited text, which will replace the old_string in the file.
+
+The tool will replace ONE occurrence of old_string with new_string in the specified file.
+
+CRITICAL REQUIREMENTS FOR USING THIS TOOL:
+
+1. UNIQUENESS: The old_string MUST uniquely identify the specific instance you want to change. This means:
+   - Include AT LEAST 3-5 lines of context BEFORE the change point
+   - Include AT LEAST 3-5 lines of context AFTER the change point
+   - Include all whitespace, indentation, and surrounding code exactly as it appears in the file
+
+2. SINGLE INSTANCE: This tool can only change ONE instance at a time. If you need to change multiple instances:
+   - Make separate calls to this tool for each instance
+   - Each call must uniquely identify its specific instance using extensive context
+
+3. VERIFICATION: Before using this tool:
+   - Check how many instances of the target text exist in the file
+   - If multiple instances exist, gather enough context to uniquely identify each one
+   - Plan separate tool calls for each instance
+
+WARNING: If you do not follow these requirements:
+   - The tool will fail if old_string matches multiple locations
+   - The tool will fail if old_string doesn't match exactly (including whitespace)
+   - You may change the wrong instance if you don't include enough context
+
+When making edits:
+   - Ensure the edit results in idiomatic, correct code
+   - Do not leave the code in a broken state
+   - Always use fully-qualified project paths (starting with the name of one of the project's root directories)
+
+If you want to create a new file, use the `create_file` tool instead of this tool. Don't pass an empty `old_string`.
+
+Remember: when making multiple file edits in a row to the same file, you should prefer to send all edits in a single message with multiple calls to this tool, rather than multiple messages with a single call each.
--- a/crates/assistant_tools/src/find_replace_file_tool.rs
+++ b/crates/assistant_tools/src/find_replace_file_tool.rs
@@ -1,268 +0,0 @@
-use crate::{replace::replace_with_flexible_indent, schema::json_schema_for};
-use anyhow::{Context as _, Result, anyhow};
-use assistant_tool::{ActionLog, Tool, ToolResult};
-use gpui::{App, AppContext, AsyncApp, Entity, Task};
-use language_model::{LanguageModelRequestMessage, LanguageModelToolSchemaFormat};
-use project::Project;
-use schemars::JsonSchema;
-use serde::{Deserialize, Serialize};
-use std::{path::PathBuf, sync::Arc};
-use ui::IconName;
-
-use crate::replace::replace_exact;
-
-#[derive(Debug, Serialize, Deserialize, JsonSchema)]
-pub struct FindReplaceFileToolInput {
-    /// The path of the file to modify.
-    ///
-    /// WARNING: When specifying which file path need changing, you MUST
-    /// start each path with one of the project's root directories.
-    ///
-    /// The following examples assume we have two root directories in the project:
-    /// - backend
-    /// - frontend
-    ///
-    /// <example>
-    /// `backend/src/main.rs`
-    ///
-    /// Notice how the file path starts with root-1. Without that, the path
-    /// would be ambiguous and the call would fail!
-    /// </example>
-    ///
-    /// <example>
-    /// `frontend/db.js`
-    /// </example>
-    pub path: PathBuf,
-
-    /// A user-friendly markdown description of what's being replaced. This will be shown in the UI.
-    ///
-    /// <example>Fix API endpoint URLs</example>
-    /// <example>Update copyright year in `page_footer`</example>
-    pub display_description: String,
-
-    /// The unique string to find in the file. This string cannot be empty;
-    /// if the string is empty, the tool call will fail. Remember, do not use this tool
-    /// to create new files from scratch, or to overwrite existing files! Use a different
-    /// approach if you want to do that.
-    ///
-    /// If this string appears more than once in the file, this tool call will fail,
-    /// so it is absolutely critical that you verify ahead of time that the string
-    /// is unique. You can search within the file to verify this.
-    ///
-    /// To make the string more likely to be unique, include a minimum of 3 lines of context
-    /// before the string you actually want to find, as well as a minimum of 3 lines of
-    /// context after the string you want to find. (These lines of context should appear
-    /// in the `replace` string as well.) If 3 lines of context is not enough to obtain
-    /// a string that appears only once in the file, then double the number of context lines
-    /// until the string becomes unique. (Start with 3 lines before and 3 lines after
-    /// though, because too much context is needlessly costly.)
-    ///
-    /// Do not alter the context lines of code in any way, and make sure to preserve all
-    /// whitespace and indentation for all lines of code. This string must be exactly as
-    /// it appears in the file, because this tool will do a literal find/replace, and if
-    /// even one character in this string is different in any way from how it appears
-    /// in the file, then the tool call will fail.
-    ///
-    /// If you get an error that the `find` string was not found, this means that either
-    /// you made a mistake, or that the file has changed since you last looked at it.
-    /// Either way, when this happens, you should retry doing this tool call until it
-    /// succeeds, up to 3 times. Each time you retry, you should take another look at
-    /// the exact text of the file in question, to make sure that you are searching for
-    /// exactly the right string. Regardless of whether it was because you made a mistake
-    /// or because the file changed since you last looked at it, you should be extra
-    /// careful when retrying in this way. It's a bad experience for the user if
-    /// this `find` string isn't found, so be super careful to get it exactly right!
-    ///
-    /// <example>
-    /// If a file contains this code:
-    ///
-    /// ```ignore
-    /// fn check_user_permissions(user_id: &str) -> Result<bool> {
-    ///     // Check if user exists first
-    ///     let user = database.find_user(user_id)?;
-    ///
-    ///     // This is the part we want to modify
-    ///     if user.role == "admin" {
-    ///         return Ok(true);
-    ///     }
-    ///
-    ///     // Check other permissions
-    ///     check_custom_permissions(user_id)
-    /// }
-    /// ```
-    ///
-    /// Your find string should include at least 3 lines of context before and after the part
-    /// you want to change:
-    ///
-    /// ```ignore
-    /// fn check_user_permissions(user_id: &str) -> Result<bool> {
-    ///     // Check if user exists first
-    ///     let user = database.find_user(user_id)?;
-    ///
-    ///     // This is the part we want to modify
-    ///     if user.role == "admin" {
-    ///         return Ok(true);
-    ///     }
-    ///
-    ///     // Check other permissions
-    ///     check_custom_permissions(user_id)
-    /// }
-    /// ```
-    ///
-    /// And your replace string might look like:
-    ///
-    /// ```ignore
-    /// fn check_user_permissions(user_id: &str) -> Result<bool> {
-    ///     // Check if user exists first
-    ///     let user = database.find_user(user_id)?;
-    ///
-    ///     // This is the part we want to modify
-    ///     if user.role == "admin" || user.role == "superuser" {
-    ///         return Ok(true);
-    ///     }
-    ///
-    ///     // Check other permissions
-    ///     check_custom_permissions(user_id)
-    /// }
-    /// ```
-    /// </example>
-    pub find: String,
-
-    /// The string to replace the one unique occurrence of the find string with.
-    pub replace: String,
-}
-
-pub struct FindReplaceFileTool;
-
-impl Tool for FindReplaceFileTool {
-    fn name(&self) -> String {
-        "find_replace_file".into()
-    }
-
-    fn needs_confirmation(&self, _: &serde_json::Value, _: &App) -> bool {
-        false
-    }
-
-    fn description(&self) -> String {
-        include_str!("find_replace_tool/description.md").to_string()
-    }
-
-    fn icon(&self) -> IconName {
-        IconName::Pencil
-    }
-
-    fn input_schema(&self, format: LanguageModelToolSchemaFormat) -> Result<serde_json::Value> {
-        json_schema_for::<FindReplaceFileToolInput>(format)
-    }
-
-    fn ui_text(&self, input: &serde_json::Value) -> String {
-        match serde_json::from_value::<FindReplaceFileToolInput>(input.clone()) {
-            Ok(input) => input.display_description,
-            Err(_) => "Edit file".to_string(),
-        }
-    }
-
-    fn run(
-        self: Arc<Self>,
-        input: serde_json::Value,
-        _messages: &[LanguageModelRequestMessage],
-        project: Entity<Project>,
-        action_log: Entity<ActionLog>,
-        cx: &mut App,
-    ) -> ToolResult {
-        let input = match serde_json::from_value::<FindReplaceFileToolInput>(input) {
-            Ok(input) => input,
-            Err(err) => return Task::ready(Err(anyhow!(err))).into(),
-        };
-
-        cx.spawn(async move |cx: &mut AsyncApp| {
-            let project_path = project.read_with(cx, |project, cx| {
-                project
-                    .find_project_path(&input.path, cx)
-                    .context("Path not found in project")
-            })??;
-
-            let buffer = project
-                .update(cx, |project, cx| project.open_buffer(project_path, cx))?
-                .await?;
-
-            let snapshot = buffer.read_with(cx, |buffer, _cx| buffer.snapshot())?;
-
-            if input.find.is_empty() {
-                return Err(anyhow!("`find` string cannot be empty. Use a different tool if you want to create a file."));
-            }
-
-            if input.find == input.replace {
-                return Err(anyhow!("The `find` and `replace` strings are identical, so no changes would be made."));
-            }
-
-            let result = cx
-                .background_spawn(async move {
-                    // Try to match exactly
-                    let diff = replace_exact(&input.find, &input.replace, &snapshot)
-                    .await
-                    // If that fails, try being flexible about indentation
-                    .or_else(|| replace_with_flexible_indent(&input.find, &input.replace, &snapshot))?;
-
-                    if diff.edits.is_empty() {
-                        return None;
-                    }
-
-                    let old_text = snapshot.text();
-
-                    Some((old_text, diff))
-                })
-                .await;
-
-            let Some((old_text, diff)) = result else {
-                let err = buffer.read_with(cx, |buffer, _cx| {
-                    let file_exists = buffer
-                        .file()
-                        .map_or(false, |file| file.disk_state().exists());
-
-                    if !file_exists {
-                        anyhow!("{} does not exist", input.path.display())
-                    } else if buffer.is_empty() {
-                        anyhow!(
-                            "{} is empty, so the provided `find` string wasn't found.",
-                            input.path.display()
-                        )
-                    } else {
-                        anyhow!("Failed to match the provided `find` string")
-                    }
-                })?;
-
-                return Err(err)
-            };
-
-            let snapshot = cx.update(|cx| {
-                action_log.update(cx, |log, cx| {
-                    log.buffer_read(buffer.clone(), cx)
-                });
-                let snapshot = buffer.update(cx, |buffer, cx| {
-                    buffer.finalize_last_transaction();
-                    buffer.apply_diff(diff, cx);
-                    buffer.finalize_last_transaction();
-                    buffer.snapshot()
-                });
-                action_log.update(cx, |log, cx| {
-                    log.buffer_edited(buffer.clone(), cx)
-                });
-                snapshot
-            })?;
-
-            project.update( cx, |project, cx| {
-                project.save_buffer(buffer, cx)
-            })?.await?;
-
-            let diff_str = cx.background_spawn(async move {
-                let new_text = snapshot.text();
-                language::unified_diff(&old_text, &new_text)
-            }).await;
-
-
-            Ok(format!("Edited {}:\n\n```diff\n{}\n```", input.path.display(), diff_str))
-
-        }).into()
-    }
-}
--- a/crates/assistant_tools/src/list_directory_tool.rs
+++ b/crates/assistant_tools/src/list_directory_tool.rs
@@ -12,7 +12,7 @@ use util::markdown::MarkdownString;

 #[derive(Debug, Serialize, Deserialize, JsonSchema)]
 pub struct ListDirectoryToolInput {
-    /// The relative path of the directory to list.
+    /// The fully-qualified path of the directory to list in the project.
    ///
    /// This path should never be absolute, and the first component
    /// of the path should always be a root directory in a project.
--- a/crates/assistant_tools/src/list_directory_tool/description.md
+++ b/crates/assistant_tools/src/list_directory_tool/description.md
@@ -1 +1 @@
-Lists files and directories in a given path.
+Lists files and directories in a given path. Prefer the `regex_search` or `path_search` tools when searching the codebase.
--- a/crates/assistant_tools/src/path_search_tool.rs
+++ b/crates/assistant_tools/src/path_search_tool.rs
@@ -6,14 +6,14 @@ use language_model::{LanguageModelRequestMessage, LanguageModelToolSchemaFormat}
 use project::Project;
 use schemars::JsonSchema;
 use serde::{Deserialize, Serialize};
-use std::{path::PathBuf, sync::Arc};
+use std::{cmp, fmt::Write as _, path::PathBuf, sync::Arc};
 use ui::IconName;
 use util::paths::PathMatcher;
 use worktree::Snapshot;

 #[derive(Debug, Serialize, Deserialize, JsonSchema)]
 pub struct PathSearchToolInput {
-    /// The glob to search all project paths for.
+    /// The glob to match against every path in the project.
    ///
    /// <example>
    /// If the project has the following root directories:
@@ -76,66 +76,114 @@ impl Tool for PathSearchTool {
            Ok(input) => (input.offset, input.glob),
            Err(err) => return Task::ready(Err(anyhow!(err))).into(),
        };
-
-        let path_matcher = match PathMatcher::new([
-            // Sometimes models try to search for "". In this case, return all paths in the project.
-            if glob.is_empty() { "*" } else { &glob },
-        ]) {
-            Ok(matcher) => matcher,
-            Err(err) => return Task::ready(Err(anyhow!("Invalid glob: {err}"))).into(),
-        };
-        let snapshots: Vec<Snapshot> = project
-            .read(cx)
-            .worktrees(cx)
-            .map(|worktree| worktree.read(cx).snapshot())
-            .collect();
-
+        let offset = offset as usize;
+        let task = search_paths(&glob, project, cx);
        cx.background_spawn(async move {
-            let mut matches = Vec::new();
-
-            for worktree in snapshots {
-                let root_name = worktree.root_name();
-
-                // Don't consider ignored entries.
-                for entry in worktree.entries(false, 0) {
-                    if path_matcher.is_match(&entry.path) {
-                        matches.push(
-                            PathBuf::from(root_name)
-                                .join(&entry.path)
-                                .to_string_lossy()
-                                .to_string(),
-                        );
-                    }
-                }
-            }
-
-            if matches.is_empty() {
-                Ok(format!("No paths in the project matched the glob {glob:?}"))
-            } else {
-                // Sort to group entries in the same directory together.
-                matches.sort();
-
-                let total_matches = matches.len();
-                let response = if total_matches > RESULTS_PER_PAGE + offset as usize {
-                let paginated_matches: Vec<_> = matches
-                      .into_iter()
-                      .skip(offset as usize)
-                      .take(RESULTS_PER_PAGE)
-                      .collect();
-
-                    format!(
-                        "Found {} total matches. Showing results {}-{} (provide 'offset' parameter for more results):\n\n{}",
-                        total_matches,
-                        offset + 1,
-                        offset as usize + paginated_matches.len(),
-                        paginated_matches.join("\n")
-                    )
-                } else {
-                    matches.join("\n")
-                };
-
-                Ok(response)
+            let matches = task.await?;
+            let paginated_matches = &matches[cmp::min(offset, matches.len())..cmp::min(offset + RESULTS_PER_PAGE, matches.len())];
+            let mut message = format!(
+                "Found {} total matches. Showing results {}-{} (provide 'offset' parameter for more results):\n",
+                matches.len(),
+                offset + 1,
+                offset as usize + paginated_matches.len(),
+            );
+            for mat in matches.into_iter().skip(offset).take(RESULTS_PER_PAGE) {
+                write!(&mut message, "\n{}", mat.display()).unwrap();
            }
+            Ok(message)
        }).into()
    }
 }
+
+fn search_paths(glob: &str, project: Entity<Project>, cx: &mut App) -> Task<Result<Vec<PathBuf>>> {
+    let path_matcher = match PathMatcher::new([
+        // Sometimes models try to search for "". In this case, return all paths in the project.
+        if glob.is_empty() { "*" } else { glob },
+    ]) {
+        Ok(matcher) => matcher,
+        Err(err) => return Task::ready(Err(anyhow!("Invalid glob: {err}"))).into(),
+    };
+    let snapshots: Vec<Snapshot> = project
+        .read(cx)
+        .worktrees(cx)
+        .map(|worktree| worktree.read(cx).snapshot())
+        .collect();
+
+    cx.background_spawn(async move {
+        Ok(snapshots
+            .iter()
+            .flat_map(|snapshot| {
+                let root_name = PathBuf::from(snapshot.root_name());
+                snapshot
+                    .entries(false, 0)
+                    .map(move |entry| root_name.join(&entry.path))
+                    .filter(|path| path_matcher.is_match(&path))
+            })
+            .collect())
+    })
+}
+
+#[cfg(test)]
+mod test {
+    use super::*;
+    use gpui::TestAppContext;
+    use project::{FakeFs, Project};
+    use settings::SettingsStore;
+    use util::path;
+
+    #[gpui::test]
+    async fn test_path_search_tool(cx: &mut TestAppContext) {
+        init_test(cx);
+
+        let fs = FakeFs::new(cx.executor());
+        fs.insert_tree(
+            "/root",
+            serde_json::json!({
+                "apple": {
+                    "banana": {
+                        "carrot": "1",
+                    },
+                    "bandana": {
+                        "carbonara": "2",
+                    },
+                    "endive": "3"
+                }
+            }),
+        )
+        .await;
+        let project = Project::test(fs.clone(), [path!("/root").as_ref()], cx).await;
+
+        let matches = cx
+            .update(|cx| search_paths("root/**/car*", project.clone(), cx))
+            .await
+            .unwrap();
+        assert_eq!(
+            matches,
+            &[
+                PathBuf::from("root/apple/banana/carrot"),
+                PathBuf::from("root/apple/bandana/carbonara")
+            ]
+        );
+
+        let matches = cx
+            .update(|cx| search_paths("**/car*", project.clone(), cx))
+            .await
+            .unwrap();
+        assert_eq!(
+            matches,
+            &[
+                PathBuf::from("root/apple/banana/carrot"),
+                PathBuf::from("root/apple/bandana/carbonara")
+            ]
+        );
+    }
+
+    fn init_test(cx: &mut TestAppContext) {
+        cx.update(|cx| {
+            let settings_store = SettingsStore::test(cx);
+            cx.set_global(settings_store);
+            language::init(cx);
+            Project::init_settings(cx);
+        });
+    }
+}
--- a/crates/assistant_tools/src/path_search_tool/description.md
+++ b/crates/assistant_tools/src/path_search_tool/description.md
@@ -1,3 +1,7 @@
-Returns paths in the project which match the given glob.
+Fast file pattern matching tool that works with any codebase size

-Results are paginated with 50 matches per page. Use the optional 'offset' parameter to request subsequent pages.
+- Supports glob patterns like "**/*.js" or "src/**/*.ts"
+- Returns matching file paths sorted alphabetically
+- Prefer the `regex_search` tool to this tool when searching for symbols unless you have specific information about paths.
+- Use this tool when you need to find files by name patterns
+- Results are paginated with 50 matches per page. Use the optional 'offset' parameter to request subsequent pages.
--- a/crates/assistant_tools/src/regex_search_tool/description.md
+++ b/crates/assistant_tools/src/regex_search_tool/description.md
@@ -1,7 +1,6 @@
 Searches the entire project for the given regular expression.

-Returns a list of paths that matched the query. For each path, it returns some excerpts of the matched text.
-
-Results are paginated with 20 matches per page. Use the optional 'offset' parameter to request subsequent pages.
-
-This tool is not aware of semantics and does not use any information from language servers, so it should only be used when no available semantic tool (e.g. one that uses language servers) could fit a particular use case instead.
+- Prefer this tool when searching for files containing symbols in the project.
+- Supports full regex syntax (eg. "log.*Error", "function\\s+\\w+", etc.)
+- Use this tool when you need to find files containing specific patterns
+- Results are paginated with 20 matches per page. Use the optional 'offset' parameter to request subsequent pages.
--- a/crates/eval/Cargo.toml
+++ b/crates/eval/Cargo.toml
@@ -27,7 +27,7 @@ language.workspace = true
 language_extension.workspace = true
 language_model.workspace = true
 language_models.workspace = true
-languages.workspace = true
+languages = { workspace = true, features = ["load-grammars"] }
 node_runtime.workspace = true
 paths.workspace = true
 project.workspace = true
@@ -35,6 +35,7 @@ prompt_store.workspace = true
 release_channel.workspace = true
 reqwest_client.workspace = true
 serde.workspace = true
+serde_json.workspace = true
 settings.workspace = true
 shellexpand.workspace = true
 toml.workspace = true
--- a/crates/eval/examples/find_and_replace_diff_card/prompt.md
+++ b/crates/eval/examples/find_and_replace_diff_card/prompt.md
@@ -1,3 +1,3 @@
-Look at the `find_replace_file_tool.rs`. I want to implement a card for it. The card should be a brand new `Entity` with a `Render` implementation.
+Look at the `find_replace_file_tool.rs`. I want to implement a card for it. The card should implement the `Render` trait.

 The card should show a diff. It should be a beautifully presented diff. The card "box" should look like what we show for markdown codeblocks (look at `MarkdownElement`). I want to see a red background for lines that were deleted and a green background for lines that were added. We should have a div per diff line.
--- a/crates/eval/src/eval.rs
+++ b/crates/eval/src/eval.rs
@@ -154,7 +154,7 @@ fn main() {
                println!(
                    "{}Logging to: {}",
                    example.log_prefix,
-                    example.output_file_path.display()
+                    example.run_directory_path.display()
                );

                let repo_url = example.base.url.clone();
@@ -223,13 +223,17 @@ fn main() {
            println!("");

            let mut judge_scores = Vec::new();
+            let mut errors = 0;
+            let mut successes =  0;

            for (result, example) in results {
                match result {
                    Err(err) => {
+                        errors += 1;
                        println!("💥 {}{:?}", example.log_prefix, err);
                    }
                    Ok(judge_output) => {
+                        successes += 1;
                        const SCORES: [&str; 6] = ["💀", "😭", "😔", "😐", "🙂", "🤩"];

                        println!(
@@ -244,7 +248,7 @@ fn main() {
                println!(
                    "{}    > {}",
                    " ".repeat(max_name_width),
-                    example.output_file_path.display()
+                    example.run_directory_path.display()
                );
            }

@@ -254,7 +258,12 @@ fn main() {
                .map(|score| score as f32)
                .sum::<f32>()
                / (score_count as f32);
-            println!("\nAverage score: {average_score}");
+
+            if errors > 0 {
+                println!("\n{errors} example(s) errored out. Average score among the {successes} example(s) that didn't error: {average_score}");
+            } else {
+                println!("\nAll {successes} examples ran successfully. Average score: {average_score}");
+            }

            cx.update(|cx| cx.quit())
        })
--- a/crates/eval/src/example.rs
+++ b/crates/eval/src/example.rs
@@ -8,11 +8,12 @@ use futures::channel::mpsc;
 use futures::{FutureExt, StreamExt as _, select_biased};
 use gpui::{App, AppContext as _, AsyncApp, Entity, Task};
 use handlebars::Handlebars;
-use language::{DiagnosticSeverity, OffsetRangeExt};
+use language::{Buffer, DiagnosticSeverity, OffsetRangeExt};
 use language_model::{
-    LanguageModel, LanguageModelRequest, LanguageModelRequestMessage, MessageContent, Role,
-    StopReason, TokenUsage,
+    LanguageModel, LanguageModelCompletionEvent, LanguageModelRequest, LanguageModelRequestMessage,
+    MessageContent, Role, StopReason, TokenUsage,
 };
+use project::lsp_store::LanguageServerState;
 use project::{LspStore, Project, ProjectPath};
 use serde::{Deserialize, Serialize};
 use std::fmt::Write as _;
@@ -47,6 +48,17 @@ pub struct ExampleBase {
    pub require_lsp: bool,
 }

+impl ExampleBase {
+    pub fn repo_name(&self) -> String {
+        self.url
+            .split('/')
+            .last()
+            .unwrap_or(&"")
+            .trim_end_matches(".git")
+            .into()
+    }
+}
+
 #[derive(Clone, Debug)]
 pub struct Example {
    pub name: String,
@@ -56,10 +68,8 @@ pub struct Example {
    pub prompt: String,
    /// Content of `criteria.md`
    pub criteria: String,
-    /// Markdown output file to append to
-    pub output_file: Option<Arc<Mutex<File>>>,
-    /// Path to markdown output file
-    pub output_file_path: PathBuf,
+    /// Path to the directory containing the requests and responses for the agentic loop
+    pub run_directory_path: PathBuf,
    /// Prefix used for logging that identifies this example
    pub log_prefix: String,
 }
@@ -93,18 +103,12 @@ impl Example {
        let prompt_path = dir_path.join("prompt.md");
        let criteria_path = dir_path.join("criteria.md");

-        let output_file_path = run_dir.join(format!(
-            "{}.md",
-            dir_path.file_name().unwrap().to_str().unwrap()
-        ));
-
        Ok(Example {
            name: name.clone(),
            base: toml::from_str(&fs::read_to_string(&base_path)?)?,
            prompt: fs::read_to_string(prompt_path.clone())?,
            criteria: fs::read_to_string(criteria_path.clone())?,
-            output_file: None,
-            output_file_path,
+            run_directory_path: run_dir.to_path_buf(),
            log_prefix: name,
        })
    }
@@ -128,6 +132,7 @@ impl Example {
            .context(format!("No such directory {WORKTREES_DIR}"))
            .unwrap()
            .join(&self.name)
+            .join(self.base.repo_name())
    }

    /// Set up the example by checking out the specified Git revision
@@ -170,20 +175,9 @@ impl Example {
            .await?;
        }

-        // Create the output file
-        let output_file = Arc::new(Mutex::new(File::create(&self.output_file_path)?));
-        self.output_file = Some(output_file);
-
        Ok(())
    }

-    /// Returns the output file, panicking if it's not set
-    fn output_file(&self) -> Arc<Mutex<File>> {
-        self.output_file
-            .clone()
-            .expect("Output file not created. Call setup() first.")
-    }
-
    pub fn run(
        &self,
        model: Arc<dyn LanguageModel>,
@@ -262,28 +256,26 @@ impl Example {
                cx.background_executor().timer(Duration::new(5, 0)).await;
                wait_for_lang_server(&lsp_store, this.log_prefix.clone(), cx).await?;

-                lsp_store.update(cx, |lsp_store, cx| {
-                    lsp_open_handle.update(cx, |buffer, cx| {
-                        buffer.update(cx, |buffer, cx| {
-                            let has_language_server = lsp_store
-                                .language_servers_for_local_buffer(buffer, cx)
-                                .next()
-                                .is_some();
-                            if has_language_server {
-                                Ok(())
-                            } else {
-                                Err(anyhow!(
-                                    "`{:?}` was opened to cause the language server to start, \
-                                    but no language servers are registered for its buffer. \
-                                    Set `require_lsp = false` in `base.toml` to skip this.",
-                                    language_file
-                                ))
-                            }
-                        })
-                    })
-                })??;
+                // Retry up to 10 times, with a delay in between, for the language server to
+                // transition from the Starting to Running state.
+                const LS_START_ATTEMPTS: usize = 10;
+                const DELAY_BETWEEN_ATTEMPTS: Duration = Duration::new(1, 0);
+                let mut answer = None;

-                Some((lsp_open_handle, lsp_store))
+                for _ in 0..LS_START_ATTEMPTS {
+                    if any_running(&language_file, lsp_store.clone(), lsp_open_handle.clone(), cx).await? {
+                        answer = Some((lsp_open_handle, lsp_store));
+                        break;
+                    }
+
+                    cx.background_executor().timer(DELAY_BETWEEN_ATTEMPTS).await;
+                }
+
+                if answer.is_none() {
+                   return Err(anyhow!("Timed out waiting for language server to transition from Starting to Running state."));
+                }
+
+                answer
            } else {
                None
            };
@@ -296,14 +288,18 @@ impl Example {
            let thread =
                thread_store.update(cx, |thread_store, cx| thread_store.create_thread(cx))?;

-            {
-                let output_file_ref = this.output_file();
-                let mut output_file = output_file_ref.lock().unwrap();
-                writeln!(&mut output_file, "👤 USER:").log_err();
-                writeln!(&mut output_file, "{}", this.prompt).log_err();
-                writeln!(&mut output_file, "🤖 ASSISTANT:").log_err();
-                output_file.flush().log_err();
-            }
+            thread.update(cx, |thread, _cx| {
+                let mut request_count = 0;
+                let run_dir_path = this.run_directory_path.clone();
+                thread.set_request_callback(move |request, response_events| {
+                    request_count += 1;
+                    let tools_file_path = run_dir_path.join(format!("{request_count}.tools.md"));
+                    let messages_file_path = run_dir_path.join(format!("{request_count}.messages.md"));
+                    let markdown = RequestMarkdown::new(request, response_events);
+                    fs::write(tools_file_path, markdown.tools).expect("failed to write tools file");
+                    fs::write(messages_file_path, markdown.messages).expect("failed to write messages file");
+                });
+            })?;

            let tool_use_counts: Arc<Mutex<HashMap<Arc<str>, u32>>> =
                Mutex::new(HashMap::default()).into();
@@ -316,7 +312,6 @@ impl Example {

            let event_handler_task = cx.spawn({
                // Need to clone the Arc here because the reference from output_file() won't live long enough
-                let output_file = this.output_file.clone().unwrap();
                let log_prefix = this.log_prefix.clone();
                let tool_use_counts = tool_use_counts.clone();
                let thread = thread.downgrade();
@@ -332,8 +327,6 @@ impl Example {
                            return Err(anyhow!("ThreadEvent channel ended early"));
                        };

-                        let mut output_file = output_file.lock().unwrap();
-
                        match event {
                            ThreadEvent::Stopped(reason) => match reason {
                                Ok(StopReason::EndTurn) => {
@@ -354,18 +347,7 @@ impl Example {
                            ThreadEvent::ShowError(thread_error) => {
                                break Err(anyhow!(thread_error.clone()));
                            }
-                            ThreadEvent::StreamedAssistantText(_, chunk) => {
-                                write!(&mut output_file, "{}", chunk).log_err();
-                            }
-                            ThreadEvent::StreamedAssistantThinking(_, chunk) => {
-                                write!(&mut output_file, "{}", chunk).log_err();
-                            }
-                            ThreadEvent::UsePendingTools { tool_uses } => {
-                                writeln!(&mut output_file, "\n\nUSING TOOLS:").log_err();
-                                for tool_use in tool_uses {
-                                    writeln!(&mut output_file, "{}: {}", tool_use.name, tool_use.input)
-                                        .log_err();
-                                }
+                            ThreadEvent::StreamedAssistantText(_, _)| ThreadEvent::StreamedAssistantThinking(_, _) | ThreadEvent::UsePendingTools { .. } => {
                            }
                            ThreadEvent::ToolFinished {
                                tool_use_id,
@@ -375,11 +357,9 @@ impl Example {
                                if let Some(tool_use) = pending_tool_use {
                                    let message = format!("TOOL FINISHED: {}", tool_use.name);
                                    println!("{}{message}", log_prefix);
-                                    writeln!(&mut output_file, "\n{}", message).log_err();
                                }
                                thread.update(cx, |thread, _cx| {
                                    if let Some(tool_result) = thread.tool_result(&tool_use_id) {
-                                        writeln!(&mut output_file, "\n{}\n", tool_result.content).log_err();
                                        let mut tool_use_counts = tool_use_counts.lock().unwrap();
                                        *tool_use_counts
                                            .entry(tool_result.tool_name.clone())
@@ -402,8 +382,6 @@ impl Example {
                                }
                            }
                        }
-
-                        output_file.flush().log_err();
                    }
                }
            });
@@ -458,6 +436,16 @@ impl Example {
        repository_diff: String,
        cx: &AsyncApp,
    ) -> Result<JudgeOutput> {
+        let mut output_file = File::create(self.run_directory_path.join("judge.md"))
+            .expect("failed to create judge.md");
+        {
+            writeln!(&mut output_file, "\n\n").log_err();
+            writeln!(&mut output_file, "========================================").log_err();
+            writeln!(&mut output_file, "           REPOSITORY DIFF             ").log_err();
+            writeln!(&mut output_file, "========================================").log_err();
+            writeln!(&mut output_file, "\n{}", &repository_diff).log_err();
+        }
+
        let judge_prompt = include_str!("judge_prompt.hbs");
        let judge_prompt_name = "judge_prompt";
        let mut handlebars = Handlebars::new();
@@ -483,9 +471,6 @@ impl Example {

        let response = send_language_model_request(model, request, cx).await?;

-        let output_file_ref = self.output_file();
-        let mut output_file = output_file_ref.lock().unwrap();
-
        writeln!(&mut output_file, "\n\n").log_err();
        writeln!(&mut output_file, "========================================").log_err();
        writeln!(&mut output_file, "              JUDGE OUTPUT              ").log_err();
@@ -566,6 +551,55 @@ fn has_pending_lang_server_work(lsp_store: &Entity<LspStore>, cx: &App) -> bool
        .any(|(_, status)| !status.pending_work.is_empty())
 }

+async fn any_running(
+    language_file: &ProjectPath,
+    lsp_store: Entity<LspStore>,
+    lsp_open_handle: Entity<Entity<Buffer>>,
+    cx: &mut AsyncApp,
+) -> Result<bool> {
+    lsp_store.update(cx, |lsp_store, cx| {
+        lsp_open_handle.update(cx, |buffer, cx| {
+            buffer.update(cx, |buffer, cx| {
+                match lsp_store.language_server_state_for_local_buffer(buffer, cx) {
+                    Some(states) => {
+                        let mut any_starting = false;
+
+                        for state in states {
+                            match state {
+                                LanguageServerState::Starting { .. } => {
+                                  // A server in the "starting" state means we should keep waiting for
+                                  // it to advance to the "running" state.
+                                  any_starting = true;
+                                },
+                                LanguageServerState::Running { .. } => {
+                                    // We found one that's running, so we're done.
+                                    return Ok(true);
+                                }
+                            }
+                        }
+
+                        if any_starting {
+                            Ok(false)
+                        } else {
+                            Err(anyhow!(
+                                "`{language_file:?}` was opened to cause the language server to start, \
+                                but no language servers are registered for its buffer. \
+                                Set `require_lsp = false` in `base.toml` to skip using a language server for this file.",
+                            ))
+                        }
+                    }
+                    None => {
+                        Err(anyhow!(
+                            "`{language_file:?}` was opened locally to cause the language server to start, \
+                            but the language server's mode was not set to LspStoreMode::Local."
+                        ))
+                    }
+                }
+            })
+        })
+    })?
+}
+
 async fn query_lsp_diagnostics(project: Entity<Project>, cx: &mut AsyncApp) -> Result<String> {
    let paths_with_diagnostics = project.update(cx, |project, cx| {
        project
@@ -691,6 +725,129 @@ pub async fn send_language_model_request(
    }
 }

+struct RequestMarkdown {
+    tools: String,
+    messages: String,
+}
+
+impl RequestMarkdown {
+    fn new(
+        request: &LanguageModelRequest,
+        response_events: &[Result<LanguageModelCompletionEvent, String>],
+    ) -> Self {
+        let mut tools = String::new();
+        let mut messages = String::new();
+
+        // Print the tools
+        if !request.tools.is_empty() {
+            for tool in &request.tools {
+                write!(&mut tools, "# {}\n\n", tool.name).unwrap();
+                write!(&mut tools, "{}\n\n", tool.description).unwrap();
+                write!(
+                    &mut tools,
+                    "```json\n{}\n```\n\n",
+                    serde_json::to_string_pretty(&tool.input_schema).unwrap_or_default()
+                )
+                .unwrap();
+            }
+        }
+
+        // Print the messages
+        for message in &request.messages {
+            let role_str = match message.role {
+                Role::User => "👤 USER",
+                Role::Assistant => "🤖 ASSISTANT",
+                Role::System => "⚙️ SYSTEM",
+            };
+
+            messages.push_str(&format!("# {}\n\n", role_str));
+
+            for content in &message.content {
+                match content {
+                    MessageContent::Text(text) => {
+                        messages.push_str(text);
+                        messages.push_str("\n\n");
+                    }
+                    MessageContent::Image(_) => {
+                        messages.push_str("[IMAGE DATA]\n\n");
+                    }
+                    MessageContent::ToolUse(tool_use) => {
+                        messages.push_str(&format!(
+                            "**Tool Use**: {} (ID: {})\n",
+                            tool_use.name, tool_use.id
+                        ));
+                        messages.push_str(&format!("```json\n{}\n```\n\n", tool_use.input));
+                    }
+                    MessageContent::ToolResult(tool_result) => {
+                        messages.push_str(&format!(
+                            "**Tool Result**: {} (ID: {})\n",
+                            tool_result.tool_name, tool_result.tool_use_id
+                        ));
+                        if tool_result.is_error {
+                            messages.push_str("**ERROR:**\n");
+                        }
+                        messages.push_str(&format!("```\n{}\n```\n\n", tool_result.content));
+                    }
+                }
+            }
+        }
+
+        // Print the response events if any
+        if !response_events.is_empty() {
+            messages.push_str("# Response\n\n");
+            let mut text_buffer = String::new();
+            let mut thinking_buffer = String::new();
+
+            let flush_buffers =
+                |output: &mut String, text_buffer: &mut String, thinking_buffer: &mut String| {
+                    if !text_buffer.is_empty() {
+                        output.push_str(&format!("**Text**:\n{}\n\n", text_buffer));
+                        text_buffer.clear();
+                    }
+                    if !thinking_buffer.is_empty() {
+                        output.push_str(&format!("**Thinking**:\n{}\n\n", thinking_buffer));
+                        thinking_buffer.clear();
+                    }
+                };
+
+            for event in response_events {
+                match event {
+                    Ok(LanguageModelCompletionEvent::Text(text)) => {
+                        text_buffer.push_str(text);
+                    }
+                    Ok(LanguageModelCompletionEvent::Thinking(text)) => {
+                        thinking_buffer.push_str(text);
+                    }
+                    Ok(LanguageModelCompletionEvent::Stop(reason)) => {
+                        flush_buffers(&mut messages, &mut text_buffer, &mut thinking_buffer);
+                        messages.push_str(&format!("**Stop**: {:?}\n\n", reason));
+                    }
+                    Ok(LanguageModelCompletionEvent::ToolUse(tool_use)) => {
+                        flush_buffers(&mut messages, &mut text_buffer, &mut thinking_buffer);
+                        messages.push_str(&format!(
+                            "**Tool Use**: {} (ID: {})\n",
+                            tool_use.name, tool_use.id
+                        ));
+                        messages.push_str(&format!("```json\n{}\n```\n\n", tool_use.input));
+                    }
+                    Ok(
+                        LanguageModelCompletionEvent::UsageUpdate(_)
+                        | LanguageModelCompletionEvent::StartMessage { .. },
+                    ) => {}
+                    Err(error) => {
+                        flush_buffers(&mut messages, &mut text_buffer, &mut thinking_buffer);
+                        messages.push_str(&format!("**Error**: {}\n\n", error));
+                    }
+                }
+            }
+
+            flush_buffers(&mut messages, &mut text_buffer, &mut thinking_buffer);
+        }
+
+        Self { tools, messages }
+    }
+}
+
 #[cfg(test)]
 mod test {
    use super::*;
--- a/crates/project/src/lsp_store.rs
+++ b/crates/project/src/lsp_store.rs
@@ -6233,6 +6233,21 @@ impl LspStore {
        })
    }

+    pub fn language_server_state_for_local_buffer<'a>(
+        &'a self,
+        buffer: &Buffer,
+        cx: &mut App,
+    ) -> Option<impl Iterator<Item = &'a LanguageServerState>> {
+        let local = self.as_local()?;
+
+        Some(
+            local
+                .language_server_ids_for_buffer(buffer, cx)
+                .into_iter()
+                .filter_map(move |server_id| local.language_servers.get(&server_id)),
+        )
+    }
+
    pub fn language_servers_for_local_buffer<'a>(
        &'a self,
        buffer: &Buffer,
Author	SHA1	Message	Date
Richard Feldman	d42ce71772	Retry if language server hasn't started up yet.	2025-04-17 11:09:03 -04:00
Richard Feldman	82511d4300	Print failrues and successes	2025-04-17 11:05:25 -04:00
Antonio Scandurra	46a7cd93d9	Load grammars in the eval Co-authored-by: Nathan Sobo <nathan@zed.dev>	2025-04-16 13:43:11 -07:00
Antonio Scandurra	d253889fe3	Checkpoint Co-authored-by: Nathan Sobo <nathan@zed.dev>	2025-04-16 13:24:43 -07:00
Antonio Scandurra	29149c2eb5	Revamp tools and system prompt Co-authored-by: Nathan Sobo <nathan@zed.dev>	2025-04-16 12:40:58 -07:00
Antonio Scandurra	853f47b9a2	Checkpoint	2025-04-16 10:42:36 -07:00
Nathan Sobo	86dbbdc921	Match full paths against glob patterns in path search tool Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Antonio Scandurra <me@as-cii.com>	2025-04-15 21:33:56 -07:00
Nathan Sobo	d78cf50efb	Output repository diff in eval example log Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com> Co-authored-by: Antonio Scandurra <me@as-cii.com>	2025-04-15 21:33:32 -07:00