mixa/zed

Files

Oleksiy Syvokon 6420df3975 eval: Count execution errors as failures (#30712 )

- Evals returning an error (e.g., LLM API format mismatch) were silently
skipped in the aggregated results. Now we count them as a failure (0%
success score).

- Setting the `VERBOSE` environment variable to something non-empty
disables string truncation

Release Notes:

- N/A

2025-05-14 20:44:19 +03:00

docs

eval: Add HTML overview for evaluation runs (#29413 )

2025-04-25 17:49:05 +03:00

src

eval: Count execution errors as failures (#30712 )

2025-05-14 20:44:19 +03:00

.gitignore

Add judge to new eval + provide LSP diagnostics (#28713 )

2025-04-14 20:18:47 +00:00

Cargo.toml

context_store: Refactor state management (#29910 )

2025-05-05 21:36:12 +02:00

LICENSE-GPL

Lay the groundwork for a Rust-based eval (#28488 )

2025-04-10 04:45:27 +00:00

README.md

eval: Add support for reading from a .env file (#29426 )

2025-04-25 15:53:02 +00:00

runner_settings.json

Introduce a new StreamingEditFileTool (#29733 )

2025-05-01 17:37:43 +02:00

README.md

Eval

This eval assumes the working directory is the root of the repository. Run it with:

cargo run -p eval

The eval will optionally read a .env file in crates/eval if you need it to set environment variables, such as API keys.

Explorer Tool

The explorer tool generates a self-contained HTML view from one or more thread JSON file. It provides a visual interface to explore the agent thread, including tool calls and results. See ./docs/explorer.md for more details.

Usage

cargo run -p eval --bin explorer -- --input <path-to-json-files> --output <output-html-path>

Example:

cargo run -p eval --bin explorer -- --input ./runs/2025-04-23_15-53-30/fastmcp_bugifx/*/last.messages.json --output /tmp/explorer.html