Oleksiy Syvokon 5e5a124ae1 evals: Eval for creating an empty file (#31034)
This eval checks that Edit Agent can create an empty file without
writing its thoughts into it. This issue is not specific to empty files,
but it's easier to reproduce with them.

For some mysterious reason, I could easily reproduce this issue roughly
90% of the time in actual Zed. However, once I extract the exact LLM
request before the failure point and generate from that, the
reproduction rate drops to 2%!

Things I've tried to make sure it's not a fluke: disabling prompt
caching, capturing the LLM request via a proxy server, running the
prompt on Claude separately from evals. Every time it was mostly giving
good outcomes, which doesn't match my actual experience in Zed.

At some point I discovered that simply adding one insignificant space or
a newline to the prompt suddenly results in an outcome I tried to
reproduce almost perfectly.

This weirdness happens even outside the Zed code base and even when
using a different subscription. The result is the same: an extra newline
or space changes the model behavior significantly enough, so that the
pass rate drops from 99% to 0-3%

I have no explanation to this.


Release Notes:

- N/A
2025-05-20 20:03:08 +03:00
2025-05-17 12:34:42 +00:00
2025-05-17 06:31:56 +00:00
2025-04-24 21:29:33 +00:00
WIP
2023-12-14 09:25:14 -07:00
2025-03-10 01:06:11 -07:00
2024-09-30 17:46:21 -04:00
2025-02-04 09:02:59 -05:00
2025-02-04 09:02:59 -05:00
2024-08-08 17:21:38 -07:00
2025-03-10 01:06:11 -07:00

Zed

CI

Welcome to Zed, a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.


Installation

On macOS and Linux you can download Zed directly or install Zed via your local package manager.

Other platforms are not yet available:

Developing Zed

Contributing

See CONTRIBUTING.md for ways you can contribute to Zed.

Also... we're hiring! Check out our jobs page for open roles.

Licensing

License information for third party dependencies must be correctly provided for CI to pass.

We use cargo-about to automatically comply with open source licenses. If CI is failing, check the following:

  • Is it showing a no license specified error for a crate you've created? If so, add publish = false under [package] in your crate's Cargo.toml.
  • Is the error failed to satisfy license requirements for a dependency? If so, first determine what license the project has and whether this system is sufficient to comply with this license's requirements. If you're unsure, ask a lawyer. Once you've verified that this system is acceptable add the license's SPDX identifier to the accepted array in script/licenses/zed-licenses.toml.
  • Is cargo-about unable to find the license for a dependency? If so, add a clarification field at the end of script/licenses/zed-licenses.toml, as specified in the cargo-about book.
Description
Code at the speed of thought – Zed is a high-performance, multiplayer code editor from the creators of Atom and Tree-sitter.
Readme 586 MiB
Languages
Rust 94.7%
JSON-with-Comments 3.1%
Inno Setup 0.6%
Scheme 0.5%
Shell 0.3%
Other 0.4%