Commit Graph

44 Commits

Author SHA1 Message Date
Agus Zubiaga
f2ad0d716f zeta cli: Print log paths when running predict (#42396)
Release Notes:

- N/A

Co-authored-by: Michael Sloan <mgsloan@gmail.com>
Co-authored-by: Ben Kunkle <ben@zed.dev>
2025-11-11 09:56:20 -03:00
Max Brunsfeld
b607077c08 Add old_text/new_text as a zeta2 prompt format (#42171)
Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <agus@zed.dev>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Michael Sloan <mgsloan@gmail.com>
2025-11-10 15:44:54 -07:00
Agus Zubiaga
c748b177c4 zeta2 cli: Cache at LLM request level (#42371)
We'll now cache LLM responses at the request level (by hash of
URL+contents) for both context and prediction. This way we don't need to
worry about mistakenly using the cache when we change the prompt or its
components.

Release Notes:

- N/A

---------

Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-11-10 14:23:52 -03:00
Agus Zubiaga
d420dd63ed zeta: Improve unified diff prompt (#42354)
Extract some of the improvements from to the unified diff prompt from
https://github.com/zed-industries/zed/pull/42171 and adds some other
about how context work to improve the reliability of predictions.

We also now strip the `<|user_cursor|>` marker if it appears in the
output rather than failing.

Release Notes:

- N/A

---------

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-11-10 14:58:42 +00:00
Agus Zubiaga
c241eadbc3 zeta2: Targeted retrieval search (#42240)
Since we removed the filtering step during context gathering, we want
the model to perform more targeted searches. This PR tweaks search tool
schema allowing the model to search within syntax nodes such as `impl`
blocks or methods.

This is what the query schema looks like now:

```rust
/// Search for relevant code by path, syntax hierarchy, and content.
#[derive(Debug, Clone, Serialize, Deserialize, JsonSchema)]
pub struct SearchToolQuery {
    /// 1. A glob pattern to match file paths in the codebase to search in.
    pub glob: String,
    /// 2. Regular expressions to match syntax nodes **by their first line** and hierarchy.
    ///
    /// Subsequent regexes match nodes within the full content of the nodes matched by the previous regexes.
    ///
    /// Example: Searching for a `User` class
    ///     ["class\s+User"]
    ///
    /// Example: Searching for a `get_full_name` method under a `User` class
    ///     ["class\s+User", "def\sget_full_name"]
    ///
    /// Skip this field to match on content alone.
    #[schemars(length(max = 3))]
    #[serde(default)]
    pub syntax_node: Vec<String>,
    /// 3. An optional regular expression to match the final content that should appear in the results.
    ///
    /// - Content will be matched within all lines of the matched syntax nodes.
    /// - If syntax node regexes are provided, this field can be skipped to include as much of the node itself as possible.
    /// - If no syntax node regexes are provided, the content will be matched within the entire file.
    pub content: Option<String>,
}
```

We'll need to keep refining this, but the core implementation is ready.

Release Notes:

- N/A

---------

Co-authored-by: Ben <ben@zed.dev>
Co-authored-by: Max <max@zed.dev>
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-11-08 01:06:12 +00:00
Mikayla Maki
5f8226457e Automate settings registration (#42238)
Release Notes:

- N/A

---------

Co-authored-by: Nia <nia@zed.dev>
2025-11-07 22:27:14 +00:00
Max Brunsfeld
f89bb2f0d2 Small zeta cli fixes (#42170)
* Fix a panic that happened because we lost the
`ContextRetrievalStarted` debug message, so we didn't assign `t0`.
* Write the edit prediction response log file as a markdown file
containing the text, not a JSON file. We mostly always want the text
content.

Release Notes:

- N/A
2025-11-07 07:34:05 +00:00
Max Brunsfeld
5044e6ac1d zeta2: Make eval example file format more expressive (#42156)
* Allow expressing alternative possible context fetches in `Expected
Context` section
* Allow marking a subset of lines as "required" in `Expected Context`.

We still need to improve how we display the results. I've removed the
context pass/fail pretty printing for now, because it would need to be
rethought to work with the new structure, but for now I think we should
focus on getting basic predictions to run. But this is progress toward a
better structure for eval examples.

Release Notes:

- N/A

---------

Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
2025-11-06 18:05:18 -08:00
Max Brunsfeld
784fdcaee3 zeta2: Build edit prediction prompt and process model output in client (#41870)
Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <agus@zed.dev>
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
2025-11-06 18:36:58 -05:00
Oleksiy Syvokon
91d631c229 Evaluate zeta2 context retrieval and edit predictions (#41921)
This PR implements the `zeta-cli eval` command. It will:

- Run the edit prediction model if there are no cached results
- Compute precision/recall/F1 for context retrieval at the line level:
every retrieved line of context is counted as a true positive (correct
retrieval), false positive (retrieved something that was not expected),
or false negative (didn't retrieve an expected line)
- Compute similar metrics for edit predictions
- Pretty-print results, highlighting the difference between actual and
expected when printing to tty

Other changes:
- `zeta-cli predict` accepts a `--format` argument with options `md`,
`json`, `diff`
- Code restructure

Release Notes:

- N/A

---------

Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
2025-11-04 17:36:50 +00:00
Max Brunsfeld
1631cec15a Add zeta-cli subcommand for running zeta2 predictions (#41722)
This PR adds a `zeta zeta2 predict` subcommand that takes an edit
prediction example markdown file as an argument, and performs zeta2's
prediction, showing the retrieved context and the predicted edit.

* [x] Apply uncommitted diff to get repo into the right state.
* [x] Apply edits in edit history
* [x] Display predicted edits as unified diff, regardless of model
output format

Release Notes:

- N/A

---------

Co-authored-by: Agus Zubiaga <agus@zed.dev>
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Co-authored-by: Ben Kunkle <ben.kunkle@gmail.com>
2025-11-03 15:12:08 -08:00
Agus Zubiaga
06bdb28517 zeta cli: Add convert-example command (#41608)
Adds a `convert-example` subcommand to the zeta cli that converts eval
examples from/to `json`, `toml`, and `md` formats.

Release Notes:

- N/A

---------

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
2025-11-01 19:35:04 +00:00
Agus Zubiaga
60c546196a zeta2: Expose llm-based context retrieval via zeta_cli (#41584)
Release Notes:

- N/A

---------

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-10-30 21:41:09 +00:00
Agus Zubiaga
ee80ba6693 zeta2: LLM-based context gathering (#41326)
Release Notes:

- N/A

---------

Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
Co-authored-by: Max Brunsfeld <max@zed.dev>
2025-10-27 22:54:42 +00:00
Agus Zubiaga
eda7a49f01 zeta2: Max retrieved definitions option (#40515)
Release Notes:

- N/A
2025-10-20 15:26:41 +00:00
Agus Zubiaga
fba7f4d8cc zeta2: Update prompts to match training more closely (#40383)
Release Notes:

- N/A
2025-10-16 14:13:19 +00:00
Agus Zubiaga
376335496d zeta2: Numbered lines prompt format (#40218)
Adds a new `NumberedLines` format which is similar to `MarkedExcerpt`
but each line is prefixed with its line number.

Also fixes a bug where contagious snippets wouldn't get merged.

Release Notes:

- N/A

---------

Co-authored-by: Michael Sloan <mgsloan@gmail.com>
Co-authored-by: Michael <michael@zed.dev>
2025-10-15 09:35:39 -03:00
Agus Zubiaga
1bd34e0db0 zeta2 cli: Export retrieval stats data frame (#40145)
Retrieval stats will now use polars to build a big data frame for
references with the cartesian product of LSP declarations and retrieved
declaration candidates (with all their score components) and rebuilds
the stats summary on top of it.

This data frame is written to a `.parquet` file, which we can load into
advanced analytics tools (such as Metabase), so we can explore our
scoring distributions and find ways to improve retrieval, and then train
the decision tree.

Release Notes:

- N/A
2025-10-14 13:34:07 -03:00
Agus Zubiaga
6a9639f62f zeta2 cli: Split retrieval stats module (#39977)
Refactors zeta2 cli a bit. Merging this by itself to prevent conflicts.

Release Notes:

- N/A
2025-10-10 19:35:51 +00:00
Agus Zubiaga
a693d44553 zeta2 cli: Resumable LSP declarations gathering (#39828)
Gathering LSP declarations in zeta_cli can take a really long time for
big repos and has to be started from scratch if interrupted.

Instead of writing the cache file once we have walked the whole
worktree, we'll now do so incrementally as we complete each file. On
subsequent runs, we'll load as many valid declarations as has been
previously written to the cache, and then continue to request the rest
from the LSP which will append to the existing file as it makes
progress. If the last cache entry is incomplete, we'll truncate the
cache file to the end of the last valid line and continue from there, so
we can just `ctrl-c` without breaking resumability.

Release Notes:

- N/A
2025-10-10 12:44:36 -03:00
Michael Sloan
bcef3b5010 zeta2: Parse imports via Tree-sitter queries + improve zeta retrieval-stats (#39735)
Release Notes:

- N/A

---------

Co-authored-by: Max <max@zed.dev>
Co-authored-by: Agus <agus@zed.dev>
Co-authored-by: Oleksiy <oleksiy@zed.dev>
2025-10-08 12:04:06 -06:00
Michael Sloan
c61409e577 zeta_cli: Avoid unnecessary rechecks in retrieval-stats (#39267)
Before this change, it would save every buffer and wait for diagnostics.
For rust analyzer this would cause a lot of rechecking and greatly slow
down the analysis

Release Notes:

- N/A

Co-authored-by: Agus <agus@zed.dev>
2025-10-01 06:31:27 +00:00
Agus Zubiaga
df43a2d3b1 zeta2 cli: Include section ranges in new full output format (#39203)
Release Notes:

- N/A

---------

Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-09-30 14:30:13 +00:00
Michael Sloan
6af385235d zeta_cli: Add retrieval-stats command for comparing with language server symbol resolution (#39164)
Release Notes:

- N/A

---------

Co-authored-by: Agus <agus@zed.dev>
2025-09-30 08:06:31 +00:00
Michael Sloan
773850f477 zeta2: Use bounded parallelism for tree-sitter indexing + await completion in zeta_cli (#39147)
Also skips indexing files that don't have a suffix that indicates a
known language, and skips when the language doesn't have an outline
grammar.

Release Notes:

- N/A

---------

Co-authored-by: Agus <agus@zed.dev>
2025-09-29 22:15:00 +00:00
Michael Sloan
a5683f3541 zeta_cli: Add --output-format both and --prompt-format only-snippets (#38920)
These are options are probably temporary, added for use in some
experimental code

Release Notes:

- N/A

Co-authored-by: Oleksiy <oleksiy@zed.dev>
2025-09-25 22:49:36 +00:00
Max Brunsfeld
495a7b0a84 Clean up RelPath API (#38912)
Consolidate constructors and accessors.

Release Notes:

- N/A

---------

Co-authored-by: Cole Miller <cole@zed.dev>
2025-09-25 14:42:32 -07:00
Agus Zubiaga
f25ace6be0 zeta2 cli: Output raw request (#38876)
Release Notes:

- N/A

Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
Co-authored-by: Oleksiy Syvokon <oleksiy.syvokon@gmail.com>
2025-09-25 12:59:17 +00:00
Michael Sloan
8fc7bd9ae8 zeta2: Add labeled sections prompt format (#38828)
Release Notes:

- N/A

Co-authored-by: Agus <agus@zed.dev>
2025-09-25 00:07:43 +00:00
Max Brunsfeld
03f9cf4414 Represent relative paths using a dedicated, separator-agnostic type (#38744)
Closes https://github.com/zed-industries/zed/issues/38690
Closes #37353

### Background

On Windows, paths are normally separated by `\`, unlike mac and linux
where they are separated by `/`. When editing code in a project that
uses a different path style than your local system (e.g. remoting from
Windows to Linux, using WSL, and collaboration between windows and unix
users), the correct separator for a path may differ from the "native"
separator.

Previously, to work around this, Zed converted paths' separators in
numerous places. This was applied to both absolute and relative paths,
leading to incorrect conversions in some cases.

### Solution

Many code paths in Zed use paths that are *relative* to either a
worktree root or a git repository. This PR introduces a dedicated type
for these paths called `RelPath`, which stores the path in the same way
regardless of host platform, and offers `Path`-like manipulation APIs.
RelPath supports *displaying* the path using either separator, so that
we can display paths in a style that is determined at runtime based on
the current project.

The representation of absolute paths is left untouched, for now.
Absolute paths are different from relative paths because (except in
contexts where we know that the path refers to the local filesystem)
they should generally be treated as opaque strings. Currently we use a
mix of types for these paths (std::path::Path, String, SanitizedPath).

Release Notes:

- N/A

---------

Co-authored-by: Cole Miller <cole@zed.dev>
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Co-authored-by: Peter Tripp <petertripp@gmail.com>
Co-authored-by: Smit Barmase <heysmitbarmase@gmail.com>
Co-authored-by: Lukas Wirth <me@lukaswirth.dev>
2025-09-24 18:57:33 -04:00
Agus Zubiaga
831de8e48f zeta2: Include edits in prompt and add max_prompt_bytes param (#38737)
Release Notes:

- N/A

Co-authored-by: Michael Sloan <mgsloan@gmail.com>
2025-09-23 19:50:07 +00:00
Michael Sloan
9ac511e47c zeta2: Collect nearby diagnostics (#38732)
Release Notes:

- N/A

Co-authored-by: Bennet <bennet@zed.dev>
2025-09-23 12:32:17 -06:00
Michael Sloan
4532765ae8 zeta2: Add prompt planner and provide access via zeta_cli (#38691)
Release Notes:

- N/A
2025-09-23 06:20:26 +00:00
Conrad Irwin
fcdab160f9 Settings refactor (#38367)
Co-Authored-By: Ben K <ben@zed.dev>
Co-Authored-By: Anthony <anthony@zed.dev>
Co-Authored-By: Mikayla <mikayla@zed.dev>

Release Notes:

- settings: Major internal changes to settings. The primary user-facing
effect is that some settings which did not make sense in project
settings files are no-longer read from there. (For example the inline
blame settings)

---------

Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
Co-authored-by: Anthony <anthony@zed.dev>
2025-09-18 16:47:23 +00:00
Piotr Osiewicz
3cb3f01406 languages: Pass fs into the init function (#38007)
Release Notes:

- N/A

---------

Co-authored-by: Cole Miller <cole@zed.dev>
2025-09-11 19:00:51 +00:00
Michael Sloan
0e33a3afe0 zeta: Check whether data collection is allowed for recent edit history (#37680)
Also:

* Adds tests for can_collect_data.
* Temporarily removes collection of diagnostics.

Release Notes:

- Edit Prediction: Fixed a bug where requests were marked eligible for
data collection despite the recent edit history in the request involving
files that may not be open source. The requests affected by this bug
will not be used in training data.
2025-09-07 11:16:49 -06:00
Michael Sloan
fded3fbcdb zeta: Scope edit prediction event history to current project (#37595)
This change also causes Zeta to not do anything for editors that are not
associated with a project. In practice, this shouldn't affect any
behavior - those editors shouldn't have edit predictions anyway.

Release Notes:

- Edit Prediction: Requests no longer include recent edits from other
projects (other Zed windows).
2025-09-05 01:15:59 +00:00
tidely
7bdc99abc1 Fix clippy::redundant_clone lint violations (#36558)
This removes around 900 unnecessary clones, ranging from cloning a few
ints all the way to large data structures and images.

A lot of these were fixed using `cargo clippy --fix --workspace
--all-targets`, however it often breaks other lints and needs to be run
again. This was then followed up with some manual fixing.

I understand this is a large diff, but all the changes are pretty
trivial. Rust is doing some heavy lifting here for us. Once I get it up
to speed with main, I'd appreciate this getting merged rather sooner
than later.

Release Notes:

- N/A
2025-08-20 12:20:13 +02:00
Piotr Osiewicz
6825715503 Another batch of lint fixes (#36521)
- **Enable a bunch of extra lints**
- **First batch of fixes**
- **More fixes**

Release Notes:

- N/A
2025-08-19 20:33:44 +00:00
Piotr Osiewicz
9e0e233319 Fix clippy::needless_borrow lint violations (#36444)
Release Notes:

- N/A
2025-08-18 21:54:35 +00:00
Michael Sloan
aedf195e97 Use distinct user agents in agent eval and zeta-cli (#35897)
Agent eval now also uses a proper Zed version

Release Notes:

- N/A
2025-08-08 23:26:38 +00:00
Michael Sloan
a1bc6ee75e zeta: Only send outline and diagnostics when data collection is enabled (#35896)
This data is not currently used by edit predictions - it is only useful
when `can_collect_data == true`.

Release Notes:

- N/A
2025-08-08 22:16:13 +00:00
Michael Sloan
68c24655e9 zeta: Collect git sha / remote urls when data collection from OSS is enabled (#35514)
Release Notes:

- Edit Prediction: Added Git info to edit predictions requests (only
sent for opensource projects when data collection is enabled). The sent
Git info is the SHA of the current commit and the URLs for the `origin`
and `upstream` remotes.
2025-08-04 14:18:06 -06:00
Michael Sloan
6052115825 zeta: Add CLI tool for querying edit predictions and related context (#35491)
Release Notes:

- N/A

---------

Co-authored-by: Marshall Bowers <git@maxdeviant.com>
2025-08-01 21:08:09 +00:00