This PR implements the `zeta-cli eval` command. It will:
- Run the edit prediction model if there are no cached results
- Compute precision/recall/F1 for context retrieval at the line level:
every retrieved line of context is counted as a true positive (correct
retrieval), false positive (retrieved something that was not expected),
or false negative (didn't retrieve an expected line)
- Compute similar metrics for edit predictions
- Pretty-print results, highlighting the difference between actual and
expected when printing to tty
Other changes:
- `zeta-cli predict` accepts a `--format` argument with options `md`,
`json`, `diff`
- Code restructure
Release Notes:
- N/A
---------
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Co-authored-by: Agus Zubiaga <agus@zed.dev>
This PR adds a `zeta zeta2 predict` subcommand that takes an edit
prediction example markdown file as an argument, and performs zeta2's
prediction, showing the retrieved context and the predicted edit.
* [x] Apply uncommitted diff to get repo into the right state.
* [x] Apply edits in edit history
* [x] Display predicted edits as unified diff, regardless of model
output format
Release Notes:
- N/A
---------
Co-authored-by: Agus Zubiaga <agus@zed.dev>
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Co-authored-by: Ben Kunkle <ben.kunkle@gmail.com>
Adds a `convert-example` subcommand to the zeta cli that converts eval
examples from/to `json`, `toml`, and `md` formats.
Release Notes:
- N/A
---------
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
We've been considering removing workspace-hack for a couple reasons:
- Lukas ran into a situation where its build script seemed to be causing
spurious rebuilds. This seems more likely to be a cargo bug than an
issue with workspace-hack itself (given that it has an empty build
script), but we don't necessarily want to take the time to hunt that
down right now.
- Marshall mentioned hakari interacts poorly with automated crate
updates (in our case provided by rennovate) because you'd need to have
`cargo hakari generate && cargo hakari manage-deps` after their changes
and we prefer to not have actions that make commits.
Currently removing workspace-hack causes our workspace to grow from
~1700 to ~2000 crates being built (depending on platform), which is
mainly a problem when you're building the whole workspace or running
tests across the the normal and remote binaries (which is where
feature-unification nets us the most sharing). It doesn't impact
incremental times noticeably when you're just iterating on `-p zed`, and
we'll hopefully get these savings back in the future when
rust-lang/cargo#14774 (which re-implements the functionality of hakari)
is finished.
Release Notes:
- N/A
Adds a new `NumberedLines` format which is similar to `MarkedExcerpt`
but each line is prefixed with its line number.
Also fixes a bug where contagious snippets wouldn't get merged.
Release Notes:
- N/A
---------
Co-authored-by: Michael Sloan <mgsloan@gmail.com>
Co-authored-by: Michael <michael@zed.dev>
Retrieval stats will now use polars to build a big data frame for
references with the cartesian product of LSP declarations and retrieved
declaration candidates (with all their score components) and rebuilds
the stats summary on top of it.
This data frame is written to a `.parquet` file, which we can load into
advanced analytics tools (such as Metabase), so we can explore our
scoring distributions and find ways to improve retrieval, and then train
the decision tree.
Release Notes:
- N/A
Gathering LSP declarations in zeta_cli can take a really long time for
big repos and has to be started from scratch if interrupted.
Instead of writing the cache file once we have walked the whole
worktree, we'll now do so incrementally as we complete each file. On
subsequent runs, we'll load as many valid declarations as has been
previously written to the cache, and then continue to request the rest
from the LSP which will append to the existing file as it makes
progress. If the last cache entry is incomplete, we'll truncate the
cache file to the end of the last valid line and continue from there, so
we can just `ctrl-c` without breaking resumability.
Release Notes:
- N/A
Before this change, it would save every buffer and wait for diagnostics.
For rust analyzer this would cause a lot of rechecking and greatly slow
down the analysis
Release Notes:
- N/A
Co-authored-by: Agus <agus@zed.dev>
Also skips indexing files that don't have a suffix that indicates a
known language, and skips when the language doesn't have an outline
grammar.
Release Notes:
- N/A
---------
Co-authored-by: Agus <agus@zed.dev>
Closes https://github.com/zed-industries/zed/issues/38690Closes#37353
### Background
On Windows, paths are normally separated by `\`, unlike mac and linux
where they are separated by `/`. When editing code in a project that
uses a different path style than your local system (e.g. remoting from
Windows to Linux, using WSL, and collaboration between windows and unix
users), the correct separator for a path may differ from the "native"
separator.
Previously, to work around this, Zed converted paths' separators in
numerous places. This was applied to both absolute and relative paths,
leading to incorrect conversions in some cases.
### Solution
Many code paths in Zed use paths that are *relative* to either a
worktree root or a git repository. This PR introduces a dedicated type
for these paths called `RelPath`, which stores the path in the same way
regardless of host platform, and offers `Path`-like manipulation APIs.
RelPath supports *displaying* the path using either separator, so that
we can display paths in a style that is determined at runtime based on
the current project.
The representation of absolute paths is left untouched, for now.
Absolute paths are different from relative paths because (except in
contexts where we know that the path refers to the local filesystem)
they should generally be treated as opaque strings. Currently we use a
mix of types for these paths (std::path::Path, String, SanitizedPath).
Release Notes:
- N/A
---------
Co-authored-by: Cole Miller <cole@zed.dev>
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Co-authored-by: Peter Tripp <petertripp@gmail.com>
Co-authored-by: Smit Barmase <heysmitbarmase@gmail.com>
Co-authored-by: Lukas Wirth <me@lukaswirth.dev>
Co-Authored-By: Ben K <ben@zed.dev>
Co-Authored-By: Anthony <anthony@zed.dev>
Co-Authored-By: Mikayla <mikayla@zed.dev>
Release Notes:
- settings: Major internal changes to settings. The primary user-facing
effect is that some settings which did not make sense in project
settings files are no-longer read from there. (For example the inline
blame settings)
---------
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
Co-authored-by: Anthony <anthony@zed.dev>
Also:
* Adds tests for can_collect_data.
* Temporarily removes collection of diagnostics.
Release Notes:
- Edit Prediction: Fixed a bug where requests were marked eligible for
data collection despite the recent edit history in the request involving
files that may not be open source. The requests affected by this bug
will not be used in training data.
This change also causes Zeta to not do anything for editors that are not
associated with a project. In practice, this shouldn't affect any
behavior - those editors shouldn't have edit predictions anyway.
Release Notes:
- Edit Prediction: Requests no longer include recent edits from other
projects (other Zed windows).
This removes around 900 unnecessary clones, ranging from cloning a few
ints all the way to large data structures and images.
A lot of these were fixed using `cargo clippy --fix --workspace
--all-targets`, however it often breaks other lints and needs to be run
again. This was then followed up with some manual fixing.
I understand this is a large diff, but all the changes are pretty
trivial. Rust is doing some heavy lifting here for us. Once I get it up
to speed with main, I'd appreciate this getting merged rather sooner
than later.
Release Notes:
- N/A
Release Notes:
- Edit Prediction: Added Git info to edit predictions requests (only
sent for opensource projects when data collection is enabled). The sent
Git info is the SHA of the current commit and the URLs for the `origin`
and `upstream` remotes.