Thread stacks in rust by default have 2 megabytes of stack which for
sumtrees (or ropes in this case) can easily be exceeded depending on the
workload.
Release Notes:
- Fixed stack overflows when constructing large ropes
Using compiler explorer I saw that the compiler wasn't clever enough to
optimise away the branches in the masking code. I thought the compiler
would have a better chance if we always branched, which [turned out to
be the case](https://godbolt.org/z/PM594Pz18).
Running the benchmarks the biggest benefit I saw was:
```
push/65536 time: [2.9067 ms 2.9243 ms 2.9417 ms]
thrpt: [21.246 MiB/s 21.373 MiB/s 21.502 MiB/s]
change:
time: [-8.3452% -7.2617% -6.2009%] (p = 0.00 < 0.05)
thrpt: [+6.6108% +7.8303% +9.1050%]
Performance has improved.
```
But I did also see some regressions:
```
slice/4096 time: [66.195 µs 66.815 µs 67.448 µs]
thrpt: [57.915 MiB/s 58.464 MiB/s 59.012 MiB/s]
change:
time: [+3.7131% +5.1698% +6.6971%] (p = 0.00 < 0.05)
thrpt: [-6.2768% -4.9157% -3.5802%]
Performance has regressed.
```
Release Notes:
- N/A
Fixes a regression introduced in
https://github.com/zed-industries/zed/pull/39857. As for the exact
reason this causes this issue I am not yet sure will investigate (as per
the todos in code)
Fixes ZED-23R
Release Notes:
- N/A *or* Added/Fixed/Improved ...
Closes https://github.com/zed-industries/zed/issues/38453
Current `Buffer` API only allows getting buffer text with `\n` line
breaks — even if the `\r\n` was used in the original file's text.
This it not correct in certain cases like LSP formatting, where language
servers need to have original document context for e.g. formatting
purposes.
Added new `Buffer` API, replaced all buffer LSP registration places with
the new one and added more tests.
Release Notes:
- Fixed ESLint linebreak-style errors by preserving line endings in LSP
communication
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Kirill Bulatov <kirill@zed.dev>
We still see a bunch of panics here but the default slicing panic
doesn't tell which side of the range is bad
Release Notes:
- N/A *or* Added/Fixed/Improved ...
This simplifies some code and is also more correct in some others (I
believe some of these might've overflowed causing panics in sentry)
Release Notes:
- N/A *or* Added/Fixed/Improved ...
Follows on from https://github.com/zed-industries/zed/pull/39949.
Again I'm not 100% sure of the intent but I think this is a fix:
`generate_random_string(rng, 4096)` would previously give you a string
of 4096 *chars* which could be anywhere between 4kB and 16kB in bytes.
This seems probably not what was intended, because Ropes generally work
in bytes not chars, including for the offsets used to index into them.
This seems to possibly cause a _regression_ in benchmark performance,
which is surprising because it should generally cause smaller test data.
But, possibly it's doing better at exercising different paths?
cc @mrnugget
Release Notes:
- N/A
In an attempt to figure out what's wrong with `point_to_buffer_offset`
for crash https://github.com/zed-industries/zed/issues/40453. We want to
know which branch among these two is the bad one.
Release Notes:
- N/A
Co-authored-by: Lukas Wirth <lukas@zed.dev>
Reduces peak stack usage in these functions and should generally be a
bit performant.
Display map benchmark results
```
To tab point/to_tab_point/1024
time: [531.40 ns 532.10 ns 532.97 ns]
change: [-2.1824% -2.0054% -1.8125%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
To fold point/to_fold_point/1024
time: [530.81 ns 531.30 ns 531.80 ns]
change: [-2.0295% -1.9054% -1.7716%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
```
Release Notes:
- N/A *or* Added/Fixed/Improved ...
We've been considering removing workspace-hack for a couple reasons:
- Lukas ran into a situation where its build script seemed to be causing
spurious rebuilds. This seems more likely to be a cargo bug than an
issue with workspace-hack itself (given that it has an empty build
script), but we don't necessarily want to take the time to hunt that
down right now.
- Marshall mentioned hakari interacts poorly with automated crate
updates (in our case provided by rennovate) because you'd need to have
`cargo hakari generate && cargo hakari manage-deps` after their changes
and we prefer to not have actions that make commits.
Currently removing workspace-hack causes our workspace to grow from
~1700 to ~2000 crates being built (depending on platform), which is
mainly a problem when you're building the whole workspace or running
tests across the the normal and remote binaries (which is where
feature-unification nets us the most sharing). It doesn't impact
incremental times noticeably when you're just iterating on `-p zed`, and
we'll hopefully get these savings back in the future when
rust-lang/cargo#14774 (which re-implements the functionality of hakari)
is finished.
Release Notes:
- N/A
Using a seeded PRNG to produce consistent test data and reduce
variability makes sense.
However, the way it was used previously, by always cloning the RNG,
means that every generated string is the same, and every offset is the
same. After this change, the tested value stream should still be the same on
each run of the benchmark, but the values within each run will vary.
The `generate_random_text` measured in chars also seems possibly
inconsistent with later comments about it being a number of bytes.
Release Notes:
- N/A
The previous code clones all the rope chunks, but the rope is passed by
value so the chunks are about to be dropped anyhow.
I thought this may slightly help performance but it has no very
noticeable effect, with a mix of small changes up and down probably
attributable to noise on my machine?
I wonder if the benchmarks might just not hit this path well? I'm
looking into that separately (see #39949, #39951), but this seemed clear
enough to be worth proposing by itself.
Incidentally it surprised me this did not generate a warning already,
but I think it's because we're taking only one field from the struct
that's about to be dropped:
https://github.com/rust-lang/rust-clippy/issues/7429.
<details>
```
Running benches/rope_benchmark.rs (target/release/deps/rope_benchmark-4c5c71666e7c1729)
push/4096 time: [362.58 µs 366.40 µs 370.69 µs]
thrpt: [10.538 MiB/s 10.661 MiB/s 10.773 MiB/s]
change:
time: [+0.0646% +1.2362% +2.4681%] (p = 0.04 < 0.05)
thrpt: [-2.4086% -1.2211% -0.0646%]
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
7 (7.00%) high mild
3 (3.00%) high severe
Benchmarking push/65536: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
push/65536 time: [1.6185 ms 1.6353 ms 1.6557 ms]
thrpt: [37.747 MiB/s 38.219 MiB/s 38.616 MiB/s]
change:
time: [+1.9135% +2.9548% +3.9838%] (p = 0.00 < 0.05)
thrpt: [-3.8312% -2.8700% -1.8776%]
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe
append/4096 time: [1.1052 µs 1.1104 µs 1.1162 µs]
thrpt: [3.4177 GiB/s 3.4354 GiB/s 3.4516 GiB/s]
change:
time: [-2.5075% -0.3430% +1.5095%] (p = 0.76 > 0.05)
thrpt: [-1.4871% +0.3441% +2.5720%]
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
append/65536 time: [12.404 µs 12.444 µs 12.487 µs]
thrpt: [4.8881 GiB/s 4.9049 GiB/s 4.9204 GiB/s]
change:
time: [-0.1408% +0.5573% +1.2016%] (p = 0.10 > 0.05)
thrpt: [-1.1874% -0.5542% +0.1410%]
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe
slice/4096 time: [32.963 µs 33.185 µs 33.466 µs]
thrpt: [116.72 MiB/s 117.71 MiB/s 118.51 MiB/s]
change:
time: [-6.4303% -5.1234% -3.6394%] (p = 0.00 < 0.05)
thrpt: [+3.7769% +5.4000% +6.8722%]
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
slice/65536 time: [668.67 µs 670.49 µs 672.65 µs]
thrpt: [92.916 MiB/s 93.215 MiB/s 93.469 MiB/s]
change:
time: [+0.0846% +0.5573% +1.0199%] (p = 0.02 < 0.05)
thrpt: [-1.0096% -0.5542% -0.0845%]
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
bytes_in_range/4096 time: [5.1513 µs 5.1594 µs 5.1674 µs]
thrpt: [755.95 MiB/s 757.12 MiB/s 758.31 MiB/s]
change:
time: [-4.9410% -4.2051% -3.3835%] (p = 0.00 < 0.05)
thrpt: [+3.5020% +4.3897% +5.1978%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high severe
bytes_in_range/65536 time: [139.87 µs 140.17 µs 140.55 µs]
thrpt: [444.67 MiB/s 445.89 MiB/s 446.85 MiB/s]
change:
time: [-0.6267% -0.0474% +0.4635%] (p = 0.87 > 0.05)
thrpt: [-0.4614% +0.0475% +0.6306%]
No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
7 (7.00%) high mild
2 (2.00%) high severe
chars/4096 time: [1.0243 µs 1.0250 µs 1.0257 µs]
thrpt: [3.7190 GiB/s 3.7217 GiB/s 3.7243 GiB/s]
change:
time: [+4.0106% +4.5396% +5.3062%] (p = 0.00 < 0.05)
thrpt: [-5.0388% -4.3425% -3.8559%]
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
2 (2.00%) high mild
8 (8.00%) high severe
chars/65536 time: [17.540 µs 17.576 µs 17.614 µs]
thrpt: [3.4652 GiB/s 3.4727 GiB/s 3.4797 GiB/s]
change:
time: [+2.5201% +3.3922% +4.1639%] (p = 0.00 < 0.05)
thrpt: [-3.9974% -3.2809% -2.4581%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
clip_point/4096 time: [58.857 µs 59.162 µs 59.490 µs]
thrpt: [65.662 MiB/s 66.026 MiB/s 66.368 MiB/s]
change:
time: [+1.6900% +2.8088% +3.8521%] (p = 0.00 < 0.05)
thrpt: [-3.7092% -2.7321% -1.6619%]
Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
clip_point/65536 time: [1.8609 ms 1.8633 ms 1.8660 ms]
thrpt: [33.494 MiB/s 33.543 MiB/s 33.585 MiB/s]
change:
time: [+0.0577% +0.2579% +0.4495%] (p = 0.01 < 0.05)
thrpt: [-0.4474% -0.2572% -0.0577%]
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
point_to_offset/4096 time: [19.246 µs 19.287 µs 19.331 µs]
thrpt: [202.07 MiB/s 202.54 MiB/s 202.97 MiB/s]
change:
time: [+1.1073% +2.9754% +5.3818%] (p = 0.00 < 0.05)
thrpt: [-5.1069% -2.8894% -1.0951%]
Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe
Benchmarking point_to_offset/65536: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
point_to_offset/65536 time: [741.87 µs 743.28 µs 744.74 µs]
thrpt: [83.922 MiB/s 84.086 MiB/s 84.247 MiB/s]
change:
time: [+5.0577% +5.6751% +6.3133%] (p = 0.00 < 0.05)
thrpt: [-5.9384% -5.3703% -4.8142%]
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
4 (4.00%) high mild
3 (3.00%) high severe
cursor/4096 time: [27.407 µs 27.483 µs 27.600 µs]
thrpt: [141.53 MiB/s 142.13 MiB/s 142.53 MiB/s]
change:
time: [-7.1479% -6.2928% -5.6378%] (p = 0.00 < 0.05)
thrpt: [+5.9747% +6.7154% +7.6981%]
Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
1 (1.00%) high mild
8 (8.00%) high severe
cursor/65536 time: [848.91 µs 849.70 µs 850.59 µs]
thrpt: [73.478 MiB/s 73.555 MiB/s 73.624 MiB/s]
change:
time: [+0.0281% +0.3487% +0.6686%] (p = 0.04 < 0.05)
thrpt: [-0.6642% -0.3475% -0.0281%]
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
5 (5.00%) high mild
4 (4.00%) high severe
```
</details>
Release Notes:
- N/A
I was looking at the rope implementation and some of the existing bugs
that crash in there, and I ran cargo-mutants to inspect test coverage. I
was motivated by bugs like
https://github.com/zed-industries/zed/issues/38556 but this doesn't fix
it and the bug may well be at a higher layer.
This PR adds coverage for a few functions that aren't tested today. I
didn't find any actual bugs yet.
I can see this tree is pretty sparse on docstrings so if you think these
are too verbose I can take them out or drop the whole PR.
Release Notes:
- N/A
This panic only happened in debug builds because of a left shift
overflow. The slice range has bounds between 0 and 128. The 128 case
caused the overflow.
We now do an unbounded shift and a wrapped sub to get the correct
bitmask. If the slice range is 128 left, it should make 1 zero. Then the
wrapped sub would flip all bits, which is expected behavior.
Release Notes:
- N/A
Co-authored-by: Nia <nia@zed.dev>
## Context
While looking into: #32051 and #16120 with instruments, I noticed that
`TabSnapshot::to_tab_point` and `TabSnapshot::to_fold_point` are a
common bottleneck between the two issues. This PR takes the first steps
into closing the stated issues by improving the performance of both
those functions.
### Method
`to_tab_point` and `to_fold_point` iterate through each character in
their rows to find tab characters and translate those characters into
their respective transformations. This PR changes this iteration to take
advantage of the tab character bitmap in the `Rope` data structure and
goes directly to each tab character when iterating.
The tab bitmap is now passed from each layer in-between the `Rope` to
the `TabMap`.
### Testing
I added several randomized tests to ensure that the new `to_tab_point`
and `to_fold_point` functions have the same behavior as the old methods
they're replacing. I also added `test_random_chunk_bitmap` on each layer
the tab bitmap is passed up to the `TabMap` to make sure that the bitmap
being passed is transformed correctly between the layers of
`DisplayMap`.
`test_random_chunk_bitmap` was added to these layers:
- buffer
- multi buffer
- custom_highlights
- inlay_map
- fold_map
## Benchmarking
I setup benchmarks with criterion that is runnable via `cargo bench -p
editor --profile=release-fast`. When benchmarking I had my laptop
plugged in and did so from the terminal with a minimal amount of
processes running. I'm also on a m4 max
### Results
#### To Tab Point
Went from completing 6.8M iterations in 5s with an average time of
`736.13 ns` to `683.38 ns` which is a `-7.1875%` improvement
#### To Fold Point
Went from completing 6.8M iterations in 5s with an average time of
`736.55 ns` to `682.40 ns` which is a `-7.1659%` improvement
#### Editor render
Went from having an average render time of `62.561 µs` to `57.216 µs`
which is a `-8.8248%` improvement
#### Build Buffer with one long line
Went from having an average buffer build time of `3.2549 ms` to `3.2635
ms` which is a `+0.2151%` regression within the margin of error
#### Editor with 1000 multi cursor input
Went from having an average edit time of `133.05 ms` to `122.96 ms`
which is a `-7.5776%` improvement
Release Notes:
- N/A
---------
Co-authored-by: Remco Smits <djsmits12@gmail.com>
Co-authored-by: Cole Miller <cole@zed.dev>
Co-authored-by: Piotr Osiewicz <24362066+osiewicz@users.noreply.github.com>
Extracts and cleans up GPUI's scheduler code into a new `scheduler`
crate, making it pluggable by external runtimes. This will enable
deterministic integration testing with cloud components by providing a
unified test scheduler across Zed and backend code. In Zed, it will
replace the existing GPUI scheduler for consistent async task management
across platforms.
## Changes
- **Core Implementation**: `TestScheduler` with seed-based
randomization, session tracking (`SessionId`), and foreground/background
task separation for reproducible testing.
- **Executors**: `ForegroundExecutor` (!Send, thread-local) and
`BackgroundExecutor` (Send, with blocking/timeout support) as
GPUI-compatible wrappers.
- **Clock and Timer**: Controllable `TestClock` and future-based `Timer`
for time-sensitive tests.
- **Testing APIs**: `once()`, `with_seed()`, and `many()` methods for
configurable test runs.
- **Dependencies**: Added `async-task`, `chrono`, `futures`, etc., with
updates to `Cargo.toml` and lock file.
## Benefits
- **Integration Testing**: Facilitates reliable async tests involving
cloud sessions, reducing flakiness via deterministic execution.
- **Pluggability**: Trait-based design (`Scheduler`) allows easy
integration into non-GPUI runtimes while maintaining GPUI compatibility.
- **Cleanup**: Refactors GPUI scheduler logic for clarity, correctness
(no `unwrap()`, proper error handling), and extensibility.
Follows Rust guidelines; run `./script/clippy` for verification.
- [x] Define and test a core scheduler that we think can power our cloud
code and GPUI
- [ ] Replace GPUI's scheduler
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
This removes around 900 unnecessary clones, ranging from cloning a few
ints all the way to large data structures and images.
A lot of these were fixed using `cargo clippy --fix --workspace
--all-targets`, however it often breaks other lints and needs to be run
again. This was then followed up with some manual fixing.
I understand this is a large diff, but all the changes are pretty
trivial. Rust is doing some heavy lifting here for us. Once I get it up
to speed with main, I'd appreciate this getting merged rather sooner
than later.
Release Notes:
- N/A
This is a bit of a readability improvement IMHO; I often find myself
confused when dealing when dimension pairs, as there's no easy way to
jump to the implementation of a dimension for tuples to remind myself
for the n-th time how exactly that impl works. Now it should be possible
to jump directly to that impl.
Another bonus is that Dimension supports 3-ary tuples as well - by using
a () as a default value of a 3rd dimension.
Release Notes:
- N/A
This gets rid of the need to pass context to all cursor functions. In
practice context is always immutable when interacting with cursors.
A nicety of this is in the follow-up PR we will be able to implement
Iterator for all Cursors/filter cursors (hell, we may be able to get rid
of filter cursor altogether, as it is just a custom `filter` impl on
iterator trait).
Release Notes:
- N/A
Precursor to other optimizations, but this already gets us a big
improvement.
Wasm compilation can easily be parallelized, and with all of the cores
on my M4 Max this already gets us an 86% improvement, bringing loading
an extension down to <9ms.
Not all setups will see this much improvement, but it will use the cores
available (it just uses rayon under the hood like we do elsewhere).
Since we load extensions in sequence, this should have a nice impact for
users with a lot of extensions.
#### Before
```
Benchmarking load: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, or reduce sample count to 70.
load time: [64.859 ms 64.935 ms 65.027 ms]
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
```
#### After
```
load time: [8.8685 ms 8.9012 ms 8.9344 ms]
change: [-86.347% -86.292% -86.237%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
```
Release Notes:
- N/A
This adds a "workspace-hack" crate, see
[mozilla's](https://hg.mozilla.org/mozilla-central/file/3a265fdc9f33e5946f0ca0a04af73acd7e6d1a39/build/workspace-hack/Cargo.toml#l7)
for a concise explanation of why this is useful. For us in practice this
means that if I were to run all the tests (`cargo nextest r
--workspace`) and then `cargo r`, all the deps from the previous cargo
command will be reused. Before this PR it would rebuild many deps due to
resolving different sets of features for them. For me this frequently
caused long rebuilds when things "should" already be cached.
To avoid manually maintaining our workspace-hack crate, we will use
[cargo hakari](https://docs.rs/cargo-hakari) to update the build files
when there's a necessary change. I've added a step to CI that checks
whether the workspace-hack crate is up to date, and instructs you to
re-run `script/update-workspace-hack` when it fails.
Finally, to make sure that people can still depend on crates in our
workspace without pulling in all the workspace deps, we use a `[patch]`
section following [hakari's
instructions](https://docs.rs/cargo-hakari/0.9.36/cargo_hakari/patch_directive/index.html)
One possible followup task would be making guppy use our
`rust-toolchain.toml` instead of having to duplicate that list in its
config, I opened an issue for that upstream: guppy-rs/guppy#481.
TODO:
- [x] Fix the extension test failure
- [x] Ensure the dev dependencies aren't being unified by Hakari into
the main dependencies
- [x] Ensure that the remote-server binary continues to not depend on
LibSSL
Release Notes:
- N/A
---------
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
Closes#4271
Implemented by kicking of a task on the main thread at the end of
`Editor::handle_input` which waits for the buffer to be re-parsed before
checking if JSX tag completion possible based on the recent edits, and
if it is then it spawns a task on the background thread to generate the
edits to be auto-applied to the buffer
Release Notes:
- Added support for auto-closing of JSX tags
---------
Co-authored-by: Cole Miller <cole@zed.dev>
Co-authored-by: Max Brunsfeld <max@zed.dev>
Co-authored-by: Marshall Bowers <git@maxdeviant.com>
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Peter Tripp <peter@zed.dev>