Commit Graph

11 Commits

Author SHA1 Message Date
Vasyl Protsiv
138286f3b1 sum_tree: Make SumTree::append run in logarithmic time (#43349)
The `SumTree::append` method is slow when appending large trees to small
trees. The reason is this code here:

f57f4cd360/crates/sum_tree/src/sum_tree.rs (L628-L630)

`append` is called recursively until `self` and `other` have the same
height, effectively making this code `O(log^2 n)` in the number of
leaves of `other` tree in the worst case.

There are no algorithmic reasons why appending large trees must be this
much slower.

This PR proves it by providing implementation of `append` that works in
logarithmic time regardless if `self` is smaller or larger than `other`.

The helper method `append_large` has the symmetric logic to
`push_tree_recursive` but moves the (unlikely) case of merging
underflowing node in a separate helper function to reduce stack usage. I
am a bit unsure about some implementation choices made in
`push_tree_recursive` and would like to discuss some of these later, but
at the moment I didn't change anything there and tried to follow the
same logic in `append_large`.

We might also consider adding `push_front`/`prepend` methods to
`SumTree`.

I did not find a good benchmark that covers this case so I added a new
one to rope benchmarks.

<details>
<summary>cargo bench (compared to current main)</summary>

```
     Running benches\rope_benchmark.rs (D:\zed\target\release\deps\rope_benchmark-59c669d2895cd2c4.exe)
Gnuplot not found, using plotters backend
push/4096               time:   [195.67 µs 195.75 µs 195.86 µs]
                        thrpt:  [19.944 MiB/s 19.955 MiB/s 19.964 MiB/s]
                 change:
                        time:   [+0.2162% +0.3040% +0.4057%] (p = 0.00 < 0.05)
                        thrpt:  [-0.4040% -0.3030% -0.2157%]
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  6 (6.00%) high severe
Benchmarking push/65536: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.8s, enable flat sampling, or reduce sample count to 50.
push/65536              time:   [1.4431 ms 1.4485 ms 1.4546 ms]
                        thrpt:  [42.966 MiB/s 43.147 MiB/s 43.310 MiB/s]
                 change:
                        time:   [-3.2257% -1.2013% +0.6431%] (p = 0.27 > 0.05)
                        thrpt:  [-0.6390% +1.2159% +3.3332%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  5 (5.00%) high severe

append/4096             time:   [15.107 µs 15.128 µs 15.149 µs]
                        thrpt:  [257.86 MiB/s 258.22 MiB/s 258.58 MiB/s]
                 change:
                        time:   [+0.9650% +1.5256% +1.9057%] (p = 0.00 < 0.05)
                        thrpt:  [-1.8701% -1.5026% -0.9557%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high severe
append/65536            time:   [1.2870 µs 1.4496 µs 1.6484 µs]
                        thrpt:  [37.028 GiB/s 42.106 GiB/s 47.425 GiB/s]
                 change:
                        time:   [-28.699% -16.073% -0.3133%] (p = 0.04 < 0.05)
                        thrpt:  [+0.3142% +19.151% +40.250%]
                        Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) high mild
  16 (16.00%) high severe

slice/4096              time:   [30.580 µs 30.611 µs 30.639 µs]
                        thrpt:  [127.49 MiB/s 127.61 MiB/s 127.74 MiB/s]
                 change:
                        time:   [-2.2958% -0.9674% -0.1835%] (p = 0.08 > 0.05)
                        thrpt:  [+0.1838% +0.9769% +2.3498%]
                        No change in performance detected.
slice/65536             time:   [614.86 µs 795.04 µs 1.0293 ms]
                        thrpt:  [60.723 MiB/s 78.613 MiB/s 101.65 MiB/s]
                 change:
                        time:   [-12.714% +7.2092% +30.676%] (p = 0.52 > 0.05)
                        thrpt:  [-23.475% -6.7244% +14.566%]
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  14 (14.00%) high severe

bytes_in_range/4096     time:   [3.3298 µs 3.3416 µs 3.3563 µs]
                        thrpt:  [1.1366 GiB/s 1.1416 GiB/s 1.1456 GiB/s]
                 change:
                        time:   [+2.0652% +3.0667% +4.3765%] (p = 0.00 < 0.05)
                        thrpt:  [-4.1930% -2.9754% -2.0234%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
bytes_in_range/65536    time:   [80.640 µs 80.825 µs 81.024 µs]
                        thrpt:  [771.38 MiB/s 773.28 MiB/s 775.05 MiB/s]
                 change:
                        time:   [-0.6566% +1.0994% +2.9691%] (p = 0.27 > 0.05)
                        thrpt:  [-2.8835% -1.0875% +0.6609%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe

chars/4096              time:   [763.17 ns 763.68 ns 764.36 ns]
                        thrpt:  [4.9907 GiB/s 4.9952 GiB/s 4.9985 GiB/s]
                 change:
                        time:   [-2.1138% -0.7973% +0.1096%] (p = 0.18 > 0.05)
                        thrpt:  [-0.1095% +0.8037% +2.1595%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  6 (6.00%) low mild
  3 (3.00%) high severe
chars/65536             time:   [12.479 µs 12.503 µs 12.529 µs]
                        thrpt:  [4.8714 GiB/s 4.8817 GiB/s 4.8910 GiB/s]
                 change:
                        time:   [-2.4451% -1.0638% +0.6633%] (p = 0.16 > 0.05)
                        thrpt:  [-0.6589% +1.0753% +2.5063%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

clip_point/4096         time:   [63.148 µs 63.182 µs 63.229 µs]
                        thrpt:  [61.779 MiB/s 61.825 MiB/s 61.859 MiB/s]
                 change:
                        time:   [+1.0107% +2.1329% +4.2849%] (p = 0.02 < 0.05)
                        thrpt:  [-4.1088% -2.0883% -1.0006%]
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking clip_point/65536: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.8s, enable flat sampling, or reduce sample count to 50.
clip_point/65536        time:   [1.2578 ms 1.2593 ms 1.2608 ms]
                        thrpt:  [49.573 MiB/s 49.631 MiB/s 49.690 MiB/s]
                 change:
                        time:   [+0.4881% +0.8942% +1.3488%] (p = 0.00 < 0.05)
                        thrpt:  [-1.3308% -0.8863% -0.4857%]
                        Change within noise threshold.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) high mild
  14 (14.00%) high severe

point_to_offset/4096    time:   [16.211 µs 16.235 µs 16.257 µs]
                        thrpt:  [240.28 MiB/s 240.61 MiB/s 240.97 MiB/s]
                 change:
                        time:   [-1.4913% +0.1685% +2.2662%] (p = 0.89 > 0.05)
                        thrpt:  [-2.2159% -0.1682% +1.5139%]
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
point_to_offset/65536   time:   [360.06 µs 360.58 µs 361.16 µs]
                        thrpt:  [173.05 MiB/s 173.33 MiB/s 173.58 MiB/s]
                 change:
                        time:   [+0.0939% +0.8792% +1.8751%] (p = 0.06 > 0.05)
                        thrpt:  [-1.8406% -0.8715% -0.0938%]
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

cursor/4096             time:   [19.266 µs 19.282 µs 19.302 µs]
                        thrpt:  [202.38 MiB/s 202.58 MiB/s 202.75 MiB/s]
                 change:
                        time:   [+1.2457% +2.2477% +2.8702%] (p = 0.00 < 0.05)
                        thrpt:  [-2.7901% -2.1983% -1.2304%]
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
cursor/65536            time:   [467.63 µs 468.36 µs 469.14 µs]
                        thrpt:  [133.22 MiB/s 133.44 MiB/s 133.65 MiB/s]
                 change:
                        time:   [-0.2019% +1.3419% +2.8915%] (p = 0.10 > 0.05)
                        thrpt:  [-2.8103% -1.3241% +0.2023%]
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

append many/small to large
                        time:   [37.419 ms 37.656 ms 37.929 ms]
                        thrpt:  [321.84 MiB/s 324.17 MiB/s 326.22 MiB/s]
                 change:
                        time:   [+0.8113% +1.7361% +2.6538%] (p = 0.00 < 0.05)
                        thrpt:  [-2.5852% -1.7065% -0.8047%]
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high severe
append many/large to small
                        time:   [51.289 ms 51.437 ms 51.614 ms]
                        thrpt:  [236.50 MiB/s 237.32 MiB/s 238.00 MiB/s]
                 change:
                        time:   [-87.518% -87.479% -87.438%] (p = 0.00 < 0.05)
                        thrpt:  [+696.08% +698.66% +701.13%]
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe
```
</details>

Release Notes:

- sum_tree: Make SumTree::append run in logarithmic time
2025-11-24 14:49:00 +01:00
Lukas Wirth
5fc54986c7 Revert "sum_tree: Replace rayon with futures (#41586) (#41846)
This causes the background executor to hang

Release Notes:

- N/A *or* Added/Fixed/Improved ...
2025-11-03 19:25:15 +00:00
Lukas Wirth
f2ce06c7b0 sum_tree: Replace rayon with futures (#41586)
Release Notes:

- N/A *or* Added/Fixed/Improved ...

Co-authored by: Kate <kate@zed.dev>
2025-10-31 10:39:01 +00:00
Martin Pool
b519f53b3e Rope benchmarks: Generate random strings measured in bytes, not chars (#39951)
Follows on from https://github.com/zed-industries/zed/pull/39949.

Again I'm not 100% sure of the intent but I think this is a fix:

`generate_random_string(rng, 4096)` would previously give you a string
of 4096 *chars* which could be anywhere between 4kB and 16kB in bytes.
This seems probably not what was intended, because Ropes generally work
in bytes not chars, including for the offsets used to index into them.

This seems to possibly cause a _regression_ in benchmark performance,
which is surprising because it should generally cause smaller test data.
But, possibly it's doing better at exercising different paths?

cc @mrnugget 

Release Notes:

- N/A
2025-10-22 00:42:05 +02:00
Martin Pool
65acf125fb Fix use of PRNG in Rope benchmarks (#39949)
Using a seeded PRNG to produce consistent test data and reduce
variability makes sense.

However, the way it was used previously, by always cloning the RNG,
means that every generated string is the same, and every offset is the
same. After this change, the tested value stream should still be the same on
each run of the benchmark, but the values within each run will vary.

The `generate_random_text` measured in chars also seems possibly
inconsistent with later comments about it being a number of bytes.

Release Notes:

- N/A
2025-10-17 00:15:28 +02:00
Lukas Wirth
e1b57f00a0 sum_tree: Reduce Cursor size for contextless summary types (#38776)
This reduces the size of cursor by a usize when the summary does not
require a context making Cursor usages and constructions slightly more
efficient.

This change is a bit annoying though, as Rust has no means of
specializing, so this uses a `ContextlessSummary` trait with a blanket
impl while turning the `Context` into a GAT `Context<'a>`. This means
`Summary` implies are a bit more verbose now while contextless ones are
slimmer. It does come with the downside that the lifetime in the GAT is
always considered invariant, so some lifetime splitting occurred due to
that.


 ```
push/4096               time:   [352.65 µs 360.87 µs 367.80 µs]
                        thrpt:  [10.621 MiB/s 10.825 MiB/s 11.077 MiB/s]
                 change:
time: [-2.6633% -1.3640% -0.0561%] (p = 0.05 < 0.05)
                        thrpt:  [+0.0561% +1.3828% +2.7361%]
                        Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
  7 (7.00%) low severe
  3 (3.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe
push/65536              time:   [1.2917 ms 1.2949 ms 1.2979 ms]
                        thrpt:  [48.156 MiB/s 48.267 MiB/s 48.387 MiB/s]
                 change:
time: [+1.4428% +1.9844% +2.5299%] (p = 0.00 < 0.05)
                        thrpt:  [-2.4675% -1.9458% -1.4223%]
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high severe

append/4096             time:   [677.87 ns 678.87 ns 679.83 ns]
                        thrpt:  [5.6112 GiB/s 5.6192 GiB/s 5.6274 GiB/s]
                 change:
time: [-0.8924% -0.5017% -0.1705%] (p = 0.00 < 0.05)
                        thrpt:  [+0.1708% +0.5043% +0.9004%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
append/65536            time:   [9.3275 µs 9.3406 µs 9.3536 µs]
                        thrpt:  [6.5253 GiB/s 6.5344 GiB/s 6.5435 GiB/s]
                 change:
time: [+0.5409% +0.7215% +0.9054%] (p = 0.00 < 0.05)
                        thrpt:  [-0.8973% -0.7163% -0.5380%]
                        Change within noise threshold.

slice/4096              time:   [27.673 µs 27.791 µs 27.907 µs]
                        thrpt:  [139.97 MiB/s 140.56 MiB/s 141.16 MiB/s]
                 change:
time: [-1.1065% -0.6725% -0.2429%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2435% +0.6770% +1.1189%]
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) low mild
  1 (1.00%) high mild
slice/65536             time:   [507.55 µs 517.40 µs 535.60 µs]
                        thrpt:  [116.69 MiB/s 120.80 MiB/s 123.14 MiB/s]
                 change:
time: [-1.3489% +0.0599% +2.2591%] (p = 0.96 > 0.05)
                        thrpt:  [-2.2092% -0.0598% +1.3674%]
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe

bytes_in_range/4096     time:   [3.3917 µs 3.4108 µs 3.4313 µs]
                        thrpt:  [1.1117 GiB/s 1.1184 GiB/s 1.1247 GiB/s]
                 change:
time: [-5.3466% -4.7193% -4.1262%] (p = 0.00 < 0.05)
                        thrpt:  [+4.3038% +4.9531% +5.6487%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
bytes_in_range/65536    time:   [88.175 µs 88.613 µs 89.111 µs]
                        thrpt:  [701.37 MiB/s 705.31 MiB/s 708.82 MiB/s]
                 change:
time: [-0.6935% +0.3769% +1.4655%] (p = 0.50 > 0.05)
                        thrpt:  [-1.4443% -0.3755% +0.6984%]
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

chars/4096              time:   [678.70 ns 680.38 ns 682.08 ns]
                        thrpt:  [5.5927 GiB/s 5.6067 GiB/s 5.6206 GiB/s]
                 change:
time: [-0.6969% -0.2755% +0.1485%] (p = 0.20 > 0.05)
                        thrpt:  [-0.1483% +0.2763% +0.7018%]
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low mild
  4 (4.00%) high mild
chars/65536             time:   [12.720 µs 12.775 µs 12.830 µs]
                        thrpt:  [4.7573 GiB/s 4.7778 GiB/s 4.7983 GiB/s]
                 change:
time: [-0.6172% -0.1110% +0.4179%] (p = 0.68 > 0.05)
                        thrpt:  [-0.4162% +0.1112% +0.6211%]
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

clip_point/4096         time:   [33.240 µs 33.310 µs 33.394 µs]
                        thrpt:  [116.98 MiB/s 117.27 MiB/s 117.52 MiB/s]
                 change:
time: [-2.8892% -2.6305% -2.3438%] (p = 0.00 < 0.05)
                        thrpt:  [+2.4000% +2.7015% +2.9751%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe
clip_point/65536        time:   [1.6531 ms 1.6586 ms 1.6640 ms]
                        thrpt:  [37.560 MiB/s 37.683 MiB/s 37.808 MiB/s]
                 change:
time: [-6.6381% -5.9395% -5.2680%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5610% +6.3146% +7.1100%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  4 (4.00%) high severe

point_to_offset/4096    time:   [11.586 µs 11.603 µs 11.621 µs]
                        thrpt:  [336.15 MiB/s 336.67 MiB/s 337.16 MiB/s]
                 change:
time: [-14.289% -14.111% -13.939%] (p = 0.00 < 0.05)
                        thrpt:  [+16.197% +16.429% +16.672%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) low severe
  5 (5.00%) low mild
  4 (4.00%) high mild
point_to_offset/65536   time:   [527.74 µs 532.08 µs 536.51 µs]
                        thrpt:  [116.49 MiB/s 117.46 MiB/s 118.43 MiB/s]
                 change:
time: [-6.7825% -4.6235% -2.3533%] (p = 0.00 < 0.05)
                        thrpt:  [+2.4100% +4.8477% +7.2760%]
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

cursor/4096             time:   [16.154 µs 16.192 µs 16.232 µs]
                        thrpt:  [240.66 MiB/s 241.24 MiB/s 241.81 MiB/s]
                 change:
time: [-3.2536% -2.9145% -2.5526%] (p = 0.00 < 0.05)
                        thrpt:  [+2.6194% +3.0019% +3.3630%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
cursor/65536            time:   [509.60 µs 511.24 µs 512.93 µs]
                        thrpt:  [121.85 MiB/s 122.25 MiB/s 122.65 MiB/s]
                 change:
time: [-7.3677% -6.6017% -5.7840%] (p = 0.00 < 0.05)
                        thrpt:  [+6.1391% +7.0683% +7.9537%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
```
Release Notes:

- N/A *or* Added/Fixed/Improved ...
2025-09-24 14:35:38 +02:00
Nathan Sobo
1ae326432e Extract a scheduler crate from GPUI to enable unified integration testing of client and server code (#37326)
Extracts and cleans up GPUI's scheduler code into a new `scheduler`
crate, making it pluggable by external runtimes. This will enable
deterministic integration testing with cloud components by providing a
unified test scheduler across Zed and backend code. In Zed, it will
replace the existing GPUI scheduler for consistent async task management
across platforms.

## Changes

- **Core Implementation**: `TestScheduler` with seed-based
randomization, session tracking (`SessionId`), and foreground/background
task separation for reproducible testing.
- **Executors**: `ForegroundExecutor` (!Send, thread-local) and
`BackgroundExecutor` (Send, with blocking/timeout support) as
GPUI-compatible wrappers.
- **Clock and Timer**: Controllable `TestClock` and future-based `Timer`
for time-sensitive tests.
- **Testing APIs**: `once()`, `with_seed()`, and `many()` methods for
configurable test runs.
- **Dependencies**: Added `async-task`, `chrono`, `futures`, etc., with
updates to `Cargo.toml` and lock file.

## Benefits

- **Integration Testing**: Facilitates reliable async tests involving
cloud sessions, reducing flakiness via deterministic execution.
- **Pluggability**: Trait-based design (`Scheduler`) allows easy
integration into non-GPUI runtimes while maintaining GPUI compatibility.
- **Cleanup**: Refactors GPUI scheduler logic for clarity, correctness
(no `unwrap()`, proper error handling), and extensibility.

Follows Rust guidelines; run `./script/clippy` for verification.

- [x] Define and test a core scheduler that we think can power our cloud
code and GPUI
- [ ] Replace GPUI's scheduler


Release Notes:

- N/A

---------

Co-authored-by: Antonio Scandurra <me@as-cii.com>
2025-09-04 17:14:53 +02:00
Piotr Osiewicz
dc64ec9cc8 chore: Bump Rust edition to 2024 (#27800)
Follow-up to https://github.com/zed-industries/zed/pull/27791

Release Notes:

- N/A
2025-03-31 20:55:27 +02:00
Antonio Scandurra
4431ef1870 Speed up point translation in the Rope (#19913)
This pull request introduces an index of Unicode codepoints, newlines
and UTF-16 codepoints.

Benchmarks worth a thousand words:

```
push/4096               time:   [467.06 µs 470.07 µs 473.24 µs]
                        thrpt:  [8.2543 MiB/s 8.3100 MiB/s 8.3635 MiB/s]
                 change:
                        time:   [-4.1462% -3.0990% -2.0527%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0957% +3.1981% +4.3255%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
push/65536              time:   [1.4650 ms 1.4796 ms 1.4922 ms]
                        thrpt:  [41.885 MiB/s 42.242 MiB/s 42.664 MiB/s]
                 change:
                        time:   [-3.2871% -2.3489% -1.4555%] (p = 0.00 < 0.05)
                        thrpt:  [+1.4770% +2.4054% +3.3988%]
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low severe
  3 (3.00%) low mild

append/4096             time:   [729.00 ns 730.57 ns 732.14 ns]
                        thrpt:  [5.2103 GiB/s 5.2215 GiB/s 5.2327 GiB/s]
                 change:
                        time:   [-81.884% -81.836% -81.790%] (p = 0.00 < 0.05)
                        thrpt:  [+449.16% +450.53% +452.01%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
append/65536            time:   [504.44 ns 505.58 ns 506.77 ns]
                        thrpt:  [120.44 GiB/s 120.72 GiB/s 121.00 GiB/s]
                 change:
                        time:   [-94.833% -94.807% -94.782%] (p = 0.00 < 0.05)
                        thrpt:  [+1816.3% +1825.8% +1835.5%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

slice/4096              time:   [29.661 µs 29.733 µs 29.816 µs]
                        thrpt:  [131.01 MiB/s 131.38 MiB/s 131.70 MiB/s]
                 change:
                        time:   [-48.833% -48.533% -48.230%] (p = 0.00 < 0.05)
                        thrpt:  [+93.161% +94.298% +95.440%]
                        Performance has improved.
slice/65536             time:   [588.00 µs 590.22 µs 592.17 µs]
                        thrpt:  [105.54 MiB/s 105.89 MiB/s 106.29 MiB/s]
                 change:
                        time:   [-45.599% -45.347% -45.099%] (p = 0.00 < 0.05)
                        thrpt:  [+82.147% +82.971% +83.821%]
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild

bytes_in_range/4096     time:   [3.8630 µs 3.8811 µs 3.8994 µs]
                        thrpt:  [1001.8 MiB/s 1006.5 MiB/s 1011.2 MiB/s]
                 change:
                        time:   [+0.0600% +0.6000% +1.1833%] (p = 0.03 < 0.05)
                        thrpt:  [-1.1695% -0.5964% -0.0600%]
                        Change within noise threshold.
bytes_in_range/65536    time:   [98.178 µs 98.545 µs 98.931 µs]
                        thrpt:  [631.75 MiB/s 634.23 MiB/s 636.60 MiB/s]
                 change:
                        time:   [-0.6513% +0.7537% +2.2265%] (p = 0.30 > 0.05)
                        thrpt:  [-2.1780% -0.7481% +0.6555%]
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  8 (8.00%) high mild
  3 (3.00%) high severe

chars/4096              time:   [878.91 ns 879.45 ns 880.06 ns]
                        thrpt:  [4.3346 GiB/s 4.3376 GiB/s 4.3403 GiB/s]
                 change:
                        time:   [+9.1679% +9.4000% +9.6304%] (p = 0.00 < 0.05)
                        thrpt:  [-8.7844% -8.5923% -8.3979%]
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
chars/65536             time:   [15.615 µs 15.691 µs 15.757 µs]
                        thrpt:  [3.8735 GiB/s 3.8899 GiB/s 3.9087 GiB/s]
                 change:
                        time:   [+5.4902% +5.9345% +6.4044%] (p = 0.00 < 0.05)
                        thrpt:  [-6.0190% -5.6021% -5.2045%]
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

clip_point/4096         time:   [29.677 µs 29.835 µs 30.019 µs]
                        thrpt:  [130.13 MiB/s 130.93 MiB/s 131.63 MiB/s]
                 change:
                        time:   [-46.306% -45.866% -45.436%] (p = 0.00 < 0.05)
                        thrpt:  [+83.272% +84.728% +86.240%]
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe
clip_point/65536        time:   [1.5933 ms 1.6116 ms 1.6311 ms]
                        thrpt:  [38.318 MiB/s 38.782 MiB/s 39.226 MiB/s]
                 change:
                        time:   [-30.388% -29.598% -28.717%] (p = 0.00 < 0.05)
                        thrpt:  [+40.286% +42.040% +43.653%]
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 7 filtered out; finished in 0.00s

point_to_offset/4096    time:   [14.493 µs 14.591 µs 14.707 µs]
                        thrpt:  [265.61 MiB/s 267.72 MiB/s 269.52 MiB/s]
                 change:
                        time:   [-71.990% -71.787% -71.588%] (p = 0.00 < 0.05)
                        thrpt:  [+251.96% +254.45% +257.01%]
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
point_to_offset/65536   time:   [700.72 µs 713.75 µs 727.26 µs]
                        thrpt:  [85.939 MiB/s 87.566 MiB/s 89.194 MiB/s]
                 change:
                        time:   [-61.778% -61.015% -60.256%] (p = 0.00 < 0.05)
                        thrpt:  [+151.61% +156.51% +161.63%]
                        Performance has improved.
```

Calling `Rope::chars` got slightly slower but I don't think it's a big
issue (we don't really call `chars` for an entire `Rope`).

In a future pull request, I want to use the tab index (which we're not
yet using) and the char index to make `TabMap` a lot faster.

Release Notes:

- N/A
2024-10-30 10:59:03 +01:00
张小白
fdddbfc179 Fix caret movement issue for some special characters (#10198)
Currently in Zed, certain characters require pressing the key twice to
move the caret through that character. For example: "❤️" and "y̆".

The reason for this is as follows:

Currently, Zed uses `chars` to distinguish different characters, and
calling `chars` on `y̆` will yield two `char` values: `y` and `\u{306}`,
and calling `chars` on `❤️` will yield two `char` values: `❤` and
`\u{fe0f}`.

Therefore, consider the following scenario (where ^ represents the
caret):

- what we see: ❤️ ^
- the actual buffer: ❤ \u{fe0f} ^

After pressing the left arrow key once:

- what we see: ❤️ ^
- the actual buffer: ❤ ^ \u{fe0f}

After pressing the left arrow key again:
- what we see: ^ ❤️
- the actual buffer: ^ ❤ \u{fe0f}

Thus, two left arrow key presses are needed to move the caret, and this
PR fixes this bug (or this is actually a feature?).

I have tried to keep the scope of code modifications as minimal as
possible. In this PR, Zed handles such characters as follows:

- what we see: ❤️ ^
- the actual buffer: ❤ \u{fe0f} ^

After pressing the left arrow key once:

- what we see: ^ ❤️
- the actual buffer: ^ ❤ \u{fe0f}

Or after pressing the delete key:

- what we see: ^
- the actual buffer: ^

Please note that currently, different platforms and software handle
these special characters differently, and even the same software may
handle these characters differently in different situations. For
example, in my testing on Chrome on macOS, GitHub treats `y̆` as a
single character, just like in this PR; however, in Rust Playground,
`y̆` is treated as two characters, and pressing the delete key does not
delete the entire `y̆` character, but instead deletes `\u{306}` to yield
the character `y`. And they both treat `❤️` as a single character,
pressing the delete key will delete the entire `❤️` character.

This PR is based on the principle of making changes with the smallest
impact on the code, and I think that deleting the entire character with
the delete key is more intuitive.

Release Notes:

- Fix caret movement issue for some special characters

---------

Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
Co-authored-by: Thorsten <thorsten@zed.dev>
Co-authored-by: Bennet <bennetbo@gmx.de>
2024-04-10 13:01:25 -06:00
Thorsten Ball
0533923f91 Reduce memory usage to represent buffers by up to 50% (#10321)
This should help with some of the memory problems reported in
https://github.com/zed-industries/zed/issues/8436, especially the ones
related to large files (see:
https://github.com/zed-industries/zed/issues/8436#issuecomment2037442695),
by **reducing the memory required to represent a buffer in Zed by
~50%.**

### How?

Zed's memory consumption is dominated by the in-memory representation of
buffer contents.

On the lowest level, the buffer is represented as a
[Rope](https://en.wikipedia.org/wiki/Rope_(data_structure)) and that's
where the most memory is used. The layers above — buffer, syntax map,
fold map, display map, ... — basically use "no memory" compared to the
Rope.

Zed's `Rope` data structure is itself implemented as [a `SumTree` of
`Chunks`](8205c52d2b/crates/rope/src/rope.rs (L35-L38)).

An important constant at play here is `CHUNK_BASE`:

`CHUNK_BASE` is the maximum length of a single text `Chunk` in the
`SumTree` underlying the `Rope`. In other words: It determines into how
many pieces a given buffer is split up.

By changing `CHUNK_BASE` we can adjust the level of granularity
withwhich we index a given piece of text. Theoretical maximum is the
length of the text, theoretical minimum is 1. Sweet spot is somewhere
inbetween, where memory use and performance of write & read access are
optimal.

We started with `16` as the `CHUNK_BASE`, but that wasn't the result of
extensive benchmarks, more the first reasonable number that came to
mind.

### What

This changes `CHUNK_BASE` from `16` to `64`. That reduces the memory
usage, trading it in for slight reduction in performance in certain
benchmarks.

### Benchmarks

I added a benchmark suite for `Rope` to determine whether we'd regress
in performance as `CHUNK_BASE` goes up. I went from `16` to `32` and
then to `64`. While `32` increased performance and reduced memory usage,
`64` had one slight drop in performance, increases in other benchmarks
and substantial memory savings.

| `CHUNK_BASE` from `16` to `32` | `CHUNK_BASE` from `16` to `64` |
|-------------------|--------------------|
|
![chunk_base_16_to_32](https://github.com/zed-industries/zed/assets/1185253/fcf1f9c6-4f43-4e44-8ef5-29c1e5d8e2b9)
|
![chunk_base_16_to_64](https://github.com/zed-industries/zed/assets/1185253/d82a0478-eeef-43d0-9240-e0aa9df8d946)
|

### Real World Results

We tested this by loading a 138 MB `*.tex` file (parsed as plain text)
into Zed and measuring in `Instruments.app` the allocation.

#### standard allocator
Before, with `CHUNK_BASE: 16`, the memory usage was ~827MB after loading
the buffer.

| `CHUNK_BASE: 16` |
|---------------------|
|
![memory_consumption_chunk_base_16_std_alloc](https://github.com/zed-industries/zed/assets/1185253/c1e04c34-7d1a-49fa-bb3c-6ad10aec6e26)
|


After, with `CHUNK_BASE: 64`, the memory usage was ~396MB after loading
the buffer.

| `CHUNK_BASE: 64` |
|---------------------|
|
![memory_consumption_chunk_base_64_std_alloc](https://github.com/zed-industries/zed/assets/1185253/c728e134-1846-467f-b20f-114a582c7b5a)
|


#### `mimalloc`

`MiMalloc` by default and that seems to be pretty aggressive when it
comes to growing memory. Whereas the std allocator would go up to
~800mb, MiMalloc would jump straight to 1024MB.

I also can't get `MiMalloc` to work properly with `Instruments.app` (it
always shows 15MB of memory usage) so I had to use these `Activity
Monitor` screenshots:

| `CHUNK_BASE: 16` |
|---------------------|
|
![memory_consumption_chunk_base_16_mimalloc](https://github.com/zed-industries/zed/assets/1185253/1e6e05e9-80c2-4ec7-9b0e-8a6fa78836eb)
|

| `CHUNK_BASE: 64` |
|---------------------|
|
![memory_consumption_chunk_base_64_mimalloc](https://github.com/zed-industries/zed/assets/1185253/8a47e982-a675-4db0-b690-d60f1ff9acc8)
|

### Release Notes

Release Notes:

- Reduced memory usage for files by up to 50%.

---------

Co-authored-by: Antonio <antonio@zed.dev>
2024-04-09 18:07:53 +02:00