Remove temporary planning docs, consolidate into scheduler integration doc

- Remove .rules, PLAN.md, STATUS.md (temporary working documents)
- Remove docs/scheduler_integration_regression_analysis.md
- Consolidate valuable lessons learned into crates/scheduler/full_integration_plan.md:
  - Never cache Entity<T> in process-wide statics
  - block_with_timeout behavior depends on tick budget
  - Realtime priority must panic in tests
This commit is contained in:
Nathan Sobo
2025-12-15 11:06:57 -07:00
parent 96996950f4
commit b245db699a
5 changed files with 135 additions and 1238 deletions

162
.rules
View File

@@ -1,162 +0,0 @@
# Rust coding guidelines
* Prioritize code correctness and clarity. Speed and efficiency are secondary priorities unless otherwise specified.
* Do not write organizational or comments that summarize the code. Comments should only be written in order to explain "why" the code is written in some way in the case there is a reason that is tricky / non-obvious.
* Prefer implementing functionality in existing files unless it is a new logical component. Avoid creating many small files.
* Avoid using functions that panic like `unwrap()`, instead use mechanisms like `?` to propagate errors.
* Be careful with operations like indexing which may panic if the indexes are out of bounds.
* Never silently discard errors with `let _ =` on fallible operations. Always handle errors appropriately:
- Propagate errors with `?` when the calling function should handle them
- Use `.log_err()` or similar when you need to ignore errors but want visibility
- Use explicit error handling with `match` or `if let Err(...)` when you need custom logic
- Example: avoid `let _ = client.request(...).await?;` - use `client.request(...).await?;` instead
* When implementing async operations that may fail, ensure errors propagate to the UI layer so users get meaningful feedback.
* Never create files with `mod.rs` paths - prefer `src/some_module.rs` instead of `src/some_module/mod.rs`.
* When creating new crates, prefer specifying the library root path in `Cargo.toml` using `[lib] path = "...rs"` instead of the default `lib.rs`, to maintain consistent and descriptive naming (e.g., `gpui.rs` or `main.rs`).
* Avoid creative additions unless explicitly requested
* Use full words for variable names (no abbreviations like "q" for "queue")
* Use variable shadowing to scope clones in async contexts for clarity, minimizing the lifetime of borrowed references.
Example:
```rust
executor.spawn({
let task_ran = task_ran.clone();
async move {
*task_ran.borrow_mut() = true;
}
});
```
# GPUI
GPUI is a UI framework which also provides primitives for state and concurrency management.
## Context
Context types allow interaction with global state, windows, entities, and system services. They are typically passed to functions as the argument named `cx`. When a function takes callbacks they come after the `cx` parameter.
* `App` is the root context type, providing access to global state and read and update of entities.
* `Context<T>` is provided when updating an `Entity<T>`. This context dereferences into `App`, so functions which take `&App` can also take `&Context<T>`.
* `AsyncApp` and `AsyncWindowContext` are provided by `cx.spawn` and `cx.spawn_in`. These can be held across await points.
## `Window`
`Window` provides access to the state of an application window. It is passed to functions as an argument named `window` and comes before `cx` when present. It is used for managing focus, dispatching actions, directly drawing, getting user input state, etc.
## Entities
An `Entity<T>` is a handle to state of type `T`. With `thing: Entity<T>`:
* `thing.entity_id()` returns `EntityId`
* `thing.downgrade()` returns `WeakEntity<T>`
* `thing.read(cx: &App)` returns `&T`.
* `thing.read_with(cx, |thing: &T, cx: &App| ...)` returns the closure's return value.
* `thing.update(cx, |thing: &mut T, cx: &mut Context<T>| ...)` allows the closure to mutate the state, and provides a `Context<T>` for interacting with the entity. It returns the closure's return value.
* `thing.update_in(cx, |thing: &mut T, window: &mut Window, cx: &mut Context<T>| ...)` takes a `AsyncWindowContext` or `VisualTestContext`. It's the same as `update` while also providing the `Window`.
Within the closures, the inner `cx` provided to the closure must be used instead of the outer `cx` to avoid issues with multiple borrows.
Trying to update an entity while it's already being updated must be avoided as this will cause a panic.
When `read_with`, `update`, or `update_in` are used with an async context, the closure's return value is wrapped in an `anyhow::Result`.
`WeakEntity<T>` is a weak handle. It has `read_with`, `update`, and `update_in` methods that work the same, but always return an `anyhow::Result` so that they can fail if the entity no longer exists. This can be useful to avoid memory leaks - if entities have mutually recursive handles to each other they will never be dropped.
## Concurrency
All use of entities and UI rendering occurs on a single foreground thread.
`cx.spawn(async move |cx| ...)` runs an async closure on the foreground thread. Within the closure, `cx` is an async context like `AsyncApp` or `AsyncWindowContext`.
When the outer cx is a `Context<T>`, the use of `spawn` instead looks like `cx.spawn(async move |handle, cx| ...)`, where `handle: WeakEntity<T>`.
To do work on other threads, `cx.background_spawn(async move { ... })` is used. Often this background task is awaited on by a foreground task which uses the results to update state.
Both `cx.spawn` and `cx.background_spawn` return a `Task<R>`, which is a future that can be awaited upon. If this task is dropped, then its work is cancelled. To prevent this one of the following must be done:
* Awaiting the task in some other async context.
* Detaching the task via `task.detach()` or `task.detach_and_log_err(cx)`, allowing it to run indefinitely.
* Storing the task in a field, if the work should be halted when the struct is dropped.
A task which doesn't do anything but provide a value can be created with `Task::ready(value)`.
## Elements
The `Render` trait is used to render some state into an element tree that is laid out using flexbox layout. An `Entity<T>` where `T` implements `Render` is sometimes called a "view".
Example:
```
struct TextWithBorder(SharedString);
impl Render for TextWithBorder {
fn render(&mut self, _window: &mut Window, _cx: &mut Context<Self>) -> impl IntoElement {
div().border_1().child(self.0.clone())
}
}
```
Since `impl IntoElement for SharedString` exists, it can be used as an argument to `child`. `SharedString` is used to avoid copying strings, and is either an `&'static str` or `Arc<str>`.
UI components that are constructed just to be turned into elements can instead implement the `RenderOnce` trait, which is similar to `Render`, but its `render` method takes ownership of `self`. Types that implement this trait can use `#[derive(IntoElement)]` to use them directly as children.
The style methods on elements are similar to those used by Tailwind CSS.
If some attributes or children of an element tree are conditional, `.when(condition, |this| ...)` can be used to run the closure only when `condition` is true. Similarly, `.when_some(option, |this, value| ...)` runs the closure when the `Option` has a value.
## Input events
Input event handlers can be registered on an element via methods like `.on_click(|event, window, cx: &mut App| ...)`.
Often event handlers will want to update the entity that's in the current `Context<T>`. The `cx.listener` method provides this - its use looks like `.on_click(cx.listener(|this: &mut T, event, window, cx: &mut Context<T>| ...)`.
## Actions
Actions are dispatched via user keyboard interaction or in code via `window.dispatch_action(SomeAction.boxed_clone(), cx)` or `focus_handle.dispatch_action(&SomeAction, window, cx)`.
Actions with no data defined with the `actions!(some_namespace, [SomeAction, AnotherAction])` macro call. Otherwise the `Action` derive macro is used. Doc comments on actions are displayed to the user.
Action handlers can be registered on an element via the event handler `.on_action(|action, window, cx| ...)`. Like other event handlers, this is often used with `cx.listener`.
## Notify
When a view's state has changed in a way that may affect its rendering, it should call `cx.notify()`. This will cause the view to be rerendered. It will also cause any observe callbacks registered for the entity with `cx.observe` to be called.
## Entity events
While updating an entity (`cx: Context<T>`), it can emit an event using `cx.emit(event)`. Entities register which events they can emit by declaring `impl EventEmittor<EventType> for EntityType {}`.
Other entities can then register a callback to handle these events by doing `cx.subscribe(other_entity, |this, other_entity, event, cx| ...)`. This will return a `Subscription` which deregisters the callback when dropped. Typically `cx.subscribe` happens when creating a new entity and the subscriptions are stored in a `_subscriptions: Vec<Subscription>` field.
## Recent API changes
GPUI has had some changes to its APIs. Always write code using the new APIs:
* `spawn` methods now take async closures (`AsyncFn`), and so should be called like `cx.spawn(async move |cx| ...)`.
* Use `Entity<T>`. This replaces `Model<T>` and `View<T>` which no longer exist and should NEVER be used.
* Use `App` references. This replaces `AppContext` which no longer exists and should NEVER be used.
* Use `Context<T>` references. This replaces `ModelContext<T>` which no longer exists and should NEVER be used.
* `Window` is now passed around explicitly. The new interface adds a `Window` reference parameter to some methods, and adds some new "*_in" methods for plumbing `Window`. The old types `WindowContext` and `ViewContext<T>` should NEVER be used.
## General guidelines
- Use `./script/clippy` instead of `cargo clippy`
## Progress-first mode
By default, prioritize forward progress over waiting for confirmation. The following guidelines replace the standard uncertainty/deferral behavior:
* **Attempt budget over early deferral**: Make up to 5 distinct attempts to fix diagnostics or solve a problem before stopping to summarize. Only stop earlier if you've exhausted meaningfully different approaches.
* **Assumption-driven progress**: When requirements are ambiguous, proceed with the most reasonable interpretation. State assumptions explicitly in your response, and implement in a way that's easy to adjust (isolated functions, flags, config). Only stop for decisions that would cause breaking changes or require large rewrites.
* **Speculative fixes are allowed** if they are: (a) localized/reversible, (b) accompanied by a test or diagnostic logging where practical, and (c) clearly noted as speculative.
* **Bias toward small, reversible changes**: When uncertain, prefer incremental edits (tests, logging, guards, better error messages) that improve observability before attempting larger refactors.
* **Keep moving unless blocked by high-impact ambiguity**: Only ask questions when there are multiple plausible paths with significantly different outcomes (e.g., architectural choices, breaking API changes, unclear product behavior). For lower-stakes uncertainty, pick the most conservative option and proceed.
* **Summarize at the end**: After completing a task or exhausting the attempt budget, summarize what was done, what assumptions were made, and what (if anything) still needs decision or review.
## Ongoing Work
### Scheduler Integration
GPUI is being updated to use the `scheduler` crate for its async scheduling infrastructure. See `crates/scheduler/plan.md` for the detailed plan. Key points:
- The scheduler crate provides `TestScheduler` for deterministic testing with session isolation
- GPUI wraps scheduler types to maintain API compatibility
- `TestDispatcher` now uses `TestScheduler` internally for timing/clock/rng

258
PLAN.md
View File

@@ -1,258 +0,0 @@
# PLAN: Reproducing and fixing the extension host LSP test hang
## Goal
Find and fix the root cause of the CI timeouts on macOS/Linux/Windows caused by a test that hangs indefinitely. The primary suspect has been identified as:
- `extension_host::extension_store_test::test_extension_store_with_test_extension`
The immediate hang is that the test awaits a fake LSP server spawn (`fake_servers.next().await`) that never arrives.
This plan is written to be executed iteratively until the hang is eliminated and CI completes within its 60-minute job timeout.
---
## High-signal symptom
Running the workspace tests (as CI does) shows one test running forever:
- `extension_host extension_store_test::test_extension_store_with_test_extension`
When isolated and run with `--nocapture`, it stalls after:
- dev extension install completes
- buffer is opened with LSP
- the extension successfully calls `latest_github_release` and `download_file` and extraction completes
- then it blocks forever awaiting the first fake LSP server: `fake_servers.next().await`
---
## Known relevant code paths
### The test hang point
In `crates/extension_host/src/extension_store_test.rs`, the hang is at:
- `let fake_server = fake_servers.next().await.unwrap();`
This receiver gets a value only when the fake server is pushed to the channel.
### Fake server emission gate
In `crates/language/src/language_registry.rs`, `create_fake_language_server` creates the fake server but only sends it to the channel **after** it receives `lsp::notification::Initialized`:
- `fake_server.try_receive_notification::<lsp::notification::Initialized>().await.is_some()`
- then `tx.unbounded_send(fake_server.clone())`
If `Initialized` never arrives, the channel never yields a server, and the test hangs forever.
### Fake server creation
`crates/project/src/lsp_store.rs` starts language servers and, in tests, attempts to create a fake server in `start_language_server`:
- it awaits `get_language_server_binary`
- then calls `languages.create_fake_language_server(...)`
- if it returns `Some(server)`, the fake server should be used instead of spawning a real process
If `create_fake_language_server` returns `None`, or if it returns `Some` but never emits the fake server to the channel (due to waiting on `Initialized` forever), the test will stall.
---
## Reproduction commands (iterate fast)
### 0) Ensure enough disk
The workspace build can consume >200GiB. If disk is full, tests wont run reliably.
- Check disk usage (system / repo)
- If needed, run `cargo clean` to free space
- Prefer keeping logs in `/tmp` to avoid growing the repo
### 1) Run only the suspect test with a hard timeout and logs
Use a short outer timeout wrapper (macOS doesnt ship `timeout` by default). Prefer 1015 seconds once the build is warm.
Run:
- `cargo test -p extension_host extension_store_test::test_extension_store_with_test_extension -- --nocapture --test-threads=1`
Environment:
- `RUST_LOG=info` (or `debug` if needed)
Capture output to a log file under `/tmp/ci-hang/…`.
### 2) Confirm the hang is still at `fake_servers.next().await`
The log should show progress markers like:
- install_dev_extension completed
- opened buffer with LSP
- awaiting first fake LSP server spawn
- then it stops
If it no longer hangs there, update this plan to the new hang point and continue.
---
## Instrumentation (keep it minimal and high-signal)
Add logs that pin down whether `Initialized` ever arrives.
### A) LanguageRegistry: log fake-server lifecycle
In `crates/language/src/language_registry.rs` inside `create_fake_language_server`, add logs:
- created fake server, waiting for Initialized
- background task started
- received Initialized (then sending to channel)
- did NOT receive Initialized (never sending)
This tells whether the hang is because `Initialized` never occurs.
### B) Extension host (already useful to keep)
In `crates/extension_host/src/wasm_host/wit/since_v0_8_0.rs`, log:
- `latest_github_release` begin and success (repo/version/assets)
- `download_file` begin/request/status/done
- capability grant decision for download_file
These confirm the hang is not from download/network.
### C) The test: add progress markers around critical awaits
In `crates/extension_host/src/extension_store_test.rs`, log:
- before/after `install_dev_extension`
- before/after registering fake LSP server
- before/after `open_local_buffer_with_lsp`
- immediately before awaiting `fake_servers.next().await`
- immediately after receiving the fake server
Keep these logs even after fixing; they prevent regressions.
---
## Primary hypotheses to validate (in order)
### Hypothesis 1 (most likely): `Initialized` notification never arrives for the fake server
Evidence expected:
- LanguageRegistry logs show “created fake server … waiting for Initialized”
- but never log “received Initialized … sending”
- so fake server is never pushed to the receiver
Next steps if confirmed:
1. Identify why `Initialized` never arrives:
- Is `LanguageServer::initialize(...)` not being called on the fake server?
- Is it being called but not emitting `Initialized` to the fake servers notification stream?
- Is it blocked behind tasks that never make progress (scheduler/foreground/background starvation)?
2. Locate where the host or LSP layer emits `Initialized` for fake servers:
- Determine whether `Initialized` is expected to be a client->server notification, a server->host notification, or synthetic for fakes.
- Confirm the direction matches `try_receive_notification::<Initialized>()` usage.
3. Fix either:
- Ensure fake servers receive (or synthesize) `Initialized` reliably in tests, OR
- Stop gating “yield fake server to tests” on `Initialized`, if that guarantee is unnecessary for correctness.
### Hypothesis 2: The fake server is created, but the background task that waits for `Initialized` never runs
Evidence expected:
- Log shows “created fake server …”
- but not even “background task started …”
- or it starts but never progresses beyond awaiting notification
This suggests executor scheduling starvation or a deadlock.
Next steps:
- Confirm the background executor is ticking during the test.
- Add a small “heartbeat” runnable scheduled on background executor to verify the scheduler keeps making progress.
- Inspect any recent changes in scheduler integration that alter how test executors drive background/foreground work.
### Hypothesis 3: `create_fake_language_server` returns `None` due to missing fake server entry
Evidence expected:
- No LanguageRegistry logs from `create_fake_language_server`
- or add a log just before the `?` that fetches `fake_server_entries.get_mut(name)` to see if its missing
Next steps:
- Ensure fake server registration happens before starting the language server.
- Confirm server name matches exactly (`LanguageServerName("gleam")`).
- Confirm `LanguageRegistry` used by the project matches the one you registered with (no duplicate registries).
---
## Fix strategy (do not stop at “it times out less”)
### Step 1: Make the hang impossible
Short-term safety fix (acceptable if it does not weaken correctness materially):
- Modify `create_fake_language_server` so it sends the fake server into the channel **without waiting** for `Initialized`, OR it sends with a bounded wait (e.g. send immediately, and also log if Initialized never arrives).
Rationale:
- The CI hang is catastrophic (60-minute timeout across OSes).
- Most tests need the fake server handle to set request handlers and inspect binary paths; they can tolerate receiving the fake server before its “initialized”.
If you do this, add a comment explaining why the `Initialized` gate was removed or bounded in test mode.
### Step 2: Determine the real underlying bug
Even if the safety fix unblocks CI, keep digging:
- Why isnt `Initialized` being observed?
- Is it a protocol direction mistake?
- Is initialization not occurring due to scheduler integration changes?
- Is there a deadlock between foreground and background execution when tests block?
### Step 3: Restore correctness (if needed)
If removing the gate causes flaky behavior:
- Replace unconditional send with:
- send immediately to allow tests to proceed
- separately log/assert that `Initialized` arrives eventually, with a bounded timeout under test support
This maintains correctness checks without hanging indefinitely.
---
## Validation matrix (must pass)
### Local
1. Run the single test:
- `cargo test -p extension_host extension_store_test::test_extension_store_with_test_extension -- --nocapture --test-threads=1`
- It must complete quickly (<30s cold, <10s warm).
2. Run extension_host crate tests:
- `cargo test -p extension_host -- --test-threads=1`
- Ensure no new hangs.
3. Run nextest for key suites:
- `cargo nextest run -p extension_host -j 1 --no-fail-fast`
- `cargo nextest run --workspace -j 1 --no-fail-fast` (expect longer, but must proceed past the prior hang)
### CI
- Verify that `run_tests_{mac,linux,windows}` no longer hit the 60-minute job timeout.
- The previously hanging test must complete, and the workflow should finish.
---
## Operational notes for fast iteration
- Prefer `-j 1` for reproducibility while isolating the hang.
- Keep hard timeouts short (1015s once build is warm) and capture logs to `/tmp/ci-hang/`.
- After each change:
- re-run the single test first
- only then widen scope (crate → workspace)
- Avoid speculative refactors; only change what reduces the hang risk or improves observability.
---
## Deliverables (when solved)
- A code fix that prevents the hang and restores deterministic progress
- Updated logs removed or reduced (keep only those that are useful long-term)
- CI green on macOS/Linux/Windows without timeout
- A brief note in the PR description explaining:
- what caused the hang
- why the fix is correct
- how it was validated

212
STATUS.md
View File

@@ -1,212 +0,0 @@
# STATUS — Scheduler integration (current branch state)
Branch: `scheduler-integration`
PR: https://github.com/zed-industries/zed/pull/44810
This file is intended to capture what changed on this branch, what we learned while validating it, and what decisions were made to reduce risk. It should help reviewers and future work pick up the thread quickly.
---
## Summary of major changes (since main)
### 1) Scheduler integration (core)
GPUIs async execution has been unified around the `scheduler` crate:
- GPUI background/foreground executors delegate task scheduling, blocking, and timers through the scheduler abstraction.
- `TestScheduler` is used in tests to provide deterministic scheduling and deterministic time control.
- `PlatformScheduler` is used in production builds to route scheduling through platform dispatchers.
This branch touched high-risk areas:
- task scheduling and wakeup behavior
- timer semantics
- test determinism and ordering
---
## Testing-driven fixes and what we learned
### A) Do not cache `gpui::Entity<T>` in process-wide statics (OnceLock/static)
We reproduced and fixed a real failure mode where a process-wide cached `Entity<T>` (from one `App` context) was used from a later test `App` context. This can cause:
- “used a entity with the wrong context” panics
- `Option::unwrap()` failures in leak-detection / entity clone paths
- ordering-dependent flakes (whichever test initializes the static first “wins”)
Fix applied: avoid caching `Entity<T>` globally; construct per-`App` instead (store plain configuration data globally if needed).
### B) TestScheduler timeouts are not purely duration-based: `timeout_ticks` matters
We learned that in `TestScheduler`, “timeout” behavior depends on an internal tick budget (`timeout_ticks`) when a timeout is present. During the allotted ticks, the scheduler can poll futures and step other tasks. This means:
- a future can sometimes complete “within a timeout” in tests due to scheduler progress, even if you didnt explicitly advance simulated time
- if a test needs deterministic timeout behavior, it must constrain the tick budget
We added/updated a scheduler regression test demonstrating:
- a timed-out future must remain pollable to completion later
- deterministic time advancement should be explicit (e.g. `advance_clock(...)`)
- deterministic timeout behavior can be enforced by setting `timeout_ticks` to `0..=0` for the test
---
## Realtime priority decision (important)
### Realtime priority has been removed (for now)
Even though a realtime-priority implementation historically existed (dedicated OS thread + bounded channel feeding runnables), we removed realtime priority entirely from:
- `scheduler::Priority`
- GPUIs public API surface
- `PlatformDispatcher` trait and its platform implementations (mac/linux/windows/test)
Rationale:
- There were no in-tree call sites using realtime priority.
- The correctness/backpressure semantics are non-trivial for arbitrary futures:
- blocking enqueue risks stalling latency-sensitive threads
- non-blocking enqueue implies dropping runnables under load, which can break correctness for general futures (IO/state machines)
- Rather than ship ambiguous or risky semantics, we removed the API until there is a concrete in-tree use case and an explicitly defined contract.
This should be straightforward to reintroduce later once the semantics are agreed and tested.
CC: the prior realtime implementation on main was introduced in the “Multiple priority scheduler” work by @localcc (from blame).
---
## Current test status (local)
At the time of writing:
- `cargo test -p gpui` passed
- `cargo test -p scheduler` passed
(Other crates were not exhaustively re-run after the final realtime removal, but GPUI + scheduler are clean.)
---
## CRITICAL ISSUE: extension_host test hang (UNRESOLVED)
**Test**: `extension_host::extension_store_test::test_extension_store_with_test_extension`
**Symptom**: Test hangs indefinitely awaiting `fake_servers.next()` for the first fake LSP server spawn. This causes CI timeouts (60min) on all platforms.
### What we've learned
#### 1) The scheduler semantic change broke `run_until_parked`
With the new `TestScheduler`, `tick()` only processes runnable tasks and expired timers but **does not advance time**. This meant:
- `run_until_parked` (which loops `while tick() {}`) would stop when no runnables remain
- but `has_pending_tasks()` would still be true if timers were pending
- tests expecting "drain all progress" semantics would stall
**Fix applied**: `gpui::BackgroundExecutor::run_until_parked` now:
- ticks all runnable work
- calls `advance_clock_to_next_timer()` when no runnables remain
- repeats until no runnables and no timers
- this restores historical "drain everything that can make progress" semantics
#### 2) The LSP startup task IS running and calling `create_fake_language_server`
Instrumentation shows:
- `[project::lsp_store][start_language_server] attempting name="gleam"`
- `[project::lsp_store][start_language_server] resolved_binary ...`
- `[project::lsp_store][start_language_server] attempting_fake ...`
- `[project::lsp_store][start_language_server] using_fake ...`
- `[language::language_registry] create_fake_language_server: called name=gleam id=0 generation=0 ...`
So the scheduler IS progressing the task. The fake server IS being created.
#### 3) The fake server registration is NOT being overwritten
Added a `generation: u64` counter to `FakeLanguageServerEntry` to detect overwrites:
- test registers: `generation=0`
- `create_fake_language_server` uses: `generation=0`
- no "overwriting existing fake server registration" warnings appeared
So it's the same entry.
#### 4) But the receiver still gets nothing
The test's `await fake_servers.next()` times out after 10s, even though:
- `create_fake_language_server` is called
- it does `tx.unbounded_send(fake_server.clone())`
This points to: **the sender and receiver are not paired** (different channels).
### Leading hypothesis: Two LanguageRegistry instances
The most likely explanation:
1. Test calls `Project::test(...)` which creates a `LanguageRegistry` internally (`LanguageRegistry::test(cx.executor())`)
2. Test extracts `language_registry = project.languages().clone()` — this is an `Arc<LanguageRegistry>`, so it's a reference to the same registry
3. Test calls `language_registry.register_fake_lsp_server(...)` on that registry, getting a receiver
4. **Somewhere during extension initialization or LSP startup**, a NEW `LanguageRegistry` is created (or the project's registry gets replaced)
5. When `LocalLspStore::start_language_server` calls `this.languages.create_fake_language_server(...)`, it's using a **different** registry instance
6. That different registry has no fake server entries, OR has a different channel pair
Result: sender and receiver are on different channels → send succeeds (or returns None) but receiver never gets the value.
### Next steps for investigation
#### STEP 1: Prove/disprove the "two registries" hypothesis
Add a unique identity marker to each `LanguageRegistry` instance:
```rust
// In LanguageRegistry or LanguageRegistryState
#[cfg(any(test, feature = "test-support"))]
registry_id: u64, // assigned from AtomicU64::fetch_add
```
Log it in:
- `LanguageRegistry::new()` / `LanguageRegistry::test()` (when created)
- `register_fake_lsp_server` (when test registers)
- `create_fake_language_server` (when LSP store creates)
- In the test, right after `let language_registry = project.languages().clone()`
If registry IDs differ between registration and creation: **confirmed different instances**.
#### STEP 2: If confirmed, trace where the new registry comes from
Likely suspects:
- `language_extension::init(...)` — check if it creates or replaces the language registry
- Extension loading / reloading paths in `ExtensionStore`
- Project initialization race: does something async replace `project.languages` after `Project::test` returns?
Key code paths to audit:
- `crates/language_extension/src/language_extension.rs` — does `init` create a new registry?
- `crates/extension_host/src/extension_store.rs` — does extension loading mutate/replace the language registry?
- `crates/project/src/lsp_store.rs` — is `this.languages` actually a stable reference to the project's registry?
#### STEP 3: Fix options (once root cause is known)
**Option A**: If extension init replaces the registry:
- Ensure test uses the registry **after** extension init, not before
- Or ensure extension init mutates the existing registry instead of replacing it
**Option B**: If LSP store has a stale/cloned registry:
- Ensure `LspStore` holds an `Arc<LanguageRegistry>` that points to the same instance the test registered with
- Avoid cloning registries; always pass `Arc<LanguageRegistry>` by reference
**Option C**: If the issue is test ordering:
- Move `register_fake_lsp_server` to happen AFTER all initialization (extension load, language_extension::init, etc.)
- Currently test does: register → install_dev_extension → open_buffer
- Try: install_dev_extension → register → open_buffer
### Files with instrumentation added (can be cleaned up after fix)
- `crates/gpui/src/executor.rs``GPUI_RUN_UNTIL_PARKED_LOG=1` logging (keep minimal version)
- `crates/language/src/language_registry.rs` — generation counter, registration/creation logs
- `crates/project/src/lsp_store.rs` — LSP startup detailed logs
- `crates/extension_host/src/extension_store_test.rs` — per-await timeouts, progress markers
### Reference logs
Detailed logs captured in `/tmp/ci-hang/`:
- `generation_debug.log` — shows generation=0 on both registration and creation
- `generation_debug2.log` — includes scheduler debug output
- `run_until_parked_debug.log` — shows run_until_parked correctly draining after fix
## Review guidance / things to focus on
- Verify scheduler integration preserves expected semantics for:
- blocking (`block_on`, `block_with_timeout`)
- timer behavior (determinism in tests; routing in production)
- task ordering changes that could affect tests or user-visible behavior
- Confirm removal of realtime priority is acceptable and doesnt break any downstream expectations.
- Pay attention to any ordering-sensitive tests (collab/editor/etc.) that may rely on old behavior; prefer explicit synchronization in tests where possible.
---
## Future work (if/when reintroducing realtime)
Before restoring realtime priority:
- define a contract: coalescible vs must-run tasks
- define backpressure strategy and failure modes (block vs drop vs bounded secondary queue)
- guarantee scheduling callback cannot block UI / critical threads
- add deterministic tests that encode the chosen contract

View File

@@ -1,156 +1,11 @@
# GPUI Scheduler Integration Plan
# GPUI Scheduler Integration
This document describes the integration of GPUI's async execution with the scheduler crate, including architecture, design decisions, and lessons learned.
## Goal
Unify GPUI's async execution with the scheduler crate, eliminating duplicate blocking/scheduling logic and enabling deterministic testing.
## Current Status: ✅ Integration Complete
All phases are complete. The scheduler integration is ready for final review and merge.
---
## ✅ Completed Work
### Phase 7: Delete Dead Code
Deleted the orphaned file `crates/gpui/src/platform/platform_scheduler.rs` - an older version of `PlatformScheduler` with an incompatible `block()` signature (took `LocalBoxFuture<()>` instead of `Pin<&mut dyn Future<Output = ()>>`). The actual implementation at `crates/gpui/src/platform_scheduler.rs` is used via `mod platform_scheduler` in `gpui.rs`.
### Phase 6: Restore Realtime Priority Support
**Problem**: The old implementation had special handling for `Priority::Realtime(_)` which spawned tasks on dedicated OS threads with elevated priority (critical for audio processing). This was removed during the integration.
**Solution implemented**: Option 2 - Handle in `gpui::BackgroundExecutor::spawn_with_priority`. This was chosen because the channel-based approach for realtime tasks needs to happen at spawn time, not at the `schedule_background_with_priority` level (which only receives a single runnable, not the full spawning context).
The implementation in `crates/gpui/src/executor.rs`:
```rust
if let Priority::Realtime(realtime) = priority {
let (tx, rx) = flume::bounded::<Runnable<RunnableMeta>>(1);
dispatcher.spawn_realtime(realtime, Box::new(move || {
while let Ok(runnable) = rx.recv() {
// Profiler timing integration (matches other dispatchers)
let start = Instant::now();
let location = runnable.metadata().location;
let mut timing = TaskTiming { location, start, end: None };
profiler::add_task_timing(timing);
runnable.run();
timing.end = Some(Instant::now());
profiler::add_task_timing(timing);
}
}));
// Create task that sends runnables to the channel
let (runnable, task) = async_task::Builder::new()
.metadata(RunnableMeta { location })
.spawn(move |_| future, move |runnable| { let _ = tx.send(runnable); });
runnable.schedule();
Task::from_scheduler(scheduler::Task::from_async_task(task))
} else {
// Normal priority path delegates to scheduler
}
```
This restores the original behavior where realtime tasks run on dedicated OS threads with elevated priority, suitable for audio workloads. Includes profiler timing integration for consistency with other platform dispatchers.
### Phase 1: Scheduler Trait and TestScheduler
The scheduler crate provides:
- `Scheduler` trait with `block()`, `schedule_foreground()`, `schedule_background_with_priority()`, `timer()`, `clock()`
- `TestScheduler` implementation for deterministic testing
- `ForegroundExecutor` and `BackgroundExecutor` that wrap `Arc<dyn Scheduler>`
- `Task<T>` type with `ready()`, `is_ready()`, `detach()`, `from_async_task()`
### Phase 2: PlatformScheduler
Created `PlatformScheduler` in GPUI (`crates/gpui/src/platform_scheduler.rs`) that:
- Implements `Scheduler` trait for production use
- Wraps `PlatformDispatcher` (Mac, Linux, Windows)
- Uses `parking::Parker` for blocking operations
- Uses `dispatch_after` for timers
- Provides a `PlatformClock` that delegates to the dispatcher
### Phase 3: Unified GPUI Executors
GPUI's executors (`crates/gpui/src/executor.rs`) now:
- `gpui::ForegroundExecutor` wraps `scheduler::ForegroundExecutor` internally
- `gpui::BackgroundExecutor` holds `Arc<dyn Scheduler>` directly (creates temporary `scheduler::BackgroundExecutor` when spawning)
- Select `TestScheduler` or `PlatformScheduler` based on dispatcher type (no optional fields)
- Wrap `scheduler::Task<T>` in a thin `gpui::Task<T>` that adds `detach_and_log_err()`
- Use `Scheduler::block()` for all blocking operations
### Phase 4: Removed Duplicate Logic
Eliminated from GPUI:
- Custom blocking loop implementations (now delegated to scheduler)
- Separate test/production code paths for spawn/block operations
- `TaskLabel` and deprioritization infrastructure (see Intentional Removals below)
- `unparker` mechanism (no longer needed - scheduler handles task coordination)
### Phase 5: Simplify block_internal and Remove Unparkers
Final cleanup:
- Removed debug logging infrastructure from executor.rs
- Simplified block_internal to use waker_fn without Parker
- Removed unparkers mechanism from TestDispatcher
- TestDispatcher now just holds session_id and scheduler (~70 lines)
---
## Intentional Removals
### `spawn_labeled` and `deprioritize` Removed
**What was removed**:
- `BackgroundExecutor::spawn_labeled(label: TaskLabel, future)` - spawn with a label for test control
- `BackgroundExecutor::deprioritize(label: TaskLabel)` - deprioritize labeled tasks in tests
- `TaskLabel` type
**Why**: These were only used in a few places for test ordering control. The new priority-weighted scheduling in `TestScheduler` provides similar functionality through `Priority::High/Medium/Low`.
**Migration**: Use `spawn()` instead of `spawn_labeled()`. For test ordering, use explicit synchronization (channels, etc.) or priority levels.
**Approval**: @as-cii reviewed and approved this removal.
### `start_waiting` / `finish_waiting` Debug Methods Removed
**What was removed**:
- `BackgroundExecutor::start_waiting()` - mark that code is waiting (for debugging)
- `BackgroundExecutor::finish_waiting()` - clear the waiting marker
- Associated `waiting_backtrace` tracking in TestDispatcher
**Why**: The new `TracingWaker` in `TestScheduler` provides better debugging capability. Run tests with `PENDING_TRACES=1` to see backtraces of all pending futures when parking is forbidden.
---
## Code Quality Notes
### Lock Ordering Inconsistency (Low Priority)
In `TestScheduler`, there's inconsistent lock ordering between `rng` and `state` mutexes:
- `block()` line 375-377: locks `rng` first, then `state`
- `schedule_foreground()` line 428-430: locks `state` first, then `rng`
This could theoretically cause deadlocks with concurrent access, but `TestScheduler` is single-threaded so it's not a practical concern. Worth fixing for code hygiene but not blocking.
### `dispatch_after` Panics in TestDispatcher
`TestDispatcher::dispatch_after` intentionally panics:
```rust
fn dispatch_after(&self, _duration: Duration, _runnable: RunnableVariant) {
panic!(
"dispatch_after should not be called in tests. \
Use BackgroundExecutor::timer() which uses the scheduler's native timer."
);
}
```
This is correct because:
- In tests, `TestScheduler` is used (not `PlatformScheduler`)
- `TestScheduler::timer()` creates native timers without using `dispatch_after`
- Any code hitting this panic has a bug (should use `executor.timer()`)
---
## Architecture
@@ -188,45 +43,133 @@ This is correct because:
└──────────────────────────────────────────────────────────────────┘
```
### Scheduler Trait
The scheduler crate provides:
- `Scheduler` trait with `block()`, `schedule_foreground()`, `schedule_background_with_priority()`, `timer()`, `clock()`
- `TestScheduler` implementation for deterministic testing
- `ForegroundExecutor` and `BackgroundExecutor` that wrap `Arc<dyn Scheduler>`
- `Task<T>` type with `ready()`, `is_ready()`, `detach()`, `from_async_task()`
### PlatformScheduler
`PlatformScheduler` in GPUI (`crates/gpui/src/platform_scheduler.rs`):
- Implements `Scheduler` trait for production use
- Wraps `PlatformDispatcher` (Mac, Linux, Windows)
- Uses `parking::Parker` for blocking operations
- Uses `dispatch_after` for timers
- Provides a `PlatformClock` that delegates to the dispatcher
### GPUI Executors
GPUI's executors (`crates/gpui/src/executor.rs`):
- `gpui::ForegroundExecutor` wraps `scheduler::ForegroundExecutor` internally
- `gpui::BackgroundExecutor` holds `Arc<dyn Scheduler>` directly
- Select `TestScheduler` or `PlatformScheduler` based on dispatcher type
- Wrap `scheduler::Task<T>` in a thin `gpui::Task<T>` that adds `detach_and_log_err()`
- Use `Scheduler::block()` for all blocking operations
---
## Design Decisions
### Key Design Principles
1. **No optional fields**: Both test and production paths use the same executor types with different `Scheduler` implementations underneath.
2. **Scheduler owns blocking logic**: The `Scheduler::block()` method handles all blocking, including timeout and task stepping (for tests).
3. **GPUI Task wrapper**: Thin wrapper around `scheduler::Task` that adds `detach_and_log_err()` which requires `&App`.
### Foreground Priority Not Supported
`ForegroundExecutor::spawn_with_priority` accepts a priority parameter but ignores it. This is acceptable because:
- macOS (primary platform) ignores foreground priority anyway
- TestScheduler runs foreground tasks in order
- There are no external callers of this method in the codebase
- Linux/Windows could theoretically use it, but the benefit is minimal
The method is kept for API compatibility but documents that priority is ignored.
### Profiler Integration Unchanged
The profiler task timing infrastructure continues to work because:
- `PlatformScheduler::schedule_background_with_priority` calls `dispatcher.dispatch()`
- `PlatformScheduler::schedule_foreground` calls `dispatcher.dispatch_on_main_thread()`
- All platform dispatchers (Mac, Linux, Windows) wrap task execution with profiler timing
- Mac writes to `THREAD_TIMINGS` directly in its trampoline; Linux/Windows call `profiler::add_task_timing()`
### Session IDs for Foreground Isolation
Each `ForegroundExecutor` gets a `SessionId` to prevent reentrancy when blocking. This ensures that when blocking on a future, we don't run foreground tasks from the same session (which could cause issues with re-entrancy).
Each `ForegroundExecutor` gets a `SessionId` to prevent reentrancy when blocking. This ensures that when blocking on a future, we don't run foreground tasks from the same session.
### Runtime Scheduler Selection
In test builds, we check `dispatcher.as_test()` to choose between `TestScheduler` and `PlatformScheduler`. This allows the same executor types to work in both test and production environments.
### Profiler Integration
The profiler task timing infrastructure continues to work because:
- `PlatformScheduler::schedule_background_with_priority` calls `dispatcher.dispatch()`
- `PlatformScheduler::schedule_foreground` calls `dispatcher.dispatch_on_main_thread()`
- All platform dispatchers wrap task execution with profiler timing
---
## Key Design Principles
## Intentional Removals
1. **No optional fields**: Both test and production paths use the same executor types with different `Scheduler` implementations underneath.
### `spawn_labeled` and `deprioritize`
2. **Scheduler owns blocking logic**: The `Scheduler::block()` method handles all blocking, including timeout and task stepping (for tests).
**What was removed**:
- `BackgroundExecutor::spawn_labeled(label: TaskLabel, future)`
- `BackgroundExecutor::deprioritize(label: TaskLabel)`
- `TaskLabel` type
3. **GPUI Task wrapper**: Thin wrapper around `scheduler::Task` that adds `detach_and_log_err()` which requires `&App`.
**Why**: These were only used in a few places for test ordering control. The new priority-weighted scheduling in `TestScheduler` provides similar functionality through `Priority::High/Medium/Low`.
**Migration**: Use `spawn()` instead of `spawn_labeled()`. For test ordering, use explicit synchronization (channels, etc.) or priority levels.
### `start_waiting` / `finish_waiting` Debug Methods
**What was removed**:
- `BackgroundExecutor::start_waiting()`
- `BackgroundExecutor::finish_waiting()`
- Associated `waiting_backtrace` tracking in TestDispatcher
**Why**: The new `TracingWaker` in `TestScheduler` provides better debugging capability. Run tests with `PENDING_TRACES=1` to see backtraces of all pending futures when parking is forbidden.
### Realtime Priority
**What was removed**: `Priority::Realtime` variant and associated OS thread spawning.
**Why**: There were no in-tree call sites using realtime priority. The correctness/backpressure semantics are non-trivial:
- Blocking enqueue risks stalling latency-sensitive threads
- Non-blocking enqueue implies dropping runnables under load, which breaks correctness for general futures
Rather than ship ambiguous or risky semantics, we removed the API until there is a concrete in-tree use case.
---
## Lessons Learned
These lessons were discovered during integration testing and represent important design constraints.
### 1. Never Cache `Entity<T>` in Process-Wide Statics
**Problem**: `gpui::Entity<T>` is a handle tied to a particular `App`'s entity-map. Storing an `Entity<T>` in a process-wide static (`OnceLock`, `LazyLock`, etc.) and reusing it across different `App` instances causes:
- "used a entity with the wrong context" panics
- `Option::unwrap()` failures in leak-detection clone paths
- Nondeterministic behavior depending on test ordering
**Solution**: Cache plain data (env var name, URL, etc.) in statics, and create `Entity<T>` per-`App`.
**Guideline**: Never store `gpui::Entity<T>` or other `App`-context-bound handles in process-wide statics unless explicitly keyed by `App` identity.
### 2. `block_with_timeout` Behavior Depends on Tick Budget
**Problem**: In `TestScheduler`, "timeout" behavior depends on an internal tick budget (`timeout_ticks`), not just elapsed wall-clock time. During the allotted ticks, the scheduler can poll futures and step other tasks.
**Implications**:
- A future can complete "within a timeout" in tests due to scheduler progress, even without explicit `advance_clock()`
- Yielding does not advance time
- If a test needs time to advance, it must do so explicitly via `advance_clock()`
**For deterministic timeout tests**: Set `scheduler.set_timeout_ticks(0..=0)` to prevent any scheduler stepping during timeout, then explicitly advance time.
### 3. Realtime Priority Must Panic in Tests
**Problem**: `Priority::Realtime` spawns dedicated OS threads outside the test scheduler, which breaks determinism and causes hangs/flakes.
**Solution**: The test dispatcher's `spawn_realtime` implementation panics with a clear message. This is an enforced invariant, not an implementation detail.
---
@@ -244,53 +187,43 @@ Test-only methods on `BackgroundExecutor`:
---
## Files Changed
## Code Quality Notes
- `crates/scheduler/src/scheduler.rs` - Updated `Scheduler::block()` signature to take `Pin<&mut dyn Future>` and return `bool`
- `crates/scheduler/src/executor.rs` - Added `from_async_task()`, `use<Fut>` on `block_with_timeout`
- `crates/scheduler/src/test_scheduler.rs` - Updated `block()` implementation
- `crates/scheduler/src/tests.rs` - Fixed `block_with_timeout` test assertions
- `crates/gpui/src/executor.rs` - Rewritten to use scheduler executors
- `crates/gpui/src/platform_scheduler.rs` - New file implementing `Scheduler` for production
- `crates/gpui/src/platform/platform_scheduler.rs` - **DEAD CODE, should be deleted** (older incompatible version)
- `crates/gpui/src/gpui.rs` - Added `platform_scheduler` module
- `crates/gpui/src/profiler.rs` - Added `#[allow(dead_code)]` to `add_task_timing`
- `crates/gpui/Cargo.toml` - Added `chrono` dependency
- `crates/gpui/src/platform/test/dispatcher.rs` - Simplified to ~70 lines, delegates to TestScheduler
- `crates/gpui/src/platform.rs` - Simplified `RunnableVariant` to type alias, removed `TaskLabel` from dispatch
- `crates/gpui/src/platform/mac/dispatcher.rs` - Removed `RunnableVariant::Compat` handling
- `crates/gpui/src/platform/linux/dispatcher.rs` - Removed label parameter from dispatch
- `crates/gpui/src/platform/windows/dispatcher.rs` - Removed label parameter from dispatch
- `crates/miniprofiler_ui/src/miniprofiler_ui.rs` - Changed `.dispatcher` to `.dispatcher()`
- `crates/repl/src/repl.rs` - Changed `.dispatcher` to `.dispatcher()`, wrap runnables with metadata
- `crates/zed/src/reliability.rs` - Changed `.dispatcher` to `.dispatcher()`
- `crates/buffer_diff/src/buffer_diff.rs` - Use `spawn()` instead of `spawn_labeled()`
- `crates/fs/src/fake_git_repo.rs` - Use `spawn()` instead of `spawn_labeled()`
- `crates/language/src/buffer.rs` - Use `spawn()` instead of `spawn_labeled()`
### `dispatch_after` Panics in TestDispatcher
This is intentional:
```rust
fn dispatch_after(&self, _duration: Duration, _runnable: RunnableVariant) {
panic!(
"dispatch_after should not be called in tests. \
Use BackgroundExecutor::timer() which uses the scheduler's native timer."
);
}
```
In tests, `TestScheduler::timer()` creates native timers without using `dispatch_after`. Any code hitting this panic has a bug.
---
## Tests Status
## Files Changed
✅ All GPUI tests pass (including Mac platform tests)
✅ All scheduler tests pass
✅ All three originally failing editor tests pass:
- `test_soft_wrap_editor_width_auto_height_editor`
- `test_no_hint_updates_for_unrelated_language_files`
- `test_inside_char_boundary_range_hints`
✅ Clippy passes with no warnings
✅ No unused dependencies (cargo-machete passes)
Key files modified during integration:
- `crates/scheduler/src/scheduler.rs` - `Scheduler::block()` signature takes `Pin<&mut dyn Future>` and returns `bool`
- `crates/scheduler/src/executor.rs` - Added `from_async_task()`
- `crates/scheduler/src/test_scheduler.rs` - Deterministic scheduling implementation
- `crates/gpui/src/executor.rs` - Rewritten to use scheduler executors
- `crates/gpui/src/platform_scheduler.rs` - New file implementing `Scheduler` for production
- `crates/gpui/src/platform/test/dispatcher.rs` - Simplified to delegate to TestScheduler
- `crates/gpui/src/platform.rs` - Simplified `RunnableVariant`, removed `TaskLabel`
- Platform dispatchers (mac/linux/windows) - Removed label parameter from dispatch
---
## Future Considerations
### Potential Improvements
1. **Foreground priority support**: If needed, add `schedule_foreground_with_priority` to the `Scheduler` trait.
1. **Foreground priority support**: If needed in the future, add `schedule_foreground_with_priority` to the `Scheduler` trait and plumb it through to platforms that support it (Linux, Windows).
2. **Profiler integration in scheduler**: Could move task timing into the scheduler crate for more consistent profiling.
2. **Profiler integration in scheduler**: Could move task timing into the scheduler crate itself for more consistent profiling across all code paths.
3. **Additional test utilities**: The `TestScheduler` could be extended with more debugging/introspection capabilities.
4. **Fix lock ordering**: Clean up the `rng`/`state` lock ordering inconsistency in `TestScheduler` for better code hygiene.
3. **Additional test utilities**: The `TestScheduler` could be extended with more debugging/introspection capabilities.

View File

@@ -1,404 +0,0 @@
# Scheduler Integration (scheduler-integration branch) — Regression Risk Analysis & Proposed Tests
## Updates from testing / experimentation (scheduler behavior learnings)
### 2) `block_with_timeout` behavior depends on “timeout tick budget” (not just wall-clock duration)
**What happened**
While adding a regression test for `ForegroundExecutor::block_with_timeout`, I initially wrote a test expecting a task that awaits a 100ms timer to reliably time out with a 50ms timeout unless we explicitly advanced time.
That assumption was wrong: depending on the schedulers behavior, the task could complete without an explicit `advance_clock()` call.
**Root cause**
In `TestScheduler`, timeouts are implemented as a bounded number of scheduler iterations (“ticks”), controlled by `TestSchedulerConfig::timeout_ticks`. When a timeout is present, `block()` picks a `max_ticks` value from `timeout_ticks` and may:
- poll the blocked future,
- step other tasks,
- and potentially advance timers,
all within that tick budget. This means “timeout” is not purely about elapsed wall-clock time in tests; its also about how many ticks were allowed.
**Fix / approach used in the regression test**
To make the timeout portion deterministic, the regression test forces:
- `scheduler.set_timeout_ticks(0..=0)`
This guarantees `block_with_timeout` cannot perform any scheduler stepping/advancing while it is trying to enforce the timeout, making it safe to assert that the call returns `Err(future)` for a timer-based future.
Then, after explicitly calling `advance_clock(...)` and `run()`, the returned future can be polled to completion.
**Takeaway**
- Yielding should _not_ advance time, and it doesnt.
- If a test needs time to advance, it should do so explicitly via `advance_clock`.
- If a test needs deterministic timeout behavior, it should control `timeout_ticks` to avoid incidental progress during a timeout.
### 3) Realtime priority must be enforced as “not allowed in tests” (determinism contract)
**What happened**
`Priority::Realtime(_)` is implemented in GPUI by spawning a dedicated OS thread and feeding it runnables. That is appropriate for production realtime workloads (e.g. audio), but it breaks determinism.
Because GPUI tests rely on `TestDispatcher` + `TestScheduler` for deterministic execution, allowing realtime spawns inside tests would introduce:
- nondeterministic scheduling (real threads outside the test scheduler)
- hangs/flakes that vary by platform and timing
- unreproducible failures even with the same seed
**What we want**
In test mode, attempting to spawn a realtime task should fail fast and loudly, so misuse is caught immediately.
**Enforcement**
The test dispatchers `spawn_realtime` implementation panics with a clear message explaining that realtime threads are not supported in tests. A regression test should assert this behavior so it cant be accidentally weakened by future refactors.
**Takeaway**
- Treat “no realtime threads in tests” as an invariant, not an implementation detail.
- The most valuable test here is not “does realtime work”, but “does realtime misuse fail deterministically”.
---
This document captures a review of the `scheduler-integration` branch compared to `main`, with a focus on:
- Problems in the **plan** that could lead to regressions
- Plan vs **implementation** mismatches
- Implementation concerns / what might go wrong / what might be missing
- Concrete **tests** worth adding to catch regressions early
The branch integrates GPUIs async execution with the `scheduler` crate and updates test infrastructure to be deterministic via `TestScheduler`.
---
## Updates from testing / experimentation (learned weak points)
### 1) Caching `Entity<T>` in process-wide statics breaks across `App` contexts (causes panics and test flakiness)
**What happened**
While running `zed` tests, multiple failures showed up with panics like:
- `used a entity with the wrong context`
- `called Option::unwrap() on a None value` during `Entity` clone paths
The backtraces pointed at a `OnceLock<Entity<ApiKeyState>>` used by the Mistral provider for the Codestral API key entity (a process-wide singleton).
**Root cause**
`gpui::Entity<T>` is **not** a plain owned value; it is a handle tied to a particular `App`s entity-map/context. If you store an `Entity<T>` in a process-wide static and then create a _different_ `App` instance later (very common in tests, but also plausible in multi-app contexts), reusing that cached `Entity<T>` will reference the wrong entity-map.
This manifests as:
- context assertion failures (`wrong context`)
- weak `entity_map` upgrades failing (leading to `unwrap()` panics in leak-detection-only clone paths)
- nondeterministic behavior depending on test ordering (because the first `App` that initializes the static “wins”)
**Fix applied**
Avoid caching `Entity<T>` in process-wide statics. Instead:
- cache plain data (env var name, URL, etc.), and
- create the `Entity<T>` per-`App`.
In this case, the `OnceLock<Entity<ApiKeyState>>` was removed and `codestral_api_key(cx)` now constructs the entity per call / per `App`.
**Takeaway / design guideline**
> Never store `gpui::Entity<T>` (or other `App`-context-bound handles) in process-wide statics (`static`, `OnceLock`, `LazyLock`), unless the value is explicitly per-`App` or keyed by an `App` identity.
**Test recommendation**
Add (or keep) at least one regression test that creates two distinct `App` instances sequentially in the same process and exercises any global/singleton initialization paths, asserting no “wrong context” panics occur.
---
## Scope reviewed (high-level)
Major surfaces touched by this branch include:
- `crates/gpui/src/executor.rs` (large rewrite; realtime priority handling moved here)
- `crates/gpui/src/platform_scheduler.rs` (new)
- `crates/scheduler/src/{scheduler.rs,test_scheduler.rs,executor.rs}` (API and determinism plumbing)
- platform dispatchers (`crates/gpui/src/platform/*/dispatcher.rs`)
- large set of tests updated for ordering/timing (`collab`, editor inlay hints, etc.)
---
## Executive summary (what to worry about)
### Highest regression risk areas
1. **Realtime tasks (`Priority::Realtime`) implementation uses a bounded, blocking channel**
- This can block scheduling paths and create deadlocks or latent stalls under load.
- It can also mask failures (ignored send errors) leading to “stuck” tasks.
2. **Documentation/plan says realtime panics in tests, but runtime guard may be missing**
- If realtime is accidentally used in tests, it could introduce nondeterminism or hangs.
- Different platforms/test dispatchers could behave differently (panic vs silently spawning threads).
3. **Behavioral changes in test ordering**
- Removal of label-based deprioritization and changes in scheduling semantics can subtly reorder operations.
- Many tests were already adjusted; more order-sensitive tests may still exist, especially in collaboration and UI pipelines.
### Medium risk areas
- Timer semantics: switching to scheduler-native timers can change edge ordering and “when” callbacks occur.
- `block()` signature and semantics change (pinned future, returns bool): easy to misuse and hard to reason about without tests.
- Profiler timing recording for realtime tasks: potential double-counting or inconsistent task timing signals.
---
## Plan vs implementation mismatches
### 1) “Realtime panics in tests” is a documented promise that needs enforcement
The plan documents:
-`Priority::Realtime` will panic in tests because real OS threads break test determinism.”
Risk if not enforced:
- A developer uses realtime priority in a test (directly or indirectly).
- Depending on platform/test dispatcher behavior, the test may:
- become nondeterministic and flaky
- hang (if the realtime thread waits for work / blocks)
- pass locally and fail in CI
**What to ensure**:
- There should be an explicit guard: if the current scheduler/dispatcher indicates tests, `Priority::Realtime` should panic with a clear message.
**Why it matters**:
- This is the sort of “rare option” that will bite later: everything works until one test or feature accidentally uses it.
### 2) The plan downplays lock-ordering issues in `TestScheduler`
The plan calls out potential inconsistent lock ordering between internal mutexes (e.g. rng/state) and describes it as low priority because “single-threaded”.
Risk:
- Even if the scheduler is conceptually single-threaded, tests and supporting code can invoke methods from multiple threads.
- A deadlock from inconsistent lock ordering is catastrophic and hard to diagnose (especially in CI).
Mitigation:
- Enforce lock ordering consistently.
- Add a stress test to ensure no deadlocks if concurrent calls occur (even if “not intended”).
---
## Implementation concerns / weak points (what might go wrong)
### A) Realtime path uses a bounded, blocking channel in the scheduling closure
Observed pattern:
- realtime spawns a dedicated OS thread
- schedules `async_task::Runnable`s by sending them through `flume::bounded(1)`
- scheduling closure calls `tx.send(runnable)` and ignores the result
Risks:
1. **Blocking send in scheduling closure**
- `send` can block if buffer is full.
- The scheduling closure is often invoked from the async runtime / scheduler worker.
- Blocking there can stall progress and create deadlocks (especially if the receiver needs other tasks to run to make progress).
2. **Backpressure semantics may differ from previous implementation**
- Even if “equivalent” conceptually, the bounded buffer changes throughput behavior and can surface as latency spikes or stalls.
3. **Ignored `send` result**
- If the receiver drops early (thread exits), send will fail.
- Ignoring this can silently drop wakeups; tasks can hang in a “never re-polled” state.
4. **Thread lifetime / leakage**
- If sender lives longer than expected, the thread loops forever on `recv`.
- If the task is dropped/canceled, ensure sender drops promptly.
**Suggested mitigation**:
- Ensure the schedule path cannot block:
- prefer `try_send` + fallback queueing strategy, or an unbounded channel
- or push onto a lock-free queue and signal via lightweight notification
- Consider logging or metrics when send fails (debug builds at least)
- Add explicit test guards (panic in tests) as promised
### B) Profiler timing integration for realtime tasks may be inconsistent
The realtime worker thread records task timings around each runnable.
Risks:
- Double-recording (start event + end event) might differ from other execution paths
- Profiler UI or consumers might assume a single record per runnable, or might aggregate incorrectly
This may already match existing platform behavior, but is worth validating explicitly to avoid subtle regressions in profiling, performance diagnostics, or telemetry.
### C) Timer and `block()` semantics need parity tests
Scheduler changes:
- Timers now route through scheduler-native timer mechanisms
- `Scheduler::block` signature changed to accept a pinned future reference and return a completion boolean
Risks:
- Timeout boundary differences vs prior implementation (especially around “just at the deadline”)
- Futures continued after timeout could be incorrectly polled or starved
- The pinned-future API makes it easy to accidentally:
- repoll after timeout in an invalid state
- assume ownership moved into block (it doesnt)
- `block()` reentrancy/session isolation behavior might differ from old GPUI behavior
---
## What could be missing (defensive checks / invariants)
1. **Explicit runtime check** for realtime in test environment (panic early)
2. **Non-blocking scheduling invariant** in realtime schedule callback
3. **TestScheduler concurrency safety**: even if not intended, tests should not deadlock if two threads touch it
4. **Timer determinism contract**: ordering and firing semantics should be documented and tested
5. **Cancellation semantics**: dropping tasks should not leak threads or leave runnables queued forever
---
## Recommended tests to add or improve
### 1) Test: realtime priority panics in tests
Goal:
- Prevent accidental nondeterministic realtime usage in tests.
Test idea:
- Build a test `App` / executor on the test dispatcher path
- Call `spawn_with_priority(Priority::Realtime(..), async { .. })`
- Assert panic message contains guidance: use `Priority::High` in tests.
Why:
- This enforces the plans documented contract.
### 2) Test: realtime scheduling callback does not block
Goal:
- Catch deadlock/stall potential from bounded channel and blocking `send`.
Test idea (requires controllable dispatcher/fake dispatcher or careful harnessing):
- Create a dispatcher/scheduler scenario where the realtime receiver does not consume (or consumes slowly).
- Trigger multiple wakeups/schedules.
- Assert that the scheduling path returns promptly and system continues making progress.
If hard to test deterministically today, thats a signal to adjust the design to avoid blocking sends.
### 3) TestScheduler: priority weighting sanity + determinism
Goals:
- Ensure scheduling remains deterministic for a fixed RNG seed.
- Ensure `High` tends to run more often than `Low` in repeated selection without starvation.
Test idea:
- Spawn multiple tasks at different priorities that increment counters.
- Run a fixed number of ticks.
- Assert:
- total run counts match expectation
- `High` counter > `Medium` > `Low` by a margin (not strict ordering)
- with same seed, results are stable across runs
### 4) TestScheduler: no deadlock under concurrent access (lock ordering regression test)
Goal:
- Prevent reintroducing lock ordering issues.
Test idea:
- Spawn two threads that repeatedly call methods that acquire both locks (rng and state) in different sequences.
- Run for a short bound and ensure completion (no deadlock).
- This test should be small and gated for determinism (or use timeouts).
Even if “not supported,” it prevents catastrophic CI hangs.
### 5) Timer semantics tests (determinism + ordering)
Goals:
- Ensure timers dont fire early.
- Ensure `advance_clock()` triggers timers deterministically.
- If ordering matters, define it and test it.
Test ideas:
- Schedule timers at different durations; advance clock in steps; assert firings.
- Multiple timers at same timestamp: ensure stable ordering (or document it as unspecified and test only that all fire).
### 6) `block()` / timeout behavior tests
Goals:
- Ensure `block_with_timeout` returns `false` only on timeout.
- Ensure future can continue after timeout and still complete.
- Ensure `block()` does not accidentally run re-entrant foreground tasks for same session if session isolation is required.
Test ideas:
- A future that completes after N ticks; timeout at N-1: assert false then later true.
- A future that yields and schedules work; ensure `block` processes until ready/timeout properly.
### 7) Profiler timing consistency test (optional but valuable)
Goal:
- Ensure profiler events are consistent across dispatchers and realtime path.
Test idea:
- Spawn a task that runs a known number of runnables.
- Validate profiler receives expected event count and timestamps in non-decreasing order.
This may require internal visibility/test hooks.
---
## Suggested implementation guardrails (small changes with big payoff)
1. **Panic in tests for realtime priority**
- Detect test scheduler/dispatcher and panic with a clear message.
2. **Avoid blocking in scheduling callback**
- Prefer non-blocking `try_send` or an unbounded queue.
- At minimum, ensure failures are visible in debug logs.
3. **Document timer ordering**
- If stable ordering is desired, implement and test it.
- Otherwise, document it as unspecified so tests dont accidentally rely on it.
---
## Next steps (how to use this doc)
- Use this as a checklist while running CI and reviewing flake-prone areas.
- Prioritize adding:
1. realtime-in-tests panic test
2. timer semantics tests
3. block/timeout continuation tests
- Consider redesigning realtime scheduling to ensure scheduling never blocks.
---