5.7 KiB
Scheduler Integration - Debugging Status
Problem
PR #44810 causes Zed to hang on startup on Linux/Windows, but works fine on Mac.
From PR comment by @yara-blue and @localcc:
"With it applied zed hangs without ever responding when you open it"
What Was Cleaned Up
Removed unrelated changes that were accidentally committed:
ConfiguredApiCard,InstructionListItem,api_key.rs(UI components)- Debug instrumentation in
terminal_tool.rs,capability_granter.rs,wasm_host/wit/since_v0_8_0.rs,lsp_store.rs - Planning docs (
.rules,PLAN.md, oldSTATUS.md)
Kept language_registry.rs changes as they may be relevant to debugging.
Analysis So Far
Code paths verified as correct:
-
Priority queue algorithm - The weighted random selection in
crates/gpui/src/queue.rsis mathematically sound. When the last non-empty queue is checked, the probability is always 100%. -
async_task::TaskimplementsUnpin- SoPin::new(task).poll(cx)is valid. -
parking::Parkersemantics - Ifunpark()is called beforepark(), thepark()returns immediately. This is correct. -
Waker creation -
waker_fnwithUnparker(which isClone + Send + Sync) should work correctly. -
PlatformScheduler::blockimplementation - Identical logic to the oldblock_internalfor production builds.
The blocking flow:
- Task spawned → runnable scheduled → sent to priority queue
- Background thread waiting on condvar in
PriorityQueueReceiver::recv() send()pushes to queue and callscondvar.notify_one()- Background thread wakes, pops item, runs runnable
- When task completes,
async_taskwakes the registered waker - Waker calls
unparker.unpark() parker.park()returns- Future is polled again, returns
Ready
Files involved:
crates/gpui/src/platform_scheduler.rs-PlatformScheduler::block()implementationcrates/gpui/src/executor.rs-BackgroundExecutor::block()wraps futurescrates/gpui/src/queue.rs- Priority queue withparking_lot::Condvarcrates/gpui/src/platform/linux/dispatcher.rs- Background thread poolcrates/scheduler/src/executor.rs-scheduler::BackgroundExecutor::spawn_with_priority
What to investigate next
1. Verify background threads are actually running
Add logging at the start of background worker threads in LinuxDispatcher::new():
.spawn(move || {
log::info!("[LinuxDispatcher] background worker {} started", i);
for runnable in receiver.iter() {
// ...
}
})
2. Verify tasks are being dispatched
Add logging in PlatformScheduler::schedule_background_with_priority:
fn schedule_background_with_priority(&self, runnable: Runnable<RunnableMeta>, priority: Priority) {
log::info!("[PlatformScheduler] dispatching task priority={:?}", priority);
self.dispatcher.dispatch(runnable, priority);
}
3. Verify the priority queue send/receive
In crates/gpui/src/queue.rs, add logging to send() and recv():
fn send(&self, priority: Priority, item: T) -> Result<(), SendError<T>> {
// ...
self.condvar.notify_one();
log::debug!("[PriorityQueue] sent item, notified condvar");
Ok(())
}
fn recv(&self) -> Result<...> {
log::debug!("[PriorityQueue] recv() waiting...");
while queues.is_empty() {
self.condvar.wait(&mut queues);
}
log::debug!("[PriorityQueue] recv() got item");
// ...
}
4. Check timing of dispatcher creation vs task spawning
Trace when LinuxDispatcher::new() is called vs when the first spawn() happens. If tasks are spawned before background threads are ready, they might be lost.
5. Check for platform-specific differences in parking or parking_lot
The parking crate (used for Parker/Unparker) and parking_lot (used for Condvar in the priority queue) may have platform-specific behavior. Check their GitHub issues for Linux-specific bugs.
6. Verify the startup sequence
The hang happens during startup. Key calls in crates/zed/src/main.rs:
// Line ~292: Tasks spawned BEFORE app.run()
let system_id = app.background_executor().spawn(system_id());
let installation_id = app.background_executor().spawn(installation_id());
let session = app.background_executor().spawn(Session::new(session_id.clone()));
// Line ~513-515: Inside app.run() callback, these BLOCK waiting for the tasks
let system_id = cx.background_executor().block(system_id).ok();
let installation_id = cx.background_executor().block(installation_id).ok();
let session = cx.background_executor().block(session);
If background threads aren't running yet when block() is called, or if the tasks never got dispatched, it will hang forever.
Hypotheses to test
-
Background threads not started yet - Race condition where tasks are dispatched before threads are listening on the queue.
-
Condvar notification lost -
notify_one()called but no thread was waiting yet, and subsequent waits miss it. -
Platform-specific parking behavior -
parking::Parkerorparking_lot::Condvarbehaves differently on Linux. -
Priority queue never releases items - Something in the weighted random selection is wrong on Linux (different RNG behavior?).
Running tests
To get logs, set RUST_LOG=info or RUST_LOG=debug when running Zed.
For the extension_host test hang (separate issue):
cargo test -p extension_host extension_store_test::test_extension_store_with_test_extension -- --nocapture
Key commits
5b07e2b242- "WIP: scheduler integration debugging" - This accidentally added unrelated UI componentsd8ebd8101f- "WIP: scheduler integration debugging + agent terminal diagnostics" - Added debug instrumentation (now removed)