https://github.com/zed-industries/zed/pull/40883 implemented this
incorrectly. It was marking a random background thread as a wasm thread
(whatever thread picked up the wasm epoch timer background task),
instead of marking the threads that actually run the wasm extension.
This has two implications:
1. it didn't prevent extension panics from tearing down as planned
2. Worse, it actually made us hide legit panics in sentry for one of our
background workers.
Now 2 still technically applies for all tokio threads after this, but we
basically only use these for wasm extensions in the main zed binary.
Release Notes:
- Fixed extension panics crashing Zed on Linux
Closes#41471
We were killing the crash handler when it received a second copy of any
of the messages, but this GPU specs one is sent on each new window
rather than once at startup. We could gate the sending to only happen
once, but it's simpler to just allow multiple gpu specs messages.
Release Notes:
- N/A
Set a TLS bit to skip invoking the crash handler when a detached thread
panics.
cc @P1n3appl3 - is this at odds with what we need the crash handler to
do?
May close#39289, cannot repro without a nightly build
Release Notes:
- Fixed extension panics crashing Zed on Linux
Co-authored-by: dino <dinojoaocosta@gmail.com>
We've been considering removing workspace-hack for a couple reasons:
- Lukas ran into a situation where its build script seemed to be causing
spurious rebuilds. This seems more likely to be a cargo bug than an
issue with workspace-hack itself (given that it has an empty build
script), but we don't necessarily want to take the time to hunt that
down right now.
- Marshall mentioned hakari interacts poorly with automated crate
updates (in our case provided by rennovate) because you'd need to have
`cargo hakari generate && cargo hakari manage-deps` after their changes
and we prefer to not have actions that make commits.
Currently removing workspace-hack causes our workspace to grow from
~1700 to ~2000 crates being built (depending on platform), which is
mainly a problem when you're building the whole workspace or running
tests across the the normal and remote binaries (which is where
feature-unification nets us the most sharing). It doesn't impact
incremental times noticeably when you're just iterating on `-p zed`, and
we'll hopefully get these savings back in the future when
rust-lang/cargo#14774 (which re-implements the functionality of hakari)
is finished.
Release Notes:
- N/A
This allows to filter by main zed binary or remote server crashes, as
well as easily tell whether a crash happened in a remote-server binary
or not.
Release Notes:
- N/A
std commands can block for an arbitrary duration and so runs risk of
blocking tasks for too long. This replaces all such uses where sensible
with async processes.
Release Notes:
- N/A *or* Added/Fixed/Improved ...
@maxdeviant We can eventually turn down the panic telemetry endpoint,
but should probably leave it up while there's still a bunch of stable
users hitting it.
@maxbrunsfeld We're optimistic that this change also fixed the macos
crashed-thread misreporting. We think it was because the
`CrashContext::exception` was getting set to `None` only on macos, while
on linux it was getting a real exception value from the sigtrap. Now
we've unified and it uses `SIGABRT` on both platforms (I need to double
check that this works as expected for windows).
We unconditionally set `RUST_BACKTRACE=1` for the current process so
that we see backtraces when running in a terminal by default. This
should be fine but I just wanted to note it since it's a bit abnormal.
Release Notes:
- N/A
---------
Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
Closes #ISSUE
Adds system GPU collection to crash reporting. Currently this is Linux
only.
The system GPUs are determined by reading the `/sys/class/drm` directory
structure, rather than using the exisiting `gpui::Window::gpu_specs()`
method in order to gather more information, and so that the GPU context
is not dependent on Vulkan context initialization (i.e. we still get GPU
info when Zed fails to start because Vulkan failed to initialize).
Unfortunately, the `blade` APIs do not support querying which GPU _will_
be used, so we do not know which GPU was attempted to be used when
Vulkan context initialization fails, however, when Vulkan initialization
succeeds, we send a message to the crash handler containing the result
of `gpui::Window::gpu_specs()` to include the "Active" gpu in any crash
report that may occur
Release Notes:
- N/A *or* Added/Fixed/Improved ...
On the latest build @maxbrunsfeld got a panic that hung zed. It appeared
that the hang occured after the minidump had been successfully written,
so our theory on what happened is that the `suspend_all_other_threads`
call in the crash handler suspended the panicking thread (due to the
signal from simulate_exception being received on a different thread),
and then when the crash handler returned everything was suspended so the
panic hook never made it to the `process::abort`.
This change makes the crash handler avoid _both_ the current and the
panicking thread which should avoid that scenario.
Release Notes:
- N/A
We see a bunch of crash events with truncated minidumps where they have
a valid header but no events. We think this is due to an issue
generating them, so we're attaching the relevant result to the uploaded
tags.
Release Notes:
- N/A
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
This should improve our detection of which thread crashed since they
wont be able to resume while the minidump is being generated.
Release Notes:
- N/A
The minidump-based crash reporting is now entirely separate from our
legacy panic_hook-based reporting. This should improve the association
of minidumps with their metadata and give us more consistent crash
reports.
Release Notes:
- N/A
---------
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>
We only want minidumps to be generated on actual release builds. Now we
avoid spawning crash handler processes for dev builds. To test
minidumping you can still set the `ZED_GENERATE_MINIDUMPS` env var which
force-enable the feature.
Release Notes:
- N/A
- [x] Handle uploading minidumps from the remote_server
- [x] Associate minidumps with panics with some sort of ID (we don't use
session_id on the remote)
- [x] Update the protobufs and client/server code to request panics
- [x] Upload minidumps with no corresponding panic
- [x] Fill in panic info when there _is_ a corresponding panic
- [x] Use an env var for the sentry endpoint instead of hardcoding it
Release Notes:
- Zed now generates minidumps for crash reporting
---------
Co-authored-by: Max Brunsfeld <maxbrunsfeld@gmail.com>