Automatically retry the agent's LLM completion requests when the
provider returns 429 Too Many Requests. Uses the Retry-After header to
determine the retry delay if it is available.
Many providers are frequently overloaded or have low rate limits. These
providers are essentially unusable without automatic retries.
Tested with Cerebras configured via openai_compatible.
Related: #31531
Release Notes:
- Added automatic retries for OpenAI-compatible LLM providers
---------
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
I am using an Azure OpenAI instance since that is what is provided at
work and with how they have it setup not all responses contain a delta,
which lead to errors and truncated responses. This is related to how
they are filtering potentially offensive requests and responses. I don't
believe this filter was made in-house, instead I believe it is provided
by Microsoft/Azure, so I suspect this fix may help other users.
Release Notes:
- N/A
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
We've been considering removing workspace-hack for a couple reasons:
- Lukas ran into a situation where its build script seemed to be causing
spurious rebuilds. This seems more likely to be a cargo bug than an
issue with workspace-hack itself (given that it has an empty build
script), but we don't necessarily want to take the time to hunt that
down right now.
- Marshall mentioned hakari interacts poorly with automated crate
updates (in our case provided by rennovate) because you'd need to have
`cargo hakari generate && cargo hakari manage-deps` after their changes
and we prefer to not have actions that make commits.
Currently removing workspace-hack causes our workspace to grow from
~1700 to ~2000 crates being built (depending on platform), which is
mainly a problem when you're building the whole workspace or running
tests across the the normal and remote binaries (which is where
feature-unification nets us the most sharing). It doesn't impact
incremental times noticeably when you're just iterating on `-p zed`, and
we'll hopefully get these savings back in the future when
rust-lang/cargo#14774 (which re-implements the functionality of hakari)
is finished.
Release Notes:
- N/A
Co-Authored-By: Ben K <ben@zed.dev>
Co-Authored-By: Anthony <anthony@zed.dev>
Co-Authored-By: Mikayla <mikayla@zed.dev>
Release Notes:
- settings: Major internal changes to settings. The primary user-facing
effect is that some settings which did not make sense in project
settings files are no-longer read from there. (For example the inline
blame settings)
---------
Co-authored-by: Ben Kunkle <ben@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
Co-authored-by: Anthony <anthony@zed.dev>
Follow up: #34532Closes#35434
Mostly fixes a issue were when the tool_choice is none it was getting
serialised as null. This was fixed for openrouter just wanted to follow
up and cleanup for other providers which might have this issue as this
is against the spec.
Release Notes:
- N/A
This prevents the common footgun of copy/pasting an API key
starting/ending with extra newlines, which would lead to a "bad request"
error.
Closes#37038
Release Notes:
- agent: Support pasting language model API keys that contain newlines.
Some APIs fail when they get this parameter
Closes#36215
Release Notes:
- Fixed OpenAI-compatible providers that don't support prompt caching
and/or reasoning
Release Notes:
- Added `reasoning_effort` support to custom models
Tested using the following config:
```json5
"language_models": {
"openai": {
"available_models": [
{
"name": "gpt-5-mini",
"display_name": "GPT 5 Mini (custom reasoning)",
"max_output_tokens": 128000,
"max_tokens": 272000,
"reasoning_effort": "high" // Can be minimal, low, medium (default), and high
}
],
"version": "1"
}
}
```
Docs:
https://platform.openai.com/docs/api-reference/chat/create#chat_create-reasoning_effort
This work could be used to split the GPT 5/5-mini/5-nano into each of
it's reasoning effort variant. E.g. `gpt-5`, `gpt-5 low`, `gpt-5
minimal`, `gpt-5 high`, and same for mini/nano.
Release Notes:
* Added a setting to control `reasoning_effort` in OpenAI models
Context: In this PR: https://github.com/zed-industries/zed/pull/33362,
we started to use underlying open_ai crate for making api calls for
vercel as well. Now whenever we get the error we get something like the
below. Where on part of the error mentions OpenAI but the rest of the
error returns the actual error from provider. This PR tries to make the
error generic for now so that people don't get confused seeing OpenAI in
their v0 integration.
```
Error interacting with language model
Failed to connect to OpenAI API: 403 Forbidden {"success":false,"error":"Premium or Team plan required to access the v0 API: https://v0.dev/chat/settings/billing"}
```
Release Notes:
- N/A
Ran into this while adding support for Vercel v0s models:
- The timestamp seems to be returned in Milliseconds instead of seconds
so it breaks the bounds of `created: u32`. We did not use this field
anywhere so just decided to remove it
- Sometimes the `choices` field can be empty when the last chunk comes
in because it only contains `usage`
Release Notes:
- N/A
Previously we were using a mix of `u32` and `usize`, e.g. `max_tokens:
usize, max_output_tokens: Option<u32>` in the same `struct`.
Although [tiktoken](https://github.com/openai/tiktoken) uses `usize`,
token counts should be consistent across targets (e.g. the same model
doesn't suddenly get a smaller context window if you're compiling for
wasm32), and these token counts could end up getting serialized using a
binary protocol, so `usize` is not the right choice for token counts.
I chose to standardize on `u64` over `u32` because we don't store many
of them (so the extra size should be insignificant) and future models
may exceed `u32::MAX` tokens.
Release Notes:
- N/A
Remove unnecessary alias attributes from Model enum variants and add
max_output_tokens limits for all OpenAI models. Also fix
supports_system_messages to explicitly handle all model variants.
Release Notes:
- N/A
There is no ISSUE opened on this topic
Release Notes:
- N/A
---------
Co-authored-by: Peter Tripp <peter@zed.dev>
Co-authored-by: Ben Brandt <benjamin.j.brandt@gmail.com>
Fixes regression caused by:
https://github.com/zed-industries/zed/pull/30639
Assistant messages can come back with no content, and we no longer
allowed that in the deserialization.
Release Notes:
- open_ai: fixed deserialization issue if assistant content was empty
https://github.com/zed-industries/zed/issues/30972 brought up another
case where our context is not enough to track the actual source of the
issue: we get a general top-level error without inner error.
The reason for this was `.ok_or_else(|| anyhow!("failed to read HEAD
SHA"))?; ` on the top level.
The PR finally reworks the way we use anyhow to reduce such issues (or
at least make it simpler to bubble them up later in a fix).
On top of that, uses a few more anyhow methods for better readability.
* `.ok_or_else(|| anyhow!("..."))`, `map_err` and other similar error
conversion/option reporting cases are replaced with `context` and
`with_context` calls
* in addition to that, various `anyhow!("failed to do ...")` are
stripped with `.context("Doing ...")` messages instead to remove the
parasitic `failed to` text
* `anyhow::ensure!` is used instead of `if ... { return Err(...); }`
calls
* `anyhow::bail!` is used instead of `return Err(anyhow!(...));`
Release Notes:
- N/A
* Adds a fast / cheaper model to providers and defaults thread
summarization to this model. Initial motivation for this was that
https://github.com/zed-industries/zed/pull/29099 would cause these
requests to fail when used with a thinking model. It doesn't seem
correct to use a thinking model for summarization.
* Skips system prompt, context, and thinking segments.
* If tool use is happening, allows 2 tool uses + one more agent response
before summarizing.
Downside of this is that there was potential for some prefix cache reuse
before, especially for title summarization (thread summarization omitted
tool results and so would not share a prefix for those). This seems fine
as these requests should typically be fairly small. Even for full thread
summarization, skipping all tool use / context should greatly reduce the
token use.
Release Notes:
- N/A
Release Notes:
- Add support for OpenAI o3 and o4-mini models via OpenAI API and
Copilot Chat providers.
---------
Co-authored-by: Peter Tripp <peter@zed.dev>
Release Notes:
- Add support for OpenAI GPT-4.1 via Copilot Chat and OpenAI API
---------
Co-authored-by: Danilo Leal <daniloleal09@gmail.com>
Co-authored-by: Bennet Bo Fenner <bennetbo@gmx.de>
This PR disables `parallel_tool_calls` for the models that support it,
as the Agent currently expects at most one tool use per turn.
It was a bit of trial and error to figure this out. OpenAI's API
annoyingly will return an error if passing `parallel_tool_calls` to a
model that doesn't support it.
Release Notes:
- N/A
This adds a "workspace-hack" crate, see
[mozilla's](https://hg.mozilla.org/mozilla-central/file/3a265fdc9f33e5946f0ca0a04af73acd7e6d1a39/build/workspace-hack/Cargo.toml#l7)
for a concise explanation of why this is useful. For us in practice this
means that if I were to run all the tests (`cargo nextest r
--workspace`) and then `cargo r`, all the deps from the previous cargo
command will be reused. Before this PR it would rebuild many deps due to
resolving different sets of features for them. For me this frequently
caused long rebuilds when things "should" already be cached.
To avoid manually maintaining our workspace-hack crate, we will use
[cargo hakari](https://docs.rs/cargo-hakari) to update the build files
when there's a necessary change. I've added a step to CI that checks
whether the workspace-hack crate is up to date, and instructs you to
re-run `script/update-workspace-hack` when it fails.
Finally, to make sure that people can still depend on crates in our
workspace without pulling in all the workspace deps, we use a `[patch]`
section following [hakari's
instructions](https://docs.rs/cargo-hakari/0.9.36/cargo_hakari/patch_directive/index.html)
One possible followup task would be making guppy use our
`rust-toolchain.toml` instead of having to duplicate that list in its
config, I opened an issue for that upstream: guppy-rs/guppy#481.
TODO:
- [x] Fix the extension test failure
- [x] Ensure the dev dependencies aren't being unified by Hakari into
the main dependencies
- [x] Ensure that the remote-server binary continues to not depend on
LibSSL
Release Notes:
- N/A
---------
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Mikayla Maki <mikayla.c.maki@gmail.com>
There's still a bit more work to do on this, but this PR is compiling
(with warnings) after eliminating the key types. When the tasks below
are complete, this will be the new narrative for GPUI:
- `Entity<T>` - This replaces `View<T>`/`Model<T>`. It represents a unit
of state, and if `T` implements `Render`, then `Entity<T>` implements
`Element`.
- `&mut App` This replaces `AppContext` and represents the app.
- `&mut Context<T>` This replaces `ModelContext` and derefs to `App`. It
is provided by the framework when updating an entity.
- `&mut Window` Broken out of `&mut WindowContext` which no longer
exists. Every method that once took `&mut WindowContext` now takes `&mut
Window, &mut App` and every method that took `&mut ViewContext<T>` now
takes `&mut Window, &mut Context<T>`
Not pictured here are the two other failed attempts. It's been quite a
month!
Tasks:
- [x] Remove `View`, `ViewContext`, `WindowContext` and thread through
`Window`
- [x] [@cole-miller @mikayla-maki] Redraw window when entities change
- [x] [@cole-miller @mikayla-maki] Get examples and Zed running
- [x] [@cole-miller @mikayla-maki] Fix Zed rendering
- [x] [@mikayla-maki] Fix todo! macros and comments
- [x] Fix a bug where the editor would not be redrawn because of view
caching
- [x] remove publicness window.notify() and replace with
`AppContext::notify`
- [x] remove `observe_new_window_models`, replace with
`observe_new_models` with an optional window
- [x] Fix a bug where the project panel would not be redrawn because of
the wrong refresh() call being used
- [x] Fix the tests
- [x] Fix warnings by eliminating `Window` params or using `_`
- [x] Fix conflicts
- [x] Simplify generic code where possible
- [x] Rename types
- [ ] Update docs
### issues post merge
- [x] Issues switching between normal and insert mode
- [x] Assistant re-rendering failure
- [x] Vim test failures
- [x] Mac build issue
Release Notes:
- N/A
---------
Co-authored-by: Antonio Scandurra <me@as-cii.com>
Co-authored-by: Cole Miller <cole@zed.dev>
Co-authored-by: Mikayla <mikayla@zed.dev>
Co-authored-by: Joseph <joseph@zed.dev>
Co-authored-by: max <max@zed.dev>
Co-authored-by: Michael Sloan <michael@zed.dev>
Co-authored-by: Mikayla Maki <mikaylamaki@Mikaylas-MacBook-Pro.local>
Co-authored-by: Mikayla <mikayla.c.maki@gmail.com>
Co-authored-by: joão <joao@zed.dev>
This removes the `low_speed_timeout` setting from all providers as a
response to issue #19509.
Reason being that the original `low_speed_timeout` was only as part of
#9913 because users wanted to _get rid of timeouts_. They wanted to bump
the default timeout from 5sec to a lot more.
Then, in the meantime, the meaning of `low_speed_timeout` changed in
#19055 and was changed to a normal `timeout`, which is a different thing
and breaks slower LLMs that don't reply with a complete response in the
configured timeout.
So we figured: let's remove the whole thing and replace it with a
default _connect_ timeout to make sure that we can connect to a server
in 10s, but then give the server as long as it wants to complete its
response.
Closes#19509
Release Notes:
- Removed the `low_speed_timeout` setting from LLM provider settings,
since it was only used to _increase_ the timeout to give LLMs more time,
but since we don't have any other use for it, we simply remove the
setting to give LLMs as long as they need.
---------
Co-authored-by: Antonio <antonio@zed.dev>
Co-authored-by: Peter Tripp <peter@zed.dev>
Users of our http_client crate knew they were interacting with isahc as
they set its extensions on the request. This change adds our own
equivalents for their APIs in preparation for changing the default http
client.
Release Notes:
- N/A