They count the requested max_output_tokens against the prompt total. Seems like a bug on their end as most other providers don't do this, but now we just default to None for the main models and let the API use its default behavior which works just fine. Closes: #45134 Release Notes: - deepseek: Fix issue with Deepseek API that was causing the token limit to be reached sooner than necessary