Support opening and saving files with legacy encodings (#44819)
## Summary Addresses #16965 This PR adds support for **opening and saving** files with legacy encodings (non-UTF-8). Previously, Zed failed to open files encoded in Shift-JIS, EUC-JP, Big5, etc., displaying a "Could not open file" error screen. This PR implements automatic encoding detection upon opening and ensures the original encoding is preserved when saving. ## Implementation Details 1. **Worktree (Loading)**: * Updated `load_file` to use `chardetng` for automatic encoding detection. * Files are decoded to UTF-8 internal strings for editing, while preserving the detected `Encoding` metadata. 2. **Language / Buffer**: * Added an `encoding` field to the `Buffer` struct to store the detected encoding. 3. **Worktree (Saving)**: * Updated `write_file` to accept the stored encoding. * **Performance Optimization**: * **UTF-8 Path**: Uses the existing optimized `fs.save` (streaming chunks directly from Rope), ensuring no performance regression for the vast majority of files. * **Legacy Encoding Path**: Implemented a fallback that converts the Rope to a contiguous `String/Bytes` in memory, re-encodes it to the target format (e.g., Shift-JIS), and writes it to disk. * *Note*: This fallback involves memory allocation, but it is necessary to support legacy encodings without refactoring the `fs` crate's streaming interfaces. ## Changes - `crates/worktree`: - Add dependencies: `encoding_rs`, `chardetng`. - Update `load_file` to detect encoding and decode content. - Update `write_file` to handle re-encoding on save. - `crates/language`: Add `encoding` field and accessors to `Buffer`. - `crates/project`: Pass encoding information between Worktree and Buffer. - `crates/vim`: Update `:w` command to use the new `write_file` signature. ## Verification I validated this manually using a Rust script to generate test files with various encodings. **Results:** * ✅ **Success (Opened & Saved correctly):** * **Japanese:** `Shift-JIS` (CP932), `EUC-JP`, `ISO-2022-JP` * **Chinese:** `Big5` (Traditional), `GBK/GB2312` (Simplified) * **Western/Unicode:** `Windows-1252` (CP1252), `UTF-16LE`, `UTF-16BE` * ⚠️ **limitations (Detection accuracy):** * Some specific encodings like `KOI8-R` or generic `Latin1` (ISO-8859-1) may partially display replacement characters (`?`) depending on the file content length. This is a known limitation of the heuristic detection library (`chardetng`) rather than the saving logic. Release Notes: - Added support for opening and saving files with legacy encodings (Shift-JIS, Big5, etc.) --------- Co-authored-by: CrazyboyQCD <53971641+CrazyboyQCD@users.noreply.github.com> Co-authored-by: Conrad Irwin <conrad.irwin@gmail.com>
This commit is contained in:
39
Cargo.lock
generated
39
Cargo.lock
generated
@@ -2667,9 +2667,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "cap-fs-ext"
|
||||
version = "3.4.5"
|
||||
version = "3.4.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d5528f85b1e134ae811704e41ef80930f56e795923f866813255bc342cc20654"
|
||||
checksum = "e41cc18551193fe8fa6f15c1e3c799bc5ec9e2cfbfaa8ed46f37013e3e6c173c"
|
||||
dependencies = [
|
||||
"cap-primitives",
|
||||
"cap-std",
|
||||
@@ -2679,9 +2679,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "cap-net-ext"
|
||||
version = "3.4.5"
|
||||
version = "3.4.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "20a158160765c6a7d0d8c072a53d772e4cb243f38b04bfcf6b4939cfbe7482e7"
|
||||
checksum = "9f83833816c66c986e913b22ac887cec216ea09301802054316fc5301809702c"
|
||||
dependencies = [
|
||||
"cap-primitives",
|
||||
"cap-std",
|
||||
@@ -2691,9 +2691,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "cap-primitives"
|
||||
version = "3.4.5"
|
||||
version = "3.4.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b6cf3aea8a5081171859ef57bc1606b1df6999df4f1110f8eef68b30098d1d3a"
|
||||
checksum = "0a1e394ed14f39f8bc26f59d4c0c010dbe7f0a1b9bafff451b1f98b67c8af62a"
|
||||
dependencies = [
|
||||
"ambient-authority",
|
||||
"fs-set-times",
|
||||
@@ -2709,9 +2709,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "cap-rand"
|
||||
version = "3.4.5"
|
||||
version = "3.4.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "d8144c22e24bbcf26ade86cb6501a0916c46b7e4787abdb0045a467eb1645a1d"
|
||||
checksum = "0acb89ccf798a28683f00089d0630dfaceec087234eae0d308c05ddeaa941b40"
|
||||
dependencies = [
|
||||
"ambient-authority",
|
||||
"rand 0.8.5",
|
||||
@@ -2719,9 +2719,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "cap-std"
|
||||
version = "3.4.5"
|
||||
version = "3.4.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "b6dc3090992a735d23219de5c204927163d922f42f575a0189b005c62d37549a"
|
||||
checksum = "07c0355ca583dd58f176c3c12489d684163861ede3c9efa6fd8bba314c984189"
|
||||
dependencies = [
|
||||
"cap-primitives",
|
||||
"io-extras",
|
||||
@@ -2731,9 +2731,9 @@ dependencies = [
|
||||
|
||||
[[package]]
|
||||
name = "cap-time-ext"
|
||||
version = "3.4.5"
|
||||
version = "3.4.4"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "def102506ce40c11710a9b16e614af0cde8e76ae51b1f48c04b8d79f4b671a80"
|
||||
checksum = "491af520b8770085daa0466978c75db90368c71896523f2464214e38359b1a5b"
|
||||
dependencies = [
|
||||
"ambient-authority",
|
||||
"cap-primitives",
|
||||
@@ -2896,6 +2896,17 @@ dependencies = [
|
||||
"util",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "chardetng"
|
||||
version = "0.1.17"
|
||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||
checksum = "14b8f0b65b7b08ae3c8187e8d77174de20cb6777864c6b832d8ad365999cf1ea"
|
||||
dependencies = [
|
||||
"cfg-if",
|
||||
"encoding_rs",
|
||||
"memchr",
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "chrono"
|
||||
version = "0.4.42"
|
||||
@@ -8797,6 +8808,7 @@ dependencies = [
|
||||
"ctor",
|
||||
"diffy",
|
||||
"ec4rs",
|
||||
"encoding_rs",
|
||||
"fs",
|
||||
"futures 0.3.31",
|
||||
"fuzzy",
|
||||
@@ -12465,6 +12477,7 @@ dependencies = [
|
||||
"dap",
|
||||
"dap_adapters",
|
||||
"db",
|
||||
"encoding_rs",
|
||||
"extension",
|
||||
"fancy-regex",
|
||||
"fs",
|
||||
@@ -20231,8 +20244,10 @@ version = "0.1.0"
|
||||
dependencies = [
|
||||
"anyhow",
|
||||
"async-lock 2.8.0",
|
||||
"chardetng",
|
||||
"clock",
|
||||
"collections",
|
||||
"encoding_rs",
|
||||
"fs",
|
||||
"futures 0.3.31",
|
||||
"fuzzy",
|
||||
|
||||
Reference in New Issue
Block a user