Compare commits

...

35 Commits

Author SHA1 Message Date
Antonio Scandurra
9e6bd78913 WIP 2024-07-01 17:56:23 +02:00
Antonio Scandurra
5d0902735f Implement chunking for long summaries in combine_summaries
Improves handling of large summary inputs by breaking them into
manageable chunks, processing each chunk separately, and then
combining the results. This approach prevents token limit issues
and enhances the quality of the final summary for large inputs.
2024-07-01 12:16:34 +02:00
Antonio Scandurra
de6e78c5ae Treat files that don't have any symbols as normal text 2024-07-01 11:44:13 +02:00
Antonio Scandurra
d2e55884a5 Refactor HTTP clients to use internal http crate
This change improves consistency and maintainability by using our internal
`http` crate instead of `reqwest` for HuggingFace and Ollama clients. It
reduces external dependencies and aligns these components with our existing
HTTP infrastructure, making it easier to manage and update in the future.
2024-07-01 11:00:07 +02:00
Antonio Scandurra
b72735453e Implement language model abstraction for improved modularity
This change introduces a LanguageModel trait to abstract away the specifics
of different language model implementations. By doing so, we enhance the
flexibility and extensibility of the miner, allowing for easier integration
of new language models in the future. The HuggingFaceClient and OllamaClient
have been updated to implement this trait, promoting a more uniform interface
for language model interactions throughout the codebase.

Additionally, this commit includes improvements to the project summarization
process, such as better handling of ignored files and directories, and a more
robust cleanup of deleted entries from the database. These changes contribute
to a more accurate and efficient mining process.

The introduction of comprehensive tests for the Miner struct ensures the
reliability of the summarization process and validates the correct behavior
of the new language model abstraction.
2024-07-01 10:47:05 +02:00
Antonio Scandurra
c570aca3a7 Remove tokio from miner 2024-07-01 09:39:18 +02:00
Antonio Scandurra
76b4844eec Implement Database and CachedSummary structures
- Create new database.rs file with Database and CachedSummary structures
- Implement methods for Database including new() and transact()
- Move database-related code from miner.rs to database.rs
- Update miner.rs to use the new Database module

This commit improves code organization by separating database
functionality into its own module, enhancing maintainability
and readability of the codebase.
2024-07-01 09:21:36 +02:00
Antonio Scandurra
9d7467a2f6 Ignore .git directory when scanning files
This commit adds a check to skip the .git directory when scanning files
during the mining process. This prevents unnecessary processing of Git
metadata files and improves overall performance.
2024-06-30 17:40:13 +02:00
Nathan Sobo
c60b6a6bac WIP: Implement Miner with Fs trait and GitIgnore support
This commit updates the Miner struct and related functions to use the Fs
trait instead of direct filesystem operations, while also incorporating
GitIgnore functionality. Key changes include:

1. Add fs and git dependencies to Cargo.toml
2. Update Miner struct to include an Arc<dyn Fs> field
3. Modify Miner::new to accept an Arc<dyn Fs> parameter
4. Replace std::fs and tokio::fs calls with Fs trait methods
5. Implement a custom walk_directory function using ignore crate's GitignoreBuilder
6. Update error handling for metadata operations
7. Add DirEntry struct for consistency with Fs trait
8. Implement build_gitignore function to create GitIgnore patterns

These changes improve testability, allow for easier mocking of filesystem
operations, and ensure proper handling of .gitignore rules during project
summarization.
2024-06-29 23:16:37 -06:00
Nathan Sobo
cbf0ac889b Enhance Rust symbol context extraction
Introduce ParsedFile struct to share parsed syntax trees, enabling richer
context extraction for multiple symbols within a file. This approach
provides more comprehensive symbol information by leveraging the shared
tree structure.

Key changes:
1. ParsedFile struct for sharing file content and syntax tree
2. Methods for extracting symbol context, module structure, and nearby functions
3. RustSymbol entry now includes shared ParsedFile
4. Refactor parse_and_enqueue_rust_symbols to use shared ParsedFile
5. Update process_rust_symbol and summarize_rust_symbol with extracted context

These improvements boost efficiency and provide more comprehensive
information for generating accurate Rust symbol summaries.
2024-06-29 22:27:47 -06:00
Nathan Sobo
5e4a658b88 Implement combine_summaries method
This commit implements the combine_summaries method to create a concise
summary from multiple input summaries. It uses the AI model to generate
a cohesive overview of the provided summaries, improving the quality of
project-level summaries.

Key changes:
- Handle empty and single summary cases
- Create a combined prompt from input summaries
- Use the AI client to generate a combined summary
- Stream and concatenate the AI-generated content
2024-06-29 22:02:32 -06:00
Nathan Sobo
d72a0d4d61 Document methods 2024-06-29 21:57:19 -06:00
Nathan Sobo
68c8614792 Implement directory summary combination
When popping a directory from the queue, we now combine the summaries
of all its contents that have been processed. This approach ensures
that directory summaries are created only when all child entries
have been summarized, providing a more comprehensive overview of
the directory's contents.

Key changes:
- Modified process_directory to check and combine summaries
- Re-enqueue directories if not all contents are summarized
- Implemented combine_summaries method for summary aggregation

This commit improves the overall summarization process by creating
more meaningful directory-level summaries based on their contents.
2024-06-29 21:47:31 -06:00
Nathan Sobo
0df293941d Prevent double enqueueing of files
Modify the `scan_directory` method to avoid enqueueing files that have already been processed.
I still don't understand why it's happening though.
2024-06-29 14:52:58 -06:00
Nathan Sobo
a58cc29991 Save Rust symbol summaries to database
This commit implements saving individual Rust symbol summaries to the database
as they are processed. It also saves the complete file summary when all symbols
for a file have been processed. This change ensures that every LLM call result
is persisted, allowing for better recovery in case of interruptions or failures.

Key changes:
- Add logic to save symbol summaries with unique keys
- Save complete file summaries when all symbols are processed
- Improve error handling and progress tracking
2024-06-29 11:34:36 -06:00
Nathan Sobo
96ff2cc531 WIP 2024-06-29 11:28:30 -06:00
Nathan Sobo
1919982116 Don't panic in parse_and_enqueue_rust_symbols
- Replace unwrap() calls with ? operator for better error handling
- Update scan_file to handle Result from parse_and_enqueue_rust_symbols
- Improve error reporting for failed Rust symbol parsing
2024-06-29 10:30:02 -06:00
Nathan Sobo
63dd33eb13 WIP: A couple errors in generated code but gotta go 2024-06-29 09:38:53 -06:00
Nathan Sobo
0b8d398751 Update endpoint 2024-06-28 18:05:39 -06:00
Nathan Sobo
e40bdcf138 WIP 2024-06-28 17:57:53 -06:00
Nathan Sobo
8daa7a7e9f WIP 2024-06-28 17:18:07 -06:00
Nathan Sobo
634f265e66 WIP: Qwen2 2024-06-28 17:07:57 -06:00
Nathan Sobo
6883f0a2ba WIP 2024-06-28 17:02:42 -06:00
Antonio Scandurra
34b728539f WIP: start on accumulating exports/imports 2024-06-27 18:57:03 +02:00
Antonio Scandurra
9ef2d85fa8 WIP 2024-06-27 15:32:08 +02:00
Antonio Scandurra
4d5a70ccbf Checkpoint 2024-06-27 13:45:49 +02:00
Antonio Scandurra
d4992ecab4 Checkpoint 2024-06-27 13:34:11 +02:00
Antonio Scandurra
49be47d322 Checkpoint 2024-06-27 11:39:23 +02:00
Antonio Scandurra
f18e9b073b Checkpoint 2024-06-27 10:40:22 +02:00
Antonio Scandurra
df829e50ea WIP 2024-06-26 21:08:52 +02:00
Antonio Scandurra
330bb4c1ce WIP 2024-06-26 18:02:01 +02:00
Antonio Scandurra
65d47587c8 Checkpoint 2024-06-26 17:54:02 +02:00
Nathan Sobo
aceb5581b3 Try aider 2024-06-24 14:30:14 -06:00
Nathan Sobo (aider)
24c8bad8de Added a new crate 'miner' to the Cargo.toml file. 2024-06-24 14:26:47 -06:00
Nathan Sobo (aider)
8f6ea25a95 Added the 'semantic_mining' crate to the Cargo.toml file. 2024-06-24 14:25:10 -06:00
11 changed files with 2209 additions and 18 deletions

1
.gitignore vendored
View File

@@ -28,3 +28,4 @@ DerivedData/
.vscode
.wrangler
.flatpak-builder
.aider*

367
Cargo.lock generated
View File

@@ -1280,7 +1280,7 @@ dependencies = [
"once_cell",
"pin-project-lite",
"pin-utils",
"rustls",
"rustls 0.21.12",
"tokio",
"tracing",
]
@@ -3189,6 +3189,41 @@ version = "1.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "96a6ac251f4a2aca6b3f91340350eab87ae57c3f127ffeb585e92bd336717991"
[[package]]
name = "darling"
version = "0.20.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "83b2eb4d90d12bdda5ed17de686c2acb4c57914f8f921b8da7e112b5a36f3fe1"
dependencies = [
"darling_core",
"darling_macro",
]
[[package]]
name = "darling_core"
version = "0.20.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "622687fe0bac72a04e5599029151f5796111b90f1baaa9b544d807a5e31cd120"
dependencies = [
"fnv",
"ident_case",
"proc-macro2",
"quote",
"strsim 0.11.1",
"syn 2.0.59",
]
[[package]]
name = "darling_macro"
version = "0.20.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "733cabb43482b1a1b53eee8583c2b9e8684d592215ea83efd305dd31bc2f0178"
dependencies = [
"darling_core",
"quote",
"syn 2.0.59",
]
[[package]]
name = "dashmap"
version = "5.5.3"
@@ -3296,6 +3331,37 @@ dependencies = [
"syn 1.0.109",
]
[[package]]
name = "derive_builder"
version = "0.20.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0350b5cb0331628a5916d6c5c0b72e97393b8b6b03b47a9284f4e7f5a405ffd7"
dependencies = [
"derive_builder_macro",
]
[[package]]
name = "derive_builder_core"
version = "0.20.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d48cda787f839151732d396ac69e3473923d54312c070ee21e9effcaa8ca0b1d"
dependencies = [
"darling",
"proc-macro2",
"quote",
"syn 2.0.59",
]
[[package]]
name = "derive_builder_macro"
version = "0.20.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "206868b8242f27cecce124c19fd88157fbd0dd334df2587f36417bafbc85097b"
dependencies = [
"derive_builder_core",
"syn 2.0.59",
]
[[package]]
name = "derive_more"
version = "0.99.17"
@@ -3742,6 +3808,15 @@ dependencies = [
"libc",
]
[[package]]
name = "esaxx-rs"
version = "0.1.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d817e038c30374a4bcb22f94d0a8a0e216958d4c3dcde369b1439fec4bdda6e6"
dependencies = [
"cc",
]
[[package]]
name = "etagere"
version = "0.2.8"
@@ -5082,6 +5157,23 @@ version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dfa686283ad6dd069f105e5ab091b04c62850d3e4cf5d67debad1933f55023df"
[[package]]
name = "hf-hub"
version = "0.3.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2b780635574b3d92f036890d8373433d6f9fc7abb320ee42a5c25897fc8ed732"
dependencies = [
"dirs 5.0.1",
"indicatif",
"log",
"native-tls",
"rand 0.8.5",
"serde",
"serde_json",
"thiserror",
"ureq",
]
[[package]]
name = "hidden-trait"
version = "0.1.2"
@@ -5273,7 +5365,7 @@ dependencies = [
"http 0.2.9",
"hyper",
"log",
"rustls",
"rustls 0.21.12",
"rustls-native-certs",
"tokio",
"tokio-rustls",
@@ -5321,6 +5413,12 @@ version = "2.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "25a2bc672d1148e28034f176e01fffebb08b35768468cc954630da77a1449005"
[[package]]
name = "ident_case"
version = "1.0.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b9e0384b61958566e926dc50660321d12159025e767c18e043daf26b70104c39"
[[package]]
name = "idna"
version = "0.5.0"
@@ -5407,6 +5505,19 @@ dependencies = [
"serde",
]
[[package]]
name = "indicatif"
version = "0.17.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "763a5a8f45087d6bcea4222e7b72c291a054edf80e4ef6efd2a4979878c7bea3"
dependencies = [
"console",
"instant",
"number_prefix",
"portable-atomic",
"unicode-width",
]
[[package]]
name = "indoc"
version = "1.0.9"
@@ -6230,6 +6341,22 @@ dependencies = [
"libc",
]
[[package]]
name = "macro_rules_attribute"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8a82271f7bc033d84bbca59a3ce3e4159938cb08a9c3aebbe54d215131518a13"
dependencies = [
"macro_rules_attribute-proc_macro",
"paste",
]
[[package]]
name = "macro_rules_attribute-proc_macro"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b8dd856d451cc0da70e2ef2ce95a18e39a93b7558bedf10201ad28503f918568"
[[package]]
name = "malloc_buf"
version = "0.0.6"
@@ -6427,6 +6554,30 @@ version = "0.3.17"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6877bb514081ee2a7ff5ef9de3281f14a4dd4bceac4c09388074a6b5df8a139a"
[[package]]
name = "miner"
version = "0.1.0"
dependencies = [
"anyhow",
"async-watch",
"clap 4.4.4",
"collections",
"fs",
"futures 0.3.28",
"git",
"gpui",
"heed",
"http 0.1.0",
"ignore",
"indicatif",
"serde",
"serde_json",
"tempfile",
"tokenizers",
"tree-sitter",
"tree-sitter-rust",
]
[[package]]
name = "minimal-lexical"
version = "0.2.1"
@@ -6489,6 +6640,27 @@ dependencies = [
"windows-sys 0.48.0",
]
[[package]]
name = "monostate"
version = "0.1.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0d208407d7552cd041d8cdb69a1bc3303e029c598738177a3d87082004dc0e1e"
dependencies = [
"monostate-impl",
"serde",
]
[[package]]
name = "monostate-impl"
version = "0.1.13"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a7ce64b975ed4f123575d11afd9491f2e37bbd5813fbfbc0f09ae1fbddea74e0"
dependencies = [
"proc-macro2",
"quote",
"syn 2.0.59",
]
[[package]]
name = "multi_buffer"
version = "0.1.0"
@@ -6917,6 +7089,12 @@ dependencies = [
"syn 1.0.109",
]
[[package]]
name = "number_prefix"
version = "0.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "830b246a0e5f20af87141b25c173cd1b609bd7779a4617d6ec582abaf90870f3"
[[package]]
name = "nvim-rs"
version = "0.6.0-pre"
@@ -7005,6 +7183,28 @@ version = "1.19.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3fdb12b2476b595f9358c5161aa467c2438859caa136dec86c26fdd2efe17b92"
[[package]]
name = "onig"
version = "6.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8c4b31c8722ad9171c6d77d3557db078cab2bd50afcc9d09c8b315c59df8ca4f"
dependencies = [
"bitflags 1.3.2",
"libc",
"once_cell",
"onig_sys",
]
[[package]]
name = "onig_sys"
version = "69.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7b829e3d7e9cc74c7e315ee8edb185bf4190da5acde74afd7fc59c35b1f086e7"
dependencies = [
"cc",
"pkg-config",
]
[[package]]
name = "oo7"
version = "0.3.0"
@@ -7721,6 +7921,12 @@ version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5da3b0203fd7ee5720aa0b5e790b591aa5d3f41c3ed2c34a3a393382198af2f7"
[[package]]
name = "portable-atomic"
version = "1.6.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7170ef9988bc169ba16dd36a7fa041e5c4cbeb6a35b76d4c03daded371eae7c0"
[[package]]
name = "postage"
version = "0.5.0"
@@ -8255,19 +8461,30 @@ dependencies = [
[[package]]
name = "rayon"
version = "1.8.0"
version = "1.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9c27db03db7734835b3f53954b534c91069375ce6ccaa2e065441e07d9b6cdb1"
checksum = "b418a60154510ca1a002a752ca9714984e21e4241e804d32555251faf8b78ffa"
dependencies = [
"either",
"rayon-core",
]
[[package]]
name = "rayon-core"
version = "1.12.0"
name = "rayon-cond"
version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5ce3fb6ad83f861aac485e76e1985cd109d9a3713802152be56c3b1f0e0658ed"
checksum = "059f538b55efd2309c9794130bc149c6a553db90e9d99c2030785c82f0bd7df9"
dependencies = [
"either",
"itertools 0.11.0",
"rayon",
]
[[package]]
name = "rayon-core"
version = "1.12.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1465873a3dfdaa8ae7cb14b4383657caab0b3e8a0aa9ae8e04b044854c8dfce2"
dependencies = [
"crossbeam-deque",
"crossbeam-utils",
@@ -8900,10 +9117,24 @@ checksum = "3f56a14d1f48b391359b22f731fd4bd7e43c97f3c50eee276f3aa09c94784d3e"
dependencies = [
"log",
"ring",
"rustls-webpki",
"rustls-webpki 0.101.7",
"sct",
]
[[package]]
name = "rustls"
version = "0.22.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bf4ef73721ac7bcd79b2b315da7779d8fc09718c6b3d2d1b2d94850eb8c18432"
dependencies = [
"log",
"ring",
"rustls-pki-types",
"rustls-webpki 0.102.4",
"subtle",
"zeroize",
]
[[package]]
name = "rustls-native-certs"
version = "0.6.3"
@@ -8925,6 +9156,12 @@ dependencies = [
"base64 0.21.7",
]
[[package]]
name = "rustls-pki-types"
version = "1.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "976295e77ce332211c0d24d92c0e83e50f5c5f046d11082cea19f3df13a3562d"
[[package]]
name = "rustls-webpki"
version = "0.101.7"
@@ -8935,6 +9172,17 @@ dependencies = [
"untrusted",
]
[[package]]
name = "rustls-webpki"
version = "0.102.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ff448f7e92e913c4b7d4c6d8e4540a1724b319b4152b8aef6d4cf8339712b33e"
dependencies = [
"ring",
"rustls-pki-types",
"untrusted",
]
[[package]]
name = "rustversion"
version = "1.0.14"
@@ -9663,9 +9911,9 @@ dependencies = [
[[package]]
name = "smallvec"
version = "1.11.1"
version = "1.13.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "942b4a808e05215192e39f4ab80813e599068285906cc91aa64f923db842bd5a"
checksum = "3c5e1a9a646d36c3599cd173a41282daf47c44583ad367b8e6837255952e5c67"
[[package]]
name = "smol"
@@ -9766,6 +10014,18 @@ dependencies = [
"der 0.7.8",
]
[[package]]
name = "spm_precompiled"
version = "0.1.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5851699c4033c63636f7ea4cf7b7c1f1bf06d0cc03cfb42e711de5a5c46cf326"
dependencies = [
"base64 0.13.1",
"nom",
"serde",
"unicode-segmentation",
]
[[package]]
name = "sptr"
version = "0.3.2"
@@ -9854,7 +10114,7 @@ dependencies = [
"paste",
"percent-encoding",
"rust_decimal",
"rustls",
"rustls 0.21.12",
"rustls-pemfile",
"serde",
"serde_json",
@@ -9868,7 +10128,7 @@ dependencies = [
"tracing",
"url",
"uuid",
"webpki-roots",
"webpki-roots 0.24.0",
]
[[package]]
@@ -10845,6 +11105,39 @@ version = "0.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1f3ccbac311fea05f86f61904b462b55fb3df8837a366dfc601a0161d0532f20"
[[package]]
name = "tokenizers"
version = "0.19.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e500fad1dd3af3d626327e6a3fe5050e664a6eaa4708b8ca92f1794aaf73e6fd"
dependencies = [
"aho-corasick",
"derive_builder",
"esaxx-rs",
"getrandom 0.2.10",
"hf-hub",
"indicatif",
"itertools 0.12.1",
"lazy_static",
"log",
"macro_rules_attribute",
"monostate",
"onig",
"paste",
"rand 0.8.5",
"rayon",
"rayon-cond",
"regex",
"regex-syntax 0.8.2",
"serde",
"serde_json",
"spm_precompiled",
"thiserror",
"unicode-normalization-alignments",
"unicode-segmentation",
"unicode_categories",
]
[[package]]
name = "tokio"
version = "1.37.0"
@@ -10902,7 +11195,7 @@ version = "0.24.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c28327cf380ac148141087fbfb9de9d7bd4e84ab5d2c28fbc911d753de8a7081"
dependencies = [
"rustls",
"rustls 0.21.12",
"tokio",
]
@@ -11537,6 +11830,15 @@ dependencies = [
"tinyvec",
]
[[package]]
name = "unicode-normalization-alignments"
version = "0.1.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "43f613e4fa046e69818dd287fdc4bc78175ff20331479dab6e1b0f98d57062de"
dependencies = [
"smallvec",
]
[[package]]
name = "unicode-properties"
version = "0.1.1"
@@ -11551,9 +11853,9 @@ checksum = "7d817255e1bed6dfd4ca47258685d14d2bdcfbc64fdc9e3819bd5848057b8ecc"
[[package]]
name = "unicode-segmentation"
version = "1.10.1"
version = "1.11.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1dd624098567895118886609431a7c3b8f516e41d30e0643f03d94592a147e36"
checksum = "d4c87d22b6e3f4a18d4d40ef354e97c90fcb14dd91d7dc0aa9d8a1172ebf7202"
[[package]]
name = "unicode-width"
@@ -11585,6 +11887,26 @@ version = "0.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "8ecb6da28b8a351d773b68d5825ac39017e680750f980f3a1a85cd8dd28a47c1"
[[package]]
name = "ureq"
version = "2.9.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d11a831e3c0b56e438a28308e7c810799e3c118417f342d30ecec080105395cd"
dependencies = [
"base64 0.22.0",
"flate2",
"log",
"native-tls",
"once_cell",
"rustls 0.22.4",
"rustls-pki-types",
"rustls-webpki 0.102.4",
"serde",
"serde_json",
"url",
"webpki-roots 0.26.3",
]
[[package]]
name = "url"
version = "2.5.0"
@@ -12404,7 +12726,16 @@ version = "0.24.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b291546d5d9d1eab74f069c77749f2cb8504a12caa20f0f2de93ddbf6f411888"
dependencies = [
"rustls-webpki",
"rustls-webpki 0.101.7",
]
[[package]]
name = "webpki-roots"
version = "0.26.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bd7c23921eeb1713a4e851530e9b9756e4fb0e89978582942612524cf09f01cd"
dependencies = [
"rustls-pki-types",
]
[[package]]
@@ -13676,9 +14007,9 @@ dependencies = [
[[package]]
name = "zeroize"
version = "1.6.0"
version = "1.8.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2a0956f1ba7c7909bfb66c2e9e4124ab6f6482560f6628b5aaeba39207c9aad9"
checksum = "ced3678a2879b30306d323f4542626697a464a97c0a07c9aebf7ebca65cd4dde"
dependencies = [
"zeroize_derive",
]

View File

@@ -58,6 +58,7 @@ members = [
"crates/markdown_preview",
"crates/media",
"crates/menu",
"crates/miner",
"crates/multi_buffer",
"crates/node_runtime",
"crates/notifications",
@@ -240,6 +241,7 @@ task = { path = "crates/task" }
tasks_ui = { path = "crates/tasks_ui" }
search = { path = "crates/search" }
semantic_index = { path = "crates/semantic_index" }
miner = { path = "crates/miner" }
semantic_version = { path = "crates/semantic_version" }
settings = { path = "crates/settings" }
snippet = { path = "crates/snippet" }

42
crates/miner/Cargo.toml Normal file
View File

@@ -0,0 +1,42 @@
[package]
name = "miner"
version = "0.1.0"
edition = "2021"
publish = false
license = "GPL-3.0-or-later"
[[bin]]
name = "miner"
path = "src/miner.rs"
[features]
test-support = [
"collections/test-support",
"fs/test-support",
"gpui/test-support",
]
[dependencies]
anyhow.workspace = true
async-watch.workspace = true
collections.workspace = true
clap.workspace = true
futures.workspace = true
fs.workspace = true
git.workspace = true
gpui.workspace = true
heed.workspace = true
http.workspace = true
ignore.workspace = true
indicatif = "0.17.8"
serde.workspace = true
serde_json.workspace = true
tree-sitter.workspace = true
tree-sitter-rust.workspace = true
tokenizers = { version = "0.19.1", features = ["http"] }
[dev-dependencies]
collections = { workspace = true, features = ["test-support"] }
fs = { workspace = true, features = ["test-support"] }
gpui = { workspace = true, features = ["test-support"] }
tempfile.workspace = true

View File

@@ -0,0 +1,80 @@
use anyhow::{anyhow, Result};
use futures::{
channel::{mpsc, oneshot},
SinkExt, StreamExt,
};
use gpui::BackgroundExecutor;
use heed::{
types::{SerdeJson, Str},
Database as HeedDatabase, EnvOpenOptions, RwTxn,
};
use serde::{Deserialize, Serialize};
use std::{path::Path, time::SystemTime};
#[derive(Debug, Serialize, Deserialize)]
pub struct CachedSummary {
pub summary: String,
pub mtime: SystemTime,
}
#[derive(Clone)]
pub struct Database {
tx: mpsc::Sender<Box<dyn FnOnce(&HeedDatabase<Str, SerdeJson<CachedSummary>>, RwTxn) + Send>>,
}
impl Database {
pub async fn new(db_path: &Path, root: &Path, executor: &BackgroundExecutor) -> Result<Self> {
std::fs::create_dir_all(&db_path)?;
let env = unsafe {
EnvOpenOptions::new()
.map_size(1024 * 1024 * 1024)
.max_dbs(3000)
.open(db_path)?
};
let mut wtxn = env.write_txn()?;
let db_name = format!("summaries_{}", root.to_string_lossy());
let db: HeedDatabase<Str, SerdeJson<CachedSummary>> =
env.create_database(&mut wtxn, Some(&db_name))?;
wtxn.commit()?;
let (tx, mut rx) = mpsc::channel::<
Box<dyn FnOnce(&HeedDatabase<Str, SerdeJson<CachedSummary>>, RwTxn) + Send>,
>(100);
executor
.spawn(async move {
while let Some(f) = rx.next().await {
let wtxn = env.write_txn().unwrap();
f(&db, wtxn);
}
})
.detach();
Ok(Self { tx })
}
pub async fn transact<F, T>(&self, f: F) -> Result<T>
where
F: FnOnce(&HeedDatabase<Str, SerdeJson<CachedSummary>>, &mut RwTxn) -> Result<T>
+ Send
+ 'static,
T: 'static + Send,
{
let (tx, rx) = oneshot::channel();
self.tx
.clone()
.send(Box::new(move |db, mut txn| {
let result = f(db, &mut txn);
if result.is_ok() {
if let Err(e) = txn.commit() {
let _ = tx.send(Err(anyhow::Error::from(e)));
return;
}
}
let _ = tx.send(result);
}))
.await
.map_err(|_| anyhow!("database closed"))?;
Ok(rx.await.map_err(|_| anyhow!("transaction failed"))??)
}
}

View File

@@ -0,0 +1,131 @@
use crate::{LanguageModel, Message};
use anyhow::{anyhow, Result};
use futures::{
channel::mpsc, future::BoxFuture, io::BufReader, AsyncBufReadExt, AsyncReadExt, FutureExt,
SinkExt, StreamExt,
};
use gpui::BackgroundExecutor;
use http::HttpClient;
use serde::Deserialize;
use std::sync::Arc;
pub struct HuggingFaceClient {
client: Arc<dyn HttpClient>,
endpoint: String,
api_key: String,
background_executor: BackgroundExecutor,
}
impl HuggingFaceClient {
pub fn new(endpoint: String, api_key: String, background_executor: BackgroundExecutor) -> Self {
Self {
client: http::client(None),
endpoint,
api_key,
background_executor,
}
}
}
impl LanguageModel for HuggingFaceClient {
fn stream_completion(
&self,
messages: Vec<Message>,
) -> BoxFuture<Result<mpsc::Receiver<String>>> {
async move {
let (mut tx, rx) = mpsc::channel(100);
let mut inputs = messages
.iter()
.map(|msg| format!("<|im_start|>{}\n{}<|im_end|>", msg.role, msg.content))
.collect::<Vec<String>>()
.join("\n");
inputs.push_str("<|im_end|>");
inputs.push_str("<|im_start|>assistant\n");
let request = serde_json::json!({
"inputs": inputs,
"stream": true,
"max_tokens": 2048
});
let request = http::Request::builder()
.method(http::Method::POST)
.uri(&self.endpoint)
.header("Authorization", format!("Bearer {}", self.api_key))
.header("Content-Type", "application/json")
.body(http::AsyncBody::from(serde_json::to_vec(&request)?))?;
let mut response = self.client.send(request).await?;
if !response.status().is_success() {
let mut body = Vec::new();
response.body_mut().read_to_end(&mut body).await?;
let body_str = std::str::from_utf8(&body)?;
return Err(anyhow!(
"Failed to connect to API: {} {}",
response.status(),
body_str
));
}
let reader = BufReader::new(response.into_body());
let stream = reader.lines().filter_map(|line| async move {
match line {
Ok(line) => {
let line = line.strip_prefix("data: ")?;
match serde_json::from_str::<StreamOutput>(line) {
Ok(output) => {
if !output.token.special {
Some(Ok(output.token.text))
} else {
None
}
}
Err(error) => Some(Err(anyhow!(error))),
}
}
Err(error) => Some(Err(anyhow!(error))),
}
});
self.background_executor
.spawn(async move {
futures::pin_mut!(stream);
while let Some(result) = stream.next().await {
match result {
Ok(text) => {
if tx.send(text).await.is_err() {
break;
}
}
Err(e) => {
eprintln!("Error in stream: {:?}", e);
break;
}
}
}
})
.detach();
Ok(rx)
}
.boxed()
}
}
#[derive(Debug, Deserialize)]
struct StreamOutput {
index: u32,
token: Token,
generated_text: Option<String>,
details: Option<serde_json::Value>,
}
#[derive(Debug, Deserialize)]
struct Token {
id: u32,
text: String,
logprob: f64,
special: bool,
}

View File

@@ -0,0 +1,16 @@
use anyhow::Result;
use futures::{channel::mpsc, future::BoxFuture};
use serde::Serialize;
#[derive(Debug, Serialize)]
pub struct Message {
pub role: String,
pub content: String,
}
pub trait LanguageModel: Send + Sync {
fn stream_completion(
&self,
messages: Vec<Message>,
) -> BoxFuture<Result<mpsc::Receiver<String>>>;
}

1476
crates/miner/src/miner.rs Normal file

File diff suppressed because it is too large Load Diff

105
crates/miner/src/ollama.rs Normal file
View File

@@ -0,0 +1,105 @@
#![allow(unused)]
use crate::{BackgroundExecutor, LanguageModel, Message};
use anyhow::{anyhow, Result};
use futures::{
channel::mpsc, future::BoxFuture, io::BufReader, AsyncBufReadExt, AsyncReadExt, FutureExt,
SinkExt, StreamExt,
};
use http::{AsyncBody, HttpClient, Method, Request as HttpRequest};
use serde::Deserialize;
use std::sync::Arc;
pub struct OllamaClient {
client: Arc<dyn HttpClient>,
base_url: String,
model: String,
executor: BackgroundExecutor,
}
impl OllamaClient {
pub fn new(base_url: String, model: String, executor: BackgroundExecutor) -> Self {
Self {
client: http::client(None),
base_url,
model,
executor,
}
}
}
impl LanguageModel for OllamaClient {
fn stream_completion(
&self,
messages: Vec<Message>,
) -> BoxFuture<Result<mpsc::Receiver<String>>> {
async move {
let (mut tx, rx) = mpsc::channel(100);
let request = serde_json::json!({
"model": &self.model,
"messages": messages,
"stream": true,
});
let uri = format!("{}/api/chat", self.base_url);
let request = HttpRequest::builder()
.method(Method::POST)
.uri(uri)
.header("Content-Type", "application/json")
.body(AsyncBody::from(serde_json::to_vec(&request)?))?;
let mut response = self.client.send(request).await?;
if !response.status().is_success() {
let mut body = Vec::new();
response.body_mut().read_to_end(&mut body).await?;
let body_str = std::str::from_utf8(&body)?;
return Err(anyhow!(
"Failed to connect to API: {} {}",
response.status(),
body_str
));
}
let reader = BufReader::new(response.into_body());
let stream = reader.lines().filter_map(|line| async move {
match line {
Ok(line) => match serde_json::from_str::<serde_json::Value>(&line) {
Ok(response) => {
if let Some(content) = response["message"]["content"].as_str() {
Some(Ok(content.to_string()))
} else {
None
}
}
Err(error) => Some(Err(anyhow!(error))),
},
Err(error) => Some(Err(anyhow!(error))),
}
});
self.executor
.spawn(async move {
futures::pin_mut!(stream);
while let Some(result) = stream.next().await {
match result {
Ok(text) => {
if tx.send(text).await.is_err() {
break;
}
}
Err(e) => {
eprintln!("Error in stream: {:?}", e);
break;
}
}
}
})
.detach();
Ok(rx)
}
.boxed()
}
}

View File

@@ -0,0 +1,6 @@
(mod_item name: (identifier) @export)
(struct_item name: (type_identifier) @export)
(impl_item type: (type_identifier) @export)
(enum_item name: (type_identifier) @export)
(function_item name: (identifier) @export)
(trait_item name: (type_identifier) @export)

View File

@@ -0,0 +1 @@
(use_declaration) @import