Commit Graph

19151 Commits

Author SHA1 Message Date
Antonio Scandurra
4ff38fc823 Index files as they change on disk 2024-04-10 15:42:10 -07:00
Kyle Kelley
3007fb51c3 show score 2024-04-10 15:42:10 -07:00
Kyle Kelley
cd3972bc52 touchup 2024-04-10 15:42:10 -07:00
Kyle Kelley
5ceb6ff351 switch to OpenAI provider for example 2024-04-10 15:42:10 -07:00
Kyle Kelley
06da24697d show snippets from search results 2024-04-10 15:42:10 -07:00
Antonio Scandurra
b503dd63e6 Fix compile errors and make search work 2024-04-10 15:42:10 -07:00
Kyle Kelley
9589630dfe wip 2024-04-10 15:42:10 -07:00
Kyle Kelley
1dbbde02ae show just three digits and truncate otherwise for display 2024-04-10 15:42:10 -07:00
Kyle Kelley
fc1ff4b061 wip 2024-04-10 15:42:10 -07:00
Kyle Kelley
f0b7ea9a50 adapt tests to new batching setup 2024-04-10 15:42:10 -07:00
Kyle Kelley
912d5469d4 move args check further up 2024-04-10 15:42:10 -07:00
Kyle Kelley
225f21dd95 implement batching for OpenAI 2024-04-10 15:42:10 -07:00
Kyle Kelley
26be3c22a1 readjust tests 2024-04-10 15:42:10 -07:00
Kyle Kelley
c6c53d8fd3 make EmbeddingProvider trait be Send + Sync 2024-04-10 15:42:10 -07:00
Kyle Kelley
f035697232 WIP 2024-04-10 15:42:10 -07:00
Kyle Kelley
3a6ffc7de4 WIP 2024-04-10 15:42:10 -07:00
Kyle Kelley
8cce847ea7 WIP 2024-04-10 15:42:10 -07:00
Kyle Kelley
32b3c1e378 WIP 2024-04-10 15:42:10 -07:00
Antonio Scandurra
8389c8e254 WIP 2024-04-10 15:42:10 -07:00
Antonio Scandurra
757532a09e Fix key ordering in database 2024-04-10 15:42:10 -07:00
Antonio Scandurra
228a4286ad 🎨 2024-04-10 15:42:10 -07:00
Antonio Scandurra
6b27c860a8 Maintain embeddings in the database 2024-04-10 15:42:10 -07:00
Kyle Kelley
e76b0dc38c add support for OpenAI embedding 2024-04-10 15:42:10 -07:00
Kyle Kelley
355ce405cb create an OpenAI embedding client 2024-04-10 15:42:10 -07:00
Kyle Kelley
ec3dd27bc6 reorganize embedding provider segment 2024-04-10 15:42:10 -07:00
Kyle Kelley
9705e26cff create a not-exactly-a-benchmark 2024-04-10 15:42:10 -07:00
Kyle Kelley
7ddf0467a5 pass embedding model on through, rely on data types for more safety 2024-04-10 15:42:10 -07:00
Kyle Kelley
19aadacdef create more specificity around the embedding's backing model 2024-04-10 15:42:10 -07:00
Kyle Kelley
f80ac2c190 set up way to run the indexing example with actual embeddings 2024-04-10 15:42:10 -07:00
Kyle Kelley
876d017294 embed each chunk 2024-04-10 15:42:10 -07:00
Kyle Kelley
84e3063d4b test vector normalization 2024-04-10 15:42:10 -07:00
Kyle Kelley
4999cf136f quick integration test on embedding 2024-04-10 15:42:10 -07:00
Kyle Kelley
8099fb9845 set up embedding 2024-04-10 15:42:10 -07:00
Kyle Kelley
636bdf1196 create an embedding provider 2024-04-10 15:42:10 -07:00
Kyle Kelley
93501bcb0c create an enum for embedding sizes 2024-04-10 15:42:10 -07:00
Antonio Scandurra
f72e74e310 Add next steps 2024-04-10 15:42:10 -07:00
Antonio Scandurra
cc753b88e1 WIP 2024-04-10 15:42:10 -07:00
Antonio Scandurra
55a8d3b696 WIP: Start on making indexing a long-lived task 2024-04-10 15:42:10 -07:00
Antonio Scandurra
02a5da3e0e Rework indexing to be on a per-worktree basis 2024-04-10 15:42:10 -07:00
Antonio Scandurra
2f8dc894e1 Checkpoint 2024-04-10 15:42:10 -07:00
Antonio Scandurra
7e5a585ca7 Fall back to using line-based splitting when a grammar can't be found 2024-04-10 15:42:10 -07:00
Antonio Scandurra
078f9ed689 🎨 2024-04-10 15:42:10 -07:00
Antonio Scandurra
514902cbac Rework embedding to simplify determining when a project scan completes 2024-04-10 15:42:10 -07:00
Nathan Sobo
6a61f9577f Start on an embed_chunks task that processes batches of chunked files 2024-04-10 15:42:10 -07:00
Nathan Sobo
57d4878d4a Compute a digest for each chunk 2024-04-10 15:42:10 -07:00
Nathan Sobo
9e1706feb0 Introduce ChunkedFile struct to prepare to fetch and store embeddings 2024-04-10 15:42:10 -07:00
Antonio Scandurra
a7345fa596 WIP: flush less eagerly 2024-04-10 15:42:10 -07:00
Kyle Kelley
ba4c2a56e0 accept a path by arg 2024-04-10 15:42:10 -07:00
Antonio Scandurra
9bfcc631b9 WIP 2024-04-10 15:42:10 -07:00
Antonio Scandurra
8ee48a7133 WIP 2024-04-10 15:42:10 -07:00