This code converts high level code completion or chat calls into low level LLM prompts and converts results back.
It's written in Rust, compiles into the refact-lsp
binary. This binary is bunlded with
VS Code
JetBrains IDEs
VS Classic,
Sublime Text,
and
Qt Creator
plugins.
It's a great way to organize code for the plugins, because it can absorb all the common logic, such as cache, debounce, telemetry, scratchpads for different models.
Depending on which API key you have handy, or maybe you have Refact self-hosting server:
cargo build && target/debug/refact-lsp --address-url Refact --api-key YYYY --http-port 8001 --lsp-port 8002 --logs-stderr
cargo build && target/debug/refact-lsp --address-url HF --api-key hf_XXXX --http-port 8001 --lsp-port 8002 --logs-stderr
cargo build && target/debug/refact-lsp --address-url http://127.0.0.1:8008/ --http-port 8001 --lsp-port 8002 --logs-stderr
Try --help
for more options.
HTTP example:
curl http://127.0.0.1:8001/v1/code-completion -k \
-H 'Content-Type: application/json' \
-d '{
"inputs": {
"sources": {"hello.py": "def hello_world():"},
"cursor": {
"file": "hello.py",
"line": 0,
"character": 18
},
"multiline": true
},
"stream": false,
"parameters": {
"temperature": 0.1,
"max_new_tokens": 20
}
}'
Output is [{"code_completion": "\n return \"Hello World!\"\n"}]
.
The flags --basic-telemetry
and --snippet-telemetry
control what telemetry is sent. To be clear: without
these flags, no telemetry is sent. Those flags are typically controlled from IDE plugin settings.
Basic telemetry means counters and error messages without information about you or your code. It is "compressed"
into .cache/refact/telemetry/compressed
folder, then from time to time it's sent and moved
to .cache/refact/telemetry/sent
folder.
"Compressed" means similar records are joined together, increasing the counter. "Sent" means the rust binary
communicates with a HTTP endpoint specified in caps (see Caps section below) and sends .json file exactly how
you see it in .cache/refact/telemetry
. The files are human-readable.
When using Refact self-hosted server, telemetry goes to the self-hosted server, not to the cloud.
The --address-url
parameter controls the behavior of this program by a lot. The address is first used
to construct $URL/coding_assistant_caps.json
address to fetch the caps file. Furthermore, there are
compiled-in caps you can use by magic addresses "Refact" and "HF".
The caps file describes which models are running, default models for completion and chat, where to send the telemetry, how to download a tokenizer, where is the endpoint to access actual language models. To read more, check out compiled-in caps in caps.rs.
The one to run often is test_edge_cases.py.
You can also run measure_humaneval_fim.py for your favorite model.
The initial version of this project was written by looking at llm-ls by @McPatate. He's a Rust fan who inspired this project!