You can install ReplitCode Local from the Visual Studio Code Marketplace.
ReplitCode Local is a Visual Studio Code extension that facilitates working with the replit-code-v1-3b language model. This extension provides auto-completions for code generation based on your current code input, allowing you to quickly generate code snippets and streamline your coding experience.
-
Auto-completions: Receive code suggestions and completions as you type, generated by the replit-code-v1-3b language model.
-
Customizable Settings: Adjust the extension's behavior to your preferences with configurable settings:
- Typing Delay: Set the delay in milliseconds before making a request to Flask.
- Flask URL: Specify the URL for making requests to Flask.
- Code Lines: Determine the number of lines to prompt for code generation.
- Open Visual Studio Code.
- Go to the Extensions view by clicking on the square icon in the left sidebar or using the shortcut
Ctrl+Shift+X
. - Search for "ReplitCode Local" and click the "Install" button.
You can customize the extension's behavior by modifying the following settings in your Visual Studio Code settings:
- Typing Delay: Set the delay in milliseconds before making a request to Flask.
- Flask URL: Specify the URL for making requests to Flask.
- Code Lines: Determine the number of lines to prompt for code generation.
To access these settings, open the command palette (Ctrl+Shift+P
), type "Preferences: Open Settings (JSON)," and add the desired values to your settings.json
file.
{
"replitcode-local.typingDelay": 1000,
"replitcode-local.flaskUrl": "http://localhost:5000/predict",
"replitcode-local.codeLines": 3
}
- Start typing code in your code editor.
- As you type, the extension will provide code completions based on your input. Completions are generated using the replit-code-v1-3b language model.
- Press
Tab
button if you like suggestion to insert it into your code. - Enjoy improved coding productivity and faster code generation with ReplitCode Local!
from flask import Flask, request, jsonify
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
MAX_TOKENS_LENGTH = 512
app = Flask(__name__)
set_seed(0)
# Load the LLM model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"replit/replit-code-v1-3b",
trust_remote_code=True,
low_cpu_mem_usage=True,
truncation_side="left",
)
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path="replit/replit-code-v1-3b",
trust_remote_code=True,
low_cpu_mem_usage=True,
).to(device="cuda:7", dtype=torch.bfloat16)
model.eval()
# Define the API endpoint
@app.route("/predict", methods=["POST"])
def predict():
# Get the input data from the request
input_data = request.json["input_data"]
promt = f"generate in python:\n{input_data}"
x = tokenizer.encode(
promt, return_tensors="pt", truncation=True, max_length=MAX_TOKENS_LENGTH
)
x = x.to(device="cuda:7")
y = model.generate(
x,
max_new_tokens=50,
temperature=0.2,
top_p=0.9,
top_k=4,
use_cache=True,
repetition_penalty=1.0,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
generated_code = tokenizer.decode(
y[0], skip_special_tokens=True, clean_up_tokenization_spaces=False
)
generated_code = generated_code.replace(promt, "")
generated_code = generated_code.rsplit("\n\n", -1)[0]
# Return the prediction as a JSON response
return jsonify({"generated_code": generated_code})
if __name__ == "__main__":
app.run(debug=True, use_reloader=False)
If you encounter any issues, have feature suggestions, or would like to contribute to the development of this extension, please visit the GitHub repository and create an issue or pull request.
If you have any questions or feedback, don't hesitate to reach out to the project maintainer:
Yaroslav Poltoran
- Telegram: https://t.me/yaroslavpoltoran
Feel free to connect and get in touch with me. Your feedback is highly appreciated!
ReplitCode Local is an open-source project created by Yaroslav Poltoran. I appreciate your support and feedback!