a-real-ai/pywinassistant

Is this library compatible with open source llm's such as llama3 - mixtral-large ?

Opened this issue · 8 comments

To add compatibility with open source llms

Usage of local LLM's is in the works.

I have tried other LLM's but the assistant results seem to worsen using the same prompting techniques which are intended for GPT-3 and GPT-4. Back a year ago I tried it with Llama 1 but failed too much requiring a total change of the algorithm prompting techniques.

Now I'm trying local LLama3 and Mixtral-Large, benchmarking different prompting techniques intended for those models and developing new agents that generate new prompts to benchmark different models using the same framework. When it's ready I will choose the best secure ones and update the project to be used locally too, but expect it for version 1.0.0 (we're at 0.4.0).

For security reasons I'm not willing to post those agents, but be aware that this is possible.

Could you not just finetune a LLama3 to your style of prompting (If its just a prompting technique)? Like for example Unsloth did in this colab https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?

Also I would like to point out that you could use the Azure GPT versions if you want to consider Security, GDPR and stuff like that. As a European this is a big topic for us because of the new AI Act and our regulations. They should employ the same Models as OpenAI so there shouldn't be much of a difference.

also there is a lot of research going on for prompt engineering techniques maybe some of those can help your project:


Paper name		Date		Institute author	Paper
Self consistency	March 22	Google			https://arxiv.org/abs/2203.11171.pdf
Generated knowledge	Sep 22		Washington university	https://arxiv.org/pdf/2110.08387.pdf
Chain of thought	Jan 23		Google			https://arxiv.org/pdf/2201.11903.pdf
Least to most		Apr 23		Google			https://arxiv.org/pdf/2205.10625.pdf
Chain of verification	Sep 23		Meta			https://arxiv.org/pdf/2309.11495.pdf
Skeleton of thought	Oct 23		Microsoft		https://arxiv.org/pdf/2307.15337.pdf
Step back prompting	Oct 23		Google			https://arxiv.org/pdf/2310.06117.pdf
Rephrase and Respond	Nov 23		UCLA University		https://arxiv.org/pdf/2311.04205.pdf
Emotion Stimuli		Nov 23		Microsoft		https://arxiv.org/pdf/2307.11760.pdf
System 2 attention	Nov 23		Meta			https://arxiv.org/pdf/2311.11829.pdf
OPRO			Dec 23		Google			https://arxiv.org/pdf/2309.03409.pdf

I have pirated this list from J. Yarkoni - https://docs.google.com/presentation/d/1fboeXSrRhMBDuNKhs8ctKntTnIE5c4BqUAUt48TAvGE/edit#slide=id.g2c8b4d20382_2_0

best regards
Vincent

Usage of local LLM's is in the works.

I have tried other LLM's but the assistant results seem to worsen using the same prompting techniques which are intended for GPT-3 and GPT-4. Back a year ago I tried it with Llama 1 but failed too much requiring a total change of the algorithm prompting techniques.

Now I'm trying local LLama3 and Mixtral-Large, benchmarking different prompting techniques intended for those models and developing new agents that generate new prompts to benchmark different models using the same framework. When it's ready I will choose the best secure ones and update the project to be used locally too, but expect it for version 1.0.0 (we're at 0.4.0).

For security reasons I'm not willing to post those agents, but be aware that this is possible.

@henyckma Would it be possible for you to give us a small guide on how to port to ollama or similar to run it on llama3? I have been trying and keep getting weird errors so a guide from you would be great. No worries if not though, just wanting to try this out without the cost haha.

Usage of local LLM's is in the works.
I have tried other LLM's but the assistant results seem to worsen using the same prompting techniques which are intended for GPT-3 and GPT-4. Back a year ago I tried it with Llama 1 but failed too much requiring a total change of the algorithm prompting techniques.
Now I'm trying local LLama3 and Mixtral-Large, benchmarking different prompting techniques intended for those models and developing new agents that generate new prompts to benchmark different models using the same framework. When it's ready I will choose the best secure ones and update the project to be used locally too, but expect it for version 1.0.0 (we're at 0.4.0).
For security reasons I'm not willing to post those agents, but be aware that this is possible.

@henyckma Would it be possible for you to give us a small guide on how to port to ollama or similar to run it on llama3? I have been trying and keep getting weird errors so a guide from you would be great. No worries if not though, just wanting to try this out without the cost haha.

One vital point here as mentioned is that changing the model to something less than for instance mixtral will lead to issues with the prompts used, however as this is not my project, nor do i hope im stepping on any toes; but could you just not modify the core_api.py file to something like this?

import requests
import json

# Assuming you've set up your local LLM to accept POST requests at this endpoint
LOCAL_LLM_ENDPOINT = "http://localhost:11434/api/chat"

def api_call(messages, model_name="YOUR_OLLAMA_MODEL_NAME", temperature=0.5, max_tokens=150):
    # Prepare the messages payload for your local LLM
    payload = {
        "model": model_name,
        "messages": messages,
        "stream": True  # Assuming your local model also supports streaming responses
    }
    
    try:
        # Send POST request to your local LLM endpoint
        response = requests.post(LOCAL_LLM_ENDPOINT, json=payload)
        response.raise_for_status()

        # Stream the response and concatenate the content until the completion message
        output = ""
        for line in response.iter_lines():
            if line:
                body = json.loads(line)
                if "error" in body:
                    raise Exception(body["error"])
                if not body.get("done", False):
                    content = body.get("message", {}).get("content", "")
                    output += content
                else:
                    break

        decision = output.strip() if output else None
        return decision

    except Exception as e:
        raise Exception(f"An error occurred while calling local LLM: {e}")

# Example usage
# Replace this payload with the actual messages sequence for your use case
messages_payload = [
    {"role": "system", "content": "You are a helpful and knowledgeable assistant."},
    {"role": "user", "content": "Please help me troubleshoot my JavaScript code."}
]

result = api_call(messages_payload, temperature=0.7, max_tokens=100)
print(f"AI Analysis Result: '{result}'")

Could you not just finetune a LLama3 to your style of prompting (If its just a prompting technique)? Like for example Unsloth did in this colab https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?

Also I would like to point out that you could use the Azure GPT versions if you want to consider Security, GDPR and stuff like that. As a European this is a big topic for us because of the new AI Act and our regulations. They should employ the same Models as OpenAI so there shouldn't be much of a difference.

also there is a lot of research going on for prompt engineering techniques maybe some of those can help your project:




Paper name		Date		Institute author	Paper

Self consistency	March 22	Google			https://arxiv.org/abs/2203.11171.pdf

Generated knowledge	Sep 22		Washington university	https://arxiv.org/pdf/2110.08387.pdf

Chain of thought	Jan 23		Google			https://arxiv.org/pdf/2201.11903.pdf

Least to most		Apr 23		Google			https://arxiv.org/pdf/2205.10625.pdf

Chain of verification	Sep 23		Meta			https://arxiv.org/pdf/2309.11495.pdf

Skeleton of thought	Oct 23		Microsoft		https://arxiv.org/pdf/2307.15337.pdf

Step back prompting	Oct 23		Google			https://arxiv.org/pdf/2310.06117.pdf

Rephrase and Respond	Nov 23		UCLA University		https://arxiv.org/pdf/2311.04205.pdf

Emotion Stimuli		Nov 23		Microsoft		https://arxiv.org/pdf/2307.11760.pdf

System 2 attention	Nov 23		Meta			https://arxiv.org/pdf/2311.11829.pdf

OPRO			Dec 23		Google			https://arxiv.org/pdf/2309.03409.pdf



I have pirated this list from J. Yarkoni - https://docs.google.com/presentation/d/1fboeXSrRhMBDuNKhs8ctKntTnIE5c4BqUAUt48TAvGE/edit#slide=id.g2c8b4d20382_2_0

best regards

Vincent

Hi @Razorbob Vicent,

I'm aware of several AI regulations in the UE and outside. For now the project complies with the federal AI Standards Coordination Working Group, Asilomar AI Principles and IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems.

Before those regulations even existed, I was truly investigating and aware of the implications for what should I release to the wilderness and what not, so I deleted some parts of the code and hardcoded them in natural language instead of using pure agent inference.

Thank you for the information as it is too relevant for this project, I'm studying them all. You have great proposals.

Regarding to the agents that learns and refine prompts for Windows, releasing them to the wild (within this project) wrongly development can lead to security concerns and will not comply with federal regulations as bad actors can make bad usage of it, something that I truly don't wish, this is project is purely intended to help people.

Antrophic just released an agent similar:
https://x.com/anthropicai/status/1788958483565732213

My private agents are specifically designed for alignment and refinement on Windows to generate better prompting techniques, and instead of uploading those agents, I will choose the best prompts and update the project to use local LLMs with those best prompts. It is going to take some time, but will be better this way for security reasons.

For now, the Single Action Model of this project using Chat-GPT API works too well, improving users life's with disabilities to use Windows OS at a minimum cost.

I'm working with local LLMs aiming to make it available to everyone for fast and free assistance. Using local LLMs highly decreases accuracy compared to the actual Chat-GPT API implementation. This makes me believe that probably the Chat-GPT LLM'S are trained on actual OS screens too.

I'm also working on fine-tunning Llama3 for OS screen knowledge.

Could you not just finetune a LLama3 to your style of prompting (If its just a prompting technique)? Like for example Unsloth did in this colab https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?
Also I would like to point out that you could use the Azure GPT versions if you want to consider Security, GDPR and stuff like that. As a European this is a big topic for us because of the new AI Act and our regulations. They should employ the same Models as OpenAI so there shouldn't be much of a difference.
also there is a lot of research going on for prompt engineering techniques maybe some of those can help your project:




Paper name		Date		Institute author	Paper

Self consistency	March 22	Google			https://arxiv.org/abs/2203.11171.pdf

Generated knowledge	Sep 22		Washington university	https://arxiv.org/pdf/2110.08387.pdf

Chain of thought	Jan 23		Google			https://arxiv.org/pdf/2201.11903.pdf

Least to most		Apr 23		Google			https://arxiv.org/pdf/2205.10625.pdf

Chain of verification	Sep 23		Meta			https://arxiv.org/pdf/2309.11495.pdf

Skeleton of thought	Oct 23		Microsoft		https://arxiv.org/pdf/2307.15337.pdf

Step back prompting	Oct 23		Google			https://arxiv.org/pdf/2310.06117.pdf

Rephrase and Respond	Nov 23		UCLA University		https://arxiv.org/pdf/2311.04205.pdf

Emotion Stimuli		Nov 23		Microsoft		https://arxiv.org/pdf/2307.11760.pdf

System 2 attention	Nov 23		Meta			https://arxiv.org/pdf/2311.11829.pdf

OPRO			Dec 23		Google			https://arxiv.org/pdf/2309.03409.pdf

I have pirated this list from J. Yarkoni - https://docs.google.com/presentation/d/1fboeXSrRhMBDuNKhs8ctKntTnIE5c4BqUAUt48TAvGE/edit#slide=id.g2c8b4d20382_2_0
best regards
Vincent

Hi @Razorbob Vicent,

I'm aware of several AI regulations in the UE and outside. For now the project complies with the federal AI Standards Coordination Working Group, Asilomar AI Principles and IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems.

Before those regulations even existed, I was truly investigating and aware of the implications for what should I release to the wilderness and what not, so I deleted some parts of the code and hardcoded them in natural language instead of using pure agent inference.

Thank you for the information as it is too relevant for this project, I'm studying them all. You have great proposals.

Regarding to the agents that learns and refine prompts for Windows, releasing them to the wild (within this project) wrongly development can lead to security concerns and will not comply with federal regulations as bad actors can make bad usage of it, something that I truly don't wish, this is project is purely intended to help people.

Antrophic just released an agent similar: https://x.com/anthropicai/status/1788958483565732213

My private agents are specifically designed for alignment and refinement on Windows to generate better prompting techniques, and instead of uploading those agents, I will choose the best prompts and update the project to use local LLMs with those best prompts. It is going to take some time, but will be better this way for security reasons.

For now, the Single Action Model of this project using Chat-GPT API works too well, improving users life's with disabilities to use Windows OS at a minimum cost.

I'm working with local LLMs aiming to make it available to everyone for fast and free assistance. Using local LLMs highly decreases accuracy compared to the actual Chat-GPT API implementation. This makes me believe that probably the Chat-GPT LLM'S are trained on actual OS screens too.

I'm also working on fine-tunning Llama3 for OS screen knowledge.

Yeah u are right about releasing the security concerns. I would also go so far as to say, that a poorly developed model could screw up your whole system and Data. For example it deletes files in System32 or sets random registry keys.

Thanks for your feedback - glad I can help. I love the work you did and I think you have a good project going. I try to contribute further, but sadly I have a company to run and can only do so much. I will drop a PR to choose between openAi and Azure as soon as they release my sponsorship credits for the API. Hopefully this is fine for you :). Do you have some kind of roadmap what u wanna do in the next months? Maybe some more people would be willing to contribute from here on.

Open Router uses Litellm to serve the models, if that would be implemented, it would give us the ability, to use all kinds of models Local/Cloud Providers (Openai, Anthropic, Together, Open Router etc.)

..and can i just add that after looking thru the code more thoroughly; you will need to include some sort of model for the image analysis, bakllava would perhaps work in cooperation with the mixtral model for the proper responses.

alot of considerations and prompts if this were to be done, however if there are ethical concerns this ends here :)