Task Evaluator

This tool is built to evaluate the code generated by generative AI tools for coding such as GitHub Copilot. In total, there are 4 major components to achieve this:

  • Code Generator
  • Code Evaluator
  • Database & Data Visualization
  • Interface (App, Web Api or Cli)

Setup

The tool is using multiple external services that need to be set up to gain full functionality. There are external tools for code generation, code evaluation and data visualization.

In any case, you'll first have to

  • Install Docker
  • Start Docker-Engine
  • Copy .env.example to .env and fill the commented out variables (see the individual sections below for more information)
  • Start the Docker Containers with
    docker-compose up -d

Manual Setup

Detailed explanations for the setup of the individual Docker containers.

GitHub Copilot

  • Add the following .NET User Secrets for TaskEvaluator
{
    "GitHubCopilot": {
        "CompletionsUrl": "https://copilot-proxy.githubusercontent.com/v1/engines/copilot-codex/completions",
        "TokenUrl": "https://api.github.com/copilot_internal/v2/token",
        "UserAgent": "GithubCopilot",
        "UserAgentVersion": "1.138.0",
        "EditorVersion": "vscode/1.84.1",
        "EditorPluginVersion": "copilot/1.138.0",
        "BearerToken": "YOUR_BEARER_TOKEN",
        "Openai-Organization": "github-copilot",
        "Openai-Intent": "copilot-ghost"
    }
}
  • Replace the Bearer Token with your GitHub Copilot Bearer Token
    • Run this python script
    • Connect with GitHub Account
    • Copy the Bearer Token from the console output

Tabby

  • Pull Tabby Image
    docker pull tabbyml/tabby
  • Add the following .NET User Secrets for TaskEvaluator
{
    "Tabby": {
        "CompletionsUrl": "http://localhost:8080/v1/completions"
    }
}
  • Start Tabby Container (when running on GPU with CUDA support)
    docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B --device cuda
  • Start Tabby Container (when running on CPU)
    docker run --entrypoint /opt/tabby/bin/tabby-cpu -it -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/StarCoder-1B

SonarQube

  • Pull SonarQube Image
    docker pull sonarqube
  • Start SonarQube Container
    docker network create taskevaluator_sonarqube_net
    docker run -d --name sonarqube -p 9000:9000 --net taskevaluator_sonarqube_net sonarqube
    • Open localhost:9000
      • Optionally set custom Environment Variable SONARQUBE_URL
    • Add the following .NET User Secrets (use custom credentials if applicable)
{
    "SonarQube": {
        "Url": "http://sonarqube:9000",
        "User": "admin",
        "Password": "admin"
    }
}

Data Sink

PostgreSQL Sink

  • Pull Postgres Image
    docker pull postgres
  • Start Postgres Container
    docker network create taskevaluator_postgres_net
    docker run -d --name postgres -u postgres -e POSTGRES_PASSWORD=YOUR_PASSWORD -p 5432:5432 --net taskevaluator_postgres_net postgres
    • Add the following .NET User Secrets
{
    "Database": {
        "ConnectionString": "User ID=postgres;Host=localhost;Port=5432;Password=YOUR_PASSWORD;"
    }
}

Visualization

Grafana

  • Pull Grafana Image
    docker pull grafana/grafana
  • Start Grafana Container
    docker run -d --name grafana -u grafana -p 3000:3000 --net taskevaluator_postgres_net grafana/grafana
  • Login (default credentials: admin/admin)
  • Add a new Data Source
    • Type: PostgreSQL
    • Host: taskevaluator-db-1:5432
    • Database: postgres
    • User: postgres
    • Password: YOUR_PASSWORD
    • SSL Mode: disable

Language Support

Currently, the tool only supports C# as a programming language.

Running a Task Set

  • Create a directory which contains your task set
  • Add the path to the directory to your .NET User Secrets
{
    "TaskSet": {
        "DirectoryPath": "YOUR_TASK_SET_DIRECTORY_PATH"
    }
}
  • The directory should have the following structure:
    • [Language]
      • [TestName]
        • File including the name "Program" for the source code
        • File including the name "UnitTest" for the unit tests
        • metadata.json with additional information
          • Example:
        {
            "id": "112c5a6e-0e7c-4e49-b699-8e2be2e24e4a",
            "isHumanEval": true
        }
  • Here an example
    • CSharp
      • Test1
        • Program.cs
        • UnitTests.cs
        • metadata.json
      • Test2
        • Program.cs
        • UnitTest
        • metadata.json
      • Test3
        • MyProgram
        • UnitTests // No metadata.json - default values will be used
      • ...
    • ...