/common-contributing-guide

Your go-to resource for development environment setup, coding standards, and contribution workflows.

MIT LicenseMIT

Alphanome.AI Common Contribution Guide

Welcome to the unified contribution guide for select Alphanome-AI projects. This document serves as a comprehensive resource for developers, detailing everything you need to contribute effectively. It covers environment setup, coding standards, and contribution workflows.

This guide is specifically referenced in the CONTRIBUTING.md files for the projects sec-parser and sec-ai.

Table of Contents

Development Environment: Quick Start

Welcome! If you're eager to start contributing, you're in the right place. This quick guide will get you set up and ready to contribute as quickly as possible.

Note This Quick Start Guide is intended for those who are already experienced with development tools and environments. If you're new or need detailed instructions, please read the Setting Up Your Development Environment section.

  1. Setting up Poetry: We prefer using Poetry instead of pip for managing dependencies. To install Poetry, you can follow the instructions available here, or simply run the command below:
curl -sSL https://install.python-poetry.org | python3 -
  1. Installing Project Dependencies: After installing Poetry, you can get the project dependencies. Navigate to the project directory and run:
poetry install
  1. Getting Task Ready: Task is an extremely straightforward tool and we use it solely on running predefined commands. Installation instructions can be found here. Let's give a quick example. Say, we have Taskfile.yml:
tasks:
  one:
    cmds:
      - echo Hello
      - task: two
      - echo !!!
  two:
    cmds:
      - echo World

Running task one will print "Hello World !!!", and running task two will print "World".

  1. Exploring Common Operations: We've set up tasks for common operations. To see a list of the most commonly used ones, run:
task --list

Congratulations! You're now prepared to contribute. We're thrilled to have you and can't wait to see your contributions.

For a more in-depth understanding of our contribution workflow, coding standards, and development environment, please continue reading the sections that follow.

Setting Up Your Development Environment

Supported Operating Systems

Our development environment is optimized for Linux and macOS. If you're using these operating systems, the setup should be straightforward. For Windows users, we recommend setting up the Windows Subsystem for Linux (WSL) and using the "Opening a WSL 2 folder in a container" method as outlined in the VS Code documentation. Please be aware that we officially support only Linux and macOS. Windows users may need to troubleshoot independently.

Installing Poetry

For managing dependencies in our projects, we prefer using Poetry over pip. Poetry is a robust tool for package management in Python applications. It allows you to declare the libraries your project relies on, and it takes care of installing and updating them for you. You can read more about Poetry here.

To get started with Poetry, you first need to install it. You can do this by running the following command:

curl -sSL https://install.python-poetry.org | python3 -

Once you've installed Poetry, you can confirm that the installation was successful by checking its version. Run the following command to do this:

poetry --version

Now that you have Poetry installed, you can use it to install the dependencies for this project. Navigate to the project directory and run the following command:

poetry install

This command will install all the necessary dependencies for the project.

Note To execute a command within the context of the project, use poetry run. This ensures that the command is executed with the virtual environment activated and with all the dependencies available. For instance, if you want to run a Python script in the project, you would use the following command: poetry run python your_script.py. This command will execute your_script.py using the Python interpreter in the project's virtual environment.

Installing Task

In our projects, we utilize Task for task management. Task is a flexible and straightforward tool that allows us to define and run the tasks with ease. You can learn more about Task here.

To start using Task, you first need to install it. You can do this by following the instructions provided on the Task installation page.

Confirming Installation of Poetry and Task

Once you have Task installed, you can confirm that both Task and Poetry are functioning correctly by running the project's unit tests. To do this, navigate to the project directory and execute the following command:

task unit-tests

This command will initiate the unit tests for the project. If all tests pass successfully, it indicates that both Task and Poetry are properly set up.

Managing Tasks with Task

To view the most commonly used tasks that you can execute with Task, navigate to the project directory and execute the following command:

task --list

This command will display a list of the most commonly used tasks. Here's an example of what you might see:

$ task --list
task: Available tasks for this project:
* ci-checks:                    Execute all CI/CD checks for debugging a failing CI/CD pipeline.
* e2e-generate-dataset:         Create end-to-end dataset snapshots using the latest parser outputs.
* e2e-verify-dataset:           Validate the end-to-end dataset snapshots against the latest parser outputs.
* launch-debug-dashboard:       Start a local debugging dashboard in the browser.
* launch-docs:                  Start a local server to preview and automatically rebuild documentation upon file modification.
* monitor-unit-tests:           Run unit tests and rerun them immediately upon file modification.
* pre-commit-checks:            Execute all pre-commit checks before committing code.
* pre-push-checks:              Execute all pre-push checks before pushing code or creating a PR.

To view all the tasks and their descriptions that you can execute with Task, you can open the `Taskfile.yml file located at the root of the project.

Choosing an IDE

While we understand and respect that developers have their own preferred tools and setups, you might find Cursor to be a highly effective option for contributing to this project. Cursor is a powerful, AI-augmented version of VSCode, equipped to make the coding process smoother and more efficient.

Our codebase is enriched with docstrings, type hints, and a high level of unit test coverage. If you choose to use Cursor, its AI capabilities can leverage these features for smarter code completions, effective error detection, and various other useful suggestions. This can help you write clean, error-free code more rapidly, reducing debugging time.

However, please note that the use of Cursor is completely optional. Our primary concern is the quality of your contributions, not the tools you use to produce them.

For those interested in trying out Cursor, it is free and includes all the functionalities you appreciate in VSCode, along with AI-enhanced features. You can learn more and download it from here.

Note For optimal performance, we recommend using OpenAI's GPT-4 model.

Note Cursor Free allows you to use unlimited AI features at no cost when you provide your own API key. You can generate your own API key at OpenAI's API key page and add it to your Cursor settings for optimal performance.

Using Pre-commit (Optional)

Pre-commit is a tool that integrates with Git to automatically run various code checks, including unit tests, before each commit. This ensures that all modifications are validated before they are committed.

This tool is already installed as a package dependency during the poetry install command. To activate it for your project, run the following command:

pre-commit install

Once you turn on pre-commit, it will check your code automatically every time you try to make a commit. If it finds any problems, you'll need to fix them before you can go ahead and commit your changes. Some issues will be fixed for you, while for others, you'll get a warning to let you know something needs attention.

Adhering to Coding Standards

The pre-commit tool you've installed is designed to perform a series of checks with each commit you make. These checks are mirrored in our CI/CD pipeline and must pass before any PR can be merged. This process ensures the quality and consistency of our code across the entire project. Remember, you have the flexibility to run these checks individually or all at once using Task.

Code Formatting

We use Ruff for Python linting due to its speed and extensive rule set. It consolidates the functionality of multiple tools, supports automatic error correction, and is trusted by major open-source projects. For more details, refer to the Ruff Documentation.

Note Cursor AI could be a helpful resource in addressing lint issues. However, it's important to review and confirm the AI's recommendations to ensure they are suitable and meet the intended purpose, rather than blindly applying them.

Type Hints

We use type hints in our code to make it more readable and maintainable. They help catch certain types of errors early and allow IDEs to provide better code completion. For a quick introduction and further reading, visit Real Python's guide on type hinting.

Note Cursor AI can assist in transforming untyped code into typed code within your project. However, it's crucial to review and verify the AI's output for accuracy, rather than applying it blindly.

Unit Tests

Unit tests are essential for verifying the functionality of code under various inputs and conditions. They aid in early bug detection and ensure safer refactoring. While writing tests, the focus is on functionality rather than cleanliness, hence code quality checks are not enforced on test code. This approach enhances productivity. For more information on Python testing, refer to Real Python's guide.

Note Consider using Cursor AI to facilitate your unit test writing process. However, remember that the purpose of unit tests is not just to "cover" your code. Unit tests establish a specification of the expected behavior of correctly written code under a variety of circumstances, isolated from other pieces of code.

Note We suggest using the Coverage Gutters extension if you're a VSCode or Cursor user. This tool seamlessly integrates with our projects to show line coverage. It visually highlights the lines of code that have been executed during unit testing. Here's how Coverage Gutters displays this information when activated.

Conventional Commits (Optional)

We encourage the use of conventional commits for your contributions. Conventional commits provide a structured format for commit messages, making them more readable and easy to automate. For more details, refer to the Conventional Commits specification.

Note Cursor AI can assist in structuring your commit messages. However, remember that the purpose of commits is to state why you're making a change, not what the change is. For example, a commit message like "Refactor serialization logic to improve performance" is much more informative than just "Update serialization".

Working with Complex Data: Unit Testing Approach

When dealing with complex data, a common and effective strategy is to encapsulate the complexity within a unit test. This approach involves defining the various scenarios you anticipate and then focusing on testing these scenarios rather than the entire document or using extensive debugging tools.

This method significantly reduces the time required to verify if your modifications are working as expected. Here's how you can do it:

  1. Isolate the complexity: Identify the complex part of your data and isolate it as a unit test. This could be a function, a class, or any other component that you find complex.

  2. Define the scenarios: Determine what you want to happen for different inputs or states of your program. These scenarios will form the basis of your unit tests.

  3. Work with the unit test: Once you have your unit test set up, you can make changes and run the test to see if your changes are working as expected. This is much quicker and more efficient than working with the full document or using full debugging tools.

Remember, the goal here is to make your testing process more efficient and manageable. By isolating complexity and focusing on unit tests, you can achieve this goal and ensure your changes work as intended.

Contribution Workflow

We're excited about your interest in contributing to Alphanome AI's projects! To ensure a smooth and efficient process for all contributors, we've established this workflow. Please follow these steps to contribute effectively and avoid overlapping efforts.

Step 1: Select a Task

  1. Option A: Explore Open Issues:

    • Check out our Request For Contributions board for tasks that are ready for contributions.
    • Alternatively, browse through the GitHub Issues page of a specific project, such as sec-parser Issues or sec-ai Issues.
    • Tips:
      • Look for tasks labeled contributions-welcome. These tasks align with the project goals.
      • If you're new to the project, look for tasks labeled good-first-issue.
      • Be sure to check if a task is already tagged in-progress to avoid duplicate efforts.
  2. Option B: Propose a New Task:

    • Go through our Short-Term Roadmap to understand our focus areas and upcoming projects.
    • If you discover an issue or have a novel idea, feel free to propose it. Initiate a conversation either in the Discussions forum or on our Discord server.

Step 2: Prepare for Contribution

  1. Read CONTRIBUTING.md:

    • Before you begin, read the CONTRIBUTING.md file of the project for guidelines on setup, coding standards, and codebase understanding.
  2. Fork the Project:

    • Fork the project on GitHub to create your own workspace.
  3. Communicate Your Plan:

    • We recommend commenting on the issue you're tackling to discuss your approach and seek guidance. This also allows us to tag the issue as in-progress.
  4. Continuously Sync Your Fork:

    • Follow this GitHub Guide to synchronize your fork with the main repository.

Step 3: Begin Your Contribution

  1. Submit a Pull Request:

    • Create a pull request with your changes, clearly explaining your contributions.
  2. Check for Errors:

    • Run our automated checks and your local tests to catch and fix any issues before final submission.

We're grateful for your contributions and look forward to your valuable input in our project!

Seeking Assistance and Asking Questions

If you have any questions, or concerns, or need further clarification, feel free to reach out. Please use our Discussions page for more detailed queries and Discord for quick, conversational questions. For questions specific to a GitHub issue or pull request, kindly post them directly in the respective issue or PR thread.

Conclusion

Thank you for being interested in contributing to our projects! We're thrilled to have you on board. Your contributions help make our projects better, and we sincerely appreciate your time and effort.

Happy coding!