/mi

an experiment on Source Operation Metrics

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

================================
Meta-information Indicators (MI)
================================

Overview
========

**MI** project collects data from GitHub repositories. You can use it to either collect data stored locally or within Amazon's S3 cloud.
For personal usage, checkout <Usage> section.

Together with `mi-scheduler <https://github.com/thoth-station/mi-scheduler>`_, we provide automated data extraction pipeline
for data minig of requested repositories and organizations. This pipeline can
be scheduled customly, e.g. to run daily, weekly, and so on.


Data extraction request
-----------------------
To request data extraction for repository or organization,
create **Data Extraction** Issue in **MI-Scheduler** repository. Use this link TODO


Data extraction Pipeline (diagram)
----------------------------------
**MI** pipeline is simple to understand, see diagram below

.. code-block::

                      +---------+
                      |ConfigMap|
                      +----+----+
                           |
                +--+-------+--------+--+
                |  |                |  |
                |  |  mi-scheduler  |  |
                |  |                |  |
                +------+---+---+-------+
                    |   |   |   |    |
                    |   |   |   |    |
                    |   |   |   |    |
                    | Argo Workflows |
                    |   |   |   |    |
                    |   |   |   |    |
    +---------------v---v---v---v----v------------------+                                          +--------------------        +--------------------+
    |                                                   |                                          |   Visualization   |        |   Recommendation   |
    |  +---------+  +---------+            +---------+  |                                          +-------------------+        +--------------------+
    |  |thoth/   |  |  AICoE  |            | your    |  |                                          |   Project Health  |        |   thoth            |
    |  |  station|  |         |            |     org |  |                                          |    (dashboard)    |        |                    |
    |  +---------+  +---------+            +---------+  |                                          |                   |        |                    |
    |  |solver   |  |...      |            |your     |  |                                          +---------+---------+        +----------+---------+
    |  |         |  |         |            |   repos |  |           thoth-station/mi                         ^                             ^
    |  |amun     |  |...      | X X X X X  |         |  |     (Meta-information Indicators)                  |                             |
    |  |         |  |         |            |         |  |                                                    +-------------+---------------+
    |  |adviser  |  |...      |            |         |  |                                                                  |
    |  |         |  |         |            |         |  |                                                                  |
    |  |....     |  |...      |            |         |  |                                                +-----------------+-------------------+
    |  |         |  |         |            |         |  |                                                |                                     |
    |  +---------+  +---------+            +---------+  |                                                |       Knowledge Processsing         |
    |                                                   |                                                |                                     |
    +-----------------------+---------------------------+                                                +-----------------+-------------------+
    GitHub repositories   |                                                                                              ^
                            |                 +--------------------------------------------------------+                   |
                            |                 |                                                        |                   |
                            |                 |      Entities Analysis   +------->      Knowledge      |                   |
                            +---------------->-+                                                      +--------------------+
                                              +---------+----------------+----------+------------------+
                                              |  Issues |  Pull Requests |  Readmes |  etc...........  |
                                              |         |                |          |                  |
                                              +---------+----------------+----------+------------------+



What can **MI** extract from GitHub?
------------------------------------
**MI** analyses entities specified on the srcopsmetrics/entities page
Entity is essentialy a repository metadata that is being inspected (e.g. Issue or Pull Request),
from which specified *features* are extracted and are stored to dataframe.

**MI** is essentialy wrapped around PyGitHub module to provide careless data
extraction with API rate limit handling and data updating.


Install
=======

pip
---

MI is available through PyPI, so you can do

.. code-block:: console

    pip install srcopsmetrics

git
---

Alternatively, you can install srcopsmetrics by cloning repository

.. code-block:: console

    git clone https://github.com/thoth-station/mi.git

    cd mi

    pipenv install --dev


Usage
=====

Setup
-----

Connect to GitHub
~~~~~~~~~~~~~~~~~
To be able to extract data from GitHub, access token must be configured. To generate one, read `this <https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token>`_
To use the token with mi, set ``GITHUB_ACESS_TOKEN`` environment variable to the token value, for example:

.. code-block:: console

    export GITHUB_ACESS_TOKEN=<token_string>


Data Location
~~~~~~~~~~~~~
To store data locally, use ``-l`` when calling CLI or set is_local=True when using **MI** as a module.

By default **MI** will try to store the data on Ceph.
In order to store on Ceph you need to provide the following env variables:

- ``S3_ENDPOINT_URL`` Ceph Host name
- ``CEPH_BUCKET`` Ceph Bucket name
- ``CEPH_BUCKET_PREFIX`` Ceph Prefix
- ``CEPH_KEY_ID`` Ceph Key ID
- ``CEPH_SECRET_KEY`` Ceph Secret Key

For more information about Ceph storing look `here <https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html>`_


CLI
---

To view all of the available commands and their description use

.. code-block:: console

    python -m srcopsmetrics.cli --help


See some of the general usage examples below

Get repository PullRequest data locally
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    python -m srcopsmetrics.cli --create --is-local --repository foo_repo --entities PullRequest

which is equivalent to

.. code-block:: console

    python -m srcopsmetrics.cli -clr foo_repo -e PullRequest


Get organization PR data locally
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    python -m srcopsmetrics.cli -clo foo_org -e PullRequest


Get multiple repository PR data locally
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    python -m srcopsmetrics.cli -clr foo_repo,bar_repo -e PullRequest


Get multiple entity data locally
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: console

    python -m srcopsmetrics.cli -clr foo_repo -e PullRequest,Issue,Commit


Meta-Information Entities Data
=================================

How to load data
----------------
- `Using pandas <https://github.com/thoth-station/mi/tree/master/srcopsmetrics/entities#using-pandas>`_
- `Using mi entities <https://github.com/thoth-station/mi/tree/master/srcopsmetrics/entities#using-mi-modules>`_

Indicators
----------
To know more about indicators that are extracted from data, check out `Meta-Information Indicators <https://github.com/thoth-station/mi/tree/master/srcopsmetrics/entities#meta-information-indicators-metrics>`_.


How to contribute
=================
Always feel free to open new Issues or engage in already existing ones!

Custom Entities & Metrics
=========================
If you want to contribute by adding new entity or metric that will be analysed from GitHub repositories,
feel free to open up an Issue and describe why do you think this new entity should be analysed and what
are the benefits of doing so according to the goal of ``thoth-station/mi project``.

After creating Issue, you can wait for the response of ``thoth-station`` devs
Do not forget to reference the Issue in your Pull Request.

Implementation
--------------
Look at `Template entity <https://github.com/thoth-station/mi/tree/master/srcopsmetrics/entities#meta-information-indicators-metrics>`_
to get an idea for requirements that need to be satisfied for custom entity implementation.