/tap-gitlab-custom

Singer.io Tap for extracting data from Gitlab's API

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

tap-gitlab

This is a Singer tap that produces JSON-formatted data following the Singer spec.

Important notice

  • This repository's default branch (legacy-stable) is kept for compatibility reasons but is no longer under active development.
  • New development is being performed against the main branch, which is based on a port to the Meltano SDK in Pull Request #65.
  • For a stable experience, users of this tap should begin pinning their installations to a specific release instead of branch references. More instructions are provided within the README.md of the main branch.

About this tap

It is based on v0.5.1 of https://github.com/singer-io/tap-gitlab, but contains many additional improvements.

This tap:

Quick start

  1. Install

Currently this project is not hosted on Python Package Index. To install, run:

pip install git+https://gitlab.com/meltano/tap-gitlab.git
  1. Get your GitLab access token

    • Login to your GitLab account
    • Navigate to your profile page
    • Create an access token
  2. Create the config file

    Create a JSON file called config.json containing:

    • Access token you just created
    • API URL for your GitLab account. If you are using the public gitlab.com this will be https://gitlab.com/api/v4
    • Groups to track (space separated)
    • Projects to track (space separated)

    Notes on group and project options:

    • either groups or projects need to be provided
    • filling in 'groups' but leaving 'projects' empty will sync all group projects.
    • filling in 'projects' but leaving 'groups' empty will sync selected projects.
    • filling in 'groups' and 'projects' will sync selected projects of those groups.
    {
      "api_url": "https://gitlab.com",
      "private_token": "your-access-token",
      "groups": "myorg mygroup",
      "projects": "myorg/repo-a myorg/repo-b",
      "start_date": "2018-01-01T00:00:00Z",
      "ultimate_license": true,
      "fetch_merge_request_commits": false,
      "fetch_pipelines_extended": false,
      "fetch_group_variables": false,
      "fetch_project_variables": false
    }

    The api_url requires only the base URL of the GitLab instance, e.g. https://gitlab.com. tap-gitlab automatically uses the latest (v4) version of GitLab's API. If you really want to set a different API version, you can set the full API URL, e.g. https://gitlab.com/api/v3, but be warned that this tap is built for API v4.

    If ultimate_license is true (defaults to false), then the GitLab account used has access to the GitLab Ultimate or GitLab.com Gold features. It will enable fetching Epics, Epic Issues and other entities available for GitLab Ultimate and GitLab.com Gold accounts.

    If fetch_merge_request_commits is true (defaults to false), then for each Merge Request, also fetch the MR's commits and create the join table merge_request_commits with the Merge Request and related Commit IDs. In the current version of GitLab's API, this operation requires one API call per Merge Request, so setting this to True can slow down considerably the end-to-end extraction time. For example, in a project like gitlab-org/gitlab-foss, this would result to 15x more API calls than required for fetching all the other Entities supported by tap-gitlab.

    If fetch_pipelines_extended is true (defaults to false), then for every Pipeline fetched with sync_pipelines (which returns N pages containing all pipelines per project), also fetch extended details of each of these pipelines with sync_pipelines_extended. Similar concerns as those related to fetch_merge_request_commits apply here - every pipeline fetched with sync_pipelines_extended requires a separate API call.

    If fetch_group_variables is true (defaults to false), then Group-level CI/CD variables will be retrieved for each available / specified group. This feature is treated as an opt-in to prevent users from accidentally extracting any potential secrets stored as Group-level CI/CD variables.

    If fetch_project_variables is true (defaults to false), then Project-level CI/CD variables will be retrieved for each available / specified project. This feature is treated as an opt-in to prevent users from accidentally extracting any potential secrets stored as Project-level CI/CD variables.

  3. [Optional] Create the initial state file

    You can provide JSON file that contains a date for the API endpoints to force the application to only fetch data newer than those dates. If you omit the file it will fetch all GitLab data

    {
      "project_278964": "2017-01-17T00:00:00Z",
      "project_278964_issues": "2017-01-17T00:00:00Z",
      "project_278964_merge_requests": "2017-01-17T00:00:00Z",
      "project_278964_commits": "2017-01-17T00:00:00Z"
    }

    Note:

    • You have to provide the id of each project you are syncing. For example, in the case of gitlab-org/gitlab it is 278964.
    • You can find the Project ID for a project in the homepage for the project, under its name.
  4. Run the application

    tap-gitlab can be run with:

    tap-gitlab --config config.json [--state state.json]

Copyright © 2018 Stitch