This repository contains scripts that help fetch repository information from a GitHub organization, retrieve detailed data such as tags and commits for each repository, and then combine this data into a JSON file and CSV file for easy consumption.
fetch_repo_summary.py
: Fetches summary information for all repositories in a GitHub organization and saves it to a file (repo_summary.json
).fetch_repo_details.py
: Fetches detailed information (tags and commit history) for each repository listed inrepo_summary.json
and saves it to individual JSON files in therepo_tags/
andrepo_commits/
directories.combine_repo_data.py
: Combines the summary, tags, and commit information into a single JSON file (combined_repo_data.json
) and CSV file (combined_repo_data.csv
).run_all.sh
: A Bash script that runs all the above scripts in sequence, ensuring that all necessary data is fetched and processed into the final output.dockerRun.sh
: A script to build and run the Docker container for easier development.
If you prefer to use Docker for a consistent development environment, this repository provides a Dockerfile
and a dockerRun.sh
script to simplify the setup.
Before running the scripts with Docker, you need to build the Docker image:
docker build -t repocleanup:latest .
This will create a Docker image named repocleanup:latest
with all the necessary dependencies.
Once the image is built, you can use the dockerRun.sh
script to run the container interactively:
./dockerRun.sh
This script runs the container with appropriate settings, mounts the current working directory to /workspace
inside the container, and opens an interactive shell where you can run the Python scripts as usual.
#!/bin/bash
# Build the Docker image if it doesn't exist
docker build -t repocleanup:latest .
# Run the Docker container
docker run --rm -it \
--ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
--name repocleanup \
-v $(pwd):/workspace \
-w /workspace \
repocleanup:latest
You can run the entire process of fetching repository data, details, and combining the results by executing the run_all.sh
script:
./run_all.sh <org_name> <github_token>
<org_name>
: The name of the GitHub organization whose repositories you want to analyze.<github_token>
: Your personal access token for GitHub.
Example:
./run_all.sh trendscenter ghp_yourgithubtokenhere
This will:
- Fetch a summary of all repositories in the organization and save it to
repo_summary.json
. - Fetch tags and commit information for each repository and save them in
repo_tags/
andrepo_commits/
directories. - Combine all data into
combined_repo_data.json
andcombined_repo_data.csv
.
If you're using Docker, first run the container interactively:
./dockerRun.sh
Once inside the container, you can run the same scripts as you would on your local machine:
./run_all.sh <org_name> <github_token>
This method ensures a consistent environment with all dependencies pre-installed, making development and execution simpler.
repo_summary.json
: Contains summary data for each repository in the organization.repo_tags/
: Contains tag information for each repository.repo_commits/
: Contains commit information for each repository.combined_repo_data.json
: The final JSON output, combining repository summary, tags, and commits.combined_repo_data.csv
: The final CSV output, which can be opened in Excel or other spreadsheet tools.
-
If any script fails during execution, it will print an error message and stop. You can re-run the process using the
run_all.sh
script, and it will continue from where it left off by skipping already processed repositories. -
Ensure that your GitHub token has the necessary permissions to read repositories and access commit history.
- This README assumes that users may want to use Docker for a more streamlined and consistent development process.
- The
dockerRun.sh
script builds the Docker image if it doesn't already exist, then runs the container interactively.