/github-data-export-extractor

Makes the .tar.gz file generated by the Github data export feature more useful for archival purposes.

Primary LanguageShellThe UnlicenseUnlicense

Github Data Export Extractor

By Alex Free

Github Data Export Extractor makes the .tar.gz file generated by the Github data export feature more useful for archival purposes. Github Data Export Extractor takes the .tar.gz file downloaded from Github and generates a new backup directory that contains 2 sub directories:

  • repositories - contains all of your repos cloned recursively from the Github Export data locally.
  • releases - contains all of your released files for your repos. If there are multiple Github Releases files with the same filename for different repos, each file will be copied. Duplicate filenames will have .~1~ or similar appended to the end to differentiate the duplicate files. These files may be hidden by default by your file explorer, so you may want to enable showing hidden files. Alternatively, ls shows these files.

This is much nicer to have then what Github provides with their data export feature alone. I don't like the way GitHub provides this data export though as it is terrible for backing up because:

  • It does not contain git repos, instead it contains .pack files of the repos.
  • Every file released for your git repos is in a different directory which only contain the single Github released file.
  • Besides my repositories and releases for each repository, I don't want any other files in a backup. The GitHub data export contains a bunch of .json files and other things I don't really need in my case.

Links

Table Of Contents

example 1

example 2

Downloads

v1.0.1 - (2/16/2023)

Github Data Export Extractor v1.0.1

Changes:

  • Recursively clones all repos to ensure any and all submodules are archived correctly.
  • Does not overwrite duplicate Github Releases filenames. If there are multiple Github Releases files with the same filename for different repos, each file will be copied. Duplicate filenames will have .~1~ or similar appended to the end to differentiate the duplicate files. These files may be hidden by default by your file explorer, so you may want to enable showing hidden files. Alternatively, ls shows these files.

v1.0 - (2/22/2022)

Github Data Export Extractor v1.0

Usage

Download and extract the latest Github Data Export Extractor release. Inside is gdee, which is a bash script that takes only two arguments:

  • The first argument is the .tar.gz file you download through the Github Data Export on Github.
  • The second argument is the name of the backup directory you want to create from the tar.gz file.

Example command line usage:

./gdee 9d9617f2-11b5-11ec-35c8-2dfe00aa20a5.tar.gz alex-free-2-21-2022

This extracts all repos and released files from the Github data export .tar.gz file and puts it in a new directory named "alex-free-2-21-2022".

License

Github Data Export Extractor is released into the public domain, see the file unlicese.txt in each release for more info.