/git

An implementation of Git in Scala 3 with ZIO 2 with all episodes available on YouTube :tv:

Primary LanguageScala

Git implementation

Implementation of a subset of git features

Objectives

  • Learn how git works in depth
  • Try Scala3
  • Have several loosely-coupled interchangeable components thanks to hexagonal architecture
  • Try to integrate practices and patterns from DDD
  • (double loop) TDD approach

Chapters

Chapter 1: Making a commit

Branch episode1

  • motivations and presentation of the objectives
  • generated project sbt new scala/scala3.g8
  • hash a blob
    • What is a blob?
      • SHA1 of file with a prefix blob <content_size>\0<content>
      • Hash of a blob: echo -n 'test content' | git hash-object --stdin
      • Comparing with sha1 hash of the same string echo -n 'blob 12\0test content' | shasum -a 1

Branch episode2

  • refactoring and extension of the code to support other input options (file, write in database, type, etc.)
    • setup domain and infrastructure packages (hexagonal architecture)
    • write a test for Main
    • introducing a HashObjectCommand

Branch episode3

  • add zio (resource management, streaming, retries, parallelism, etc.)

Branch episode4

  • objective of the chapter: making a commit
  • hash stdin string - change the way the command is used:
    • hash-object --text "test content" instead of hash-object "test content"
  • Fix the encoding issue
  • Hashing a stream of bytes (ZStream and ZSink)

Branch episode5

  • Write test to hash a file
  • Refactor so the hash object usecase accepts several types of command
  • Implement hashing a file
  • Model the return type of the usecase with a richer type
  • Update test to hash several files and implement

Branch episode6

  • [Refactor/hexagonal arch.] extract reading a file and have the implementation in the infrastructure package.
    • problem in the hash object usecase
    • fixing the problem

Hexagonal diagram

Branch episode7

  • [Business Logic] write a blob in git objects directory
    • create an ObjectRepository
    • [/] write a test for HashObjectUseCase verifying that the repository is called

Branch episode8

  • [Business Logic] write a blob in git objects directory
    • create an ObjectRepository
    • write a test for HashObjectUseCase verifying that the repository is called
    • [/] create the implementation for the repository and test
      • what to test? we are looking to test compatibility with Git: right place, right format

Branch episode9

  • [Business Logic] write a blob in git objects directory
    • Object Repository File System
      • Refactor the ObjectRepositoryFileSystemSpec to generate a single hash to avoid a "cache" issue.
      • Implement Object Repository File System

Branch episode10

  • Check that hash object use case is calling the object repository with the right value (with the blob + size prefix)

Branch episode11

  • Put things together: hash and save a blob from the app and try to read it with git
    • Test missing: not call the repository when the save option is false
    • refactor main to extract the parsing and the formatting part

Chapter 2: Saving the current tree

Branch episode12

  • [Business Logic] read and write git index file
    • read the git index file

Branch episode13

  • [Business Logic] read and write git index file
    • create a dummy index file and read it
    • refactor the code to use case classes

Branch episode14

  • [Business Logic] read and write git index file
    • productionize the code

Next:

  • [Business Logic] write a tree in git object directory
  • refactor the MainSpec to separate the concerns
  • use a more specific type than string for dealing with files
  • [Business Logic] write a tree in git object directory
  • [Business Logic] write a commit (with a tree hash provided)

Git internals

Objects

Source: https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelaigitn

Types of objects

Git uses the concept of Object. There 3 types of object:

  • blobs. A blob basically represents the content of a file. It is stored in a file named after the hash of the content.
  • trees. Trees are used to represent the hierarchy between blobs. A tree contains blobs and other trees with their names. For instance :
100644 blob dc711f442241823069c499197accce1537f30928    .gitignore
100644 blob e5d351c3cd44aa1d8c1cb967c7e7fde1dee4b0ad    README.md
100644 blob 7a010b786eb29b895ba5799306052b996516d63b    build.sbt
040000 tree 8bac5f27882165d313f5732bb4f140003156c693    project
040000 tree 163727ec9bd17ef32ee088a52a31fe0b483fa18f    src
  • there are different types of files:
    • 100644 is a normal file,
    • 100755 is an executable file,
    • 120000 for symbolic links,
    • 040000 for tree
    • 160000 for sub-modules
  • commits. Commits are used to capture :
    • the tree snapshot of the code
    • the parent(s) commits. Usually a commit has only one parent, but it can have 0 to n parents. The first commit does not have any parent. A merge commit has several parents (usually 2).
    • the author
    • the commiter
    • a blank line
    • the commit message

How object are stored in the object repository

Prefixed by the first two characters of the hash

Those files are stored in .git/objects. Each file representing either blobs, trees or commits, are stored within directory named after the first two characters of the hexadecimal hash. For the hash dc711f442241823069c499197accce1537f30928 will be stored the in folder .git/objects/dc.

The filename is the hash without the first two letters. For the hash dc711f442241823069c499197accce1537f30928, the filename will be 711f442241823069c499197accce1537f30928 -- note that the prefix dc has been removed here. The file corresponding to the hash dc711f442241823069c499197accce1537f30928 would be .git/objects/dc/711f442241823069c499197accce1537f30928.

Zipped using ZLib

ZLib is a C library used for data compression. It only supports one algorithm: DEFLATE (also used in the zip archive format). This algorithm is widely used.

Git index

https://git-scm.com/docs/index-format

Useful git commands:

  • git cat-file show information about an object
    • -p <hash> show the content of an object. hash can be master^{tree} to reference the tree object pointed to the last version of master.
    • -t <hash> show the type of object
  • git hash-object (explicit)
  • git update-index Register file contents in the working tree to the index
  • git write-tree writes the staging area to a tree object
  • git ls-files
    • --stage or -s show all files tracked
  • zlib-flate -uncompress < .git/objects/18/7fbaf52b4fdebd0111740829df5b51edc8b029 other program that deflates files

Useful links: