/git-companion-scripts

Useful script collection for Git.

Primary LanguageShell

Git Companion Scripts

Introduction

Useful script collection for Git. Currently it contains file encoding validation and conversion scripts especially for multi-platform development.

They are expected to work in most Git environments with no additional software installation. Git for Windows (msysgit) is also supported.

More specifically, it's assumed the following software pieces are available:

  • Perl 5.8 or later (with Encode module).
  • Bourne shell (/bin/sh) and standard unix commands for tests.

Directories and files:

- hooks - Hook scripts for .git/hooks
  - pre-commit-encoding - pre-commit script to verify file encoding.

- utils - Utility scripts
  - ipconv - In-place converter of text encoding and newline characters.

- tests - Test scripts
  - shunit2 - shUnit2 Unix shell unit testing framework.
  - test-* - Individual test scripts.

- fixtures - Test fixtures
  - txtgen.pl - Text fixture generator.
  - *.txt - Text fixtures.

- runtests.sh - Script to run all tests.

Usage

hooks/pre-commit-encoding

Call this script from pre-commit hook .git/hooks/pre-commit.

You can specify encodings allowed to be committed for each file pattern by writing encoding or encoding=ENCODINGS on it in .gitattributes. ENCODINGS is an optional parameter of comma-separated encodings.

The script accepts emacs-like encoding notation like utf-8 utf-8-dos utf-8-with-signature-unix and so on. No newline character specifier (-unix -dos -mac) means 'dont care' - any newline characters will be accepted.

If encoding attribute without any ENCODINGS parameter is specified, default encodings will be used. The default encodings can be specified by a script argument or $default_encoding variable in the script.

Some .gitattributes examples:

# Force ascii on log files.
*.log encoding=ascii

# Specify default encodings on text files.
*.txt encoding

# Specify a macro for encodings MSVC can process.
[attr]msvc encoding=ascii-dos,utf-8-with-signature-dos
*.c msvc
*.h msvc
*.cpp msvc

When running git commit, the script checks that each staged file has valid encoding characters, valid newline characters, or UTF-8 BOM character prior to the commit. If any infringement found the commit will be aborted.

% git add hoge.c
% git commit
hoge.c: utf8-unix (ascii-dos,utf-8-with-signature-dos)
Commit aborted!  (Use "git commit --no-verify" to skip this)

Use git commit --no-verify to skip checks by the pre-commit script.

Limitations:

  • With older Git like v1.7.4, it won't work for an initial commit.
  • It cannot handle files whose names have the sequence ": " (COLON SPACE).

utils/ipconv

ipconv is in-place converter of text encoding and newline characters.

Specify the output encoding with -e option, or modify $output_encoding variable in the script.

It accepts emacs-like encoding notation like utf-8 utf-8-unix utf-8-with-signature-unix.

Specyfing no newline character means 'dont touch.' Newline characters are not modified.

It creates backup files with suffix .orig.

License

tests/shunit2 is taken from shUnit2 and licensed under LGPL.

Other files are licensed under MIT.

Copyright (c) 2012 Takeshi Yaegashi.