git2 log command
seriema opened this issue · 8 comments
From the readme:
Generate a git log file using the following command:
git log --pretty=format:'[%h] %aN %ad %s' --date=short --numstat --after=YYYY-MM-DD
Note that there's a second supported Git format as well, imaginatively named
git2
. This format is more tolerant and faster to parse, so please prefer it over the plaingit
format described above:git log --all -M -C --numstat --date=short --pretty=format:'--%h--%ad--%an' --no-renames
The second instruction uses --pretty=format:'--%h--%ad--%an'
, which results in this error:
$ java -jar code-maat-0.9.2-SNAPSHOT-standalone.jar -l test.log -c git -a entity-effort -o entity-test.csv
WARNING: update already refers to: #'clojure.core/update in namespace: incanter.core, being replaced by: #'incanter.core/update
Invalid argument: java.lang.IllegalArgumentException: input: --afefec5--2015-11-07--JP Johansson
5 3 filename1.js
60 0 filename2.js
55 81 filename3.js
126 158 filename.css
, reason: Parse error at line 1, column 1:
--afefec5--2015-11-07--JP Johansson
^
Expected:
[
This is Code Maat, a program used to collect statistics from a VCS.
Version: 0.9.2-SNAPSHOT
It happens with all -a
flags to code-maat. I changed it to use the --pretty=format:'[%h] %aN %ad %s'
from the first instruction and it works, but then the readme text about a "second supported Git format" is a bit confusing.
I also didn't quite understand the other flags in the second instruction, especially the last parameter --no-renames
which seems to contradict -M -C
? That's how I read it on ExplainShell.
When using the second format, you need to specify a different parser as well. From the readme (a bit below the text you quoted):
if you use the second Git format, just specify git2 instead
So you just need to switch from '-c git' to '-c git2' and the analyses will work.
Yes, you're correct about the contradiction between the different flags. It's the result of some experiments with different formats and nothing deliberate. I'll see if I can clean it up.
I see. The info was a bit down in "Generating a summary" so I probably skipped it since I was more interested in the other options. Maybe it could be moved up to "Running Code Maat"? Or maybe update the "Generate a git log ..." to give more context for a new user? When I read the first instructions I had no relation to what you meant by:
Note that there's a second supported Git format as well, imaginatively named git2.
It sounded like git 2.0 had a different format and that was supported as well. The relation to the -c
flag could be pointed out there perhaps?
@seriema I've added a clarification in the paragraph where I introduce the git2
input data format.
I've cleaned-up the git2
format by removing the overridden, and potentially conflicting, flags from the documentation. I also corrected the format to respect a mailmap (when present).
Sorry @adamtornhill, I didn't have time to check earlier. The new paragraph is good information. I created a PR for merging the info a bit and adding some info above "git1" to make it a bit clearer and have context.
Looking at the new git2 command, the difference besides the --pretty=format
seems to be the --all
and --no-renames
flags. I'm guessing here, but --all
changes the format a bit, but --no-renames
would actually change the contents of the log since it won't follow redirects so old filenames will be in the log and our data gets skewed?
Yes, that's correct: old file names will show-up in the logs with git2
. But IIRC, the same limitation applies to the legacy git log format (-c git
).
In practice I haven't found it that limiting because I always apply scripts to the output that post-processes the data and filters it depending on the current content of the repository. In addition, most hotspots tend to stay where they are, although there's obviously no guarantee.
That said, there are cases where it's useful to do proper re-name tracking. I have code to do that in another analysis tool. I hope to port that into Code Maat one day.
@adamtornhill This sounds like it should be pointed out in the Readme perhaps? As a user I only see two options for log format, and one being touted as "more tolerant and faster". I wouldn't expect to have to treat the resulting data differently given that description.