- Requirement: Our requirement was that we had to intimate commit authors of conflicting commits in merge conflicts. Imagine there are jenkins jobs configured to actually merge branches to keep 2 branches up to date. Now if the jenkins jobs could be configured in such a way that we simply have to click a swicth and that would send emails to corresponding commit authors who are culprits of the merge conflict
Jenkins already has the functionality built inside it such that we can easily get the authors and committers of a commits that actually broke the merge and was causing conflict. However it is not working as expected and we could not figure out any reason for this even after an entire week of tries. Hence we had to come up with a python script that would solve this problem.
There are 2 github apis that can actually extract commits from a repository.
-
The first one is called the compare commits api and is powerful as it can actually compare 2 heads of 2 branches and return all commits that one branch is ahead of after the a merge operation. So lets say that a we did a merge commit at a certain point and now for the second merge operation that we try to perform we can't do it because of a merge conflict that occurs. Now we can call this api and it will show us all commits after the successful merge commit. It does not require any headers other that the Authorization header for simply authorization for private repos. There is however a catch here. No matter how good this api is, it can only give back 250 commits. For any more commits this approach won't work.
-
The second one is the good old list commits api. This requires some information which is why it is required that we actually store some information in a file in the system. The information that the api requires are:
- Filename of the file for which we want to get the commits.
- Branch name
- Timestamp since which we wanna get the commits
The following plugins will be necessary in jenkins:
-
Log Trigger plugin also called post build plugin
-
Email extender plugin
-
Env inject plugin
-
Build timestamp plugin
-
Log Trigger will be used to trigger the running of the script in case the merge fails. The way the Log Trigger works is that it searches for a specified string and if found it basically runs a shell script.
-
The Email extender plugin will be used to send emails to certain people on failure.
-
Env inject plugin will enable us to inject our environment variables of choice in the entire job in jenkins. Environment variables are pretty much the only ways we can communicate between shell scripts and the plugins.
-
Build timestamp will store the build timestamp of the last successful build.
Python script trigger The following line will be searched in the Log Trigger plugin:
ERROR: Branch not suitable for integration as it does not merge cleanly
One important idea here is that we want to catch the Exception where the branch does not merge because of a merge conflict. The concept is similar to exception handling where we do not want to handle an exception that is has too large a scope like the class Exception
. Reading the console output and parsing it for finding out a specific output is not definitely a clean implementation to solve this issue. However we could not find a better way to handle this, other than may be writing a brand new plugin for jenkins.
There will be another log trigger that will store the build timestamp of the last successful build in a properties file whose path will be known to jenkins using the env inject plugin. Properties file are the only way we can actually inject environment variables of our choice into jenkins jobs.
The python script that will run will have the following command line arguments passed to it.
- jenkins_home
- job_name
- name of the branch to merge to.
- gmci properties home
The log file will be in the following location:
${JENKINS_HOME}/jobs/${JOB_NAME}/builds/lastFailedBuild/log
The location of the properties file will be:
${JENKINS_HOME}/jobs/${JOB_NAME}/gmci.properties
In jenkins this entire location will be available under the env variables
${GMCI_HOME}
The branch to merge to will be available as:
${BRANCH_TO_MERGE_TO}
These 2 environment variables are enough for finding out the log for the lastFailedBuild and properties file where the last successful build timestamp is stored.
Build timestamp trigger
The following line will be searched in the log trigger plugin for the purpose of simply storing the build timestamp of the last successful build.
MERGE COMMIT SUCCESSFUL NO MERGE CONFLICTS FOUND
This line will actually be generated by the build step as an echo to the shell. If the merge fails the build step would not be executed in the first place.
A more direct approach would be to execute the command to store the last successful build path directly in the build shell as the build task will will get executed when merge before build succeeds.
The ConfigParser module is used for reading properties file (java properties files). Although it is not a script necessity that the build timestamp of the last successful build be stored in a properties file, just to keep things consistent we are doing that.
>>> import configparser
>>> config = configparser.ConfigParser()
>>> config.read('/var/lib/jenkins/jobs/auto_merge_github_branches/build_timestamp.properties')
['/var/lib/jenkins/jobs/auto_merge_github_branches/build_timestamp.properties']
>>> config.sections()
[]
>>> config['DEFAULT']
<Section: DEFAULT>
>>> for key in config['DEFAULT']:
... print key
...
build_timestamp_auto_merge_github_branches_60
build_timestamp_auto_merge_github_branches_61
>>> for key in config['DEFAULT']:
... print config['DEFAULT'][key]
...
2017-08-15T11:57:54+0530
2017-08-15T11:58:34+0530
>>> all_keys = config['DEFAULT'].keys()
>>> all_keys
[u'build_timestamp_auto_merge_github_branches_60', u'build_timestamp_auto_merge_github_branches_61']
>>> last_build_timestamp = all_keys[-1]
>>> last_build_timestamp
u'build_timestamp_auto_merge_github_branches_61'
>>> config['DEFAULT'][last_build_timestamp]
u'2017-08-15T11:58:34+0530'
>>>
Again this is easy as well especially with the re module of python, we can simply search for a specific line with a regex and we can find the line and if we can find the line we can easily find the filepath in the line.
We will consider only one file for all Github merge conflict intimator properties. That file will be called gmci.properties and will recide in the following file location:
${JENKINS_HOME}/jobs/${JOB_NAME}/gmci.properties
Also note the contents of this file would be the following:
[DEFAULT]
GMCI_HOME=${JENKINS_HOME}/jobs/${JOB_NAME}/gmci.properties
BRANCH_TO_MERGE_TO=test_branch_20
MERGE_CONFLICT_INTIMATOR_HOME=${JENKINS_HOME}/jobs/${JOB_NAME}/Github_Merge_Conflict_Intimator.py
[LAST_SUCCESSFUL_BUILD_TIMESTAMP]
BUILD_TIMESTAMP_auto_merge_github_branches_75=2017-08-15T15:43:42+0530
[CULPRITS]
[DEVELOPERS]
The following build script will be used in order to remove the last build timestamp and update that with the latest build timestamp
exit_code=$?
if [ $exit_code -eq 0 ]
then
echo "Storing last successful build timestamp"
build_timestamp_segment=$(grep -n "LAST_SUCCESSFUL_BUILD_TIMESTAMP" ${GMCI_HOME} | cut -d : -f 1)
build_timestamp_line=$(grep -n "BUILD_TIMESTAMP_JOB" ${GMCI_HOME} | cut -d : -f 1)
if [ ! $build_timestamp_line ]
then
sed -i "${build_timestamp_segment}a\BUILD_TIMESTAMP_JOB_${JOB_NAME}_${BUILD_NUMBER}=$BUILD_TIMESTAMP" $GMCI_HOME
else
sed -i "${build_timestamp_line}d" $GMCI_HOME
sed -i "${build_timestamp_segment}a\BUILD_TIMESTAMP_JOB_${JOB_NAME}_${BUILD_NUMBER}=$BUILD_TIMESTAMP" $GMCI_HOME
fi
fi
The tool sed is used here in order to actually append, delete and insert to files from the shell itself.
It looks like the only way of comparing these commits for finding conflicts is actually comparing them in O(n**2). The conflicting commit actually can be between a massive gap of time.
Initially I was of the impression that the hunk portion in the diff actually could give us the idea of what exactly changed in the file. However that is not the case. Looks like we may have to go for a line by line comparison which would make things absolutely insane.
Although it is essentially a line by line comparison the idea would be that if a line is added by someone, is it removed by the other guy and vice versa, if so there is a merge conflict, if not its okay. This is a complicated problem when it comes to literally checking patch by patch. The current agorithm follows that for a given patch, we check the hunk of the patch, the number of lines that are inserted and the number of lines deleted.
NOTE: There will be no way of actually giving a clue as to who are the actual culprits of the conflict. The best we can do is give a probable set of commit authors whose commits conflict and a list of all authors who committed in the file where the conflict happened.
One important idea here is the git blame command. git blame is actually used to show who are the people who committed at any point in the file. It is amazingly effective and if we can parse the git blame command on the file itself. Thats it. We will be there.