zsh logs commands and timestamps to ~/.zsh_history
for
shell features such as reverse history search.
This repository is a fun project that provides shell, Python, and R
scripts to parse, analyze, visualize .zsh_history
files.
These scripts can be extended to support Bash's .bash_history
.
You can run this on your .zsh_history
files by cloning this repository
with git clone https://github.com/bamos/zsh-history-analysis.git
and installing the following prerequisites.
Ensure you have increased the history file size so commands aren't removed.
Then, follow the steps in Control Flow
to generate the plots.
PATH
containspython3
andRscript
, which can be installed from your package manager. In Arch Linux, the required packages are python and r.R
: ggplot2 and reshape are installed from an R shell withinstall.packages("ggplot2")
andinstall.packages("reshape")
.
Unfortunately, zsh's default history file size is limited to 10000 lines by default and will truncate the history to this length by deduplicating entries and removing old data.
Adding the following lines to .zshrc
will remove the limits and
deduplication of the history file.
export HISTSIZE=1000000000
export SAVEHIST=$HISTSIZE
setopt EXTENDED_HISTORY
The following is the control flow for generating plots.
- Archive all
.zsh_history
files indata/<server>.zsh_history
../pull-history-data.sh
is a script to partially help archiving the data that will pull files from a list of servers separated by newlines in a file namedservers
. - Run
./analyze.py
to analyze the raw data files../analyze.py --help
will provide a help menu with the supported options. - Run
./plot.r
to generate plots from the analyzed data.
At a given hour or weekday, how frequently do I run commands? The following shows the average number of commands executed for each hour and weekday. I average 10 commands per hour overnight and a little more during the day, and Wednesdays seem to be my least productive days.
Many hours have 0 commands executed since I'm not typing commands every hour of every day, so these points have a high standard deviation. Empirical Cumulative Distribution Functions (ECDF's) provide a deeper visualization of the distributions.
What command was over 100 characters!?
analyze.py
will output the top five commands, and these
long commands are from using the full path to an executable,
such as the Android ARM cross compiler, as shown in the following output.
$ ./analyze.py commandLengths
105: /opt/android-ndk-r9/toolchains/arm-linux-androideabi-4.8/prebuilt/linux-x86/bin/arm-linux-androideabi-gcc
Scoping into the majority of the data shows that almost 50% of my commands are one or two characters.
Since almost 50% of my commands are one or two characters, what are the top commands? The following plot shows the top commands are Linux utilities and oh-my-zsh aliases.