CAT is a tool for offline crash report analysis. CAT allows faster, more precise queries of ProSeries crash data. It includes a quickbase crash report downloader, xml parser, Pandas dataframe helper functions, and some text analysis tools.
If you are new to Python 31, Jupyter notebooks2, or Pandas3, check out the references section.
After installation, check out CrashAnalysisTour.ipynb
to see examples of the commands available with CAT. Copy and rename ExampleNotebook.ipynb
to create a new crash report.
Python 3 is required (3.5+ preferred). We recommend installing python with Anaconda.
PyCharm or the Visual Studio Python Plugin is recommended for editing the crash_analysis
library, but not required.
Git is also required for editing the crash_analysis
library.
In a terminal or command prompt, do the following:
-
Download this repository:
git clone https://github.intuit.com/arosengarten/CrashAnalysisTool.git
. If you don't havegit
installed, this can be downloaded from the repo webpage by clicking the "Clone or Download" button and selecting "Download Zip". However, if you don't use git/clone the repo, you will not be able to make lasting changes to the tool. -
Go inside the directory:
cd CrashAnalysisTool
. If you downloaded the zip file, extract it and go inside that directory. -
(Recommended) Create a virtual environment:
conda create --name cat35 python=3.5
. Otherwise, ensure that Python 3.5+ is your default python installation. -
(Recommended) Activate the virtual environment (of python 3.5):
source activate cat35
for OSX/Linux, oractivate cat35
for Windows. -
Install required python packages:
pip install -r requirements.txt
-
Open or create
crash_analysis/private.py
and input the database id, username, password, and app token as strings. See internal ProSeries wiki for details.
For contributing to the crash_analysis
library, it is recommended that you install extra python packages.
Activate your cat python environment (step 4 in Set Up) and from the CrashAnalysisTool
directory, run the following commands:
pip install -r crash_analysis/module_requirements.txt
pip install -r crash_analysis/dev_requirements.txt
module_requirements.txt
include packages such as sci-kit learn and gensim, which are necessary for the machine-learning modules in the library (not currently publicly accessible).dev_requirements.txt
include packages that promote higher code quality, namely a python linter (flake8/hacking) and type checker (mylang).
-
Open a command prompt or terminal inside the
CrashAnalysisTool
directory on your machine. -
In the command prompt or terminal, start the jupyter notebook:
jupyter notebook
-
A browser window should open up. Open
src/ExampleNotebook.ipynb
, copy it (File > Make A Copy...), and begin crash reporting!
- Revised documentation (this readme, docstrings in lib, and explicit comments in the example notebook)
- Added types and doctests to a few modules.
- Added dev requirements
- Added quickbase downloader that can download crashed by time range in parallel
- Curated ExampleNotebook and CrashAnalysisTour
- Completely upgraded to Python 3
- (Finally) started writing documentation
- Finish adding type annotations
- Use hacking/flake8 to lint project, make sure it adheres to community style guide
- Refactor/Gut
analysis.py
, which hasn't been used in a while in the first place. - Add unit tests (specifically to
parser.py
,downloader.py
, and maybeanalysis.py
) - Finish adding doctests (specifically to
preprocess.py
).dataframe_helper.py
is fully doctested. - (optional) Create sphinx documentation for project (put in root/docs/ directory)
- (optional) Reorganize modules into subpackages (e.g.
parser.py
,quickbase.py
, anddownloader.py
could go into a download sub-package) - (reach) Rehash document clustering investigation (see
kmeans.py
andlda.py
). Maybe with more time and effort, ML could be useful for crash analysis. - (reach) Refactor downloader subpackage to live update data into an AWS database. Refactor notebooks to get data from AWS DB instead of manually downloading files.