This manual provides detailed instruction to reproduce the analysis of this project.
-
Python version: Python 2.7.11
- Anaconda distribution for Python 2.7 is highly recommended since it has many of the following libraries preinstalled
-
Python Package Dependencies (older versions are not tested)
- pandas (0.17.1)
- numpy (1.10.4)
- matplotlib (1.5.1)
- statsmodels (0.6.1)
- csv (1.0)
- patsy (0.4.1)
- scipy (0.17.0)
- tablib (0.11.2)
-
Clone this repository to a local directory.
-
Download sample data of Enhanced TRACE from the WRDS database to sample cusip IDs. (A registered and subscribed account is required to download TRACE data)
-
Choose Date range to be the 1st month of interested period, select "Search the entire database" and select only two columns : Bond Symbol and CUSIP;
-
Change the Output Format to comma-delimited text**(*.csv)**, and keep everything else default;
-
Click Submit Query to acquire datasets. Wait for 10 sec to a few minutes, depending on size of the dataset, until a link to the .csv file appears below the query form;
-
Save the data file with name "cusip_id_start_raw.csv".
-
Repeat 1~4 with Date range set to last month of interested period. Save the data file with name "cusip_id_end_raw.csv".
-
Download Datastream data from a Datastream Terminal.
- Download list of bonds with fields ISIN, SIC and MOODY'S RATING as file "bond_list_datastream.csv".
- Open "bond_list_datastream.csv" with Excel, create a new column named "cusip_id" with each cell containing formula
=MID(ISIN_CELL, 3, 9)
. - Rename "MOODY'S RATING" to "MOODY_RATING" for easier Python usage.
-
Put the above data files in ./data/ folder of the repository.
-
In command prompt or Anaconda prompt, run
\MarketMakingProfitability> python cusip.py
to generate a list of cusip IDs
- A master list of cusip ID "cusip_id_list_TRACE.txt" is generated, as well as txt files for cusip ID lists of each market segment. The following instruction will use the master list as example.
- Go to Enhanced TRACE again to download raw data to be used for analysis
- Choose Date range to be the full length of the interested period;
- Choose "CUSIP" as search option and choose to upload a file. Upload "cusip_id_list_TRACE.txt" generated by step 5;
- Select ALL variables and Output Format as (.csv);
- Submit the query. Download the file to _.\data_ folder.
-
Go to Kenneth R. French's data library
-
Download the Fama/French 3 Factors in csv format;
-
Open the file with Excel, remove excess information (only keep data and column names "Date", "MKT", "SMB", "HML" and "RF");
-
Change the format of the Date colume to yyyymm;
-
Rename the file to "FF_monthly_data.csv"
-
Go to the Fed historical data page.
-
Download the monthly data of 20-year Nominal Treasury constant maturities;
-
Copy the data in the yield column to replace "RF" column in "FF_monthlly_data.csv" (make sure the dates are matched).
-
Go to the FRED Economic Data website to download the following three datasets
-
BofA Merrill Lynch US Corporate A Effective Yield: Download Monthly data (aggregated by End of Period), copy the data column to "FF_monthlly_data.csv" as a new column named "IGBonds";
-
10-year US Government Bond Yields benchmark: Download Monthly data, copy the data column to "FF_monthlly_data.csv" as a new column named "Gbond";
-
4-week Treasury Bill: Download Monthly data, copy the data column to "FF_monthlly_data.csv" as a new column named "Tbill";
-
Open "FF_monthlly_data.csv" with Excel, create a new column "DEF" by
=IGBonds_cell - Gbond_cell
and a new column "TERM" by=Gbond_cell - Tbill_cell
. Save "FF_monthlly_data.csv" in _.\data\ folder. -
Download BofA Merrill Lynch US Corporate A Effective Yield again in Daily, Close data, save to _.\data\ as "BofA_Corporate_Bond_Index.csv".
Now with all data ready in data folder, we can run the entire analysis procedure automatically with proper parameters set.
- Adjust configuration parameters by editing "MMP_config.py" file
- With raw datafile, set
readDaily = False
andreadRaw = True
. - Set the names of the raw files to parameter
RAW_DATA_NAMES
as a list. Note that all given raw files will be merged to one large raw file to run analysis as one portfolio. - Set
doOutput = False
to speed up data processing. IfdoOutput = True
, during the data cleaning, all transactions of each bond will be output to one csv file in folder _.\data\output_. - Set
dt_start
anddt_end
to the corresponded dates in yyyymmdd format. - Set
CLEAN_DATA_FILE
to be the desired prefix for analysis outputs.
- In command prompt, run
MarketMakingProfitability> python main.py
to start processing.
- Uncomment
print
statements in main function of "main.py" to output progress logs in console. - Depending the size of the raw dataset, this process will take from 5 mins to 1 hour.
- After first run of the script, a "_<CLEAN_DATA_FILE>daily.csv" file is generated. In "MMP_config.py" file, set
readDaily = True
andreadRaw = False
to speed up future analysis time.
The Python script will produce a series of output csv files as intermediate and final results of the analysis. All the files will have prefix CLEAN_DATA_FILE
set in "MMP_config.py". Use CLEAN_DATA_FILE = "list_clean_UTILITIES"
as an example.
-
list_clean_UTILITIES.csv: Fully processed clean transaction data
-
list_clean_UTILITIES_daily.csv: transaction data aggregated into daily yield
-
list_clean_UTILITIES_monthly_total_bond.csv: number of bonds traded each month
-
list_clean_UTILITIES_monthly_volume.csv: aggregated dollar volume traded each month
-
list_clean_UTILITIES_liquidity.csv: monthly liquidity measure and liquidity risk measure
-
list_clean_UTILITIES_Betas.csv: Betas of each bond resulted from Fama French regression
-
list_clean_UTILITIES_return_diff.csv: actual and expected yield from Fama French model, and regression error term
-
list_clean_UTILITIES_stats_summary.csv: adjusted R-squared of Fama French regression, and t-stats and p-stats of the Betas of each bond