Produce a python module (ayasdi_python_code.py
) which does the following:
Create a tab-delimited file (ayasdi_assignment.csv
) containing 20 columns and a million rows with the following characteristics:
- Column 1 (labeled as col1 is the index column where the values are 1 to 1 million)
- The next 9 columns (2 to 10) are labelled col2_x ... col10_x where each contains random values and 'x' is the mean mentioned in the next sentence. Each column has random data generated from a gaussian distribution at different means and variances. Additionally, each of these columns have 10% nulls.
- Columns 11 to 19 are labelled as col11...col19, where each column has random strings selected from the English Dictionary. 10% nulls in this column as well.
- Column 20 has random dates selected between January 1, 2014 to December 31, 2014. No nulls in this column.
Once this dataset has been created, load it into a single table in a sqlite database (ayasdi_assignment.db).
Guidelines:
- The code should follow PEP 8 guidelines (https://www.python.org/dev/peps/pep-0008/)
- Stick to core-python packages in the code (no pandas, numpy, etc.)
- Write object-oriented code.
- Include basic doctests in your code.
- The CSV file is comma separated instead of tab.
- Doctests not included.
- Codes are more in functional style rather than object oriented.