Jupyter notebooks and individual scripts included.
core python implementation I created a small library called redpandas which have the following features
: It reads the *sv file and stores them as a python dict. with column headers as keys and values as list of those keys.
: example read_csv('NST-EST2015-alldata.csv', sep='\t')
: It takes input as dataframe which is the output of read_csv
and filters a particular colunm based of the condition specified in the lamda expression.
: example filter_df(df, 'SUMLEV', (lambda a: a==40))
: It takes input as the dataframe and column-name which is to be sorted. When that particular column is being sorted along with that all the other column values gets shuffled with respect to that column index
: example sort_df(df, 'POPESTIMATE2015')
: It takes dataframe and columns names as list and displays in tabular format
: example display_df(df, col= ['POPESTIMATE2015','NAME' ])
: histogram(data_list, bins=3) : equal_width_binning(l, bins=3) : mapper(bucket, item)
: It generates a histogram by transforming numerical values to categorical values having equal width bins. equal_width_binning(l, bins=3)
creates the break points and returns a bucket on n-bins having equal width. mapper(bucket, item)
is responsible for counting the values that fall under those bins. Finally histogram(data_list, bins=3)
converges all these and returns a dict having keys as buckets and values as count.
python-exercise.py python-exercise.py
: It is the main script which demonstrates the task-1 and task-1 in action.
Jupyter Files:
: Task-1
: Task-2
....preview
....preview