The SurveyDataSet object takes a CSV in its constructor. init() parses every line and builds an OrderedDict containing the questions as keys to a list of data points (values / answers).
Once the data has been parsed into the ordered dictionary, the dictionary is iterated on, creating a list of new Question objects. A Question object stores the question name, the results for that question, and a reference to the "parent" SurveyDataSet object.
foreach survey_result (line in the csv) | O(n)
foreach question_result (column in the csv) | *O(m)
build the ordered dictionary (data) | * ((O(1) lookup + O(1) append))
foreach question in ordered dictionary (column in the csv) | O(m)
create a new question and append it to the member data list | O(1) append
This solution has a time complexity of O(n * m)
assuming ordered dictionary
and list getters and appends are O(1).
(Reference)
- n = # of rows (results) in the csv
- m = # of columns (questions) in the csv
In order to implement Question.get_conditionals()
, we want to pre-compute and store
the list of Questions which could be determiners for each Question in the survey.
Once this data is stored as class member data (self.conditionals
),
get_conditionals()
can be implemented as a simple getter for that data.
foreach survey_result (line in the csv) | O(n)
foreach question_result (column in the csv) | *O(m)
build the ordered dictionary (data) | *((O(1) lookup + O(1) append))
foreach question in ordered dictionary (column in the csv) | O(m)
get_conditionals(ordered dictionary from first to current element) | *O(m * n)
create a new question and append it to the member data list | O(1) append
get_conditionals(list of questions): O(m * n)
if list of results contains no None(s), return | O(n)
foreach q in dictionary of questions | O(m)
return if results in q are all the same | O(n)
conditional_list.append(complement(q, current_q)) | O(n)
return conditional_list
complement(list_a, list_b) O(n)
a_null_matches = result matches from `a` which correspond with None items in `b` | O(n)
comp_a_null_matches = the complement of a_null_matches | O(n)
if an item from comp_a_null_matches corresponds to every non-None result
and does not appear in the a_null_matches list, we have a conditional
This solution has a time complexity of O(m^2 * n)
assuming ordered dictionary
and list getters and appends are O(1).
(Reference)
- n = # of rows (results) in the csv
- m = # of columns (questions) in the csv
- I spent about 3 hours figuring out the current code and the solution
- I spent about 6 hours writing python code
- I would be curious to run this against a larger data set, to spot check for correctness and to get a sense for real-world runtime
- For the
self.conditional
member data, check that references to the objects are passed / stored, as opposed to copies.- It looks like references are stored in parent_dataset. I'm assuming we're good here but worth further eval if this were production code / I had time to learn more about references in python!
The first commit (c7a4ffe) contains the updates I found necessary for python3 compatibility. Ordinarily I would include these changes in a separate PR, but in this case given I don't have PR access to the originally forked repo. The following are my notes (I would typically include these in a PR):
- Removed
unicode
andstr
magic methods no longer necessary in python3 - Tweaked line length to adhere to PEP-8
- Simplified the csv open
- 'r' is default, 'b' is no longer necessary, U is deprecated
__repr__
as written was causing an infinite recursive loop- I wonder why?
- I made the output format match the original implementation,
however it seems that using
__str__()
as opposed to__repr__()
might be preferred becauserepr()
is suggested to return a valid python expression - I found the previous two items worth noting, but not worth spending much time on