Improvement suggestion: need an equivalent of Pandas value_counts()
minhster99 opened this issue · 1 comments
minhster99 commented
Hi
I have a bit of background using Python's Pandas and I've been evaluating Tablesaw from this perspective.
One of the most useful functions in Pandas is value_counts. It allows us to understand more about the data in a specific column which has a small subset of values, eg days of the week, months of the year, enums, etc. Extremely useful during data exploratory work
This is an example Pandas code with the output
table['Marital_Status'].value_counts()
// groups the data by Marital_Status and give the count of each value, finally sort by descending count order
Married 864
Together 580
Single 480
Divorced 232
To do the equivalent in Tablesaw
table.summarize("Marital_Status", count).by("Marital_Status").sortDescendingOn("Count [Marital_Status]")
// note how the column name has to be repeated
Marital_Status | Count [Marital_Status] |
---------------------------------------------
Married | 864 |
Together | 580 |
Single | 480 |
Divorced | 232 |
It would be great if there was a convenience function like.
table.valueCounts("Marital_Status")
The important thing is not having to repeat the column name.
minhster99 commented
I have since discovered countBy which does precisely this!
table.countBy("Marital_Status")
edit: ok it doesnt sort so you'll have to do this part yourself