marianogappa/chart

Support customised category name instead of `category 0`, `category 1` and etc

GaaraZhu opened this issue · 8 comments

For example, I want to generate the bar chart for below csv

GaryZ@GaryZhus-MacBook-Pro-2 ~/Desktop $ cat test2.csv
,jack,jame
age,25,30
height,168,172

After running this command: cat test2.csv | chart bar , it generates below chart. Instead of showing jack and jame it shows category 0 and category 1 at the bottom.

example

I don't think column names are used as labels at the moment. I actually noticed similar. Seems like a good enhancement. 👍

This format is only common of csvs though; I wouldn't consider this as a built-in.
This is also pretty much only useful in the context of muti-series categorical charts.
Nevertheless, I agree it's a good idea.

How would you guys have it implemented? I'm leaning towards a command-line option flag to interpret the first line of STDIN as category labels.

I think there are a number of challenges with this. One obvious one to me is that even if one of variables is discrete, it may still have many unique values. Somehow I think this should be capped, like top 5, top 10, etc. I think implementing this as an option flag with ability to specify column number is one reasonable approach. But I certainly would allow for passing in the column number to use.

Another possibility is e.g.:

chart bar --categories "Jack,Jame"

Which works for datasets that are not csvs as well, but requires one to write the categories manually.

In the original solution, there wouldn't be a "which column number to use" problem in my opinion, because chart parses and knows which columns are floats, so the category name for each float column would always be in the first row and the same column number as that float.

I think my issue with having this information in the data is that you are technically requiring its use, instead of making it purely optional with an argument like --categories "Jack,Jame" or --categories="Jack,Jame". My preference, I think would be making it optional without the format dependency, because format dependencies may mean other tools stop working on same data, or two or more copies of data now have to be kept, etc.

Could we combine both by having --header when the input has the categories in the first row and --categories="Jack,Jame" when we want to manually add it?

Note that the example in the original post is one weird csv: generally in a csv you'd have a data point per line, but instead there's one column per line. I'm not convinced we should support that format as it's very rare. Also note that at the moment csvs are barely supported, because field escaping is not supported (i.e. fields wrapped in quotes or other separators).

However, I agree that support for category labels is desirable. @Kuraio 's suggestion sounds good, although I'd first check if it's possible to make the two flags the same name and take the string optionally, and if not then to have a similar name for both.