/text_histogram32

Repackage Bit.ly's data_hacks histogram for convenient script use.

Primary LanguagePythonOtherNOASSERTION

text_histogram

PyPI version Number of PyPI downloads

Histograms are great for exploring data, but numpy and matplotlib are heavy and overkill for quick analysis. They also can't be easily used on remote servers over ssh. Don't even get me started on installing them.

Bit.ly's data_hacks histogram.py is great but difficult to use from python code directly (it requires an optparse.OptionParser to pass histogram options). This is histogram.py repackaged for convenient script use.

Python 3.11.3 (tags/v3.11.3:f3909b8, Apr  4 2023, 23:49:59) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from text_histogram import histogram
>>> histogram([1, 2, 2, 2, 2, 3, 3, 3], graph_char='#')
# NumSamples = 8; Min = 1.00; Max = 3.00
# Mean = 2.250000; Variance = 0.437500; SD = 0.661438; Median 2.000000
# each # represents a count of 1
     1.0000 -      1.2000 [     1]: #
     1.2000 -      1.4000 [     0]:
     1.4000 -      1.6000 [     0]:
     1.6000 -      1.8000 [     0]:
     1.8000 -      2.0000 [     4]: ####
     2.0000 -      2.2000 [     0]:
     2.2000 -      2.4000 [     0]:
     2.4000 -      2.6000 [     0]:
     2.6000 -      2.8000 [     0]:
     2.8000 -      3.0000 [     3]: ###
>>> histogram([1, 2, 2, 2, 2, 3, 3, 3], graph_char='#', display_empty_buckets=False)
# NumSamples = 8; Min = 1.00; Max = 3.00
# Mean = 2.250000; Variance = 0.437500; SD = 0.661438; Median 2.000000
# each # represents a count of 1
     1.0000 -      1.2000 [     1]: #
     1.8000 -      2.0000 [     4]: ####
     2.8000 -      3.0000 [     3]: ###

Installation

pip uninstall text_histogram text_histogram3  # avoid conflicts
pip install text_histogram32

Source: https://github.com/clach04/text_histogram32

Fork of https://github.com/Kobold/text_histogram with a few tweaks:

  • Same name space as original text_histogram
  • Python 3 and 2.7 support
  • Support for control over character used for the bar (see graph_char and DEFAULT_graph_char option)
  • Support for NOT displaying empty buckets/bins/intervals (see display_empty_buckets and DEFAULT_display_empty_buckets option)
  • Fixes
    • Kobold#4 - Zero for min or max value
    • when min == max value
    • improved error handling for empty data sets

ToDo items

from text_histogram import histogram
python2_3_difference_with_low_counts = []

python2_3_difference_with_low_counts += [  2540] *  1142
python2_3_difference_with_low_counts += [  5071] *   163
python2_3_difference_with_low_counts += [  7602] *    67
python2_3_difference_with_low_counts += [ 10134] *    28
python2_3_difference_with_low_counts += [ 12665] *    17
python2_3_difference_with_low_counts += [ 15196] *    14
python2_3_difference_with_low_counts += [ 17728] *     4
python2_3_difference_with_low_counts += [ 20259] *     4
python2_3_difference_with_low_counts += [ 22790] *     2
python2_3_difference_with_low_counts += [ 25322] *     4
python2_3_difference_with_low_counts += [ 27853] *     2
python2_3_difference_with_low_counts += [ 30384] *     4
python2_3_difference_with_low_counts += [ 32915] *     2
python2_3_difference_with_low_counts += [ 40509] *     2
python2_3_difference_with_low_counts += [ 45572] *     1
python2_3_difference_with_low_counts += [ 50635] *     1
python2_3_difference_with_low_counts += [ 55697] *     1
python2_3_difference_with_low_counts += [ 63291] *     1
python2_3_difference_with_low_counts += [129105] *     1
python2_3_difference_with_low_counts += [162012] *     1
python2_3_difference_with_low_counts += [253139] *     1

print(python2_3_difference_with_low_counts)

histogram(python2_3_difference_with_low_counts, buckets=100, graph_char='#', display_empty_buckets=False)