DataSketches - sketch data structures - for Ruby
Add this line to your application’s Gemfile:
gem "datasketches"
Distinct counting
Most frequent
Quantiles and histograms
Sampling
Create a sketch
sketch = DataSketches::CpcSketch.new
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Estimate the count
sketch.estimate
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::CpcSketch.deserialize(data)
Get the union
u = DataSketches::CpcUnion.new(14)
u.update(sketch1)
u.update(sketch2)
u.result
Create a sketch
sketch = DataSketches::HllSketch.new(14)
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Estimate the count
sketch.estimate
Save a sketch
data = sketch.serialize_updatable
# or
data = sketch.serialize_compact
Load a sketch
sketch = DataSketches::HllSketch.deserialize(data)
Get the union
u = DataSketches::HllUnion.new(14)
u.update(sketch1)
u.update(sketch2)
u.result
Create a sketch
sketch = DataSketches::UpdateThetaSketch.new
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Estimate the count
sketch.estimate
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::UpdateThetaSketch.deserialize(data)
Get the union
u = DataSketches::ThetaUnion.new
u.update(sketch1)
u.update(sketch2)
u.result
Get the intersection
i = DataSketches::ThetaIntersection.new
i.update(sketch1)
i.update(sketch2)
i.result
Compute A not B
d = DataSketches::ThetaANotB.new
d.compute(a, b)
Create a sketch
sketch = DataSketches::FrequentStringsSketch.new(64)
Add data
sketch.update("a")
sketch.update("b")
sketch.update("c")
Estimate the frequency of an item
sketch.estimate("a")
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::FrequentStringsSketch.deserialize(data)
Create a sketch
sketch = DataSketches::KllIntsSketch.new
# or
sketch = DataSketches::KllFloatsSketch.new
Add data
sketch.update(1)
sketch.update(2)
sketch.update(3)
Get quantiles
sketch.quantile(0.5)
Get the minimum and maximum values from the stream
sketch.min_value
sketch.max_value
Save a sketch
data = sketch.serialize
Load a sketch
sketch = DataSketches::KllIntsSketch.deserialize(data)
Merge sketches
sketch.merge(sketch2)
Create a sketch
sketch = DataSketches::VarOptSketch.new(14)
Add data
sketch.update(1)
sketch.update(2.0)
sketch.update("three")
Sample data
sketch.samples
This library is modeled after the DataSketches Python API.
View the changelog
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone --recursive https://github.com/ankane/datasketches-ruby.git
cd datasketches-ruby
bundle install
bundle exec rake compile
bundle exec rake test