
A perl module for tracking the top-N most frequent items in a streaming data set using fixed memory.

Primary LanguagePerl


It seems that someone has already built this.  See:

  - https://github.com/gray/statistics-topk
  - http://search.cpan.org/dist/Statistics-TopK/

I'd recommend using that module unless you're particularly fond of mine
for some reason.

Algorithm-TopPercent version 0.02

A Perl extension for tracking the most popular items seen in a large
stream of data using fixed memory.

This module implements a simple algorithm first described to my by Udi
Manber when he was the Chief Scientist at Yahoo! Inc.  It implements
a set of data structures and a counting technique that allow you to
track the top-N (or top-N percent) in a stream of data using fixed
memory, provided that certain conditions are met.  See the DETAILS
section for more information.

I have reimplemented it mostly from my memory of his description
roughly 8 years ago.


To install this module type the following:

   perl Makefile.PL
   make test
   make install


This module requires these other modules and libraries:



Put the correct copyright and licence information here.

Copyright (C) 2010 by Jeremy Zawodny

This library is free software; you can redistribute it and/or modify
it under the same terms as Perl itself, either Perl version 5.10.1 or,
at your option, any later version of Perl 5 you may have available.