SegFit is a C++ implementation of the SegFit Algorithm proposed by the following paper,
Honglin Yu, Lexing Xie and Scott Sanner, The Lifecyle of a Youtube Video: Phases, Content and Popularity, Accepted by ICWSM-15
The goal is to segment a sequence of numbers into consecutive of phases and fit each of them with a generalized power-law.
This repository also includes an easy-to-use python wrapper.
2.1
Curently, the software are only tested under Linux (specifically, Ubuntu)
-
first make sure
g++
,gfortran
andcmake
are installed. -
Optional: If you need python support, install
python-dev
andlibboost-python-dev
(tested under version >=1.54) -
Then run
cmake .
in the folder. If no error happened, the executable is./segfit
. Python lib file is./libpysegfit.so
Please notice that, the index is "zero-based" in the main program and "one-based" in the python library.
###Main routine:###
- run
./segfit --help
to see all the options. Main options are
-s
: the input sequence. Numbers can be separated by either blank or ','-l
: how long of the input sequence from the beginning is to be segmented. For example, if running./segfit -s "1,2,3,4,5,6" -l 5
will only segment the first 5 element of the input sequence, i.e."1,2,3,4,5"
.-smp
: the penalty used to avoid "over-segmnetation" (\eta in the paper), default value is 2.3
- Output format:
- Each row is a segment(phase) found consecutively from left to right
- In each row, the meanings of each number are (from left to right) "start index", "end index", "parameter a", "parameter b", "parameter c" and "fitting order". For example, if we run
./segfit -s "1,2,3,4,5,1,4,9,16" -l 9
, the output can be seen as,
start index | end index | parameter a | parameter b | parameter c | fitting order | |
---|---|---|---|---|---|---|
1st phase: | 0 | 4 | 6.25 | 1 | 0 | 0 |
2rd phase: | 5 | 8 | 6.25 | 2 | 0 | 0 |
We can see the algorithm segments the sequence into two phases. Please notice that, the program scaled the sequnce to [0,100], so the parametes actually describe sequcence [6.25, 12.5, 18.75, 25.0, 31.25, 6.25, 25.0, 56.25, 100.0].
###Python library###
First, you need to put the libpysegfit.so
to somewhere in the PYTHONPATH
. Here is an example
>>> from libpysegfit import PySegFit
>>> psf = PySegFit()
>>> psf.get_para() # see the default parameters
{'lb': -6.0, 'btype': 2, 'm': 15.0, 'smp': 2.3, 'maxiter': 1000.0, 'factr': 5000.0, 'pgtol': 1e-05, 'ub': 6.0}
>>> psf.segfit([1,2,3,4,5,1,4,9,16]) # segfit a sequence
[(1, 5, 6.25, 1.0, 0.0, 0), (6, 9, 6.25, 2.0, 0.0, 0)]
###As a C++ library###
The code is very easy to be used as an c++ library. See main.cpp
(line 164) for example.
This piece of code is written by Honglin Yu, except for the following files,
lib/solver.f
: by Ciyou Zhu et al.lib/program.h(cpp)
: originally by Peter Carbonetto
GPLv3