/SliceNStitch

SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams (ICDE'21)

Primary LanguageC++Apache License 2.0Apache-2.0

SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams

Source code for SliceNStitch, described in the paper SliceNStitch: Continuous CP Decomposition of Sparse Tensor Streams by Taehyung Kwon*, Inkyu Park*, Dongjin Lee, and Kijung Shin, to be presented at ICDE 2021.

SliceNStitch is an algorithm for continous CANDECOMP/PARAFAC (CP) decomposition, which has numerous time-critical applications. It has the following properties:

  • Any time: updates factor matrices immediately without having to wait until the current time period ends
  • Fast: with constant-time updates up to 464x faster than online methods
  • Accurate: with fitness comparable (specifically, 72 - 100%) to offline methods.

Supplementary Document

Please see supplementary.

Datasets

All parsed datasets are available at this link. The source of each dataset is listed below.

Name Structure Size # Non-zeros Source
Divvy Bikes sources x destinations x time (minutes) 673 x 673 x 525594 3.82M Link
Chicago Crime communities x crime types x time (hours) 77 x 32 x 148464 5.33M Link
New York Taxi sources x destinations x time (seconds) 265 x 265 x 5184000 84.39M Link
Ride Austin sources x destinations x colors x time (minutes) 219 x 219 x 24 x 285136 0.89M Link

Requirements

  • C/C++17 compiler
  • CMake
  • Git

Used Libraries

Download

git clone --recursive https://github.com/DMLab-Tensor/SliceNStitch.git

Generate Build System

To generate the build system, run the following command on your terminal:

mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=RELEASE

After that, you can build the program using the build automation software (e.g. Make, Ninja).

Execution

To test the algorithms in the paper, run the following command on your terminal:

./SliceNStitch [config_file]

Config File

Example config file

# test.yaml
data:
    filePath: "nyt_2019.csv"  # Path of the data stream file
    nonTempDim: [265, 265]  # Non-temporal dimension
    tempDim: 5184000  # Temporal length of data stream

tensor:
    unitNum: 10  # The number of indices in the time mode (W)
    unitSize: 3600  # Period (T)
    rank: 20  # Rank of CPD (R)

algorithm:
    name: "SNS_RND+"
    settings:  # Details are described in the next section
        numSample: 20
        clipping: 1000.0

test:
    outputPath: "out.txt"  # Path of the output file
    startTime: 0  # Starting time of the input data to be processed
    numEpoch: 180000  # The number of epochs
    checkPeriod: 1  # Period of printing the error
    updatePeriod: 1  # Period of updating the tensor
    random: true  # Randomness of the initial points

Examples of Possible Algorithms

algorithm:
    name: "ALS"
    settings:
algorithm:
    name: "SNS_MAT"
    settings:
        numIter: 1
algorithm:
    name: "SNS_VEC"
    settings:
algorithm:
    name: "SNS_VEC+"
    settings:
        clipping: 1000.0  # Clipping value (η)
algorithm:
    name: "SNS_RND"
    settings:
        numSample: 20  # Threshold (θ)
algorithm:
    name: "SNS_RND+"
    settings:
        numSample: 20  # Threshold (θ)
        clipping: 1000.0  # Clipping value (η)

Input & Output Format

Input (data:filePath in a config file) must be a CSV file that consists of a multi-aspect data stream. Each row of the file is a single event and the file should be formatted as follows.

For a CSV file with N columns,

  • First (N-2) columns represent the non-temporal indices of events
  • The (N-1)th column represents the time indices of events
  • The last column represents the values of events

The output (test:outputPath) of the code will be:

-----Test Results-----
RMSE	Fitness
0.33438	0.675436
0.329659 0.639608
...
0.37203	0.692263

-----The Total Number of Updates-----
4030721

-----Total Elapse Time-----
330.706 (sec)

-----Elapse Time per each Update-----
8.20463e-05 (sec)

Reference

If you use this code as part of any research, please cite the following paper.

@inproceedings{kwon2021slicenstitch,
  title={Slicenstitch: Continuous cp decomposition of sparse tensor streams},
  author={Kwon, Taehyung and Park, Inkyu and Lee, Dongjin and Shin, Kijung},
  booktitle={2021 IEEE 37th International Conference on Data Engineering (ICDE)},
  pages={816--827},
  year={2021},
  organization={IEEE}
}