Cassandra

Module Installing

The following external Python modules are required

  1. cassandra-driver
  2. numpy
  3. pandas

Cassandra Configuration

Add Cassandra to PATH environment variable
export PATH=$PATH:/temp/apache-cassandra-4.0.1/bin
Configure cassandra.yaml

cassandra.yaml on five nodes should have following configuration:

  • Set cluster name:

    cluster_name: 'groupM'
  • Set seed address

    seed_provider:
        - class_name: org.apache.cassandra.locator.SimpleSeedProvider
          parameters:
              # You can use any node addresses of your cluster as seeds
              # Ex: "<ip1>,<ip2>,<ip3>"
              # Here we use xcnc50.comp.nus.edu.sg and xcnc51.comp.nus.edu.sg as seeds
              - seeds: "192.168.51.13,192.168.51.14"
  • Set listening address

    listen_address: # leaving it blank leaves it up to InetAddress.getLocalHost().
  • Set port number

    # port for the CQL native transport to listen for clients on
    # For security reasons, you should not expose this port to the internet.  Firewall it if needed.
    native_transport_port: 6042
  • Set timeouts

    read_request_timeout_in_ms: 5000000
    range_request_timeout_in_ms: 10000000
    write_request_timeout_in_ms: 2000000
    counter_write_request_timeout_in_ms: 500000
    cas_contention_timeout_in_ms: 1000000
    truncate_request_timeout_in_ms: 600000
    request_timeout_in_ms: 1000000
Start Cassandra Server
cassandra

Preprocessing

Data preprocessing will take about three hours (mianly becuase of the creation of related_customer dataset) , we highly recommend to directly use the processed files saved in /temp/project_files/project_files_cassandra/data_files/

python3 preprocess_1.py
cqlsh localhost 6042 --request-timeout=100000 -f /home/stuproj/cs4224m/cs5424_cassandra/src/preprocess/load_raw.cql
python3 preprocess_2.py

Cassandra Setup and Data Loading

cqlsh localhost 6042 --request-timeout=100000 -f /home/stuproj/cs4224m/cs5424_cassandra/src/setup.cql

This will build data model and load data file into database.

Start Transaction Experiemnt

Put folders /benchmarking/node1 to /benchmarking/node5 into corresponding server (e.g. xcnc50.comp.nus.edu.sg to xcnc54.comp.nus.edu.sg). Inside which folders lies exp12.sh, we have to run these 5 exp12.sh at the same time on 5 servers by the same command:

exp12.sh [workload] [consistency_lvl]
  • workload = A or B
  • consistency_lvl = QUORUM or ROWA

When need to test another workload, please run /benchmarking/resetup.sh on node1.

Measurement Result

First save the dbstate by running /src/dbstate.py

python3 dbstate.py > dbstate.txt

Then run /src/stackClientMeasurement.py. You may need to check all the input and output path before generating the result at line 7 - line 11 in stackClientMeasurement.py.

It will generated clients.csv, throughput.csv and dbstate.csv.

The generated csv files are saved at /measurement/workloadX/