Cassandra

Module Installing

The following external Python modules are required

cassandra-driver
numpy
pandas

Cassandra Configuration

Add Cassandra to PATH environment variable

export PATH=$PATH:/temp/apache-cassandra-4.0.1/bin

Configure `cassandra.yaml`

cassandra.yaml on five nodes should have following configuration:

Set cluster name:
```
cluster_name: 'groupM'
```

Set seed address

seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          # You can use any node addresses of your cluster as seeds
          # Ex: "<ip1>,<ip2>,<ip3>"
          # Here we use xcnc50.comp.nus.edu.sg and xcnc51.comp.nus.edu.sg as seeds
          - seeds: "192.168.51.13,192.168.51.14"

Set listening address

listen_address: # leaving it blank leaves it up to InetAddress.getLocalHost().

Set port number

# port for the CQL native transport to listen for clients on
# For security reasons, you should not expose this port to the internet.  Firewall it if needed.
native_transport_port: 6042

Set timeouts

read_request_timeout_in_ms: 5000000
range_request_timeout_in_ms: 10000000
write_request_timeout_in_ms: 2000000
counter_write_request_timeout_in_ms: 500000
cas_contention_timeout_in_ms: 1000000
truncate_request_timeout_in_ms: 600000
request_timeout_in_ms: 1000000

Start Cassandra Server

cassandra

Preprocessing

Data preprocessing will take about three hours (mianly becuase of the creation of related_customer dataset) , we highly recommend to directly use the processed files saved in /temp/project_files/project_files_cassandra/data_files/

python3 preprocess_1.py

cqlsh localhost 6042 --request-timeout=100000 -f /home/stuproj/cs4224m/cs5424_cassandra/src/preprocess/load_raw.cql

python3 preprocess_2.py

Cassandra Setup and Data Loading

cqlsh localhost 6042 --request-timeout=100000 -f /home/stuproj/cs4224m/cs5424_cassandra/src/setup.cql

This will build data model and load data file into database.

Start Transaction Experiemnt

Put folders /benchmarking/node1 to /benchmarking/node5 into corresponding server (e.g. xcnc50.comp.nus.edu.sg to xcnc54.comp.nus.edu.sg). Inside which folders lies exp12.sh, we have to run these 5 exp12.sh at the same time on 5 servers by the same command:

exp12.sh [workload] [consistency_lvl]

workload = A or B
consistency_lvl = QUORUM or ROWA

When need to test another workload, please run /benchmarking/resetup.sh on node1.

Measurement Result

First save the dbstate by running /src/dbstate.py：

python3 dbstate.py > dbstate.txt

Then run /src/stackClientMeasurement.py. You may need to check all the input and output path before generating the result at line 7 - line 11 in stackClientMeasurement.py.

It will generated clients.csv, throughput.csv and dbstate.csv.

The generated csv files are saved at /measurement/workloadX/

zirui-huang73/CS5424-distributed-database