This project is intended to test write, read and query performances of several NoSQL data-stores like MongoDB, Cassandra, HBase, Redis, Solr and various others.
Supported: MongoDB, Solr, Cassandra
WIP: HBase
Pre-req's:
- java
- git
cd /opt
git clone https://github.com/cloudwicklabs/benchmark.git
To run the benchmark programs, use the wrapper script from bin
folder
cd /opt/benchmark
chmod +x bin/benchmark
bin/benchmark
with out any arguments, run
script will show the supported sub-commands that a user can run:
Usage: benchmark DRIVER
Possible DRIVER(s)
mongo mongo benchmark driver
solr solr benchmark driver
cassandra cassandra benchmark driver
where, each sub-command represents a driver interface to the application that's used to benchmark.
####Benchmarking Mongo Mongo benchmark driver can do the following:
- Benchmark inserts for given number of events (also supports concurrency)
- Benchmark random reads (also supports concurrency)
- Benchmark predefined aggregate queries on top of inserted data
$bin/benchmark mongo --help
mongo 0.7
Usage: mongo_benchmark [options] [<totalEvents>...]
-m <insert|read|agg_query> | --mode <insert|read|agg_query>
operation mode ('insert' will insert log events, 'read' will perform random reads & 'agg_query' performs pre-defined aggregate queries)
-u <value> | --mongoURL <value>
mongo connection url to connect to, defaults to: 'mongodb://localhost:27017' (for more information on mongo connection url format, please refer: http://goo.gl/UglKHb)
-e <value> | --eventsPerSec <value>
number of log events to generate per sec
<totalEvents>...
total number of events to insert|read
-s <value> | --ipSessionCount <value>
number of times a ip can appear in a session, defaults to: '25'
-l <value> | --ipSessionLength <value>
size of the session, defaults to: '50'
-b <value> | --batchSize <value>
size of the batch to flush to mongo instead of single inserts, defaults to: '0'
-t <value> | --threadsCount <value>
number of threads to use for write and read operations, defaults to: 1
-p <value> | --threadPoolSize <value>
size of the thread pool, defaults to: 10
-d <value> | --dbName <value>
name of the database to create|connect in mongo, defaults to: 'logs'
-c <value> | --collectionName <value>
name of the collection to create|connect in mongo, defaults to: 'logEvents'
-w <value> | --writeConcern <value>
write concern level to use, possible values: none, safe, majority; defaults to: 'none'
-r <value> | --readPreference <value>
read preference mode to use, possible values: primary, primaryPreferred, secondary, secondaryPreferred or nearest; defaults to: 'none'
where,
primary - all read operations use only current replica set primary
primaryPreferred - if primary is unavailable fallback to secondary
secondary - all read operations use only secondary members of the replica set
secondaryPreferred - operations read from secondary members, fallback to primary
nearest - use this mode to read from both primaries and secondaries (may return stale data)
-i | --indexData
index data on 'response_code' and 'request_page' after inserting, defaults to: 'false'
--shard
specifies whether to create a shard collection or a normal collection
--shardPreSplit
specifies whether to pre-split a shard and move the chunks to the available shards in the cluster
-o <value> | --operationRetries <value>
number of times a operation has to retired before exhausting, defaults to: '10'
--help
prints this usage text
Mongo Benchmark Example(s):
-
Inserts of 100000, 1000000 and 100000000 documents consecutively on single mongo instance:
bin/benchmark mongo --mode insert 100000 1000000 100000000
-
Inserts of 100000, 1000000 and 100000000 documents on sharded mongo cluster, this requires running the benchmark from
mongos
(mongo router)bin/benchmark mongo --mode insert 100000 1000000 100000000 --shard
-
Inserts of 100000, 1000000 and 100000000 documents consecutively and also indexes the data once inserted:
bin/benchmark mongo --mode insert 100000 1000000 100000000 --indexData
-
Insert data with indexing and custom batch size:
bin/benchmark mongo --mode insert 100000 1000000 100000000 --indexData --batchSize 5000
-
Benchmark random reads of 10000, 100000 and 1000000 documents:
bin/benchmark mongo --mode read 10000 100000 1000000
-
Perform aggregation queries on the inserted data
bin/benchmark mongo --mode agg_query
All the inserts and reads can be run concurrently using specified number of threads and threadPools, see options for more details
NOTE: By default, mongo benchmark driver tries to connect to local instance of mongo, please use --mongoURL
to
specify the path where the mongo is listening. For more details on how to build the url please visit
here.
Example connection URI scheme:
mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]
####Benchmarking Solr Solr benchmark driver can do the following:
- Benchmark inserts for a given range of inputs
- Benchmark random reads
- Benchmark custom user input queries
$bin/benchmark solr --help
solr 0.1
Usage: index_logs [options] [<totalEvents>...]
Indexes log events to solr
-m <index|read|search> | --mode <index|read|search>
operation mode ('index' will index logs, 'search' will perform search &'read' will perform random reads against solr core)
-u <value> | --solrServerUrl <value>
url for connecting to solr server instance
<totalEvents>...
total number of events to insert
-s <value> | --ipSessionCount <value>
number of times a ip can appear in a session
-l <value> | --ipSessionLength <value>
size of the session
-b <value> | --batchSize <value>
flushes events from memory to solr index, defaults to: '1000'
-c | --cleanPreviousIndex
deletes the existing index on solr core
-q <value> | --query <value>
solr query to execute, defaults to: '*:*'
--queryCount <value>
number of documents to return on executed query, default:10
--help
prints this usage text
Before, benchmarking Solr you have to create a solr collection|core named logs
. The following steps illustrate how to
create a collection named logs on single solr core instance:
cp -r $SOLR_HOME/example $SOLR_HOME/logs
cp -r $SOLR_HOME/tweets/solr/collection1 $SOLR_HOME/tweets/solr/logs
rm -r $SOLR_HOME/tweets/solr/collection1
echo 'name=logs' > $SOLR_HOME/tweets/solr/logs/core.properties
where, $SOLR_HOME
is path where solr is installed (unpacked).
Also,
schema.xml
from logs collection should be replaced withresources/schema.xml
from the project dirsolrconfig.xml
from logs collection should be replaced withresources/solrconfig.xml
from the project dir
Solr Driver Example(s):
-
To benchmark the inserts of 100000, 1000000 and 100000000 log events consecutively:
bin/benchmark solr --mode insert 100000 1000000 100000000
-
To benchmark the inserts of 100000, 1000000 and 100000000 log events consecutively while clearing existing index data:
bin/benchmark solr --mode insert 100000 1000000 100000000 --cleanPreviousIndex
-
To benchmark the inserts of 100000, 1000000 and 100000000 log events consecutively while clearing existing index data and also providing batch size using which the driver will flush the data:
bin/benchmark solr --mode insert 100000 1000000 100000000 --cleanPreviousIndex --batchSize 10000
-
To benchmark random reads
bin/benchmark solr --mode read 10000 100000 1000000
-
To execute custom queries and return specified number of documents
bin/benchmark solr --mode search --query '*:*' --queryCount 20
####Benchmarking Cassandra Cassandra benchmark driver can do the following:
- Benchmark inserts for a given range of inputs
- Generates random movie dataset events
- Can insert in both
batch
mode andnormal
mode, batch mode automatically batches a set of events and sends them to cassandra - supports both
sync
andasync
operations, async does not wait for the cassandra to give the ack back - supports operation retries, default to 10 retries for operation before exhausting
- Benchmark random reads
- Automatically builds random queries to query the data from inserted 4 tables
$bin/benchmark cassandra --help
cassandra 0.5
Usage: cassandra_benchmark [options] [<totalEvents>...]
-m <insert|read|query> | --mode <insert|read|query>
operation mode ('insert' will insert log events, 'read' will perform random reads & 'query' performs pre-defined set of queries on the inserted data set)
-n <value> | --cassNode <value>
cassandra node to connect, defaults to: '127.0.0.1'
<totalEvents>...
total number of events to insert|read
-b <value> | --batchSize <value>
size of the batch to flush to cassandra; set this to avoid single inserts, defaults to: '0'
-c <value> | --customersDataSize <value>
size of the data set of customers to use for generating data, defaults to: '1000'
-k <value> | --keyspaceName <value>
name of the database to create|connect in cassandra, defaults to: 'moviedata'
-d | --dropExistingTables
drop existing tables in the keyspace, defaults to: 'false'
-a | --aSyncInserts
performs asynchronous inserts, defaults to: 'false'
-r <value> | --replicationFactor <value>
replication factor to use when inserting data, defaults to: '1'
-o <value> | --operationRetries <value>
number of times a operation has to retired before exhausting, defaults to: '10'
-t <value> | --threadsCount <value>
number of threads to use for write and read operations, defaults to: 1
-p <value> | --threadPoolSize <value>
size of the thread pool, defaults to: 10
--help
prints this usage text
Cassandra Benchmark Example(s):
-
Inserts of 2500, 25000, 250000 of rows into 4 tables which is equivalent to 10000, 100000, 1000000 insertions
bin/benchmark cassandra --mode insert 2500 25000 250000
-
Inserts of 2500, 25000, 250000 of rows into 4 tables using batch inserts with a batch size of 500
bin/benchmark cassandra --mode insert 2500 25000 250000 --batchSize 500
-
Inserts of 2500, 25000, 250000 of rows into 4 tables using batch inserts with a batch size of 500 and also delete the previously existing data in the tables:
bin/run cassandra --mode insert 2500 25000 250000 --batchSize 500 --dropExistingTables
-
Inserts in asynchronous mode, which does not required acknowledge back from cassandra:
bin/benchmark cassandra --mode insert 2500 25000 250000 --batchSize 500 --dropExistingTables --aSyncInserts
-
Random reads
bin/benchmark cassandra --mode read 2500 25000 250000
All inserts and read operations in cassandra benchmark driver can be made concurrent by specifying theads count and thread poolsize, see options for more details
###License and Authors Authors: Ashrith
Apache 2.0. Please see LICENSE.txt
. All contents copyright (c) 2013, Cloudwick Labs.