rochefort - poor man's kafka (with in-place modifications and inverted index search)

PUSH DATA, GET FILE OFFSET; no shenanigans

(if you can afford to lose data and do your own replication)

disk write speed storage service that returns offsets to stored values
if you are ok with losing some data (does not fsync on write)
supports: append, multiappend, modify, get, multiget, close, query, compact
clients: go, java

turns out when you are fine with losing some data, things are much faster and simpler :)

run in docker

run with docker: jackdoe/rochefort:2.5

docker run -e BIND=":8000" \
           -e ROOT="/tmp/rochefort" \
           -p 8000:8000 \
           jackdoe/rochefort:2.5

breaking change between 0.5 and 1.0

added 4 more bytes in the header
the -buckets parameter is gone, so everything is appended in one file per namespace

you can migrate your data by doing:

oldServer.scan(namespace: ns) do |offset, v|
  newServer.append(namespace:ns, data: v)
end

breaking change between 1.x and 2.0

moved get/multiget/append to protobuf
moved delete/close to protobuf

parameters

root: root directory, files will be created at root/namespace||default/append.raw
bind: address to bind to (default :8000)

dont forget to mount persisted root directory

compile from source

$ go run main.go query.go input.pb.go -bind :8000 -root /tmp
2018/02/10 12:06:21 starting http server on :8000
....

APPEND/MULTI APPEND

res, err := r.Set(&AppendInput{
	AppendPayload: []*Append{{
		Namespace: ns,
		Data:      []byte("abc"),
                AllocSize: 10, // so you can do inplace modification
                Tags:      []string{"a","b","c"} // so you can search it
	}, {
		Namespace: ns,
		Data:      []byte("zxc"),
	}},
})

you can always do inplace modifications to an object, and you can also reserve some space to add more stuff to the same offset later

the searchable tags are sanitized as all non alphanumeric characters(excluding _) [^a-zA-Z0-9_]+ are removed

inverted index

passing tags a,b,c will create postings lists in the namespace a.postings, b.postings and c.postings, later you can query only specific tags with /query

MODIFY

_, err = r.Set(&AppendInput{
	ModifyPayload: []*Modify{{
		Namespace: ns,
		Offset:    off,
		Pos:       1,
		Data:      []byte("zxcv"),
	}},
})

inplace modifies position, for example if we want to replace 'abc' with 'szze' in the blob we appended at offset 0, we modify rochefort offset 0 with 'zz' from position 1 If you pass Pos: -1 it will append to the previous end of the blob

in AppendInput you can mix modify and append commands

GET/MULTI GET

fetched, err := r.Get(&GetInput{
	GetPayload: []*Get{{
		Namespace: "example",
		Offset:    offset1,
	}, {
		Namespace: "example,
		Offset:    offset12,
	}},
})

output is GetOutput which is just array of arrays of byte, so fetched[0] is array of bytes holding the first blob and fetched[1] is the second blob

NAMESPACE

you can also pass "namespace" parameter and this will create different directories per namespace, for example

namespace: events_from_20171111 
namespace: events_from_20171112

will crete {root_directory}/events_from_20171111/... and {root_directory}/events_from_20171112/...

and then you simply delete the directories you don't need (after closing them)

CLOSE/DELETE

Closes a namespace so it can be deleted (or you can directly delete it with DELETE)

STORAGE FORMAT

header is 16 bytes
D: data length: 4 bytes
R: reserved: 8 bytes
A: allocSize: 4 bytes
C: crc32(length, time): 4 bytes
V: the stored value

DDDDRRRRRRRRAAAACCCCVVVVVVVVVVVVVVVVVVVV...DDDDRRRRRRRRAAAACCCCVVVVVV....

as you can see the value is not included in the checksum, I am checking only the header as my usecase is quite ok with missing/corrupting the data itself, but it is not ok if corrupted header makes us allocate 10gb in output := make([]byte, dataLen)

SCAN

scans the file

$ curl http://localhost:8000/scan?namespace=someStoragePrefix > dump.txt

the format is [len 4 bytes(little endian)][offset 8 bytes little endian)]data...[len][offset]data

SEARCH

you can search all tagged blobs, the dsl is fairly simple, post/get json blob to /query

basic tag query

{"tag":"xyz"}

basic OR query

{"or": [... subqueries ...]}

basic AND query

{"and": [... subqueries ...]}

example:

curl -XGET -d '{"and":[{"tag":"c"},{"or":[{"tag":"b"},{"tag":"c"}]}]}' 'http://localhost:8000/query'

it spits out the output in same format as /scan, so the result of the query can be very big but it is streamed

LICENSE

MIT

naming rochefort

Rochefort Trappistes 10 is my favorite beer and I was drinking it while doing the initial implementation at sunday night

losing data + NIH

You can lose data on crash and there is no replication, so you have to orchestrate that yourself doing double writes or something.

The super simple architecture allows for all kinds of hacks to do backups/replication/sharding but you have to do those yourself.

My usecase is ok with losing some data, and we dont have money to pay for kafka+zk+monitoring(kafka,zk), nor time to learn how to optimize it for our quite big write and very big multi-read load.

Keep in mind that there is some not-invented-here syndrome involved into making it, but I use the service in production and it works very nice :)

non atomic modify

there is race between reading and modification from the client prespective

TODO

travis-ci
perl client
make c client that can be used from ruby/perl
javadoc for the java client
publish the java client on maven central