Emeralddb
Emeralddb is a json document nosql database like mongodb. It's based on BSON and boost. Currently it only supports basic document k/v operations. The internal data is managed by mmap. It supports secondry LinearHash-based and BTree-base index on primary key _id.
Limitations
- It doesn't support intuitive query language. It makes it really hard to use.
- The database is really a trial and shouln't be used in real case.
Geting started
- Make sure you have boost.
- Modify the Makefile.am to change the path of it. The default path of boost if in the parent path of src.
- Run ./build.sh If it has some problems, may be your versions of boost are incompatible with the version of bson. My boost version is 1.54.
Operation API
After making it, it will have a client program named edb and server program named emberalddb. Use it for test. Before running edb, you need firstly to run the emeralddb. You can type help for command usages
./edb
edb>> help
Connect
Before executing operations, you need connect the server first. The default port is 48127.
edb>> connect localhost:48127
It will return +OK to if your connection has been built sucessful.
Scheme
The first time, you use it you need to create a scheme.
edb>> create test
Then you can type show schemes to see how many databases you have.
edb>> show schemes
# delete scheme test
edb>> drop test
The scheme will be set to null by default. You need to specify it before executing any operations.
edb>> use test
Insert
Every data has an _id internally. Currently, You can specify the _id to increment or randomly. The structure of data is JSON.
# default _id
edb>> insert {"name":"lizhe", "age":"30"}
it will return the _id to show you the id
edb>> +OK _id = 1341409
# specify _id
edb>> insert {“_id”:1, "email": "lizhe.ted@gmail.com"}
# return
edb>> +OK _id=1
Delete
Without index, you can only delete data by _id.
edb>> delete {"_id":"1"}
# return
edb>> +OK
If you have an index on email, you can type:
edb>> delete {"email":"lizhe.ted@gmail.com"}
# you can also use wildcard expression for deletion
edb>> delete {"email": "lizhe*"}
Query
edb>> query {"_id":"1"}
# return
edb>> {"id":"1","email": "lizhe.ted@gmail.com"}
edb>> query {"_id", "2"}
# return
edb>> NULL
Create index
Limitation
- index can only contain one column.
- the column value should be unique. Very sad, isn't it? come to help me!
You can specify the index type in creating index. Default is hash-base index.
command format is:
create index on [database] (name [AESC|DESC]) [hash|btree]
edb>> create index on test (email) btree
Log
The system log is stored in [dialog.log]
The operation log is stored in [oper.log].
Currently the operation log is handled very badly. It stores all the data in one file and will not rotate automatically.
Driver
Currently it only supports C++ and Java. The java driver is in the /driver/java. If you want to test it in distributed system, you can configure the client with Ketama hashing.
Internal
Protocol
The protocol is really trivial. Every message begins with a msg header. It begins with a msglen, and a type to indicate the type of msg. For simplification, both of them is stored formated with int32. the format is like:
********MsgHeader*****
=====================
| Length |
=====================
| Type |
=====================
Currently, it has the following types.
#define OP_REPLY 1
#define OP_INSERT 2
#define OP_MULTI_INSERT 3
#define OP_DELETE 3
#define OP_QUERY 4
#define OP_CREATE_SCHEME 16
#define OP_DROP_SCHEME 17
#define OP_CREATE_INDEX 18
#define OP_DROP_INDEX 19
#define OP_CONNECT 32
#define OP_DISCONNECT 33
#define OP_SNAPSHOT 33
For client, the remaining msg can be the data you want to insert. The idea is simple, if it has only one element, the remainng msg should only be your data. If it has multiple elements, it starts with a int to indicate the number of elements. And every elements starts with a int to indicate the length of it. For example the multiple insert should be like:
============================
| MSGHEADER |
============================
| elments_num |
============================
| element1_len |
============================
| element1_data |
============================
| element2_len |
===========================
| element2_data |
............................
Storage structure
The internal storage structure is a little bit like mysql, but it's much more easy. It has 4 kinds of structure to manage the data.
DATABASE STRUCTURE
It firstly reads this header before openning the database. It contains the basic information of database. Magic is to verify the header. Flag indicate if the database is closed rightly the last time. Scheme Num is the scheme currently the database have. And the remaining is the name of each scheme. We use scheme name to locate the scheme file.
/********************************
SCHEME STRUCTURE
=============================
| Magic |
=============================
| Page Num |
=============================
| Record Num |
=============================
| INDEX LIST |
*********************************/
SCHEME STRUCTURE
It contains the basic information of scheme. It has a fixed size of 4096.
/********************************
SCHEME STRUCTURE
=============================
| Magic |
=============================
| Page Num |
=============================
| Record Num |
=============================
| INDEX LIST |
.............................
*********************************/
Index structure stores the information of index. Currently it only supports BTREE Index and LinearHash index, and one to one index.
*********************
INDEX STRUCTRE
=====================
| Type |
=====================
| Field Num |
=====================
| Field Name 1 | (64char)
=====================
| Field Name 2 |
PAGE STRUCTURE
The data is managed by page using mmap. Page has a fixed size of 4096. Every time the database read the data, it should allocate the whole page to the memory. The data in the page is stored reversely from end to begin. In this way, we can easily check if the page is full. Slot is the index of record in this page.
*****************************
PAGE STRUCTURE
=============================
| PAGE HEADER |
=============================
| Slot List |
=============================
| Free Space |
=============================
| Data |
=============================
When the space is full, we will extend the database. At this moment, we will not extend only one page(too small), we will use a term SEGMENT, which contains a bunch of pages, currently it configures with 64.
RECORD STRUCTURE
Using both PAGEID and SLOTID to locate the exact record.
*****************************
RECORD STRUCTURE
=============================
| PAGEID |
=============================
| SLOTIT |
=============================
Contact
If you have problems, fell free to contact me at any time. My email is: lizhe.ted@gmail.com
If you have interest in developing this project, just fork it, and do what you want.