Caskdb is a distributed key-value store inspired by consistent hashing, beansdb and beanseye. I implemented it for learning purpose.
- automatic data partition
- dynamic node adding
- multiple replication
- data recovery (when restarting after crashed)
Install Go first, then
go clone URL(caskdb, bitcask_go)
mv caskdb bitcask_go GOPATH/src/
go get
go install caskdb/master
go install caskdb/datanode
Prepare configure file(according to caskdb/master/conf/example.ini) and make sure static/ is in current path
> master
datanode -port=7901 -dbpath="test1" -debug
datanode -port=7902 -dbpath="test2" -debug
Open localhost:7908 in browser to monitor the state of datanodes
cd caskdb/bench
go build
./bench -sz=1K -n=1000 -t=W
Caskdb is designed under the master-slave architecture. Master node is responsible to receiving all requests from clients and send them to data nodes according to its data partition algorithm. It also serves as a cluster monitor which provides a web interface. Data nodes mainly provide an efficient set,get and delete interfaces to outside.
- In Caskdb, the mapping relationship between keys and nodes are determined by consistent hashing algorithm.
- Every key/value pair is stored in two different nodes.
- Modify the configure file(add address of new node).
- Master node will notice the update of configure file, recalculate the hashing circle and send data migration tasks.
- Some data nodes will execute the migration tasks.
- After all tasks are done, the new node is successfully added.
In Caskdb, the kv engine is an implementation of Bitcask. The merging operation could be triggered by size of datafile, time window and percentage of useless data.
type Options struct {
MaxFileSize int32
MergeWindow [2]int // startTime-EndTime
MergeTrigger float32
Path string
server_port=7900 # default port
port=7905 # proxy port for accessing
port=7908 # monitor port for web
proxy=localhost:7905 # proxy list to monitor
The monitor part is actually stoled from the monitor implementation in beanseye.