Tag Archives: database

Redis Database

How to Setup Redis Cluster from Source

What is redis

Redis is an open source in-memory database. It stores data in key-value format. Because of residing in memory, redis is an excellent tool for caching. Redis provides a rich set of data types. This gives redis upper hand over Memcached. Apart from caching, redis can be used as distributed message broker.

Redis Cluster and Sentinels

To achieve high availability, redis can be deployed in cluster along with Sentinels. Sentinel is a feature of redis. Multiple sentinels are deployed across redis clusters for monitoring purpose. When redis master goes down, sentinels elect a new master from slaves. When old master comes up again, it is added as slave.

Another use case of clustering is a distribution of load. In high load environment, we can send write requests to master and read request to slaves.

This tutorial is specifically focused on Redis Cluster Master Slave model. We will not cover data sharding across cluster here. In data sharding, keys are distributed across multiple redis nodes.

Setup for tutorial

For this tutorial, we will use 3 (virtual) servers. On one server Redis master will reside while other two servers will be used for slaves. Standard redis port is 6379. To differentiate easily, we will run master on 6379 port and slaves on
6380 and 6381 ports. Same will be applied for sentinel services. Master sentinel will listen on 16379 port while slave sentinels will be on 16380 and 16381.

Lets put this easy way.

    Server 1 (Redis Master)
    Redis Service at host 192.168.0.1 and port tcp:6379
    Sentinel Service at host 192.168.0.1 and port tcp:16379
    Server 2 (Redis Slave 1)
    Redis Service at host 192.168.0.2 and port tcp:6380
    Sentinel at host 192.168.0.2 and port tcp:16380
    Server 3 (Redis Slave 2)
    Redis Service at host 192.168.0.3 and port tcp:6381
    Sentinel Service at host 192.168.0.3 and port tcp:16381

This tutorial is tested on CentOS 6.9. For CentOS 7.X, check below Notes Section.

Installation

We will follow same installation steps for setting up of all servers. Only difference will be in configurations.

  • Step 1: Grab redis source, make and install
  • Step 2: Setup required directories
  • Step 3: Configure redis master
  • Step 4: Configure redis master sentinel
  • Step 5: Add low privileged user to run redis
  • Step 6: Setup init scripts
  • Step 7: Start service

Server 1 (Redis Master)


Install Redis

    cd /opt
    wget download.redis.io/releases/redis-4.0.9.tar.gz
    tar -xf redis-4.0.9.tar.gz
    cd redis-4.0.9
    make
    make install

Setup required directories

    mkdir -p /etc/redis /var/run/redis /var/log/redis /var/redis/6379
    cp redis.conf redis.conf.bak
    cp redis.conf /etc/redis/6379.conf

Configure redis master

Edit config file `/etc/redis/6379.conf` in your favorite editor and change below options.

    bind 192.168.0.1
    port 6379
    daemonize yes
    pidfile "/var/run/redis/redis_6379.pid"
    logfile "/var/log/redis/redis_6379.log"
    dir "/var/redis/6379"

Configure redis master sentinel

Add config file for sentinel at `/etc/redis/sentinel_6379.conf`. Open a file and add below content

    bind 192.168.0.1
    port 16379
    sentinel monitor redis-cluster 192.167.0.1 6379 2
    sentinel down-after-milliseconds redis-cluster 5000
    sentinel parallel-syncs redis-cluster 1
    sentinel failover-timeout redis-cluster 10000
    daemonize yes
    pidfile "/var/run/redis/sentinel_6379.pid"
    dir "/var/redis/6379"

Add non-privileged user

    adduser redis -M -g daemon
    passwd -l redis
    chown -R redis:daemon /opt/redis-4.0.9
    chown -R redis:daemon /var/run/redis
    chown -R redis:daemon /var/log/redis
    chown -R redis:daemon /var/redis/

Setup init scripts

You can find sample init scripts in Notes section below.

    cp redis-6379-init-script /etc/init.d/redis_6379
    chmod 750 /etc/init.d/redis_6379
    chkconfig redis_6379 on
    cp sentinel-6379-init-script /etc/init.d/sentinel_6379
    chmod 750 /etc/init.d/sentinel_6379
    chkconfig sentinel_6379 on

Start service

    service redis_6379 start
    service sentinel_6379 start

Server 2 (Redis Slave 1)


Install Redis

    cd /opt
    wget download.redis.io/releases/redis-4.0.9.tar.gz
    tar -xf redis-4.0.9.tar.gz
    cd redis-4.0.9
    make
    make install

Setup required directories

    mkdir -p /etc/redis /var/run/redis /var/log/redis /var/redis/6380
    cp redis.conf redis.conf.bak
    cp redis.conf /etc/redis/6380.conf

Configure redis slave 1

Edit config file `/etc/redis/6380.conf` in your favorite editor and change below options.

    bind 192.168.0.2
    port 6380
    daemonize yes
    pidfile "/var/run/redis/redis_6380.pid"
    logfile "/var/log/redis/redis_6380.log"
    dir "/var/redis/6380"
    slaveof 192.168.0.1 6379

Configure redis slave 1 sentinel

Add config file for sentinel at `/etc/redis/sentinel_6380.conf`. Open a file and add below content

    bind 192.168.0.2
    port 16380
    sentinel monitor redis-cluster 192.167.0.1 6379 2
    sentinel down-after-milliseconds redis-cluster 5000
    sentinel parallel-syncs redis-cluster 1
    sentinel failover-timeout redis-cluster 10000
    daemonize yes
    pidfile "/var/run/redis/sentinel_6380.pid"
    dir "/var/redis/6380"

Add non-privileged user

    adduser redis -M -g daemon
    passwd -l redis
    chown -R redis:daemon /opt/redis-4.0.9
    chown -R redis:daemon /var/run/redis
    chown -R redis:daemon /var/log/redis
    chown -R redis:daemon /var/redis/

Setup init scripts

You can find sample init scripts in Notes section below. Change `$HOST` and `$PORT` values accordingly

cp redis-6380-init-script /etc/init.d/redis_6380
    chmod 750 /etc/init.d/redis_6380
    chkconfig redis_6380 on
cp sentinel-6380-init-script /etc/init.d/sentinel_6380
    chmod 750 /etc/init.d/sentinel_6380
    chkconfig sentinel_6380 on

Start service

    service redis_6380 start
    service sentinel_6380 start

Server 3 (Redis Slave 2)


Install Redis

    cd /opt
    wget download.redis.io/releases/redis-4.0.9.tar.gz
    tar -xf redis-4.0.9.tar.gz
    cd redis-4.0.9
    make
    make install

Setup required directories

    mkdir -p /etc/redis /var/run/redis /var/log/redis /var/redis/6381
    cp redis.conf redis.conf.bak
    cp redis.conf /etc/redis/6381.conf

Configure redis slave 2

Edit config file `/etc/redis/6381.conf` in your favorite editor and change below options.

    bind 192.168.0.3
    port 6381
    daemonize yes
    pidfile "/var/run/redis/redis_6381.pid"
    logfile "/var/log/redis/redis_6381.log"
    dir "/var/redis/6381"
    slaveof 192.168.0.1 6379

Configure redis slave 2 sentinel

Add config file for sentinel at `/etc/redis/sentinel_6381.conf`. Open a file and add below content

    bind 192.168.0.3
    port 16381
    sentinel monitor redis-cluster 192.167.0.1 6379 2
    sentinel down-after-milliseconds redis-cluster 5000
    sentinel parallel-syncs redis-cluster 1
    sentinel failover-timeout redis-cluster 10000
    daemonize yes
    pidfile "/var/run/redis/sentinel_6381.pid"
    dir "/var/redis/6381"

Add non-privileged user

    adduser redis -M -g daemon
    passwd -l redis
    chown -R redis:daemon /opt/redis-4.0.9
    chown -R redis:daemon /var/run/redis
    chown -R redis:daemon /var/log/redis
    chown -R redis:daemon /var/redis/

Setup init scripts

You can find sample init scripts in Notes section below. Change `$HOST` and `$PORT` values accordingly

    cp redis-6381-init-script /etc/init.d/redis_6381
    chmod 750 /etc/init.d/redis_6381
    chkconfig redis_6381 on
    cp sentinel-6381-init-script /etc/init.d/sentinel_6381
    chmod 750 /etc/init.d/sentinel_6381
    chkconfig sentinel_6381 on

Start service

    service redis_6381 start
    service sentinel_6381 start

Sentinel Testing

    # Connect to any redis sentinel
    /usr/local/bin/redis-cli -h 192.168.0.3 -p 16381

    # To see current masters
    192.168.0.3:16381> SENTINEL masters

    # To see slaves for given cluster
    192.168.0.3:16381> SENTINEL slaves redis-cluster

Redis Fail-over Testing

For fail-over testing, we can take down redis-master either using init script or below command.

    # Connect to redis master and execute below command
    /usr/local/bin/redis-cli -h 192.168.0.1 -p 6379
    192.168.0.1:6379> DEBUG SEGFAULT

Also we can force sentinel to run fail over using below command

    # Forced failure
    192.168.0.1:6379> SENTINEL failover redis-cluster
    # Check after few seconds. You should get new master
    192.168.0.1:6379> SENTINEL masters

Sample init scripts

Redis Init Script

#!/bin/sh
#
# Simple Redis init.d script conceived to work on Linux systems
# as it does use of the /proc filesystem.
#
# chkconfig: 345 85 15
# description:  Redis is persistent key-value database
# processname: redis_6379

# Source function library
. /etc/init.d/functions

REDISHOST=192.168.0.1
REDISPORT=6379
EXEC=/usr/local/bin/redis-server
USER=redis
CLIEXEC=/usr/local/bin/redis-cli
PIDFILE=/var/run/redis/redis_${REDISPORT}.pid
CONF="/etc/redis/${REDISPORT}.conf"

case "$1" in
    start)
        if [ -f $PIDFILE ]
        then
                echo "$PIDFILE exists, process is already running or crashed"
        else
                echo "Starting Redis server..."
                daemon --user $USER $EXEC $CONF
        fi
        ;;
    stop)
        if [ ! -f $PIDFILE ]
        then
                echo "$PIDFILE does not exist, process is not running"
        else
                PID=$(cat $PIDFILE)
                echo "Stopping ..."
                daemon --user $USER $CLIEXEC -h $REDISHOST -p $REDISPORT shutdown
                while [ -x /proc/${PID} ]
                do
                    echo "Waiting for Redis to shutdown ..."
                    sleep 1
                done
                echo "Redis stopped"
        fi
        ;;
    restart)
        stop
        sleep 3
        start
        ;;
    *)
        echo "Usage: $PROG_NAME {start|stop|restart}"
		exit 1
		;;
esac

Sentinel Init Script

#!/bin/sh
#
# Simple Redis Sentinel init.d script conceived to work on Linux systems
# as it does use of the /proc filesystem.
#
# chkconfig: 345 86 15
# description:  Redis Sentinel to monitor redis cluster
# processname: sentinel_6379

# Source function library
. /etc/init.d/functions

REDIS_PORT=6379
SENTINEL_PORT=16379
EXEC=/usr/local/bin/redis-server
CLIEXEC=/usr/local/bin/redis-cli
USER=redis

PIDFILE=/var/run/redis/sentinel_${REDIS_PORT}.pid
CONF="/etc/redis/sentinel_${REDIS_PORT}.conf"

case "$1" in
    start)
        if [ -f $PIDFILE ]
        then
                echo "$PIDFILE exists, process is already running or crashed"
        else
                echo "Starting Redis Sentinel server..."
                daemon --user $USER $EXEC $CONF --sentinel
        fi
        ;;
    stop)
        if [ ! -f $PIDFILE ]
        then
                echo "$PIDFILE does not exist, process is not running"
        else
                PID=$(cat $PIDFILE)
                echo "Stopping ..."
                kill -9 $PID
                rm $PIDFILE
                while [ -x /proc/${PID} ]
                do
                    echo "Waiting for Redis Sentinel to shutdown ..."
                    sleep 1
                done
                echo "Redis Sentinel stopped"
        fi
        ;;
    restart)
        stop
        sleep 3
        start
        ;;
    *)
        echo "Usage: $PROG_NAME {start|stop|restart}"
		exit 1
		;;
esac

Notes

Security

  • NEVER EVER run redis on public interface
  • If redis is deployed in cloud environment like AWS, set up security groups/firewalls carefully. Most of times, cloud providers use ephemeral ips. Because of ephermal ips, even redis is bound to private ip, it can be accessed over public interface.
  • For more security, dangerous commands can be disabled(renamed). But be careful with them in cluster environment.
  • Redis also provides simple authentication mechanism. It is not covered here because of scope.

Sentinel management

  • During redis fail-over, config files are rewritten by sentinel program. So when restarting redis-cluster, be careful.

Sources

  • https://redis.io/topics/cluster-tutorial
  • https://redis.io/topics/security
  • https://redis.io/commands/debug-segfault
Dbf file in python

What is DBF file? How to read it in linux and python?

What is DBF files ?

A DBF file is a standard database file used by dBASE, a database management system application. It organises data into multiple records with fields stored in an array data type. DBF files are also compatible with other “xBase” database programs, which became an important feature because of the file format’s popularity.

Tools which can read or open DBF files

Below are list of program which can read and open dbf file.

  • Windows
    1. dBase
    2. Microsoft Access
    3. Microsoft Excel
    4. Visual Foxpro
    5. Apache OpenOffice
    6. dbfview
    7. dbf Viewer Plus
  • Linux
    1. Apache OpenOffice
    2. GTK DBF Editor

How to read file in linux ?

“dbview” command available in linux, which can read dbf files.

Below code snippet show how to use dbview command.

[lalit : temp]₹ dbview test.dbf 
Name       : John
Surname    : Miller
Initials   : JM
Birthdate  : 19800102

Name       : Andy
Surname    : Larkin
Initials   : AL
Birthdate  : 19810203

Name       : Bill
Surname    : Clinth
Initials   : 
Birthdate  : 19820304

Name       : Bobb
Surname    : McNail
Initials   : 
Birthdate  : 19830405

[lalit : temp]₹ 

 How to read it using python ?

dbfread” is the library available in python to read dbf files. This library reads DBF files and returns the data as native Python data types for further processing.

dbfread requires python 3.2 or 2.7.  dbfread is a pure python module, so doesn’t depend on any packages outside the standard library.

You can install library by the command below.

pip install dbfread

The below code snippet can read dbf file and retrieve data as python dictionary.

>>> from dbfread import DBF

>>> for record in DBF('people.dbf'):
...     print(record)

<strong>Out Put</strong>
OrderedDict([('NAME', 'Alice'), ('BIRTHDATE', datetime.date(1987, 3, 1))])
OrderedDict([('NAME', 'Bob'), ('BIRTHDATE', datetime.date(1980, 11, 12))])

You can also use the with statement:

with DBF('people.dbf') as table:
    ...

By default the records are streamed directly from the file.  If you have enough memory you can load them into a list instead. This allows random access

>>> table = DBF('people.dbf', load=True)
>>> print(table.records[1]['NAME'])
Bob
>>> print(table.records[0]['NAME'])
Alice

 How to Write content in DBF file using python ?

dbfpy is a python-only module for reading and writing DBF-files.  dbfpy can read and write simple DBF-files.

You can install it by using below command

pip install dbfpy

The below example shows how to create dbf files and write records in to it.

import datetime
from mx import DateTime
from dbfpy import dbf

## create empty DBF, set fields

db = dbf.Dbf("test.dbf", new=True)
db.addField(
    ("NAME", "C", 15),
    ("SURNAME", "C", 25),
    ("INITIALS", "C", 10),
    ("BIRTHDATE", "D"),
)

## fill DBF with some records

for name, surname, initials, birthdate in (
    ("John", "Miller", "JM", (1980, 1, 2)),
    ("Andy", "Larkin", "AL", datetime.date(1981, 2, 3)),
    ("Bill", "Clinth", "", DateTime.Date(1982, 3, 4)),
    ("Bobb", "McNail", "", "19830405"),
):
    
    rec = db.newRecord()
    rec["NAME"] = name
    rec["SURNAME"] = surname
    rec["INITIALS"] = initials
    rec["BIRTHDATE"] = birthdate
    rec.store()
db.close()

Also you can update a dbf file record using dbf module.

The below example shows how to update a record in a .dbf file.

db = dbf.Dbf("test.dbf")
rec = db[2]
rec["INITIALS"] = "BC"
rec.store()

 

Asynchronous DB Operations in Twisted

Twisted is an asynchronous networking framework. Other Database API Implementations have blocking interfaces.

For this reason, twisted.enterprise.adbapi was created. It is a non-blocking interface,which allows you to access a number of different RDBMSes.

General Method to access DB API.

1 ) Create a Connection with db.

 db = dbmodule.connect('dbname','user','password')

2) create a cursor.

 cursor = db.cursor()

3) do a query.

resultset = cursor.query('SELECT * FROM table WHERE condition=expression')

Cursor blocks to response in asynchronous framework. Those delays are unacceptable when using an asynchronous framework such as Twisted.
To Overcome blocking interface, twisted provides asynchronous wrapper for db module such as twisted.enterprise.adbapi

Database Connection using adbapi API.

To use adbapi, we import dependencies as below

 from twisted.enterprise import adbapi

1) Connect Database using adbapi.ConnectionPool

db = adbapi.ConnectionPool("MySQLdb",db="agentdata",user="root",passwd="<yourpassword>")

Here, We do not need to import dbmodule directly.
dbmodule.connect are passed as extra arguments to adbapi.ConnectionPool’s Constructor.

2) Run Database Query

query = ("SELECT * FROM agentdetails WHERE name = '%s'" % (name))
return dbpool.runQuery(query).addCallback(self.receiveResult).addErrback(self.errorquery)

Here, I used ‘%s’ paramstyle for mysql. if you use another database module, you need to use compatible paramstyle. for more, use DB-API specification.

Twisted doesn’t attempt to offer any sort of magic parameter munging – runQuery(query,params,…) maps directly onto cursor.execute(query,params,…).

This query returns Deferred, which allows arbitrary callbacks to be called upon completion (or failure).

Demo : Select, Insert and Update query in Database.

from twisted.enterprise import adbapi
import datetime,logging
from twisted.internet import reactor


"""
Test DB : This File do database connection and basic operation.
"""

log = logging.getLogger("Test DB")

dbpool = adbapi.ConnectionPool("MySQLdb",db="agentdata",user="root",passwd="<yourpassword>")

class AgentDB():

    def getTime(self):
        log.info("Get Current Time from System.")
        time = str(datetime.datetime.now()).split('.')[0]
        return time

    def doSelect(self,name):
        log.info("Select operation in Database.")
        query = ("SELECT * FROM agentdetails WHERE name = '%s'" % (name))
        return dbpool.runQuery(query).addCallback(self.receiveResult).addErrback(self.errorquery)

    def receiveResult(self,result):
        print "Receive Result"
        print result
        # general purpose method to receive result from defer.
        return result

    def errorquery(self,result):
        print "error received", result
        return result

    def doInsert(self,name,status,state):
        log.info("Insert operation in Database.")
        query = """INSERT INTO agentdetails (name,state,status,logtime) values('%s','%s','%s','%s')""" % (
        name,state,status,self.getTime())
        return dbpool.runQuery(query).addCallback(self.receiveResult)

    def doUpdate(self,name,status,state):
        log.info("Update operation in Database.")
        query = ("UPDATE agentdetails SET status = '%s', state = '%s',logtime = '%s' WHERE name = '%s'" % (
        status,state, self.getTime(),name))
        return dbpool.runQuery(query).addCallback(self.receiveResult)

    def checkDB(self):
        self.doSelect('1011')
        self.doInsert('Test','Test','Test')
        self.doUpdate('Test','SecondTest','SecondTest')



a= AgentDB()
a.checkDB()
reactor.run()

Here, I have used MySQLdb api, agentdata as a database name, root as a user, 123456 as a password.
Also, I have created select, insert and update query for select, insert and update operation respectively.
runQuery method returns deferred. For this, add callback and error back to handle success and failure respectively.

DBSlayer

DBSlayer is a simpler way to proxy mysql.

DBSlayer can be queried via JSON over HTTP, and the responses can be given in either one of the following supported languages : JSON, PHP and Python which makes processing the database results.

Multi-threaded server written in C.

Features :

  • Reduce configuration
  • Reduce dependencies
  • Handle failovers
  • Simple load balancing
  • Easy to monitor
  • Minimal performance overhead
  • Work in different configuration scenarios
  • Support different programming languages

Installing DBSlayer :

git clone https://github.com/derekg/dbslayer.git

./configure

make

make install

Database URI :

http://machine:port/db?URLENCODED(JSON OBJECT)

http://machine:port/dbform?URLENCODED(HTML FORM)

Parameters :

SQL – SQL to execute

Example Request :

http://localhost:9090/dbform?SQL=SELECT+name+FROM+emp

Example Response :

{'RESULT': {'HEADER': ['name'],

'ROWS' : [['name']],

'TYPES' : ['MYSQL_TYPE_VAR_STRING'],

'SERVER' : 'servername'

}}

Example for python :

import urllib2, urllib, json

def dbex():
    uri = 'http://localhost:9090/dbform?SQL=%s'
    data = urllib2.urlopen(uri%urllib.quote('SELECT * FROM market')).read()
    print json.loads(data)

dbex()

Start dbslayer :

dbslayer -c /path/dbslayer.conf -s servername

This starts up a DBSlayer daemon on 9090 (this default port can be changed which acts as a proxy for the backend mysql server. this proxy can be queried via JSON over HTTP).

Stop dbslayer :

pkill dbslayer

Other URI/API endpoints :

http://machine:port/stats [Queries per second]

http://machine:port/stats/log [Last 100 requests]

http://machine:port/stats/errors [Last 100 error]

http://machine:port/shutdown [Only from localhost]

Riak Database

Introduction to Riak

Riak is a distributed database designed to deliver maximum data availability by distributing data across multiple servers. Riak is an open-source, distributed key/value database for high availability, fault-tolerance, and near-linear scalability.

Riak Components

Riak is a Key/Value (KV) database, built from the ground up to safely distribute data across a cluster of physical servers, called nodes.

Riak functions similarly to a very large hash space. Depending on your background, you may call it hashtable, a map, a dictionary, or an object. But the idea is the same: you store a value with an immutable key, and retrieve it later.

1) Key and value

2) Buckets

Key and Value

Key/value is the most basic construct in all of computerdom.

Buckets

Buckets in Riak  provide logical namespaces so that identical keys in different buckets will not conflict.

Buckets are so useful in Riak that all keys must belong to a bucket. There is no global namespace. The true definition of a unique key in Riak is actually        bucket + key.

For convenience, we call a bucket/key + value pair as an object, sparing ourselves the verbosity of “X key in the Y bucket and its value”.

Replication and Partitions

Distributing data across several nodes is how Riak is able to remain highly available, tolerating out-ages and  partitioning. Riak combines two styles of distribution to achieve this: replication and partitions.

Replication

Replication is the act of duplicating data across multiple nodes. Riak replicates by default.

The obvious benefit  of replication is that if one node goes down, nodes that contain replicated data remain available to serve requests. In other words, the system remains available with no down time.

The downside with replication is that you are multiplying the amount of storage required for every duplicate. There is also some network overhead with this approach, since values must also be routed to all replicated nodes on write.

Partitions

A partition is how we divide a set of keys onto separate physical servers. Rather than duplicate values, we pick one server to exclusively host a range of keys, and the other servers to host remaining non-overlapping ranges.

With partitioning, our total capacity can increase without any big expensive hardware, just lots of cheap commodity servers. If we decided to partition our database into 1000 parts across 1000 nodes, we have (hypothetically) reduced the amount of work any particular server must do to 1/1000th.

There’s also another downside. Unlike replication, simple partitioning of data actually decreases uptime.

If one node goes down, that entire partition of data is unavailable. This is why Riak uses both replication and partitioning.

Replication+Partitions

Since partitions allow us to increase capacity, and replication improves availability, Riak combines them. We partition data across multiple nodes, as well as replicate that data into multiple nodes.

n_val

The Riak team suggests a minimum of 5 nodes for a Riak cluster, and replicating to 3 nodes (this setting is called n_val, for the number of nodes on which to replicate each object).

The Ring

Riak applies consistent hashing to map objects along the edge of a circle (the ring).

Riak partitions are not mapped alphabetically (as we used in the examples above), but instead a partition marks a range of key hashes (SHA-1 function applied to a key). The maximum hash value is 2160 , and divided into some number of partitions—64 partitions by default (the Riak config setting isring_creation_size).

The Ring is more than just a circular array of hash partitions. It’s also a system of metadata that gets copied to every node. Each node is aware of every other node in the cluster, which nodes own which vnodes, and other system data.

N/R/W Values

N

With our 5 node cluster, having an n_val=3 means values will eventually replicate to 3 nodes, as we’ve discussed above. This is the N value. You can set other values (R,W) to equal the n_val number with the shorthand all.

R

Reading involves similar tradeoffs. To ensure you have the most recent value, you can read from all 3 nodes containing objects (r=all). Even if only 1 of 3 nodes has the most recent value, we can compare all nodes against each other and choose the latest one, thus ensuring some consistency. Remember when I mentioned that RDBMS databases were write consistent? This is close to read consistency. Just like w=all,however, the read will fail unless 3 nodes are available to be read. Finally, if you only want to quickly read any value, r=1 has low latency, and is likely consistent if w=all.

W

But you may not wish to wait for all nodes to be written to before returning. You can choose to wait for all 3 to finish writing (w=3 or w=all), which means my values are more likely to be consistent. Or you could choose to wait for only 1 complete write (w=1), and allow the remaining 2 nodes to write asynchronously, which returns a response quicker but increases the odds of reading an inconsistent value in the short term. This is the W value

Since Riak is a KV database, the most basic commands are setting and getting values. We’ll use the HTTP interface, via curl, but we could just as easily use Erlang, Ruby, Java, or any other supported language. The basic structure of a Riak request is setting a value, reading it, and maybe eventually deleting it. The actions are related to HTTP methods (PUT, GET, POST, DELETE).

PUT /riak/bucket/key
GET /riak/bucket/key
DELETE /riak/bucket/key

 

PUT

The simplest write command in Riak is putting a value. It requires a key, value, and a bucket. In curl, all HTTP methods are prefixed with -X. Putting the value pizza into the key favorite under the food bucket is done like this:

curl -XPUT <span class="s2">"http://localhost:8098/riak/food/favorite"</span> <span class="se">\</span>
   -H <span class="s2">"Content-Type:text/plain"</span> <span class="se">\</span>
   -d <span class="s2">"pizza"</span>

The -d flag denotes the next string will be the value. Declaring it as text with the proceeding line -H ‘Content-Type:text/plain’

This declines the HTTP MIME type of this value as plain text. We could have set any value at all, be it XML or JSON—even an image or a video. Riak does not care at all what data is uploaded, so long as the object size doesn’t get much larger than 4MB.

GET

The next command reads the value pizza under the bucket/key food/favorite.

curl -XGET <span class="s2">"http://localhost:8098/riak/food/favorite"</span>
pizza

This is the simplest form of read, responding with only the value. Riak contains much more information, which you can access if you read the entire response, including the HTTP header. In curl you can access a full response by way of the -i flag.

POST

Similar to PUT, POST will save a value. But with POST a key is optional. All it requires is a bucket name, and it will generate a key for you.

Let’s add a JSON value to represent a person under the people bucket. The response header is where a POST will return the key it generated for you.

curl -i -XPOST <span class="s2">"http://localhost:8098/riak/people"</span> <span class="se">\</span>
    -H <span class="s2">"Content-Type:application/json"</span> <span class="se">\</span>
    -d <span class="s1">'{"name":"aaron"}'</span>

HTTP/1.1 <span class="m">201</span> Created
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.9.2 <span class="o">(</span>someone had painted...
Location: /riak/people/DNQGJY0KtcHMirkidasA066yj5V
Date: Wed, <span class="m">10</span> Oct <span class="m">2012</span> 17:55:22 GMT
Content-Type: application/json
Content-Length: 0

You can extract this key from the Location value. Other than not being pretty, this key is treated the same as if you defined your own key via PUT.

Body

You may note that no body was returned with the response. For any kind of write, you can add the returnbody=true parameter to force a value to return, along with value-related headers like X-Riak-Vclock and ETag.

curl -i -XPOST <span class="s2">"http://localhost:8098/riak/people?returnbody=true"</span> <span class="se">\</span>
-H <span class="s2">"Content-Type:application/json"</span> <span class="se">\</span>
-d <span class="s1">'{"name":"billy"}'</span>
HTTP/1.1 <span class="m">201</span> Created
X-Riak-Vclock: a85hYGBgzGDKBVIcypz/fgaUHjmdwZTImMfKkD3z10m+LAA<span class="o">=</span>
Vary: Accept-Encoding
Server: MochiWeb/1.1 WebMachine/1.9.0 <span class="o">(</span>someone had painted...
Location: /riak/people/DnetI8GHiBK2yBFOEcj1EhHprss
Link: </riak/people><span class="p">;</span> <span class="nv">rel</span><span class="o">=</span><span class="s2">"up"</span>
Last-Modified: Tue, <span class="m">23</span> Oct <span class="m">2012</span> 04:30:35 GMT
ETag: <span class="s2">"7DsE7SEqAtY12d8T1HMkWZ"</span>
Date: Tue, <span class="m">23</span> Oct <span class="m">2012</span> 04:30:35 GMT
Content-Type: application/json
Content-Length: 16

<span class="o">{</span><span class="s2">"name"</span>:<span class="s2">"billy"</span><span class="o">}</span>

DELETE

The Final basic operation is deleting keys, which is similar to getting a value, but sending the DELETE method to the url/bucket/key.

curl -XDELETE <span class="s2">"http://localhost:8098/riak/people/DNQGJY0KtcHMirkidasA066yj5V"</span>

A deleted object in Riak is internally marked as deleted, by writing a marker known as a tombstone. Unless configured otherwise, another process called a reaper will later finish deleting the marked objects.

1. In Riak, a delete is actually a read and a write, and should be considered as such when calculating

read/write ratios.

2. Checking for the existence of a key is not enough to know if an object exists. You might be reading a key after it has been deleted, so you should check for tombstone metadata.

Lists

Riak provides two kinds of lists. The first lists all buckets in your cluster, while the second lists all keys under a specific bucket. Both of these actions are called in the same way, and come in two varieties.

The following will give us all of our buckets as a JSON object.

curl <span class="s2">"http://localhost:8098/riak?buckets=true"</span>

<span class="o">{</span><span class="s2">"buckets"</span>:<span class="o">[</span><span class="s2">"food"</span><span class="o">]}</span>

And this will give us all of our keys under the food bucket.

curl <span class="s2">"http://localhost:8098/riak/food?keys=true"</span>
<span class="o">{</span>
...
<span class="s2">"keys"</span>: <span class="o">[</span>
<span class="s2">"favorite"</span>
<span class="o">]</span>
<span class="o">}</span>

Adjusting N/R/W to our needs

N is the number of total nodes that a value should be replicated to, defaulting to 3. But we can set this n_val to less than the total number of nodes.

Any bucket property, including n_val, can be set by sending a props value as a JSON object to the bucket URL. Let’s set the n_val to 5 nodes, meaning that objects written to cart will be replicated to 5 nodes.

curl -i -XPUT <span class="s2">"http://localhost:8098/riak/cart"</span> <span class="se">\</span>
-H <span class="s2">"Content-Type: application/json"</span> <span class="se">\</span>
-d <span class="s1">'{"props":{"n_val":5}}'</span>

Symbolic Values

A quorum is one more than half of all the total replicated nodes (floor(N/2) + 1). This figure is important, since if more than half of all nodes are written to, and more than half of all nodes are read from, then you will get the most recent value (under normal circumstances).

Hooks

Another utility of buckets are their ability to enforce behaviors on writes by way of hooks. You can attach functions to run either before, or after, a value is committed to a bucket.

Functions that run before a write is called precommit, and has the ability to cancel a write altogether if the incoming data is considered bad in some way. A simple precommit hook is to check if a value exists at all.

I put my custom Erlang code files under the riak installation ./custom/my_validators.erl.

%%Erlang Code

-module(my_validators).
-export([value_exists/1]).

%% Object size must be greater than 0 bytes
value_exists(RiakObject)->
val = riak_object:get_value(RiakObject).
case erlang:byte_size(Value) of
        0 -> {fail, "A value size greater than 0 is required"};
    _ -> RiakObject
end.

 

Then compile the file.(You need to install erlang before installing Riak)

erlc my_validators.erl

Install the file by informing the Riak installation of your new code via app.config (restart Riak).

{riak_kv,
...
{add_paths, ["./custom"]}
}

Then you need to do set the Erlang module (my_validators) and function (value_exists) as a JSON value to the bucket’s precommit array {“mod”:”my_validators”,”fun”:”value_exists”}.

curl -i -XPUT http://localhost:8098/riak/cart \
-H "Content-Type:application/json" \
-d '{"props":{"precommit":[{"mod":"my_validators","fun":"value_exists"}]}}'

If you try and post to the cart bucket without a value, you should expect a failure.

curl -XPOST http://localhost:8098/riak/cart \
-H "Content-Type:application/json"
A value sized greater than 0 is required

Siblings

Siblings occur when you have conflicting values, with no clear way for Riak to know which value is correct. Riak will try to resolve these conflicts itself if the allow_mult parameter is configured to false, but you can instead ask Riak to retain siblings to be resolved by the client if you set allow_mult to true.

curl -i -XPUT http://localhost:8098/riak/cart \
-H "Content-Type:application/json" \
-d '{"props":{"allow_mult":true}}'

Siblings arise in a couple cases.

1. A client writes a value using a stale (or missing) vector clock.

2. Two clients write at the same time with the same vector clock value.

We used the second scenario to manufacture a conflict in the previous chapter when we introduced the concept of vector clocks, and we’ll do so again here.

Resolving Conflicts

When we have conflicting writes, we want to resolve them. Since that problem is typically use-case specific, Riak defers it to us, and our application must decide how to proceed.

For our example, let’s merge the values into a single result set, taking the larger count if the item is the same. When done, write the new results back to Riak with the vclock of the multipart object, so Riak knows you’re resolving the conflict, and you’ll get back a new vector clock.

Successive reads will receive a single (merged) result.

curl -i -XPUT http://localhost:8098/riak/cart/fridge-97207?returnbody=true \
-H "Content-Type:application/json" \
-H "X-Riak-Vclock:a85hYGBgzGDKBVIcypz/fgaUHjmTwZTInMfKoG7LdoovCwA=" \
-d '[{"item":"kale","count":10},{"item":"milk","count":1},\
{"item":"almonds","count":12}]'

Will share more on this arctile soon.

mysql best practices

10 MySQL best practices

When we design database schema it’s recommended to follow the beast practices to use memory in optimal way and to gain performance. Following are 10 MySQL best practices

Always try to avoid redundancy

We can say database schema designed is the best one if it is having no redundancy. If you want to avoid redundancy in your schema, normalize it after you design.

Normalize tables

Database normalization is the process of  organizing columns and tables in relational database to avoid redundancy. Find more about normalization here

Use (unique) indexes on foreign key columns

We use foreign keys for data integrity and to represent relation. Some times these are result of process called normalization. When tables are mutually related obviously we can’t query  the data without using joins

Avoid using varchar for fixed width column instead use char

Choose the right one CHAR vs VARCHAR. CHAR(15) will just allocate the space for 15 characters but VARCHAR(15) will allocate the space only required by number of characters you store.

Always use explain  to investigate your queries and learn about how mysql is using indexes

EXPLAIN  statement is very handy in mysql. I’m sure it will spin your head. This statement will give you analyzed report. Where you can use it to improve your queries and schema. It works on both select and update. If you try it on update queries it will that query as select and will give you the report.

Use right data type

Choosing right data type for you column will help you to get rid of many bottle necks. MySQL query optimizer will choose the indexes based on data type you used in query and column datatype. There are many MySQL datatype.

Use ENUM if required  

ENUM is one datatype that mysql supports. By using this you can save lot of memory if you have predefined and predictable values in your database column.

Don’t use too many indexes, it will slow down the inserts and updates. Only use the indexes on selected column

As you know indexes will help you query data much faster than expected. It’s very tempting to you indexes on unintended columns. Choosing index on every column or unnecessary columns will get you slow inserts and updates. You need to think of indexes as seperate table. Where MySQL needs to create a index for every insert in seperate table/file. It’s extra overhead.

Tune  mysql default parameters

MySQL comes with default parameters. These parameters are not suitable if you want use mysql on dedicated machine or production. You have to tune these parameters. Formally we call them as system variables.

 Always create an account with associated hosts instead of wildcard %

MySQL manages the user with associated hosts. i.e, the user  root@localhost can’t login to mysql from everywhere except localhost. but root@% can login from every where. Using only associated hosts will mitigate many attacks those are in your blind spot.