Thursday 31 July 2014

From RDBMS to NoSQL to DBaaS

RDBMS
Relational Database Management Systems had been a subject I paid special attention to, when I was at college.  Having used DB2 and Oracle, the attention that the NoSQL databases are getting over the past two years or so, made me think why we need it, and whether RDBMS stay for the coming decades.

The ACID (Atomicity, Consistency, Isolation and Durability) properties and referential integrity that RDBMS provides us cannot be compromised in many systems.  That means, that for some systems, there were some needs, that led to the emergence and high usage of NoSQL (Not Only SQL ) systems.

NoSQL
An obvious reason that convinced me for the emergence of NoSQL is the increasing volume of un-structured and semi-structured data, and other documents that are used in the social networks which most of us use.  Then the other reasons slowly came by.  The speed at which we want a post to be published,  the number of reads and likes to displayed is one.  And the concurrency at which the social networks and other systems are being used, by many of us around the world, is also on the rise.  Above all this, we want a 24 X 7 availability of most of the sites, without which our satisfaction rate will drop down

It is to satisfy the above requirements that the use of NoSQL is on the rise.  Let us see how this is made possible.

1. Use of distributed databases
Distributed databases can be located at servers at any geographic location.  This means that they could be available on servers across the internet.  Also they could be located on the cloud infrastructure.  Distributed databases support replication and duplication, thereby enabling continuous availability.  Since data is available across many locations, concurrent usage is also made possible.

2. Horizontal scaling (sharding)
All users of Facebook want a quick login and quicker updates.  With this, a wiser way to store the database of Indian users in servers in India and Canadian users in servers in Canada, than storing data at any place in the globe.  This is an example of sharding.

3. Scalability
Many of the NoSQL databases are capable of storing large quantities of data.  With BigData the volume of data that is generated every second is on the rise.  Hence the ability to store data becomes important.

4. Schema-less databases
To enable storage of semi-structured and unstructured data, the databases do not store data in tables.  Data is stored as Documents, Columns, Key Value Store or Graph Databases.

Let us have a look at a couple of ways in which data is stored.

a) Document
Documents that contain semi-structured data are stored.  The MongoDB database stores documents.  This database is platform independent and holds JSON like documents.

b) Column
A column in a tuple of three arguments (name, column, timestamp)
student_name: {name: "student_name", value: "vishnu", timestamp: 123456789}

A Column Family is a set of Columns.  This is in some ways similar to a table, but the main difference is that, the same set of Columns need not be provided for all Column Family objects.  Please notice the difference between the column families given below.

{
    student_name: {name: "student_name", value: "aditya", timestamp: 123456789}
    school_name: {name: "school_name", value: "sun shine", timestamp: 123456789}
    city: {name: "city", value: "bangalore", timestamp: 123456789},
}
{
    student_name: {name: "student_name", value: "lily", timestamp: 123456789}
    school_name: {name: "school_name", value: "sun shine", timestamp: 123456789}
    standard: {name: "standard", value: ""IV", timestamp: 123456789},
}

hbase which is like a  BigTable for Hadoop uses column type storage.  This is an open source database from the Apache Foundation.  Hadoop uses hbase to store critical data, the size of which is much smaller when compared to the Big Data that Hadoop can store.

Database As A Service
The name indicates here that the database is provided as a service by a cloud provider.  The cloud provider will do the installation, upgrades and maintenance activities on the database and the customers can invoke services on it.  DBaaS reduces the time taken for installation and maintenance and manpower required for database management for the customer.  With the emergence of Cloud, DBaaS is not a surprise.  IBM Cloudant is a DBaaS, that stores JSON documents.

No comments:

Post a Comment