This article gives a light overview of NoSQL databases: when to use them and how they differ from relational databases.
What is RDBMS
Relational systems are the databases we’ve been using for a while now. RDBMSs and systems that support ACIDity and joins are considered relational.
What is NoSQL
NOSQL (Not Only SQL) really is a very wide category for a group of persistence solutions.
Definition from http://nosql-database.org/: Next Generation Databases mostly addressing some of the points: being non-relational, distributed, open-source and horizontally scalable.
In brief, NoSQL systems can be characterized as:
- easy replication support,
- simple API,
- eventually consistent / BASE (not ACID),
- a huge data amount,
- and more.
Why NoSQL ?
The original intention of NoSQL systems has been modern web-scale databases.
Scaling and CAP Theorem. ACID vs. BASE
In order to guarantee the integrity of data, most of the classical database systems are based on transactions. This ensures consistency of data in all situations of data management. These transactional characteristics are also know as ACID (Atomicity, Consistency, Isolation, Durability). However, scaling out of ACID-compliant systems has shown to be a problem. Conflicts are arising between the different aspects of high availability in distributed systems that are not fully solvable – known as the CAP– theorem:
- Strong Consistency: all clients see the same version of the data, even on updates to the dataset – e. g. by means of the two-phase commit protocol (XA transactions), and ACID,
- High Availability: all clients can always find at least one copy of the requested data, even if some of the machines in a cluster is down,
- Partition-tolerance: the total system keeps its characteristic even when being deployed on different servers, transparent to the client.
The CAP-Theorem postulates that only two of the three different aspects of scaling out are can be achieved fully at the same time.
Many of the NOSQL databases above all have loosened up the requirements on Consistency in order to achieve better Availability and Partitioning. This resulted in systems know as BASE (Basically Available, Soft-state, Eventually consistent). These have no transactions in the classical sense and introduce constraints on the data model to enable better partition schemes. A more comprehensive discussion of CAP, ACID and BASE is available in this introduction.
Relational databases give you consistency and availability, and a lot of popular NoSQL databases give you availability and partition tolerance. One of the primary goals of NoSQL systems is to bolster horizontal scalability. To scale horizontally, you need strong network partition tolerance which requires giving up either consistency or availability. NoSQL solutions have lighter weight transactional semantics than relational databases, but still have facilities for atomic operations at some level.
When to use NoSQL
NoSQL can be good when you have the following requirements:
- You plan to deploy a large-scale, high-concurrency database (hundreds of GB, thousands of users);
- Which doesn’t need ACID guarantees;
- Or relationships or constraints;
- Stores a fairly narrow set of data (the equivalent of 5-10 tables in SQL);
- Will be running on commodity hardware (i.e. Amazon EC2);
- Needs to be implemented on a very low budget and “scaled out.”
It is good for most of web sites. For example, Google and Twitter fit very neatly into these requirements. Does it really matter if a few tweets are lost or delayed
When to use SQL Databases (RDBMSs)
Most business systems have very different requirements from web sites like:
- Medium-to-large-scale databases (10-100 GB) with fairly low concurrency (hundreds of users at most);
- ACID (especially the A and C – Atomicity and Consistency) is a hard requirement;
- Data is highly correlated (hierarchies, master-detail, histories);
- Has to store a wide assortment of data – hundreds or thousands of tables are not uncommon in a normalized schema (more for denormalization tables, data warehouses, etc.);
- Run on high-end hardware;
- Lots of capital available.
High-end SQL databases (SQL Server, Oracle, Teradata, Vertica, etc.) are designed for vertical scaling, they like being on machines with lots and lots of memory, fast I/O through SANs and SSDs, and the occasional horizontal scaling through clustering (HA) and partitioning (HC).
In short, NOSQL databases can be categorized according to their data model into the following four categories:
- Key-Value systems
- Graph Databases
- Key-value systems basically support get, put, and delete operations based on a primary key.
- Column-oriented systems still use tables but have no joins (joins must be handled within your application). Obviously, they store data by column as opposed to traditional row-oriented databases. This makes aggregations much easier.
- Document-oriented systems store structured “documents” such as JSON or XML but have no joins (joins must be handled within your application). It’s very easy to map data from object-oriented software to these systems.
Consistent, Available (CA) Systems have trouble with partitions and typically deal with it with replication. Examples of CA systems include:
- Traditional RDBMSs like Postgres, MySQL, etc (relational)
- Vertica (column-oriented)
- Aster Data (relational)
- Greenplum (relational)
Consistent, Partition-Tolerant (CP) Systems have trouble with availability while keeping data consistent across partitioned nodes. Examples of CP systems include:
- BigTable (column-oriented/tabular)
- Hypertable (column-oriented/tabular)
- HBase (column-oriented/tabular)
- MongoDB (document-oriented)
- Terrastore (document-oriented)
- Redis (key-value)
- Scalaris (key-value)
- MemcacheDB (key-value)
- Berkeley DB (key-value)
Available, Partition-Tolerant (AP) Systems achieve “eventual consistency” through replication and verification. Examples of AP systems include:
- Dynamo (key-value)
- Voldemort (key-value)
- Tokyo Cabinet (key-value)
- KAI (key-value)
- Cassandra (column-oriented/tabular)
- CouchDB (document-oriented)
- SimpleDB (document-oriented)
- Riak (document-oriented)
* This video at YouTube talks about NoSQL vs SQL. It describes how you solve problems in NoSQL that were solved in SQL relational databases.