A bit of wording ...
Database or data store. Is there a difference?
Yes, as per the following definition, found on Wikipedia:
- A data store is a repository for persistently storing and managing collections of data which include not just repositories like databases, but also simpler store types such as simple files, emails etc.
- A database is a series of bytes that is managed by a database management system (DBMS), relational or not.
What is it about?By NoSQL, we should understand a solution to store data which is not following the Relational DataBase Management System (RDBMS), where the consistency is guaranteed, but the scalability may not be. NoSQL can be understood as NO SQL or as Not Only SQL, meaning the language to address these kind of database is totally different of SQL or can be SQL and something else.
NoSQL databases are often also described as being schema less databases. At least we can say:
- in classical RDBMS system, you define first your schema (the tables, columns and fields) then you start to fill it, according to this schema. The data is said to be structured.
- We say that the schema is applied at write time.
- If you define a table to hold information about users, if you don't foresee a column for the birth date, your application won't be able to store it unless you review and change your schema
- in NoSQL databases, you can load any kind of data, usually unstructured or semi-structured into it. This is the client application that will have the notion of schema.
- We say that the schema is applied at read time
- If you define a table (or its structural equivalent in a NoSQL database), you can put in it user definition without birth date. Or with birth date if you know it. So without any change in the DB, you can store a new kind of information
Why do we need NoSQLThe Relational Databases have the following challenges:
- Not good for large volume (Petabytes) of data with variety of data types (eg. images, videos, text)
- Cannot scale for large data volume
- Cannot scale-up, limited by memory and CPU capabilities
- Cannot scale-out, limited by cache dependent Read and Write operations
- Sharding (break database into pieces and store in different nodes) causes operational problems (e.g. managing a shared failure)
- Complex RDBMS model
- Consistency limits the scalability in RDBMS
- A scale-out, shared-nothing architecture, capable of running on a large number of nodes
- A non-locking concurrency control mechanism so that real-time reads will not conflict writes
- Scalable replication and distribution – thousands of machines with distributed data
- An architecture providing higher performance per node than RDBMS
- Schema-less data model
Choosing the right database
A multi-BD approach may prevailOur view on databases is that a multi-database approach is often best. More often than not, using a single database results in using the wrong technology to address a given product need. In our experience, learning how to use each database for use cases that it addresses best is easier than trying to hack one database for all use cases (especially those that it’s particularly bad at).
To simplify a multi-database approach for other engineering layers, such as Platform Services and UX, we recommend the creation of a Data Services API. The Data Services API provides a standard REST API as the way for engineers outside of Data Services to query data without the need to know which database is used for persistence.
A short analysis of strengths and weaknesses of some well known database used for their NoSQL characteristics is presented in the next pages of this site.