NoSQL is a type of database, whose specificity is to be non-relational. These systems allow the storage and analysis of Big Data. Find out everything you need to know: definition, history, functioning, use cases, advantages, training…
In the age of Big Data, relational databases are no longer adequate. To handle the immense volumes of data, store and analyze them, it is imperative to rely on new solutions.
What is NoSQL?
A NoSQL database is a “non-relational” database. It is possible to store data in an unstructured form, without following a fixed schema. Joins are no longer necessary, and scaling is facilitated.
NoSQL databases are used in particular for distributed data stores with high storage capacity requirements. Thus, NoSQL is used for Big Data and real-time web applications. Technology giants such as Twitter, Facebook or Google collect several terabytes of data about their users every day.
The term “NoSQL” actually means “Not Only SQL“. Indeed, relational databases use SQL syntax to store and analyze data.
This is not the case with a non-relational database. NoSQL systems are compatible with a wide variety of technologies allowing the storage of structured, unstructured, semi-structured or polymorphic data.
The history of NoSQL
The term and the concept NoSQL were invented in 1998 by Carl Strozz, in order to designate his lightweight and open source relational database. This concept was then adopted and popularized by GAFAMs such as Google, Facebook or Amazon faced with huge volumes of data. Relational databases had become too slow.
Instead of upgrading their IT equipment to increase the performance of RDBMS (Relational Database Management System), the tech giants chose to distribute the load over multiple host servers. This is known as the “scaling out” method. NoSQL databases are ideal for scaling out, since they are non-relational.
In the year 2000, the graphical database Neo4j was launched. Then it was the turn of the Google Bigtable, in 2004, and CouchDB in 2005. The history of NoSQL databases was also marked by Amazon Dynamo in 2007.
Then, in 2008, Facebook made open source the non-relational database it uses internally: Cassandra. This tool became the reference for NoSQL databases, and put the term NoSQL back in the spotlight by giving it its current meaning and popularity.
What are the main characteristics of NoSQL?
The main feature of NoSQL databases is that they do not follow the relational model and do not present tables in the form of fixed columns. These databases do not require data normalization or relational mapping. It is possible to interact without using complex query languages.
Another feature is the absence or flexibility of schemas. It is not necessary to define a data schema, and data from different structures can be grouped together on the same system.
Non-relational databases are also distinguished by an easy-to-use interface for storing and querying data. APIs allow the manipulation of data with various selection methods. The protocols, based on text, are mainly based on HTTP REST with JSON. A NoSQL query language is usually used.
The final characteristic of a NoSQL database is that it is distributed. Multiple NoSQL databases can be run in a distributed fashion, providing auto-scaling and fail-over capabilities. The ACID concept can be abandoned in favor of elasticity and performance.
What are the different types of NoSQL databases?
There are four main types of NoSQL databases: key/value pair, column-oriented, graph-oriented, and document-oriented. Each of these categories has a unique attribute and specific limitations. However, none of these four types of databases can solve every problem. It is necessary to choose the right database according to the use case.
Key/value pair databases
In the case of key/value pair databases, the data is stored as key/value pairs. This allows the support of large volumes of data and heavy loads. The data is stored in a “hash” array in which each key is unique. The value can be a JSON, a BLOB object, a line of code or other.
This type of database is the most basic. It allows the developer to store data more easily without a schema. Examples include Redis or Dynamo. Moreover, Amazon Dynamo is the initial model of this category of database.
As their name indicates, they are based on columns. They are based on the BigTable model from Google. Each column is treated separately, and the values are stored contiguously.
JUMPSTART YOUR CAREER
IN A DATA SCIENCE
JUMPSTART YOUR CAREER
IN A DATA SCIENCE
Are you interested in a career change into Big Data, but don’t know where to start?
Then you should take a look at our Data Science training course
This category of database offers high performance for aggregation queries like SUM, COUNT, AVG and MIN. This is because the data is already available and ready in a column. Examples are HBase, Cassandra or Hypertable.
Graph-Based databases store entities and the relationships between these entities. The entity is stored as a node, and the relationships as edges. This makes it easy to visualize the relationships between the nodes. Each node and each edge has a unique identifier.
This type of database is multi-relational. It is mainly used for social networks, logistics or spatial data. Popular examples include Neo4J, Infinite Graph, OrientDB and FlockDB.
This kind of database stores and retrieves data as a key-value pair. However, the value is stored as a document in JSON or XML format. The value is thus understood by the database and can be found with a query.
This type of database therefore offers increased flexibility. It is mainly used for CMS systems, blogging platforms, or e-commerce applications. However, it is not suitable for complex transactions requiring multiple operations or queries on variable aggregate structures. The best known examples in this category are Amazon SimpleDB, CouchDB, MongoDB, Riak, and Lotus Notes.
Advantages and disadvantages of NoSQL
NoSQL has many advantages, but also disadvantages. These databases are ideal for Big Data storage and analysis, and also avoid a single point of failure.
They facilitate replication, and do not require a separate caching layer. Performance is high, and horizontal scalability is possible. NoSQL databases can support structured or unstructured data in the same way.
In addition, object-oriented programming is easy to use and flexible. NoSQL databases also do not require a dedicated high-performance server. They are compatible with all major programming languages. Implementation is simpler than with RDBMS. The flexible schema can be altered easily without interruption.
Nevertheless, this type of database also has some weak points. These include the lack of standardization rules and limited query capabilities. Traditional database capabilities, such as consistency when multiple transactions are performed simultaneously, may also be lacking.
In addition, it becomes difficult to maintain unique values as keys as the volume of data increases. This model does not work as well for relational data. The learning curve can be difficult for new developers, and open source options are not always popular within companies. Overall, relational databases and their tools are more mature, and therefore more adopted.
Why use NoSQL?
NoSQL databases are suitable for several use cases. They are suitable for storing and retrieving large volumes of data. They are also suitable when the relationships between data are not particularly important.
They can also be used if the data changes over time and is unstructured. Finally, they are suitable when the volume of data increases continuously and regular scaling of the database is required to support it.
How to learn to use NoSQL?
Knowing how to handle NoSQL databases is a highly sought after skill. Companies today are overwhelmed with data, so they need experts who can store and analyze that data on non-relational systems.
To learn how to use SQL and NoSQL databases, you can opt for DataScientest training courses. By following our different Data Analyst, Data Engineer or Data Scientist course, databases will no longer hold any secrets for you.
Our different training courses prepare you for the different Data Science professions. Designed by professionals to meet the real needs of companies, they provide immediate access to the job market. Students receive a degree certified by the Sorbonne University, and 93% of alumni have found work immediately.
The different courses adopt an innovative Blended Learning approach, combining distance and face-to-face learning. They can be done in BootCamp or in Continuing Education.