What makes MongoDB different?
MongoDB being a document-oriented database, each record is contained in a BSON-formatted document. Without the typical restrictions of a table layout, fields and properties can be inserted and removed anytime, making it relatively easy to transit from one data model design to another. This property of MongoDB makes it a very suitable candidate as a development database, reducing the time and effort of the development cycle.
Strengths
Schema-less
The definition of “schema” can vary between database software. For MySQL, schema is a synonym for database. For PostgreSQL, schema is a subset of database, where collections of tables are grouped together.
Essentially, a schema is a collection of tables defined by sets of parameters. Being schemaless means there are not as many parameters or restrictions to conform.
- No pre-defined data structure or modelling required
- No enforcement of data type for fields with the same name
- Allows data type transformation without remodelling, therefore little to no downtime
That is why a collection in MongoDB can look like this:
{
name: "Mr Bob",
phone: 12345678,
height: 123.45
},
{
name: "Tom",
phone: "012.9876543",
},
{
state: "liquid",
formula: "H2O"
}
Such does not happen for a SQL database, since each column has a pre-defined data type. The phone field would not be able to have numbers and strings coexisting. Unlike relational databases, new fields can be inserted on-the-fly, since no field definition is necessary before use.
Highly scalable
Unlike relational databases, MongoDB does not have to worry about splitting tables when scaling horizontally. Documents can be easily distributed across multiple server clusters (sharding and replica sets), giving the benefits of speed and redundancy.
Sharding means dividing a database into sections and distribute the sections across multiple servers. It allows MongoDB as a whole to expand on storage capacity and increase total processing power.
Replica Sets duplicate a database or a part of a database to slave MongoDB instances to increase availability and redundancy.
Familiar data structure
The BSON document format is a JSON-like structure, which is widely used among frontend and backend web application. So, data retrieved from MongoDB can be easily processed and adapted to existing infrastructure and tools.
Higher throughput rate for inserting and updating
An experiment by Filip and Cegan (2020) found that MongoDB is faster than MySQL in non-transactional queries by up to 37.9 times, and several times faster in transactional queries. The experiment shows that indexed queries in MySQL had a significant impact on performance. That could be due to a higher cost of insertion on indexed tables compared to document insertion.
Geospatial data queries
MongoDB can store GeoJSON objects and natively supports geospatial queries. It was found that MongoDB is faster than MySQL in processing real-time geospatial data by about 50% with a combination of updating and querying.
Weaknesses
Not relational
Compared to a traditional relational database, building relations between documents is not as straightforward, consider the following scenario.
A property rental service needs to manage:
- Rental listing: A list of rental properties
- Rental property reviews: Reviews left by users
- Users: A list of rental users
- User reviews: Reviews left by property owners
On a property page, one expects to find the following:
- Property details
- Property reviews
On a user details page, one expects the following:
- User description
- Reviews left by property owners
- Reviews left for properties
For a relational database, references between records can be established easily by referencing each other. For example:
-
Table: user
user_id user_name 1 “Bob” 2 “Tom” -
Table: property
property_id property_name 1 “City View” 2 “Uni Life” -
Table: reviews
review_id property_id user_id score comment 1 1 1 3 “Too much dust” 2 1 2 2 “No hot water” 3 2 1 5 “Owner is nice”
To get a list of all reviews made by Bob:
SELECT u.name, p.property_name, r.score, r.comment
FROM user u
INNER JOIN reviews r
ON r.user_id = u.user_id
INNER JOIN property p
ON p.property_id = r.property_id
WHERE name = 'Bob'
- Expected output:
name property_name score comment Bob City View 3 Too much dust Bob Uni life 5 Owner is nice
In MongoDB, however, there are no built-in methods for relations. A possible “solution” is to embedded reviews into property documents:
property collection:
{
name: "City View",
owner: "John",
reviews: [
{
username: "Bob",
score: 3,
comment: "Too much dust"
},
{
username: "Tom",
score: 2,
comment: "No hot water"
}
]
},
{
name: "Uni Life",
owner: "Betty",
reviews: [
{
username: "Bob",
score: 5,
comment: "Owner is nice"
}
]
}
But this approach can raise an issue with large collections when we want to get a list of all reviews made by a certain user. We would have to look up every property for the reviews made by the user, which is not ideal for efficiency.
If we want to avoid multiple database lookups, another possible “solution” is to embed every review into user documents as well, but that leads to the next problem: data duplication.
A better approach is to have a separate collection for reviews, and reference the review IDs from user and property documents. It takes less storage space than embedding, and looking up a document with its indexed ID is faster than looping through a collection. It is generally the way to build “relations” in MongoDB, which will be explored in the next chapter.
Data duplication
Without having relations between documents, a simple solution is to embed documents into another document.
It is fast to read since no further database lookup is required, but it has two problems:
- Every user review is duplicated between a user document and a property document.
- In case of a user modifying a review, it needs to be written twice to the database: once on user and once on property.
Opportunities
What can it be used for when it is not the main production database
MongoDB can act as a complementing database for functions such as caching and buffering.
Development database
Since no planning for data types and structure is necessary, it allows faster prototyping and migrating data in or out.
Intermediate (cache) database
MongoDB is more forgiving with data types than other databases, such as MySQL, making it an ideal choice as an intermediate database. It means using MongoDB as a buffer for another database backend. Data of known or unknown types can be written to MongoDB first, and then subsequently moved to a more permanent database for storage or other purposes.
Since MongoDB can be scaled horizontally very easily, it can also act as a cache for other slower databases that do not scale as well.
Threats
Relational databases are being improved
Relational databases are starting to support JSON formatted records, and are gradually adding operators for it, such as MySQL since version 5.7.8, and PostgreSQL since version 9.2.
Since the JSON-like format of MongoDB is one of the reasons why it is so flexible, the adoption of the format by other relational databases may reduce the need to use MongoDB. As other technologies evolve, MongoDB may not have as big of an edge on flexibility and scalability in the future.
Alternatives to MongoDB
There are other “NoSQL” database management software that can compete with MongoDB on multiple fronts: JSON-like format, horizontal scalability, ease-of-use, etc.
CouchDB
CouchDB shares several key features with MongoDB:
- Document-based database
- JSON-like format
- Similar query structure
- Scalability (clustering)
Yet, CouchDB has some additional features:
- REST API for most operations: shell access, CRUD operations and clustering
- Have a complementing browser-based database software, PouchDB, to cache data from CouchDB for offline use.
CouchDB puts more emphasis on being resilient against adverse computing environments, such as high network latency, loss of connection, limited bandwidth, etc. So, it may be more suitable for applications that are expected to be frequently offline.
CouchDB may run slower than MongoDB, since MongoDB runs on a binary protocol, while CouchDB relies on HTTP APIs.
PostgreSQL
PostgreSQL is primarily a relational database, but it also supports a good selection of operators for JSON data, allowing it to process non-relational queries for fields with JSON data.
It can be an acceptable middle ground for those who are already invested in relational databases, yet want to store JSON type of data as well. Or, it may be suitable for existing document-oriented database users who want to migrate to a relational database while having the option to store JSON data.