Detailed presentation of MongoDB

Features

  • NoSQL
  • Sharding automatic
  • Own language, integrated into the driver similar to JavaScript
  • DSON - Binary JSON
  • Aggregation framework (similar to GROUP BY)
    • $lookup
  • Lookup framework (similar to LEFT JOIN)
    • $group
  • Native support for MapReduce
  • MongoDB Connector for BI
Allow traditional BI tools to access semi-structured and unstructured data
  • Different type of indexing
  1. Compound indexes (reference to multiple fields)
  2. TTL indexes (indexes that expire automatically)
  3. Unique indexes (value in the field or combination of fields must be unique)
  4. Array indexes (when the field to index contain an array of values)
  5. Geospatial indexes (to optimize queries related to location in a 2D space)
  6. Sparse indexes (when a field is not present in all documents)
  7. Partial indexes (since 3.2) (indexing is based on an expression, e.g. {stare: "active"}
  8. Hash indexes (primary used for hash based sharding)
  9. Text search indexes (specialized index for text search using linguistic rules, …)
  • Document validation within MongoDB: users can enforce checks on document structure, data types, data ranges, and the presence of mandatory fields.

Key points

Can be spread on low cost commodity hardware
Native sharding mechanism to build a fully resilient implementation without the need to add (costly) specific option to the stack.
Migration from RDBMS to MongoDB is possible by the means of tools, of course this means that the way the applications are working should be reviewed and adapted.
Some BI vendors provide native MongoDB connector, without using SQL, like MicroStrategy.
Write operations are ACID.
Uses write-ahead logging to an on-disk journal. Change operations are written first into the journal; in case of server failure or error before writing in the database, the journaled operations can be reapplied

Parallel with RDBMS

Terminology translation

RDBMS MongoDB
Database Database
Table Collection
Row Document
Index Index
JOIN Embedded document, document reference or $lookup()
GROUP BY Aggragation, $group()

Equivalent queries

RDBMS MongoDB
INSERT INTO n tables Insert() to 1 document
SELECT and JOIN n tables Find() single document
INSERT INTO "review" table, foreign key to product document Insert() to "review" collection, reference to initial document

Document sample

{
   first_name: “Paul”,
   surname: “Miller”,
   city: “London”,
   location: [45.123,47.232],
   cars: [
       { model: “Bentley”,
         year: 1973,
         value: 100000, ….},
       { model: “Rolls Royce”,
         year: 1965,
         value: 330000, ….},
   ]
}

Migration to MongoDB

Use the mongoimport tool.

Use any ETL tools (Informatica, Pentaho, Talend with native connector).
Many migrations involve running the existing RDBMS in parallel with the new MongoDB database, incrementally transferring production data:
  • As records are retrieved from the RDBMS, the application writes them back out to MongoDB in the required document schema.
  • Consistency checkers, for example using MD5 checksums, can be used to validate the migrated data.
  • All newly created or updated data is written to MongoDB only.
Shutterfly used this incremental approach to migrate the metadata of 6 billion images and 20 TB of data from Oracle to MongoDB.

Security in MongoDB

Require MongoDB Enterprise Advanced subscription to have all the advanced security features
  • Kerberos
  • LDAP
  • Encryption
  • Audit logs
  • FIPS compliance
 
MongoDB ------> Kerberos / LDAP -------> Enterprise User Directory (Active Directory or LDAP)
 
Without subscription, only internal users authentication.
Key file to authenticate node at the cluster level. X.509 certificates can also be used.
User authentication inside MongoDB on a per database level
  • Using Linux PAM integration
  • LDAP authentication, not authorization (in MongoDB Enterprise Advanced)
  • Kerberos => support Active Directory as a central repository for users (in MongoDB Enterprise Advanced)
  • Support for X.509 certificates authentication (in MongoDB Enterprise Advanced)
  • Integrate also with Red Hat Identity Management
Using roles for fine-grained privileges for users or applications (at DB and collection level)
  • You can have privilege to insert data but not to update or delete
  • Privileges to create DB or collections
  • Cluster wide-privileges for some users
=> follow a bit the granularity of privileges in RDBMS, allowing segregation of duties
 
Possibility to restrict the returned content of a document (removing some fields) - done via Redaction stage of the Aggregation Pipeline
Audit the administrative action, schema action and data manipulation
  • Filtering of the logs is possible
  • Logs can be written to multiple location in a variety of format (JSON, BSON, Syslog, …)
Encryption
  • Of the communication links
  • Of the storage (encrypted storage engine since version 3.2)
    • Transparent for the application
    • Support KMIP for integration with third party key management appliance
    • From user testing, storage engine encryption reduce the performance by 15%
  • Support FIPS validated cryptographic module (FIPS 140-2)
With MongoDB Enterprise Advanced, you also got the MongoDB Ops Manager:
  • monitoring (with graphs of metric about a lot of parameters of the database)
  • Backups
  • Point-in-time recovery