mongodb interview questions

Here are 100 MongoDB interview questions and answers, designed for candidates from fresher to experienced levels. They cover everything from basic NoSQL concepts and CRUD operations to advanced topics like the Aggregation Pipeline, indexing strategies, replication, sharding, and real-world scenario-based questions. Each question is in bold, followed by a detailed answer. No dividing lines.

What is MongoDB and why is it classified as a NoSQL database?
Answer: MongoDB is an open-source, document-oriented NoSQL database that stores data in flexible, JSON-like BSON (Binary JSON) documents. It is considered a NoSQL database because it does not use traditional tables, rows, or rigid schemas like relational databases (SQL). Instead, it uses collections and documents, allowing for a dynamic schema and hierarchical data structures. This design makes it highly scalable and performant for modern web applications.

How does a document database differ from a relational database?
Answer: A relational database organizes data into tables with fixed rows and columns, where relationships are maintained through foreign keys and JOIN operations. In contrast, MongoDB stores data as flexible, self-contained documents. A single document can embed related data, which often eliminates the need for expensive JOINs. This leads to faster reads for certain workloads and better horizontal scalability.

What is a collection in MongoDB?
Answer: A collection is a grouping of MongoDB documents. It is the equivalent of a table in a relational database. Collections do not enforce a document schema; the documents within a single collection can have different fields and structures.

What is a document in MongoDB?
Answer: A document is the basic unit of data in MongoDB. It is a set of key-value pairs stored in BSON format (a binary representation of JSON). A document’s structure is flexible, and fields can contain arrays, sub-documents, and various data types.

What is BSON, and why does MongoDB use it instead of JSON?
Answer: BSON (Binary JSON) is a binary-encoded serialization format that extends JSON. It supports additional data types, like Date, Binary Data, and ObjectId, and enables faster scanning and traversal. MongoDB uses BSON for efficient storage and data exchange because it allows for reduced storage space and faster encoding/decoding compared to standard JSON.

What are the default indexes in MongoDB?
Answer: By default, MongoDB creates a unique index on the _id field for every collection. This index is on the primary key and prevents two documents from having the same _id value. You cannot remove this index.

Explain the CRUD operations in MongoDB and provide the corresponding methods.
Answer: CRUD stands for Create, Read, Update, Delete.

Create: insertOne(), insertMany().
Read: findOne(), find().
Update: updateOne(), updateMany(), replaceOne().
Delete: deleteOne(), deleteMany().

What is the difference between db.collection.find() and db.collection.findOne()?
Answer: The find() method returns a cursor to a set of documents that match the query filter. It can return zero, one, or many documents. The findOne() method returns a single document (the first match) or null if no documents match. It is a shorthand for find() with a limit of 1 and automatically returns the document object instead of a cursor.

How do you update documents in MongoDB? List the important update operators.
Answer: You can update documents using the updateOne(), updateMany(), or replaceOne() methods. Important update operators include:

$set: Sets the value of a field in a document.
$unset: Removes a specified field from a document.
$inc: Increments the value of a field by a specified amount.
$push: Appends a value to an array field.
$addToSet: Adds a value to an array only if it does not already exist.

What is the explain() method used for?
Answer: The explain() method provides information about the query execution plan. It helps in performance analysis. It can be used with find(), count(), update(), and remove() operations to see how MongoDB executes the query, including details like index usage, number of documents scanned, and execution time.

What is the Aggregation Framework in MongoDB?
Answer: The Aggregation Framework is MongoDB’s powerful, SQL-equivalent tool for data processing and transformation. It operates through a pipeline where documents pass through multiple stages, each performing a specific operation (like filtering, grouping, sorting, or reshaping). It is more complex and versatile than simple find() queries, enabling advanced data analytics.

What are the main stages of an aggregation pipeline?
Answer: Common stages include:

$match: Filters documents (similar to find()).
$group: Groups documents for aggregation (e.g., sum, avg).
$project: Reshapes documents, adds new fields, or excludes existing ones.
$sort: Sorts documents.
$limit / $skip: Paginates results.
$unwind: Deconstructs an array field into multiple documents.
$lookup: Performs a left outer join with another collection.
$addFields: Adds new fields to documents.

What is the $lookup stage? Provide an example.
Answer: The $lookup stage performs a left outer join between two collections, similar to SQL JOIN. It’s commonly used to combine data. Example: joining an orders collection with a users collection:

javascript

db.orders.aggregate([
  {
    $lookup: {
      from: "users",
      localField: "user_id",
      foreignField: "_id",
      as: "user_details"
    }
  }
])

This will add an array user_details to each orders document containing the matching user document.

What is the purpose of the $unwind stage?
Answer: The $unwind stage deconstructs an array field from input documents to output a separate document for each element of the array. This is essential before performing $group operations on the elements of an array.

What are the different types of indexes in MongoDB?
Answer: The main index types are:

Single Field Index: Index on a single field.
Compound Index: Index on multiple fields to support queries on those fields.
Multikey Index: Index on fields that contain array values; MongoDB automatically creates an index entry for each element in the array.
Text Index: Supports text search on string content.
Geo-spatial Index: Supports queries on geospatial coordinate data (2d sphere).
Hashed Index: Indexes the hash of a field’s value, used for sharding with hashed shard keys.
TTL Index: A special single-field index that automatically removes documents after a specified time.

How does indexing affect write performance?
Answer: Indexes improve read performance by reducing the number of documents MongoDB must scan. However, each index slows down write operations (INSERT, UPDATE, DELETE) because every write operation must also update all indexes on that collection. The trade-off is a balance between read speed and write throughput.

What is a compound index, and how do you determine field order?
Answer: A compound index is an index on multiple fields. The order of fields is critical for query efficiency. For best performance, put the field with the highest selectivity (most unique values) first. The index can support queries that include a prefix of the indexed fields. For example, an index on {state: 1, city: 1} can support queries on state alone and both state and city, but not on city alone.

What is a covered query?
Answer: A covered query is one where all the fields in the query filter and the projection are covered by an index. In this case, MongoDB can satisfy the query entirely using the index data without fetching the actual documents from disk, resulting in very fast performance.

What are Replica Sets in MongoDB?
Answer: A Replica Set is a group of MongoDB servers (nodes) that maintain the same data set, providing redundancy and high availability. A Replica Set typically has one primary node that receives all write operations and multiple secondary nodes that replicate the primary’s data. If the primary node fails, the set automatically elects a new primary.

What is the role of the Oplog (operations log) in replication?
Answer: The Oplog (operations log) is a special capped collection in the local database. The primary node records all write operations in its Oplog. Secondary nodes continuously read operations from the primary’s Oplog and apply them to their own data sets, ensuring eventual consistency.

What is Sharding in MongoDB?
Answer: Sharding is MongoDB’s method for distributing data across multiple machines horizontally. It is used to support deployments with very large data sets and high throughput operations. A sharded cluster consists of shards (data nodes), query routers (mongos), and config servers that store cluster metadata.

What is a Shard Key?
Answer: A shard key is a field or set of fields used to partition data across the shards. Choosing an appropriate shard key is vital for even data distribution and query performance. A poor shard key can lead to hot spots where a single shard handles most of the traffic.

What are the different sharding strategies?
Answer: There are two main strategies:

Hashed Sharding: Provides even data distribution by using a hash of the shard key. Best for keys with monotonically changing values like timestamps.
Range Sharding: Splits data into ranges based on the shard key value. It allows for efficient range queries but might lead to uneven distribution if not chosen correctly.

What is a mongos instance?
Answer: A mongos instance is a query router. It directs client read and write operations to the appropriate shard(s) in the cluster. It has no persistent data; it caches metadata from the config servers to know where to route requests.

What are Transactions in MongoDB?
Answer: Starting from version 4.0, MongoDB supports multi-document ACID transactions. A transaction is a logical unit of work that includes one or more read and write operations. All operations in a transaction are executed as a whole; if any part fails, the entire transaction is aborted and no changes are made, ensuring data consistency.

How do you handle schema design for a one-to-many relationship (e.g., a blog post and its comments)?
Answer: There are two main approaches:

Embedding: Store the comments as an array inside the blog post document. This is best if the number of comments is small, updates are relatively infrequent, and you always load the post with its comments.
Referencing: Store each comment in a separate collection with a reference back to the post’s _id. Use this when the “many” side is unbounded or large, updates are frequent, or you frequently want to load posts without their comments.

What are the trade-offs of embedding vs. referencing in MongoDB?
Answer:

Embedding: Provides better read performance (one round trip), atomic writes for the whole document, but can lead to document growth and document size limit (16MB) issues.
Referencing: Allows for unbounded relationships, prevents data duplication, but requires multiple queries (at best, two using $lookup).

How can you enforce data integrity and validation in MongoDB?
Answer: MongoDB provides JSON Schema Validation. You can define validation rules for a collection to enforce document structure, required fields, data types, and allowed values. This allows you to maintain data quality without the fixed schema of SQL.

What are the differences between ObjectId and a traditional auto-incrementing integer key?
Answer: ObjectId is a 12-byte identifier usually generated by the client driver. It is unique across shards and distributed systems without a central coordinator. A traditional auto-incrementing integer is simpler but creates a bottleneck in a distributed system. An ObjectId also contains a timestamp, meaning it is inherently sortable and can be used for time-based queries.

How do you handle large files (e.g., images, videos) in MongoDB?
Answer: MongoDB stores large files using GridFS, a specification for storing and retrieving files that exceed the 16MB BSON document size limit. It splits a file into chunks (default 255KB) and stores them as separate documents in two collections: fs.chunks (file data) and fs.files (file metadata).

What is the mongod process?
Answer: The mongod process is the primary daemon process for MongoDB. It handles data requests, manages data access, and is responsible for background data management operations. It is the core database instance.

What are some best practices for MongoDB security?
Answer:

Enable Authentication: Require users and applications to authenticate.
Role-Based Access Control (RBAC): Grant only the minimum necessary privileges.
Enable TLS/SSL: Encrypt data in transit.
Encrypt Data at Rest: Using the encrypted storage engine or file-system/disk encryption.
Network Hardening: Run MongoDB in a trusted network and bind to internal IPs.
Disable HTTP Status Interface: Turn off the legacy HTTP interface.

What is the purpose of the mongostat tool?
Answer: mongostat is a command-line tool that provides a quick overview of the status of a running mongod or mongos instance. It reports metrics like inserts, queries, updates, deletes, getmores, commands, flushes, virtual memory, resident memory, and connection count. It is useful for real-time performance monitoring.

What is the mongotop tool?
Answer: mongotop is a diagnostic tool that monitors and reports the read and write activity per collection in a MongoDB database. It helps identify collections that are causing a high load.

How do you ensure high availability in a MongoDB deployment?
Answer: By using Replica Sets. A properly configured replica set ensures that if the primary node goes down, a secondary node is automatically elected as the new primary, allowing the application to continue with minimal downtime.

What is Causal Consistency in MongoDB?
Answer: Causal consistency is a guarantee that if a read operation is performed after an update in the same client session, the read will see the effect of that update, even if the read is executed on a secondary node. It ensures a “read your own writes” guarantee in a distributed system.

How would you choose a shard key for a large e-commerce application?
Answer: A good shard key must have high cardinality (many distinct values) to prevent ‘hotspots’, provide for even data distribution, and support the most common query patterns. For an e-commerce app, a poor choice would be {category: 1} because it has low cardinality. A better choice might be {user_id: 1} (hashed) to distribute user data evenly and support frequent queries for a specific user’s purchase history.

What is the difference between a blocking and a non-blocking sort?
Answer: A non-blocking sort occurs when the query can use an index to return documents in sorted order. A blocking sort occurs when an index is not available; MongoDB must load all result documents into memory to sort them before returning any results. Blocking sorts are expensive and can be slow, especially on large result sets.

When would you disable the _id index on a MongoDB collection?
Answer: The _id index is mandatory on a standard collection. You cannot disable or remove it. It is required for the core functionality of the database.

What is a capped collection in MongoDB?
Answer: A capped collection is a fixed-size collection that automatically overwrites its oldest entries when it reaches its maximum size. They maintain insertion order and support high-throughput operations. They are excellent for use cases like logging, caching, or real-time message queuing.

How does MongoDB handle concurrency control?
Answer: MongoDB uses a multi-granularity locking mechanism with WiredTiger Storage Engine. It also uses snapshot isolation for read operations. A write operation will place a lock at the collection or database level depending on the modification. Starting from 4.0, multi-document transactions use an optimized transaction system.

What are the limitations of SQL JOINs in MongoDB and how does it compensate?
Answer: MongoDB’s native $lookup stage for JOINs can be less efficient for heavy relational operations compared to SQL databases. To compensate, MongoDB encourages data modeling for the application’s access patterns. By embedding related data and denormalizing, applications can often avoid JOINs entirely, resulting in significantly faster reads. If a JOIN is necessary, the Aggregation Pipeline can perform it, though it might be slower.

What is the WiredTiger storage engine?
Answer: WiredTiger is the default storage engine starting from MongoDB 3.0. It provides

Document-level concurrency control for write operations.
Compression for data and indexes.
Snapshots and checkpoints for consistency.
Superior performance and reliability.

Explain the concept of a Write Concern in MongoDB.
Answer: A Write Concern describes the level of acknowledgment requested from MongoDB for write operations. It is about durability. A value of {w: 1} acknowledges after the primary writes, {w: "majority"} acknowledges after a majority of replica set nodes have written, and {w: 0} provides no acknowledgment.

Explain the concept of a Read Preference in MongoDB.
Answer: A Read Preference describes how MongoDB clients route read operations to members of a replica set. It is about distribution and latency. Options include:

primary: All reads go to the primary.
primaryPreferred: Reads go to the primary, else secondary.
secondary: Reads go only to secondary.
secondaryPreferred: Reads go to secondary, else primary.
nearest: Reads go to the member with the lowest network latency.

How would you design a users and posts schema for a social media feed?
Answer: You would use a reference approach because posts can grow unbounded. The users collection would have _id, name, etc. The posts collection would have _id, user_id (reference to users), text, timestamp. To display a feed, you could run an aggregation with a $lookup to join the users collection, or better, denormalize the user’s name into the posts document to avoid a $lookup entirely for the feed view.

What happens when a document exceeds the 16MB BSON limit?
Answer: MongoDB will reject the write operation and throw an error. The application must redesign the document. Common solutions include breaking the document into multiple documents in a new collection and using references (normalizing) or using GridFS for large binary data files.

How can you secure a MongoDB deployment from unauthorized access?
Answer: Enable authorization in the configuration file and create a user in the admin database. Use TLS/SSL certificates to encrypt communication. Disable the HTTP interface. Use the principle of least privilege with role-based access control (RBAC). Run MongoDB in a secure network, utilizing firewalls to restrict access.

How does change streams work in MongoDB?
Answer: Change streams allow applications to access real-time data changes without the complexity and risk of tailing the Oplog. Applications can open a change stream cursor on a replica set, which returns events for insert, update, and delete operations. It’s a feature for real-time reactive systems.

What is an $inc operator used for?
Answer: The $inc operator increments the value of a field by a specified amount. If the field does not exist, $inc will create the field and set it to the specified amount.

What is the difference between $push and $addToSet?
Answer: $push always appends an element to the end of an array, even if it already exists, potentially creating duplicates. $addToSet adds the element only if it is not already in the array, ensuring all values remain unique.

Leave a Comment Cancel Reply