MDB-02 Data Modelling

Table of Contents

    MDB-02 Data Modelling

    Key Features of MongoDB

    • Schema-less design allows you to store data in JSON-like format.
    • Scalability: MongoDB supports horizontal scaling through sharding, making it efficient for big data.
    • High Performance: Designed for fast reads and writes with indexes and in memory capabilities.
    • Replication: Provides data redundancy and high availability

    Documents

    • Documents in MongoDB are the primary unit of storage, and each document is represented as a BSON object.
    • A document contains key-value pairs and can store complex data types like arrays and embedded documents.
    • The _id field is a unique identifier automatically generated for each document
    {
    	"_id": 1,
    	"name": “Pranav",
    	"address": {
    		"street": "123 Elm St",
    		"city": "Somewhere",
    	}
    }

    Collection

    • A collection is like a whole notebook that holds many pages (documents) about a specific topic.
    • In MongoDB, documents that are similar are grouped together in a collection.
    • Example: You might have a collection called "Pets" that contains all the documents for each of your pets

    Schema Design

    • MongoDB allows you to store data without a fixed schema, meaning documents in a collection can have different structures.
    • Schema Design Considerations:
      • Denormalization: Embed related data in a single document. This improves performance when reading, but increases the document size.
      • Normalization: Store data in separate collections and link them using references. This reduces redundancy but may lead to more complex queries.
      • Performance: Design schemas to optimize query performance based on read and write patterns.

    Embedded Documents

    • In embedding, the related data is stored within the same document as a nested structure.
    • Use this approach when data is closely related and usually accessed together.
    • Advantages: Faster read operations, simpler schema.
    {
      "_id": 1,
      "name": "Amit Sharma",
      "age": 16,
      "address": {
        "street": "MG Road",
        "city": "Ahmedabad",
        "pincode": "380015"
      }
    }

    References

    • Use when data is large, or there is a many-to-one or many-to-many relationship.
    • In referencing, related data is stored in separate documents, and a reference (usually an _id) is used to link them
    • Advantages: Avoids document duplication and reduces storage requirements.
    {
      "_id": 1,
      "name": "Amit Sharma",
      "age": 16,
      "address_id": 101
    }
    • Here, the address_id in the student's document references the _id of the address document in the Addresses collection. To get the complete information about a student, you would need to join or query both collections.

    Embedded vs References

    AspectEmbeddingReferencing
    SrtuctureNested data within the same documentRelated data stored separately
    PerformanceFaster reads, fewer queriesSlower reads due to joins
    Data DuplicationMay lead to duplicationReduces duplication
    FlexibilityHarder to u date shared dataEasier to update shared
    data
    Use CaseSimple, closely related dataComplex, reusable data

    MongoDb Data Structures

    • Database: A container for collections.
    • Collection: A group of documents, equivalent to a table in relational databases.
    • Document: A set of key-value pairs, analogous to a row in relational databases. Documents are stored in BSON (Binary JSON) format

    Querying MongoDB

    • MongoDB Query Language (MQL) allows you to query collections in a rich and flexible manner.
    db.collection.find({query})
    • The query returns documents that match the specified criteria.

    Comparison Operator

    • $eq: Matches values that are equal to a specified value.
    db.emp.find({ deptname: { $eq: "IT" } })
    • $ne: Matches values that are not equal.
    db.emp.find({ deptname: { $ne: "IT" } })
    • $gt: Matches values greater than a specified value.
    db.emp.find({ salary: { $gt: 80000 } })
    • $lt: Matches values less than a specified value.
    db.emp.find({ salary: { $lt: 80000 } })
    • $gte: Matches values greater than or equal to.
    db.emp.find({ empid: { $gte: 22 } })
    • $lte: Matches values less than or equal to
    db.emp.find({ empid: { $lte: 5 } })
    • $in: Allows you to match documents where a field’s value is within a specified array
    db.emp.find({ empid: { $in: [1,3,5,8] } })
    • $nin: Selects those documents where the field value is not equal to any of the given values in the array and the field that does not exist.
    db.emp.find({ empid: { $nin: [1,3,5,8] } })

    Logical Operator

    • $and: Joins query clauses with a logical AND
    db.emp.find({ $and: [ { deptname: "Sales" }, { salary: { $gt: 40000 } } ] })
    • $or: Joins query clauses with a logical OR
    db.emp.find({ $or: [ { deptname: "HR" }, { salary: { $lt: 30000 } } ] }
    • $not: Inverts the effect of a query expression.
    db.emp.find({ city: { $not: { $regex: "^B" } } })
    • $nor: Joins query clauses with a logical NOR.
    db.emp.find({$nor: [{salary: 3000}, {deptname: "IT"}]})
    • distinct
    db.emp.distinct(“deptname")

    Element Operator

    • $exists: Matches documents that have the specified field.
    db.emp.find({ gender: { $exists: true } })
    • $type: Selects documents if a field is of the specified type.
    db.emp.find({ salary: { $type: "number" } })

    Projection Operator

    • Retrieve only the name and age fields for all documents.
    db.students.find({}, { name: 1, age: 1 })
    • Exclude the address field from the results
    db.students.find({}, { address: 0 })
    • By default, MongoDB includes the _id field. To exclude it:
    db.students.find({}, { _id: 0, name: 1, age: 1 })

    CRUD Operations

    • MongoDB is a NoSQL database that stores data in a flexible, JSON-like format called BSON (Binary JSON)
    • It is known for its scalability, high performance, and rich querying capabilities.
    • CRUD stands for Create, Read, Update, and Delete.
    • These are the basic operations you can perform on data in a MongoDB database.

    Create Operation

    • To insert a document into a collection, use the insertOne() or insertMany() methods.
    // Switch to a database
     
    use school
    // Insert a single document
     
    db.students.insertOne({
    	name: "John Doe",
    	age: 21,
    	department: "Computer Science"
    })
    // Insert multiple documents
     
    db.students.insertMany([
    	{ name: "Jane Doe", age: 22, department: "Mathematics" },
    	{ name: "Sam Smith", age: 20, department: "Physics" }
    ])

    Read Operation

    • To retrieve documents from a collection, use the find() method.
    // Find all documents in the students collection
     
    db.students.find()
    // Find a document with a specific condition
     
    db.students.find({ age: 21 })
    // Find documents and format the output
     
    db.students.find({}, { name: 1, _id: 0 })

    Update Operation

    • To update documents, use the updateOne(), updateMany() methods.
    // Update a single document
     
    db.students.updateOne(
    	{ name: "John Doe" },
    	{ $set: { age: 22 } }
    )
    // Update multiple documents
     
    db.students.updateMany(
    	{ department: "Physics" },
    	{ $set: { department: "Astrophysics" } }
    )

    Delete Operation

    • To delete documents, use the deleteOne() or deleteMany() methods
    // Delete a single document
     
    db.students.deleteOne({ name: "Sam Smith" })
    // Delete multiple documents
     
    db.students.deleteMany({ age: { $lt: 21 } })

    Aggregation Framework

    • The Aggregation Framework in MongoDB is a powerful tool used for processing and analyzing data.
    • It allows you to transform and summarize data in a collection through a series of stages.
    • Each stage in an aggregation pipeline performs a specific operation on the input documents and passes the result to the next stage.
    • The output of the final stage is the result of the aggregation.

    Basic Stages of Aggregation

    • $match: Filters the documents to pass only those that match the specified condition.

    • $group: Groups input documents by a specified identifier and applies accumulator expressions.

    • $project: Shapes the documents by including or excluding fields.

    • $sort: Sorts the documents in ascending or descending order.

    • $limit: Limits the number of documents passed to the next stage.

    • $skip: Skips over a specified number of documents.

    • Let's consider a sales collection that contains documents about sales transactions.

    use store
    db.sales.insertMany([
    	{ item: "apple", quantity: 10, price: 5, date: new Date("2023-01-01")},
    	{ item: "banana", quantity: 5, price: 2, date: new Date("2023-01-02")},
    	{ item: "orange", quantity: 8, price: 3, date: new Date("2023-01-03")},
    	{ item: "apple", quantity: 3, price: 5, date: new Date("2023-01-04")},
    	{ item: "banana", quantity: 7, price: 2, date: new Date("2023-01-05")}
    ])
    • Using $match:
    // Filter sales of apples
     
    db.sales.aggregate([
    	{ $match: { item: "apple" } }
    ])
    • Using $group:
    // Group sales by item and calculate the total quantity sold:
     
    db.sales.aggregate([
    	{ $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } }
    ])
    • Using $project:
    // Project only the item and total cost (quantity * price):
     
    db.sales.aggregate([
    	{ $project: { item: 1, totalCost: { $multiply: ["$quantity", "$price"]}, _id: 0 } }
    ])
    • Using $sort:
    // Sort items by total quantity in descending order:
     
    db.sales.aggregate([
    	{ $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } },
    	{ $sort: { totalQuantity: -1 } }
    ])
    • Using $limit:
    // Limit the result to the top 2 items with the highest total quantity:
     
    db.sales.aggregate([
    	{ $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } },
    	{ $sort: { totalQuantity: -1 } },
    	{ $limit: 2 }
    ])
    • Using $skip:
    // Skip the first item and display the next one:
     
    db.sales.aggregate([
    	{ $group: { _id: "$item", totalQuantity: { $sum: "$quantity" } } },
    	{ $sort: { totalQuantity: -1 } },
    	{ $skip: 1 },
    	{ $limit: 1 }
    ])

    Use Cases of MongoDB Aggregation

    • Reporting: MongoDB’s aggregation framework can be used for generating complex reports, like calculating total sales per region, or average order value.
    • Data Transformation: Use aggregation to reshape data into the desired format.
    • Analytics: Useful in data analytics to derive insights from raw data, e.g., finding the most popular products in an e-commerce store.

    Resources


    Made By SOU Student for SOU Students