A data warehouse is a centralized repository of integrated data collected from different sources. It is designed to support decision-making processes by providing historical, subject-oriented, and consistent information.
Aspect | OLTP (Online Transaction Processing) | OLAP (Online Analytical Processing) |
---|---|---|
Characteristic | Operational processing | Informational processing |
Orientation | Transaction-based | Analysis-based |
Primary Task | Performs online transaction and query processing | Provides decision support through data analysis |
User | Clerks, DBAs, database professionals | Knowledge workers (e.g., managers, executives, analysts) |
Function | Supports day-to-day operations (purchasing, inventory, payroll, registration, accounting, banking) | Supports long-term informational requirements and complex decision-making |
DB Design | ER-model based, application-oriented | Star schema / Snowflake schema, subject-oriented |
Data | Current, up-to-date, real-time | Historical, accuracy maintained over time |
Summarization | Primitive, highly detailed | Summarized, consolidated |
View | Detailed, flat relational view | Summarized, multidimensional view |
Unit of Work | Short, simple transactions | Complex queries |
Access Pattern | Frequent Read/Write | Mostly Read |
Focus | Data In (capturing and updating data) | Information Out (reporting, analysis, insights) |
Operations | Index/hash on primary key | Lots of scans |
Number of Records | Tens | Millions |
Number of Users | Thousands | Hundreds |
Database Size | GB to high-order GB | ≥ TB |
Priority | High performance, high availability | High flexibility, end-user autonomy |
Metric | Transaction throughput | Query throughput, response time |
Examples | Banking systems, airline reservations, retail sales | Data warehouses, business reporting, trend analysis, forecasting |
A separate data warehouse is maintained alongside operational databases because both serve different purposes, have different workloads, and require different optimization strategies.
Operational Databases (OLTP):
Data Warehouse (OLAP):
OLTP Systems:
OLAP Systems:
Operational Databases:
Data Warehouses:
OLTP:
OLAP:
Because OLTP and OLAP serve different objectives, workloads, and data requirements, it is presently necessary to maintain separate databases: _ OLTP for operational efficiency. _ OLAP (Data Warehouse) for decision support, analysis, and strategic insights.
Data warehouses are typically built on a three-tier architecture that separates data storage, data processing, and data presentation. This layered approach improves scalability, performance, and maintainability.
The bottom tier is the data warehouse server, usually implemented as a relational database system (RDBMS).
Functions:
Gateways: Interfaces used for data extraction from heterogeneous sources.
Metadata Repository:
Tier | Main Component | Function |
---|---|---|
Bottom | Data warehouse DB server | Stores data; handles ETL (extraction, transformation, loading); maintains metadata. |
Middle | OLAP server (ROLAP/MOLAP) | Provides analytical processing and multidimensional modeling. |
Top | Front-end tools | Provides user access for querying, reporting, analysis, mining, and visualization. |
An enterprise warehouse collects information about subjects spanning the entire organization.
Provides corporate-wide data integration from multiple operational systems and external sources.
Scope:
Contents:
Implementation:
Use Case:
A data mart contains a subset of corporate-wide data, focusing on specific subjects or departments.
Scope:
Contents:
Implementation:
Challenges:
Types of Data Marts:
A virtual warehouse consists of a set of views over operational databases rather than storing data physically.
Contents:
Advantages:
Disadvantages:
Use Case:
Feature | Enterprise Data Warehouse (EDW) | Data Mart | Virtual Warehouse |
---|---|---|---|
Scope | Entire organization, cross-functional | Specific department or subject area | Logical views over operational databases |
Data | Detailed + summarized | Mostly summarized | Views (some materialized) |
Implementation | Large-scale, time-consuming (years) | Quick (weeks to months) | Easy and fast |
Cost | High | Moderate to low | Low (depends on operational DB capacity) |
Integration | Strong, consistent | May lack integration if independent | Limited |
Best for | Large enterprises with complex needs | Departments needing focused insights | Organizations needing quick setup |
Starts with the development of an Enterprise Data Warehouse (EDW) covering the entire organization.
From the EDW, dependent data marts are created for different departments or business units.
Advantages:
Disadvantages:
Starts with the creation of independent data marts for specific departments or functions.
Later, these data marts may be integrated into a broader enterprise warehouse.
Advantages:
Disadvantages:
Combines both strategies: an overall enterprise-wide architecture is planned (top-down), but incremental data marts are built first (bottom-up).
These data marts follow the enterprise-wide standards, ensuring integration.
Advantages:
Feature | Top-Down Approach | Bottom-Up Approach | Hybrid Approach |
---|---|---|---|
Starting Point | Enterprise Data Warehouse (EDW) | Independent Data Marts | Data Marts aligned with EDW architecture |
Cost | High | Lower | Moderate |
Development Time | Long (years) | Short (weeks/months) | Medium |
Flexibility | Low | High | Medium-High |
Integration | Strong and consistent | Weak, may lead to silos | Strong |
Return on Investment | Slow | Quick | Balanced |
Best for | Large organizations with long-term vision | Organizations needing quick insights | Most real-world use cases |
The most common modelling paradigm is the star schema, in which the data warehouse contains:
The schema graph resembles a starburst, with the dimension tables displayed in a radial pattern around the central fact table.
Example: A star schema for AllElectronics sales is shown in Figure.
Made By SOU Student for SOU Students