Familiarity with database concepts and basic understanding of statistics and machine learning techniques would be beneficial for comprehending data mining principles and methodologies.
Course Objective:
This course is designed for a section level integration of data mining. It is about how to discover significant data and therefore separate important patterns from it.
Unit-01 Introduction to Data Mining (17%)
What is data mining
What kind of data is mined?
Database data
Data Warehouses
Transactional data
Other kinds of data
What kind of patterns can be mined?
Introduction about Knowledge Discovery in Databases Process
Steps in the KDD Process
Real-life Applications of KDD
Examples from business intelligence, healthcare, marketing
Unit-02 Data Pre-processing (17%)
Data Pre-processing: An Overview
Data Quality: Why Pre-process the Data?
Major task in Data Pre-processing,
Data Cleaning, Missing Values, Noisy Data
Data Cleaning as Process
Data Integration
Entity identification problem
Redundancy and correlation analysis
Unit-03 Data Warehouse (24%)
What is a Data Warehouse?
Difference between Operational Database Systems and Data Warehouse
Why have a separate Data Warehouse?
Data Warehousing: A multitier architecture
Data Warehouse Models: Enterprise Warehouse
Data Mart, and Virtual Warehouse Extraction
Transformation and Loading
Data Warehouse Modelling: Data cube and OLAP
Data Cube: Multidimensional Data Model, Stars, Snowflakes, and Fact
Constellations: Schemas for Multidimensional Data Models
Typical OLAP Operations
Data Warehouse Design and Usage
Information Processing from OLAP to Multidimensional data Mining
Unit-04 Mining Frequent Patterns, Association: Basic Concepts and Methods (17%)
Market Basket Analysis: A Motivating Example
Frequent itemset, Closed itemset and Association Rules
Frequent Itemset Mining Methods
Apriori Algorithm: Finding Frequent Itemset by confined candidate generation
Generating Association rules from frequent itemset