Aim: Installation of WEKA Tool.
Description: The installation of the WEKA tool ensures that users can perform various machine learning experiments and analyze datasets with a user-friendly graphical interface. It's widely used for data mining, data preprocessing, classification, clustering, regression, and visualization.
Code:
1. Download the WEKA software from the official website: https://www.cs.waikato.ac.nz/ml/weka/
2. Choose the appropriate version for your operating system (Windows/Linux/Mac).
3. Install Java Runtime Environment (JRE) if not already installed,since WEKA is a Java-based application.
4. Run the installer or executable file.
5. Launch WEKA GUI Chooser to start working with the tool.Output: Successful installation of WEKA with GUI Chooser window displaying options like Explorer, Experimenter, KnowledgeFlow, and Simple CLI.
WEKA stands for -
WEKA is mainly used for -
Which programming language is WEKA primarily written in?
Conclusion: WEKA was successfully installed, enabling the user to perform machine learning tasks such as preprocessing, classification, clustering, and visualization using its GUI-based environment.
Aim: Create a new weather ARFF file with appropriate attributes and data types.
Description: The ARFF (Attribute-Relation File Format) is a text file format used in WEKA for representing datasets. It contains a description of the dataset's attributes and the actual data instances. This file can then be used for applying classification algorithms like Decision Trees or Naive Bayes in WEKA.
Steps:
weather.arff, making sure to use the .arff extension.weather.arff.@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny, hot, high, FALSE, no
sunny, hot, high, TRUE, no
overcast, hot, high, FALSE, yes
rainy, mild, high, FALSE, yes
rainy, cool, normal, FALSE, yes
rainy, cool, normal, TRUE, no
overcast, cool, normal, TRUE, yes
sunny, mild, high, FALSE, no
sunny, cool, normal, FALSE, yes
rainy, mild, normal, FALSE, yes
sunny, mild, normal, TRUE, yes
overcast, mild, high, TRUE, yes
overcast, hot, normal, FALSE, yes
rainy, mild, high, TRUE, noThe full form of ARFF is -
Which symbol is used to define the data section in the ARFF file?
The file extension of ARFF file is -
.arffWhich of the following is a valid nominal attribute declaration?
Conclusion: A correctly formatted weather.arff file has been successfully created, demonstrating the basic ARFF structure and making the dataset ready for analysis in WEKA.
Aim: Create an employee ARFF file using the WEKA data mining tool.
Description: The ARFF (Attribute-Relation File Format) is used in WEKA to represent datasets in a structured format. In this practice, we'll create an Employee dataset in ARFF format by defining attributes (such as Employee ID, Name, Department, Age, Salary, and Status) with their appropriate data types (nominal or numeric). This dataset can later be used in WEKA for performing classification, clustering, and other data mining tasks.
Steps:
employee.arff.employee.arff.@relation employee
@attribute EmployeeID numeric
@attribute Name string
@attribute Department {HR, Sales, IT, Finance}
@attribute Age numeric
@attribute Salary numeric
@attribute Status {Permanent, Contract}
@data
1, "Amit", HR, 25, 30000, Permanent
2, "Neha", Sales, 28, 35000, Contract
3, "Raj", IT, 30, 45000, Permanent
4, "Simran", Finance, 26, 32000, Contract
5, "Karan", IT, 35, 60000, Permanent
6, "Pooja", HR, 29, 28000, Contract
7, "Arjun", Sales, 31, 40000, Permanent
8, "Meena", Finance, 27, 31000, Contract
9, "Vivek", IT, 40, 75000, Permanent
10, "Riya", HR, 24, 25000, ContractWhich keyword defines a relation name in an ARFF file?
Which data type is used for storing employee names in ARFF?
Which of the following is a nominal attribute in the Employee ARFF file?
Conclusion: We successfully created a formatted employee.arff file with multiple attributes. The dataset is now loaded in WEKA and is ready for data mining tasks like classification and clustering.
Aim: Apply Pre-Processing techniques to the training data set of Weather Table.
Description: Real-world databases are highly influenced by noise, missing values, and inconsistency due to their huge size. Data can be pre-processed to improve quality, handle missing values, and enhance efficiency. There are 3 main pre-processing techniques:
Steps:
weather.arff ).Code:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@attribute humidity numeric
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny, 85, 85, FALSE, no
sunny, 80, 90, TRUE, no
overcast, 83, 78, FALSE, yes
rainy, 70, 96, FALSE, yes
rainy, 68, 80, FALSE, yes
rainy, 65, 70, TRUE, no
overcast, 64, 65, TRUE, yes
sunny, 72, 95, FALSE, no
sunny, 69, 70, FALSE, yes
rainy, 75, 80, FALSE, yes
sunny, 75, 70, TRUE, yes
overcast, 72, 90, TRUE, yes
overcast, 81, 75, FALSE, yes
rainy, 71, 91, TRUE, noWhat is the difference between Normalize and Standardize filters in WEKA?
Which filter in WEKA is used to replace missing values?
ReplaceMissingValuesWhat is the primary reason for applying data pre-processing?
Conclusion: Pre-processing (Add, Remove, Normalize) was applied to the Weather dataset, improving its quality and making it ready for analysis in WEKA.
Aim: Apply Pre-Processing techniques to the training dataset of Employee Table.
Description: The Employee dataset created in Practical-03 can be pre-processed to improve data quality, handle missing or inconsistent values, and enhance efficiency. Pre-processing in WEKA includes:
Steps:
employee.arffCode:
@relation employee
@attribute EmployeeID numeric
@attribute Name string
@attribute Department {HR, Sales, IT, Finance}
@attribute Age numeric
@attribute Salary numeric
@attribute Status {Permanent, Contract}
@data
1, "Amit", HR, 25, 30000, Permanent
2, "Neha", Sales, 28, 35000, Contract
3, "Raj", IT, 30, 45000, Permanent
4, "Simran", Finance, 26, 32000, Contract
5, "Karan", IT, 35, 60000, Permanent
6, "Pooja", HR, 29, 28000, Contract
7, "Arjun", Sales, 31, 40000, Permanent
8, "Meena", Finance, 27, 31000, Contract
9, "Vivek", IT, 40, 75000, Permanent
10, "Riya", HR, 24, 25000, ContractWhy is pre-processing important in the Employee dataset?
Why is Employee_ID removed during pre-processing?
Which of the following can be used after pre-processing the employee data?
Conclusion: Pre-processing (Add, Remove, Normalize) was applied to the Employee dataset, improving its quality and making it ready for analysis in WEKA.
Aim: Find Association Rules for Buying Data.
Description: In data mining, association rule learning is a popular method for discovering interesting relationships between variables in large databases. It involves analyzing and presenting strong rules discovered in datasets using different measures of interestingness such as support, confidence, and lift. Association rules are commonly used in market basket analysis and are also applied in areas like web usage mining, intrusion detection, and bioinformatics.
Steps:
shopping.arff.shopping.arff).weka.associations.Apriori.milk=t => bread=t confidence: 1.0 support: 0.6butter=t => bread=t confidence: 1.0 support: 0.6Code:
@relation shopping
@attribute milk {t, f}
@attribute bread {t, f}
@attribute butter {t, f}
@attribute jam {t, f}
@data
t,t,t,f
f,t,t,f
t,t,f,f
t,t,t,t
f,t,f,tWhich of the following is an example of an association rule?
What is Association Rule Mining?
What is the main use of Association Rules?
Conclusion: Using WEKA and the Apriori algorithm, we extracted strong association rules from buying data. For example, customers buying milk also tend to buy bread with 100% confidence.
Aim: Construct a Decision Tree for Weather data and classify it.
Description: Classification is the process of building a model to describe and predict data values. A Decision Tree is a classification scheme with root, internal, and external nodes, where branches represent attribute values and leaves represent classes. It also generates rules from the dataset, using training data for model building and testing data for evaluation.
Steps:
Code:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,noWhat is a Decision Tree?
Which algorithm in WEKA is used to generate a Decision Tree?
Conclusion: Using the J48 algorithm in WEKA, we built a decision tree for weather data and visualized it in a tree diagram. The model classified instances with high accuracy.
Aim: Construct a Decision Tree for Customer data and classify it.
Description: Classification builds predictive models from data. A Decision Tree is widely used, consisting of root, internal, and external nodes, with branches representing attribute values and leaves representing decision classes. It uses training datasets for model construction and testing datasets for evaluation.
Steps:
Code:
@relation customer
@attribute age {young, middle, senior}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit_rating {fair, excellent}
@attribute buys_computer {yes, no}
@data
young,high,no,fair,no
young,high,no,excellent,no
middle,high,no,fair,yes
senior,medium,no,fair,yes
senior,low,yes,fair,yes
senior,low,yes,excellent,no
middle,low,yes,excellent,yes
young,medium,no,fair,no
young,low,yes,fair,yes
senior,medium,yes,fair,yes
young,medium,yes,excellent,yes
middle,medium,no,excellent,yes
middle,high,yes,fair,yes
senior,medium,no,excellent,noWhich algorithm in WEKA is used to construct a Decision Tree?
Which WEKA tab is used to build a Decision Tree?
Write one real-life application of Decision Trees.
Conclusion: Using the J48 algorithm in WEKA, we built and visualized a decision tree for customer data. The model classified customer behavior effectively and identified key attributes for purchase prediction.
Aim: Apply filters on the Weather dataset using WEKA.
Description: Preprocessing with filters is an important step in data mining that improves data quality and prepares it for classification. In WEKA, filters can be used to handle missing values, discretize numeric attributes, or remove irrelevant attributes. Applying filters ensures the dataset is clean, consistent, and ready for building accurate models like decision trees.
Steps:
weather.arff (use the given Weather dataset).weka.filters.unsupervised.attribute.ReplaceMissingValues → handles missing data.weka.filters.unsupervised.attribute.Discretize → converts numeric attributes to nominal.weka.filters.unsupervised.attribute.Remove → removes unwanted attributes.
Click Apply each time and observe the changes.
Code:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,noWEKA does not allow applying more than one filter at a time......
WEKA filters are mainly used for.......... and.......... of data.
Filters in WEKA can be applied under which panel?
What is the file format of the Weather dataset used in WEKA?
Conclusion: In this practical, we applied filters like ReplaceMissingValues, Discretize, and Remove on the Weather dataset in WEKA. These preprocessing steps improved dataset quality, making the data suitable for classification. After applying filters, the J48 decision tree was built successfully, giving more accurate results.
Aim: Apply filters on the customer dataset using WEKA.
Description: Data preprocessing is an important step in data mining to improve dataset quality, handle missing values, remove irrelevant attributes, and prepare it for classification. In this practical, we will apply filters such as Remove, ReplaceMissingValues, and Discretize on the Customer dataset. After preprocessing, we will classify the data using the J48 Decision Tree to observe the impact of filtering on model performance.
Steps:
customer.arff using the provided Customer dataset.weka.filters.unsupervised.attribute.ReplaceMissingValues → handles missing entries (if any).weka.filters.unsupervised.attribute.Discretize → converts numeric attributes to nominal (if present).weka.filters.unsupervised.attribute.Remove → removes the credit_rating attribute.
Click Apply after each filter and observe the changes.
Code:
@relation customer
@attribute age {young, middle, senior}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit_rating {fair, excellent}
@attribute buys_computer {yes, no}
@data
young,high,no,fair,no
young,high,no,excellent,no
middle,high,no,fair,yes
senior,medium,no,fair,yes
senior,low,yes,fair,yes
senior,low,yes,excellent,no
middle,low,yes,excellent,yes
young,medium,no,fair,no
young,low,yes,fair,yes
senior,medium,yes,fair,yes
young,medium,yes,excellent,yes
middle,medium,no,excellent,yes
middle,high,yes,fair,yes
senior,medium,no,excellent,noWhich filter is used to normalize numeric attributes in a customer dataset?
Which of the following is an example of an unsupervised filter?
A filter used for selecting important attributes from the customer dataset is
What is the main purpose of applying filters in WEKA?
Conclusion: Pre-processing filters were successfully applied to the Customer dataset, including removal of unnecessary attributes and handling of missing values. The J48 Decision Tree classifier was built on the filtered data, demonstrating improved data quality and readiness for accurate classification.
Aim: Implementation of K-means algorithm.
Description: K-Means is an unsupervised learning algorithm used to partition datasets into clusters based on similarity. In this practical, we will group customers using their age and income attributes. This helps in understanding customer segmentation and unsupervised learning concepts.
Steps:
Code:
@relation customer
@attribute name {x,y,z,u,v,l}
@attribute age {youth,middle,senior}
@attribute income {high,medium,low}
@attribute class {A,B}
@data
x,youth,high,A
y,youth,low,B
z,middle,high,A
u,middle,low,B
v,senior,high,A
l,senior,low,BWhat does the K in K-Means stand for?
Which of the following is commonly used to measure distance in K-Means?
The value of K can be automatically determined by the algorithm. True/False
What is the main purpose of the K-Means algorithm?
Conclusion: The K-Means algorithm successfully grouped the customers into clusters based on age and income. This practical demonstrates unsupervised learning and customer segmentation effectively.
Aim: Apply k-Means clustering and visualize the results.
Description: Cluster analysis assigns objects into groups (clusters) so that items in the same cluster are more similar to each other than to those in other clusters. In this practical, we apply k-Means clustering on a small customer dataset and visualize the results in WEKA.
Steps:
Code:
@relation customer
@attribute name {x,y,z,u,v,l}
@attribute age {youth,middle,senior}
@attribute income {high,medium,low}
@attribute class {A,B}
@data
x,youth,high,A
y,youth,low,B
z,middle,high,A
u,middle,low,B
v,senior,high,A
l,senior,low,BWhat type of machine learning does K-Means belong to?
Which type of visualization is best for showing K-Means clusters?
What is the effect of choosing the wrong K value?
Conclusion: k-Means clustering was successfully applied to the customer dataset. The algorithm grouped the customers into 2 distinct clusters, and visualization confirmed the natural groupings based on age and income.
Aim: Write a procedure for visualization of the weather dataset in WEKA.
Description: Data visualization is a key step in preprocessing to explore attributes and patterns before applying machine learning. In this practical, we load the weather dataset in WEKA and visualize attribute distributions and relationships.
Steps:
Code:
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,noWhat is the purpose of visualizing the Weather dataset in WEKA?
The color of data points in the scatter plot indicates:
Visualization helps in identifying ......... and ........ in the dataset.
The Visualize panel in WEKA uses only bar charts. True/False
Conclusion: We successfully visualized the weather dataset in WEKA. Visualization helped identify strong patterns such as the effect of weather conditions on the decision to play. Adjusting point size and jitter improved clarity, making trends easier to observe.
Aim: Write a procedure for visualization of the banking dataset in WEKA.
Description: Visualization allows us to explore distributions and relationships in datasets before applying models. In this practical, we load the banking dataset in WEKA and visualize attributes like gender, bank name, location, and deposit behavior.
Steps:
Code:
@relation bank
@attribute cust {male, female}
@attribute accno {0101, 0102, 0103, 0104, 0105, 0106, 0107, 0108, 0109}
@attribute bankname {sbi, hdfc, sbh, abi}
@attribute location {hyd, jmd, antr, pdt2, kdp}
@attribute deposit {yes, no}
@data
male,0101,sbi,hyd,yes
female,0102,hdfc,jmd,no
male,0103,sbh,antr,yes
female,0104,abi,pdt2,no
male,0105,abi,kdp,yes
female,0106,sbi,hyd,no
male,0107,hdfc,jmd,yes
female,0108,sbh,antr,no
male,0109,abi,pdt2,yes
Why is visualization important in data preprocessing?
The Weather dataset is usually saved in ........... format in WEKA.
To inspect a specific plot in WEKA, ......... on it.
Conclusion: We successfully visualized the banking dataset in WEKA. The visualizations helped analyze deposit behavior across banks, genders, and locations. Adjusting jitter and point size improved readability of plots, making patterns and anomalies easier to identify.
Made By SOU Student for SOU Students