Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

DUE 2 0 MAY PLEASE GIVE CORRECT CODE WITH DETAILED EXPLANATION I WANT TO UNDERSTAND ALSO NOT JUST SEE A CODE The goal of this

DUE 20 MAY PLEASE GIVE CORRECT CODE WITH DETAILED EXPLANATION I WANT TO UNDERSTAND ALSO NOT JUST SEE A CODE
The goal of this homework assignment is to explore the KMeans algorithm using the given dataset airlines.csv. Throughout this assignment, you will perform various tasks including data description, data preprocessing, exploratory data analysis, and determining the optimal number of clusters using KMeans.
Tasks
1. Data Description
The provided raw data is in the airlines.csv file.
The description of the raw data is as follows:
id: Unique ID
balance: Number of miles eligible for award travel
qual_mile: Number of miles counted as qualifying for Topflight status.
cc1_miles: Number of miles earned with freq. flyer credit card in the past 12 months:
cc2_miles: Number of miles earned with Rewards credit card in the past 12 months:
cc3_miles: Number of miles earned with Small Business credit card in the past 12 months:
1: under 5,000
2: 5,000-10,000
3: 10,001-25,000
4: 25,001-50,000
5: over 50,000
bonus_miles: Number of miles earned from non-flight bonus transactions in the past 12 months.
bonus_trans: Number of non-flight bonus transactions in the past 12 months.
flight_miles_12mo: Number of flight miles in the past 12 months.
flight_trans_12: Number of flight transactions in the past 12 months.
days_since_enrolled: Number of days since enrolled in flier program.
award: whether that person had an award flight (free flight) or not.
2. Check for Missing Values
Perform data preprocessing to check for any missing values in the dataset.
3. Analyze Features
Create histograms to understand the distribution of different features in the dataset.
4. Calculate Percentage of Customers with/without Award
Find the percentage of customers who do not have an award flight and those who do have an award flight.
5. Correlation Analysis
- Find which feature is correlated with the balance feature.
- Draw a correlation heatmap to visualize the correlations among different features.
6. Plotting
Plot the relationship between frequent flying bonuses and non-flight bonus transactions.
7. Determining Optimal Number of Clusters
- Apply MinMaxScaler to normalize the data.
- Use the Elbow Method and Silhouette Score to find the ideal number of clusters for KMeans algorithm.
this is a short part from the information in the airline.csv file:
id,balance,qual_miles,cc1_miles,cc2_miles,cc3_miles,bonus_miles,bonus_trans,flight_miles_12mo,flight_trans_12,days_since_enroll,award
1,28143,0,1,1,1,174,1,0,0,7000,0
2,19244,0,1,1,1,215,2,0,0,6968,0
3,41354,0,1,1,1,4123,4,0,0,7034,0
4,14776,0,1,1,1,500,1,0,0,6952,0
5,97752,0,4,1,1,43300,26,2077,4,6935,1
6,16420,0,1,1,1,0,0,0,0,6942,0
7,84914,0,3,1,1,27482,25,0,0,6994,0
8,20856,0,1,1,1,5250,4,250,1,6938,1
9,443003,0,3,2,1,1753,43,3850,12,6948,1
10,104860,0,3,1,1,28426,28,1150,3,6931,1
the file has 4000 lines in total with similar information, I attached a photo also
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David M. Kroenke, David J. Auer

7th edition

133544621, 133544626, 0-13-354462-1, 978-0133544626

More Books

Students also viewed these Databases questions