Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Explore an online published machine learning project, compose a research report at least including: What is the problem? What is the type of machine learning?
Explore an online published machine learning project, compose a research report at least including:
What is the problem?
What is the type of machine learning?
What are the feature variables and target variables?
What data preprocessing was used?
How did the author explore the data?
What machine learning algorithms were used?
How the model's performance was evaluated?
What is the conclusion? Is it reasonable?
If you were the author, which part would you want to improve?
Pick the following Kaggle project:
https://www.kaggle.com/code/yassineghouzam/titanic-top-4-with-ensemble-modeling/notebook
Predict the default of a credit card
The training data set includes a binary variable, default payment (Yes = 1, No = 0), as the target variable, and the following 23 variables as the features variables:
X1: Amount of the given credit
X2: Gender (1 = male; 2 = female)
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others)
X4: Marital status (1 = married; 2 = single; 3 = others)
X5: Age
X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
X12-X17: Amount of bill statement. X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
X18-X23: Amount of previous payment. X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
Hint: Check the target variable; is the classes balanced or imbalanced?
data file: https://raw.githubusercontent.com/franklin-univ-data-science/data/master/credit_default.csv
Actions
Implement a model in Jupyter Notebook and discuss the following topics:
Describe the problem
What is the problem?
What is the type of machine learning?
What are the feature variables and target variables?
Data exploration and preprocessing
How did you explore the data?
How did you clean the data (are there missing or invalid values)?
Modeling
Split 20% data as the test set using the random status 123.
What machine learning algorithms were used? Which is better?
What evaluation metric do you prefer?
How did you evaluation model's performance?
How did you diagnose the model? Is it overfitting, under fitting, or good fitting?
Results and discussion
What is your model's results? Is it good? Do you have any concerns?
What is the problem?
What is the type of machine learning?
What are the feature variables and target variables?
What data preprocessing was used?
How did the author explore the data?
What machine learning algorithms were used?
How the model's performance was evaluated?
What is the conclusion? Is it reasonable?
If you were the author, which part would you want to improve?
Pick the following Kaggle project:
https://www.kaggle.com/code/yassineghouzam/titanic-top-4-with-ensemble-modeling/notebook
Predict the default of a credit card
The training data set includes a binary variable, default payment (Yes = 1, No = 0), as the target variable, and the following 23 variables as the features variables:
X1: Amount of the given credit
X2: Gender (1 = male; 2 = female)
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others)
X4: Marital status (1 = married; 2 = single; 3 = others)
X5: Age
X6 - X11: History of past payment. We tracked the past monthly payment records (from April to September, 2005) as follows: X6 = the repayment status in September, 2005; X7 = the repayment status in August, 2005; . . .;X11 = the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
X12-X17: Amount of bill statement. X12 = amount of bill statement in September, 2005; X13 = amount of bill statement in August, 2005; . . .; X17 = amount of bill statement in April, 2005.
X18-X23: Amount of previous payment. X18 = amount paid in September, 2005; X19 = amount paid in August, 2005; . . .;X23 = amount paid in April, 2005.
Hint: Check the target variable; is the classes balanced or imbalanced?
data file: https://raw.githubusercontent.com/franklin-univ-data-science/data/master/credit_default.csv
Actions
Implement a model in Jupyter Notebook and discuss the following topics:
Describe the problem
What is the problem?
What is the type of machine learning?
What are the feature variables and target variables?
Data exploration and preprocessing
How did you explore the data?
How did you clean the data (are there missing or invalid values)?
Modeling
Split 20% data as the test set using the random status 123.
What machine learning algorithms were used? Which is better?
What evaluation metric do you prefer?
How did you evaluation model's performance?
How did you diagnose the model? Is it overfitting, under fitting, or good fitting?
Results and discussion
What is your model's results? Is it good? Do you have any concerns?
Step by Step Solution
★★★★★
3.51 Rating (154 Votes )
There are 3 Steps involved in it
Step: 1
Here is a Python method sort to sort a LinkedList object in descending order of the attributevalue using the algorithm described in the link def sorts...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started