Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In: In: 2. Preliminary Wrangling, Dataset Information: Dataset recording people to invest in each other in a way that is nancially and socially rewarding. On

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
In: In: 2. Preliminary Wrangling, Dataset Information: Dataset recording people to invest in each other in a way that is nancially and socially rewarding. On loans, borrowers list loan requests between 2, OOOrmd 35,000 and individual investors invest as little as $ 25 in each loan listing they select. Prosper handles the servicing of the loan on behalf of the matched borrowers and investors. A. Read the dataset called Pri-Load.csv 'l[ B. Check the data type and adjust datatype for all other categorical columns. 0. If you nd any of the missing values in the ProsperRating column, then drop it. 3. UNIVARIATE ANALYSIS A. What are the main features of interest in your dataset? Step1: Apply Univariate analysis using suitable charts for[ Loan status, Employment Status, Stated Monthly Income] Step2: Check if any column distribution skew or not. Step3: Write at least 2 observations 'For' each visualization. In [ ]: 4. BIVARIATE ANALYSIS A. Check the correlation matrix for all numeric variables. Maintain the Strong positive and Negative correlations columns. B. Check the relation between LoanOriginalAmount and BorrowerAPR columns. Step1: Use subplots. Plot1: Scatter Plot of LoanOriginalAmount and BorrowerAPR columns Plot2: HeatMap of LoanOriginalAmount and BorrowerAPR Step2: Write your observations. C.Display the seperate box plot for y = BorrowerAPR with x1 = LoanStatus , x2 = EmploymentStatus columns.Write your observations. In [ ]:5. MULTI VARIATE ANALYSIS, FEATURE ENGINEERING A. Write a program Step 1: Create a condition = 'LoanStatus'== 'Completed' | 'LoanStatus' == 'Defaulted' |\\' LoanStatus' == 'Chargedoff' Step2: Create a user define function using condition and LoanStatus column. Hint: df[ ' LoanStatus' ] = df . apply(user define function , axis=1) Sample output : Completed 168 Defaulted 59 B. Write a program Step 1: Create a dictionary called categories = 1: 'Debt Consolidation', 2: 'Home Improvement', 3: 'Business', 6: 'Aut o', 7: 'Other' Step2: Create a user define function using categories and ListingCategory (numeric) column. Hint: df[ 'ListingCategory (numeric) ' ] = df . apply(user define function , axis=1) Sample output : Debt Consolidation 106 Other 65B. Write a program Step 1: Create a dictionary called categories = 1: 'Debt Consolidation', 2: 'Home Improvement', 3: 'Business', 6: 'Aut o', 7: 'other' Step2: Create a user define function using categories and ListingCategory (numeric) column. Hint: df[ 'ListingCategory (numeric) ' ] = df . apply(user define function , axis=1) Sample output : Debt Consolidation 106 other 65 Business 25 Home Improvement 22 Auto 9 C. Display the box plot for ProsperRating (Alpha) vs LoanOriginalAmount and hue = Loan Status [Completed, Defaulted]. Write your observations. D. Display the catplot for ProsperRating (Alpha) vs ListingCategory (numeric) [Debt Consolidation, Other, Business, Home Improvement,Auto] and hue = Loan Status [Completed, Defaulted]. Write your observations. In [ ]:Exploratory Data Analysis Project 1. DESCRIPTIVE STATISTICS 1. Create a dataframe using below data and answer the below questions: Hourly_Income = [1000, 2009, 24418, 444478, 324235, 243242, 3434234, 7567457, 9235, 238237, 1312, 3412] Hourly_Expense = [651361, 217371, 2746, 2356, 13436, 5732, 346346, 3463, 1132, 23534, 242235, 235235] family_members_count = [3,4, 2,3, 1, 4,5,6,3,6,3,5] House_rent = [1299, 2300, 3411, 3422, 4566, 4211, 4600, 736, 672, 0, 734, 2374] Highest_income_Member = ["olivia", "George", "Isla", "Harry", "Ava", "Noah", "Sophia", "Jacobi", "Freddie", "Ella", "Grace", "E 1la" ] A. Display the five point summary of the data. B. What is the mean of the hourly expense? C. What is the median of the hourly expense? D. Find the family member with maximum income and using a suitable graph. E. Calculate IQR(the difference between 75% and 25% quartile) for Hourly_Income and Hourly_Expense . F. Calculate the standard deviation for first 2 columns. G. Calculate variance for the first 4 columns In [ ]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Graphical Approach To College Algebra

Authors: John E Hornsby, Margaret L Lial, Gary K Rockswold

6th Edition

0321900766, 9780321900760

More Books

Students also viewed these Mathematics questions