Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python The first task is to read the external data into the Python environment. The external files are named cleveland and va indicating data collected

Python

The first task is to read the external data into the Python environment. The external files are named "cleveland" and "va" indicating data collected from two different locations, and they are both .csv files with the same variable structure. Since the first row of the .csv files do not provide the variable names, you will have to create the variable names on your own. They are: age: age in years of the patient sex: 1 = male, 0 = female cp: chest pain type: 1 = typical angina, 2 = atypical angina, 3 = non-anginal pain, 4 = asymptomatic trestbps: resting blood pressure (in mm Hg on admission to the hospital) chol: serum cholestoral in mg/dl fbs: fasting blood sugar > 120 mg/dl. (1 = true, 0 = false) restecg: resting electrocardiographic results: 0 = normal, 1 = having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV), 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria thalach: maximum heart rate achieved exang: exercise induced angina (1 = yes; 0 = no) oldpeak: ST depression induced by exercise relative to rest slope: the slope of the peak exercise ST segment: 1 = upsloping, 2 = flat, 3 = downsloping ca: number of major vessels (0-3) colored by flourosopy thal: 3 = normal, 6 = fixed defect, 7 = reversable defect num: diagnosis of heart disease (angiographic disease status): 0 = < 50% diameter narrowing, 1 = > 50% diameter narrowing Create a DataFrame for each location, read the external data into the DataFrame and add a new variable, say "location", to indicate which location the data come from - either 'cleveland' or 'va'.

The second task is to get some useful information about the DataFrame, such as the dimension, data type, etc. and extract only the useful variables for our future use. The variables we are interested in at this moment are: age, sex, cp, trestbps, chol, num, along with the location variable. Create a new DataFrame that combines the two locations and that consists of all observations and only these 7 variables, and save the new DataFrame into an external .cvs file (let's call if full.csv)

could you show me the commands to run this

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Big Data And Hadoop Fundamentals Tools And Techniques For Data Driven Success

Authors: Mayank Bhushan

2nd Edition

9355516665, 978-9355516664

Students also viewed these Databases questions