Question
Read the instructions below before you start your analysis. 1. Create a markdown document to prepare your answers. You should upload two (2) files on
Read the instructions below before you start your analysis. 1. Create a markdown document to prepare your answers. You should upload two (2) files on Canvas: (i) an .ipynb file; and (ii) a .PDF file that is generated by exporting the output from the first file. Both files should contain the required Python code, tables and charts, and all the required explanations and answers to the questions in the homework. 2. Include your group number in the name of the file you upload. For example, if your group number is 7, then name the file DSCI5240_HW1_Group7. 3. DO NOT use an absolute directory path. I should be able to replicate your results without trying to find the input data in another directory. 4. Use a seed of 123, wherever necessary, to ensure replicability. 5. Label the charts and/or tables appropriately so that it is easy to understand the information contained in a chart or table. 6. Any assignment submitted after the deadline will be considered late and will not be graded. DSCI5240 Homework 1 The Utilities dataset includes information on 22 public utility companies in the US. The variable definitions are provided below. Fixed_charge = fixed-charge covering ratio (income/debt) RoR = rate of return on capital Cost = cost per kilowatt capacity in place Load_factor = annual load factor Demand_growth = peak kilowatthour demand growth from 1974 to 1975 Sales = sales (kilowatthour use per year) Nuclear = percent nuclear Fuel_Cost = total fuel costs (cents per kilowatthour) For Questions 1-4 below, do not scale the data. 1. Compute the minimum, maximum, mean, median, and standard deviation for each of the numeric variables. Which variable(s) has the largest variability? Explain your answer. 2. Create boxplots for each of the numeric variables. Are there any extreme values for any of the variables? Which ones? Explain your answer. 3. Create a heatmap for the numeric variables. Discuss any interesting trend you see in this chart. 4. Run principal component analysis using unscaled numeric variables in the dataset. How do you interpret the results from this model? 5. Next, run principal component model after scaling the numeric variables. Did the results/interpretations change? How so? Explain your answers.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started