Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The cage.csv file is in my google drive: https://drive.google.com/open?id=1zNuVGoXfwg3oVhiaARD8WtA08UZJ-xE8 Question 1a: Nicolas Cage: National Treasure or Threat To National Security? [6 points] It is a

image text in transcribed

The cage.csv file is in my google drive: https://drive.google.com/open?id=1zNuVGoXfwg3oVhiaARD8WtA08UZJ-xE8

Question 1a: Nicolas Cage: National Treasure or Threat To National Security? [6 points] It is a well-established fact that, at least from the years 1999-2009, there was a strong positive correlation between the number of films Nicolas Cage appeared in and the number of people who died by falling into a swimming pool. That is, the more movies he was in, the more peopled died by falling into pools (specifically after falling in, the correlation does not hold for people who entered the pool of their own volition) ...No, really. This is real data: 5filmo . Number of movies Nicolas Cage appears in: https://m.imdb.comamemo U.S. Mortality data: https://wonder.cdc.govlucd-icd10.html Using the file cage.csv, explore the relationship between Nicolas Cage's acting and these (rather tragic) deaths. You may wish to consult http:l/scikit arn stable/modules/linear model.ht (i) (2 points) Create two plots: One that shows the number of movies Nicolas Cage stars in per year and one that shows the number of people who drown after falling in a pool per year. In [ ]: | | # YOUR CODE HERE (i) (1 point) Use scikit-learn to create two linear models that relate the number of movies he stars in to the number of people who drowned after falling into in a swimming pool. Your first model should use linear regression (also called ordinary least squares), the other should use the "lasso" model. (You can access the lasso model using using the sklearn.linear_model.Lasso() function, the inputs and outputs are the same as the sklearn.linear model.LinearRegression () function we used in class.) You might need to use the fixlDarray function I showed in class for converting a one-dimensional list into the format needed to use in scikit-learn In [ 1:1 def fix1Darray (L) return [[x] for x in L] in [ ]:| 1 #YOUR CODE HERE (ii) (1 point) Now lets use our linear regression models to predict the number of people who might drown based on the number of films Nicolas Cage appears in. Then, create a scatter plot. The x axis will be the number of movies Nicolas Cage appears in; the y axis will be the number of people who drown after falling into in a swimming pool. Put three things on the plot a scatter plot showing the real relationship between the number of movies Nicolas Cage appears in and the number of people who drowned after falling into in a swimming pool on the y axis, and two lines of best fit for the predictions that each linear regression model predicted. in [ ]:| 1 #YOUR CODE HERE (iv) (2 points) Print out the R2 score for each model using the method shown in class to 2 decimal places. Which model performs better? Does this answer the question whether Nicolas Cage is a national treasure or threat to life? in [ ]:| 1 #YOUR CODE HERE 3 # Fill in the blanks: 4 print("I think the model that performs better is... because ...") 5 print("I conclude that... because...") Question 1a: Nicolas Cage: National Treasure or Threat To National Security? [6 points] It is a well-established fact that, at least from the years 1999-2009, there was a strong positive correlation between the number of films Nicolas Cage appeared in and the number of people who died by falling into a swimming pool. That is, the more movies he was in, the more peopled died by falling into pools (specifically after falling in, the correlation does not hold for people who entered the pool of their own volition) ...No, really. This is real data: 5filmo . Number of movies Nicolas Cage appears in: https://m.imdb.comamemo U.S. Mortality data: https://wonder.cdc.govlucd-icd10.html Using the file cage.csv, explore the relationship between Nicolas Cage's acting and these (rather tragic) deaths. You may wish to consult http:l/scikit arn stable/modules/linear model.ht (i) (2 points) Create two plots: One that shows the number of movies Nicolas Cage stars in per year and one that shows the number of people who drown after falling in a pool per year. In [ ]: | | # YOUR CODE HERE (i) (1 point) Use scikit-learn to create two linear models that relate the number of movies he stars in to the number of people who drowned after falling into in a swimming pool. Your first model should use linear regression (also called ordinary least squares), the other should use the "lasso" model. (You can access the lasso model using using the sklearn.linear_model.Lasso() function, the inputs and outputs are the same as the sklearn.linear model.LinearRegression () function we used in class.) You might need to use the fixlDarray function I showed in class for converting a one-dimensional list into the format needed to use in scikit-learn In [ 1:1 def fix1Darray (L) return [[x] for x in L] in [ ]:| 1 #YOUR CODE HERE (ii) (1 point) Now lets use our linear regression models to predict the number of people who might drown based on the number of films Nicolas Cage appears in. Then, create a scatter plot. The x axis will be the number of movies Nicolas Cage appears in; the y axis will be the number of people who drown after falling into in a swimming pool. Put three things on the plot a scatter plot showing the real relationship between the number of movies Nicolas Cage appears in and the number of people who drowned after falling into in a swimming pool on the y axis, and two lines of best fit for the predictions that each linear regression model predicted. in [ ]:| 1 #YOUR CODE HERE (iv) (2 points) Print out the R2 score for each model using the method shown in class to 2 decimal places. Which model performs better? Does this answer the question whether Nicolas Cage is a national treasure or threat to life? in [ ]:| 1 #YOUR CODE HERE 3 # Fill in the blanks: 4 print("I think the model that performs better is... because ...") 5 print("I conclude that... because...")

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Accounting And Auditing Research And Databases Practitioner's Desk Reference

Authors: Thomas R. Weirich, Natalie Tatiana Churyk, Thomas C. Pearson

1st Edition

1118334426, 978-1118334423

More Books

Students also viewed these Databases questions

Question

What is a TLB, and how does it improve EAT?

Answered: 1 week ago

Question

Please help me evaluate this integral. 8 2 2 v - v

Answered: 1 week ago