Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

INSTRUCTIONS This assignment testsyour ability to create a simple regression solution to a prediction problem. You will use a dataset of bike rentals fromCapital Bikeshare

INSTRUCTIONS

This assignment testsyour ability to create a simple regression solution to a prediction problem. You will use a dataset of bike rentals fromCapital Bikeshare system, Washington D.C., USA which ispublicly available athttp://capitalbikeshare.com/system-data.You will do a regression that will predict the number of bike rentals on a particular day given the weather conditions.

There are two questions in this assignment and one optional True and False section.

The file bikeshare_data.csvin Vocareum contains an extract of the data downloaded from link provided in the description.

About the data

In any machine learning problem, it is very interesting to get a feel of the data in its entirety. Although we might not be using all the columns in the data provided, it is generally a good practice to know what the different columns are because understanding data is the key. The data consists of the following fields:

- instant: record index

- dteday : date

- season : season (1:spring, 2:summer, 3:fall, 4:winter)

- yr : year (0: 2011, 1:2012)

- mnth : month ( 1 to 12)

- hr : hour (0 to 23)

- holiday : whether the day is a holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule)

- weekday : day of the week

- workingday : if the day is neither weekend nor holiday-1, otherwise is 0.

+ weathersit :

- 1: Clear, Few clouds, Partly cloudy, Partly cloudy

- 2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

- 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

- 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

- temp : Normalized temperature in Celsius. The values are divided into 41 (max)

- atemp: Normalized feeling temperature in Celsius. The values are divided into 50 (max)

- hum: Normalized humidity. The values are divided into 100 (max)

- windspeed: Normalized wind speed. The values are divided into 67 (max)

- casual: count of casual users

- registered: count of registered users

- cnt: count of total rental bikes including both casual and registered

Goal: use this data to do a regression analysis that focuses on predicting the number of bike rentals for a particular day.

Question 1: Preprocessing

In this question, you will prepare the data before building your regression model. After preparing the data, you will save it as a CSV.

Follow these steps to prepare the data:

  1. Read the data with the pandas read_csv function
  2. Let us see how many rows and columns the data has for the sanity check. The total should be 17379 rows and 17 columns.
  3. Our objective here is to cleanse this data.We aim to do a linear regression analysis on the data only for working days between 9 AM to 6 PM. This means that we will have to remove some rows which are not of our interest. The following steps would demonstrate what we need to do.
  4. Let us first remove all the data for the holidays. This would mean removing all the rows where the holidayfield is one. Let us first see how many such rows are there.There should be 500 such rows. Now remove them.
  5. Now, let us remove all the days which were not a working day. This would mean removing all the rows where the 'workingday' field is 0. There should be 5014 such rows. Remove all of them.
  6. Now, we would take only the data for times 9 AM to 6 PM. This would mean that we only need to take the rows where the value of the 'hr' column is between [9,17] as 17 depicts the time frame 5 PM to 6 PM. Remove all the rows that do not satisfy this condition. You should get 4477 rows.
  7. Since we want to see the impact of weather conditions on the number of booking, create a subset of this data now that contains temp, hum, windspeed, and cnt.
  8. Save this subset of data as a CSV called 'filtered.csv'. The output should look something like that (don't forget the header):

temp,hum,windspeed,cnt

0.16,0.43,0.3881,88

0.18,0.43,0.2537,44

0.20,0.40,0.3284,51

0.22,0.35,0.2985,61

WARNING: Do not change the order of the rows. If you do, the grader won't recognize the data and you will get a low grade.

HINT: The file 'filtered.csv' should have4478 rows, counting the header.

image text in transcribed
jupyter bike_share_data Last Checkpoint: a day ago (autosaved) Control Panel File Edit View Insert Cell Kernel Widgets Help Trusted Python 3 [3.6] O C Markdown There are some instructions you need to follow: . You only need to write your code in the comment area "Your Code Here". . Do not upload your own file. Please make the necessary changes in the Jupyter notebook file already present in the server. . Please note, there are several cells in the Assignment Jupyter notebook that are empty and read only. Do not attempt to remove them or edit them. They are used in grading your notebook. Doing so might lead to 0 points. In [6]: import numpy as np import pandas as pd from sklearn import datasets, linear_model Question 1 Transform the data. def filter_bike_data(filename = 'bikeshare_data.cav') : data = pd. read_csv( filename, header=0) ### ### YOUR CODE HERE ### data. to_cav(' filtered. csv', index=False)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Financial management theory and practice

Authors: Eugene F. Brigham and Michael C. Ehrhardt

12th Edition

978-0030243998, 30243998, 324422695, 978-0324422696

Students also viewed these Programming questions