Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Exploring the Taxi Data Set In this section we'll have a first look at a large data set. The data set contains taxi cab journey

Exploring the Taxi Data Set
In this section we'll have a first look at a large data set. The data set contains taxi cab journey information from New York city for June 2015. This particular data set is for New York's "Yellow Cabs" and is structured as a very large CSV file.
A "small" extract of the data set is available in the file:
L: \SCMS \ENGEN103\data\taxi\yellow_tripdata_2015-06-1%.cSV
NOTE that you either need the @ symbol before the string, or to use ?????? for every ??? See the steel.csv example above.
This file contains a random 1% sample of the complete file. We'll work with the complete file at the end of the exercise. It is on the local scratch drive so reading it doesn't slow over the network.
As a small but important aside, when working with large files, sometimes you'll want to abort a command that's running. You can do that by typing ????C(Control-C). Note: ???Z appears to do the same thing but actually doesn't. ^Z suspends a running program so that it is stopped for now, but can be restarted later. The means it is still holding all the resources it has (other than CPU time). Don't use ???Z if you want to abort a command; it frequently ends in confusion. Use ????not??Z. If ???C doesn't work for some reason, try ???1
Write a Python program called
taxi.py that reads the data set and prints the total tips for Vendor ID=2. Print the result with a $ sign and two decimal places.
Notes:
a) Begin with as similar a program as you can find. Use the steel program from above as an example of using the csv library.
b) Develop your program with the 1% dataset but once it's working, use the full dataset in the file:
L: \SCMS \ENGEN103\data\taxi\yellow_tripdata_2015-06-1%.cSV
c) Running your program on the full dataset will take a long time. You may want to add a progress counter to see that your program is proceeding. Before your main loop, initialise a variable (say lines) to 0. Inside the loop add something like:
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases Demystified

Authors: Andrew Oppel

1st Edition

0072253649, 9780072253641

More Books

Students also viewed these Databases questions

Question

explain five important changes in the world of work;

Answered: 1 week ago

Question

Understand some techniques for evaluating the HRM function

Answered: 1 week ago