Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 10, 2024

Hi, I attached question for python. please help us solve that. thanks ## Introduction OpenData sources are growing in number. Many governments have OpenData initiatives

Hi, I attached question for python. please help us solve that. thanks

## Introduction

OpenData sources are growing in number. Many governments have "OpenData" initiatives that provide portions of government/public data for public access. It is motivated on the idea that some data should be freely available to everyone. From a government perspective, "opening up" data can also help to initiate innovative uses of public data.

For this assignment you will analyze an OpenData file on liquor purchases made by Class 'E' liquor license holders in the state of Iowa.

## Problem 1: 30 pts

Download the Iowa Liquor Sales dataset for 2018 as a csv file from CyBox for MIS407 at:

[Iowa-Liquor-Sales-2018.csv](https://iastate.box.com/shared/static/d7iwcwn9nizh9yp98pd9879hwq5eamx9.csv)

**NOTE: PLEASE DO NOT COMMIT THE IOWA LIQUOR SALES DATA FILE TO YOUR mis407s18-student-xx repo!** Downloading 20 copies of the file (and the repo containing a copy of the file) will overload my workstation.

Your task is to write a program that analyzes this file. There are 24 columns in the csv file, of which these are probably the interesting ones:

```

INVOICE_NUM = 0

DATE = 1

STORE_NUM = 2

STORE_NAME = 3

CITY = 5

COUNTY_NUM = 8

COUNTY_NAME = 9

VENDOR_NUM = 12

VENDOR_NAME = 13

ITEM_NUM = 14

ITEM_NAME = 15

BOTTLE_SIZE = 17

SALE = 21

VOLUME_LITERS = 22

VOLUME_GALS = 23

```

Write a program `ia03_1.py` that:

* Asks for the name of the input file

* Reads the file,

* Computes and (at the end) prints these values:

* **Total volume sold by county** in liters *ordered by county name*

* **Sum total of all counties** in liters

* Print the computed values on the screen, like this made-up sample output:

```

Alcohol sales by county:

1. Adair 31239.89 liters

2. Adams 5234.46 liters

3. Allamakee 77654.82 liters

...

97. Woodbury 702304.08 liters

98. Worth 20000.59 liters

99. Wright 62222.83 liters

Total: 20023945.01 liters

```

* Commit your program `ia03-01.py` for grading.

* Copy, paste, and commit the output of your program into the file `output-1.txt` for grading.

__NOTE__: Select based on **total volume**, not dollar value.

The program must use the `csv` library for file IO. For processing of data, you can use any of the built-in data types (sets, lists, tuples, string, and the various object/structures found in the Collections library). We are not - at this stage - using numpy, Pandas, or even SQLite to process such data. We will use these later in the course, but for now I want you to know how to do this assignment using the standard python library.

*Hints*:

* Don't try to read the entire CSV file using a `list(csvFile)` -- your computer may not have enough memory to load the entire file into a list at once. Instead, use a for loop over your csvFile and process data row-by-row.

* A per-county dictionary would be an excellent way to hold the data as you sum it. You could make a dictionary by county name, and for each row, add the liquor volume to the appropriate entry in the dictionary.

* Some of the data rows may be missing county names. You should probably skip those rows.

* Some of the county names may not be capitalized correctly. You could use the `.title()` method on the county name after you read it from the CSV row to get a consistent county title name.

## Problem 2: 10 pts

Copy and modify your program to print only the top 5 counties by volume.

* Create a copy of your program `ia03_1.py` to `ia03_2.py`

* Update the code in `ia03_2.py` to ask for the name of the input file, open it, read it, process the volume of liters by county, and print the *top 5 counties* ordered by volume:

* *Hint*: The `Counter` class in the python `collections` module can take a dictionary and return a sorted list of the top most common elements in a dictionary, like:

```

top_values = Counter(dict).most_common(5)

```

* Print the top_values similar to how all 99 were printed (but don't print the total), like this made-up sample output:

```

Top 5 counties alcohol sales by volume:

1. Polk 4,111,222.15 liters

2. Linn 1,444,333.59 liters

3. Scott 1,333,666.80 liters

4. Johnson 1,222,777.53 liters

5. Black Hawk 1,111,888.34 liters

```

* Commit your program `ia03_2.py` for grading.

* Copy, paste, and commit the output of your program into the file `output-2.txt` for grading.

## Bonus: 20 pts (optional)

Here is a CSV file containing total population for each county in Iowa: [iowa_county_pop.csv](https://iastate.box.com/shared/static/qmr1hlewae1hont0ylvpu7djifumgimk.csv). Use this data to calculate a per capita volume consumption figure by re-using your code from `ia03_1.py` in a new program.

Name this program `ia03_drinkers.py`. Ask the user for the county population filename, the Iowa alcohol sales filename, and the name of the output file.

Use this county population data and the liquor sales data file you downloaded above, and rank order the counties from top to bottom with respect to total **per capita volume** of liquor purchased by class E stores located within the county. Write the results to a csv file called `drinking_counties.csv`.

* Commit your program `ia03_drinkers.py` and your *output* `drinking_counties.csv` file in your `IA03` commits for grading.