Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

from csv import reader ### The Google Play data set ### opened_file = open('googleplaystore.csv') read_file = reader(opened_file) android = list(read_file) android_header = android[0] android =

from csv import reader ### The Google Play data set ### opened_file = open('googleplaystore.csv') read_file = reader(opened_file) android = list(read_file) android_header = android[0] android = android[1:] ### The App Store data set ### opened_file = open('AppleStore.csv') read_file = reader(opened_file) ios = list(read_file) ios_header = ios[0] ios = ios[1:]

To make it easier to explore the two data sets, we'll first write a function named explore_data() that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set.

def explore_data(dataset, start, end, rows_and_columns=False): dataset_slice = dataset[start:end] for row in dataset_slice: print(row) print(' ') # adds a new (empty) line between rows if rows_and_columns: print('Number of rows:', len(dataset)) print('Number of columns:', len(dataset[0])) print(android_header) print(' ') explore_data(android, 0, 3, True)
['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] ['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up'] ['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']
Number of rows: 10841 Number of columns: 13
print(' ') explore_data(ios, 0, 3, True) 
['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic'] ['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1'] 
Number of rows: 7197 Number of columns: 16

Deleting Wrong Data

print(android[10472]) # incorrect row print(' ') print(android_header) # header print(' ') print(android[0]) # correct row 
['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up'] ['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver'] 
print(len(android)) del android[10472] # don't run this more than once print(len(android)) 
10841 10840
for app in android: name = app[0] if name == 'Instagram': print(app) 
['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] ['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device'] 

In total, there are 1,181 cases where an app occurs more than once:

duplicate_apps = [] unique_apps = [] for app in android: name = app[0] if name in unique_apps: duplicate_apps.append(name) else: unique_apps.append(name) print('Number of duplicate apps:', len(duplicate_apps)) print(' ') print('Examples of duplicate apps:', duplicate_apps[:15]) 
Number of duplicate apps: 1181

Let's start by building the dictionary.

In [8]:

reviews_max = {} for app in android: name = app[0] n_reviews = float(app[3]) if name in reviews_max and reviews_max[name] < n_reviews: reviews_max[name] = n_reviews elif name not in reviews_max: reviews_max[name] = n_reviews 

In a previous code cell, we found that there are 1,181 cases where an app occurs more than once, so the length of our dictionary (of unique apps) should be equal to the difference between the length of our data set and 1,181.

In [9]:

print('Expected length:', len(android) - 1181) print('Actual length:', len(reviews_max)) 
Expected length: 9659 Actual length: 9659

WRITE COMMENT ON THE CODE

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2014 Nancy France September 15 19 2014 Proceedings Part 2 Lnai 8725

Authors: Toon Calders ,Floriana Esposito ,Eyke Hullermeier ,Rosa Meo

2014th Edition

3662448505, 978-3662448502

More Books

Students also viewed these Databases questions