Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

CS 112 Project 5 Dictionaries and File IO Due Date: Sunday, April 23rd, 11:59pm Last chance to use tokens! (P6 won't allow late submissions) The

CS 112 Project 5 Dictionaries and File IO Due Date: Sunday, April 23rd, 11:59pm Last chance to use tokens! (P6 won't allow late submissions) The purpose of this assignment is to explore dictionaries and file IO. We will be reading in data from two different datasets, merging them together into a single structure, and then analyzing our dataset by various metrics and searching our dataset for various important items. You will turn in a single python file following our naming convention (example: gmason76_2XX_PX.py) Similar to previous projects, include your name, G#, Lecture/Lab sections, and any extra comments we ought to know, as a comment at the top of your file. If you have questions, use Piazza (and professor/TA office hours) to obtain assistance. Remember, do not publicly post code for assignments on the forum! Ask a general question in public, or ask a private question (addressed to all "instructors") when you're asking about your particular code. Also please have a specific question; instead of "my code doesn't work, please help", we need to see something like "I'm having trouble when I add that particular line, what am I misunderstanding?". If you are unsure whether a question may be public or not, just mark it as private to be sure. We can change a post to public afterwards if it could have been public. Background: Dictionaries give an enriched way to store values by more than just sequential indexes (as lists give us); we identify key-value pairs, and treat keys like indexes of various other types. The only restriction on keys is that they are "hashable", which we can approximate by thinking that they are "immutable all the way down". Though unordered, dictionaries help us simplify many tasks by keeping those key-value associations. Each key can only be paired with one value at a time in a dictionary. When a file contains text, we can readily write programs to open the file and compute things based on the files contents. It turns out that reading and writing text files gives our programs far more longevity: we can store data and results for later, save user preferences, and all sorts of other useful things. We will be reading text files that happen to be in the CSV format (CSV stands for comma separated values). What's allowed? Here is the exhaustive list of things you can use on the project. You can ask if we've omitted something, but the answer is probably no. all basic expressions/operators, indexing/slicing. all basic statements: assignment, selection, and loop statements, break/continue, return functions: len(), range(), int(), float(), str(), set(), dict(), bool(), tuple() file reading: open(), .close(), .read(), .readline(), .readlines(), with syntax dictionaries: all methods listed in our slides on that chart methods: lists: .insert(), .append(), .extend(), .pop(), .remove() strings: .strip(), .split(), .join() sorted(), .sort(), reversed(), .reverse() This means that you can't call anything not listed above. Focus on applying these functions to solve the task you can't import any modules for this project (e.g. you can't import csv) you cant use try-except for this assignment 2 Procedure Complete the function definitions as described; you can test your code with the included testing file. Sample csv files have been provided with this assignment You can look for example file contents & databases, starting around line 75 or so in the tester. Invoke it as with prior assignments: python3 tester5p.py yourcode.py You can also test individual functions: python3 tester5p.py yourcode.py get_types You can also run your code in interactive mode: python3 i yourcode.py Note that there are 64 test cases worth1.25 points each, and 5 extra credit tests worth 1 point each. Scenario The Pokemon franchise consists of over 60 video games, 15 movies, several TV shows, a trading card game, and even a musical. It has been around for over two decades and at this point has fans of all ages. Because of this, people have become somewhat analytical about how they play the games. To help players of the Pokemon video games, some people have created a Pokemon data set with all the useful statistics, while other people have created data sets with the Pokemons other properties (such as type). 1 It will be your job to merge together these two data sets and give players some useful statistics and analysis of what you find. CSV file: This is a file containing ASCII text where each line in the file represents one record of information, and each piece of info in the record is separated by a single comma. The very first line is the "header" row, which names the columns but is not part of the data. You will be given two CSV files to work with: an info file (containing information such as the Pokemons type and the generation of the game it was introduced in) as well as a stats file (containing information on how good the Pokemon is at various things, such as attack and defense). Below are two very small sample files that can be used in our project. Note: the file extension has no effect on the file contents; you can edit these files in your code editor, and you can give them any extension you want without changing the ability of your program. It's best not to use MS Excel, as it often uses several different notions of what a CSV file should be and it is easy to mess them up. Example info file: "ID","Name","Type 1","Type 2","Generation","Legendary" 1,"Bulbasaur","Grass","Poison",1,"FALSE" 6,"Charizard","Fire","Flying",1,"FALSE" 4,"Charmander","Fire","",1,"FALSE" 169,"Crobat","Poison","Flying",2,"FALSE" 146,"Moltres","Fire","Flying",1,"TRUE" 643,"Reshiram","Dragon","Fire",5,"TRUE" 641,"Tornadus, (Incarnate Form)","Flying","",5,"TRUE" Example stats file: "ID","HP","Attack","Defense","Speed" 1,45,49,49,45 4,39,52,43,65 6,78,84,78,100 146,90,100,90,90 149,91,134,95,80 641,79,115,70,111 643,100,120,100,90 1 These data sets actually come from a single data set available for free here: https://www.kaggle.com/abcsds/pokemon. 3 Pokemon: We will use three different representations for a Pokemon inside our programs, but all three will be based on a tuple with values in some order. The first format (based on the information in an info file) will be called INFO FORMAT and will consist of a key-value pair structured as the following examples. # name: (id, type1, type2, generation, legendary) # 'Bulbasaur': (1, 'Grass', 'Poison', 1, False) # 'Charmander': (4, 'Fire', None, 1, False) The second format (based on the information in a stats file) will be called STATS FORMAT and will consist of a key-value pair structured as follows: # id: (hp, attack, defense, speed) # 1: (45, 49, 49, 45) The final format for a Pokemon will combine the two prior formats into a format for our DATABASE (see next section) and will consist of a key-value pair with the following structure: # name: (id, type1, type2, hp, attack, defense, speed, generation, legendary) # 'Bulbasaur': (1, 'Grass', 'Poison', 45, 49, 49, 45, 1, False) Note that name, type1, and type2 fields are strings, id, generation, and all the stats are integers, and the Pokemons legendary status is a boolean. Additionally, for pokemon that arent dual type, the 2nd type is set to be None. (Also notice that none of the formats duplicate the key as part of the data, as duplicated information is rarely a good idea). Database: a "database" of Pokemon can store multiple Pokemon by name. Our database will be a dictionary whose keys are Pokemon names, and whose values are tuples of interesting information about the Pokemon (in the combined final database format shown above). Only Pokemon with stored information and statistics may be present. sample_db = { "Bulbasaur": (1, "Grass", "Poison", 45, 49, 49, 45, 1, False), "Charmander": (4, "Fire", None, 39, 52, 43, 65, 1, False), "Charizard": (6, "Fire", "Flying", 78, 84, 78,100, 1, False), "Moltres": (146, "Fire", "Flying", 90,100, 90, 90, 1, True), "Crobat": (169, "Poison", "Flying", 85, 90, 80,130, 2, False), "Tornadus, (Incarnate Form)": (641, "Flying", None, 79,115, 70,111, 5, True), "Reshiram": (643, "Dragon", "Fire", 100,120,100, 90, 5, True) } Functions You must implement the following functions. Examples can be found later in this document (under Examples). Methods NEVER modify the given database; but some functions create a new database. read_info_file(filename): This is only one of two functions that deals with reading a file. It accepts the file name as a string, assume it is a CSV file in the format described above for an info file. The function needs to open the file, read all the described pokemon, and create a dictionary of pokemon in the INFO FORMAT. It returns the dictionary it creates. Note: the first line of the file is always a header line which does not corresponds to any pokemon; the last line of the file always ends with a newline (' '). Special case: Name field in the input file might contain one comma as part of the string, for example, "Tornadus, (Incarnate Form)". You can assume the name can have at most one comma and all other fields do not have any comma. 4 read_stats_file(filename): This is the other one of two functions that needs to deal with reading a file. It accepts the file name as a string, assume it is a CSV file in the format described above for a stats file. The function needs to open the file, read all the described pokemon, and create a dictionary of pokemon in the STATS FORMAT. It returns the dictionary it creates. Note: the first line of the file is always a header line which does not corresponds to any pokemon; the last line of the file always ends with a newline (' '). combine_databases(info_db, stats_db): This function takes two dictionaries (one in INFO FORMAT and one in STAT FORMAT) and combines them into a final dictionary database (described in the previous section). Items from one dictionary that do not appear in the other should be discarded. It returns the combined dictionary. NOTE: All functions below will be expecting a full database, not an info or stats dictionary. pokemon_by_types(db, types): This function accepts a pokemon database db and a list of strings types. It needs to build and return a new database with only pokemon of the given types. NOTE: you must not change the original database with this function. pokemon_by_hp_defense(db, lowest_hp, lowest_defense): Given a pokemon database db and two integers indicating the lowest hit points (hp) and lowest defense stats, this function creates and returns a new database consisting only of pokemon with an hp >= lowest_hp and a defense >= lowest_defense. NOTE: you must not change the original database with this function. get_types(db): Given a database db, this function determines all the pokemon types in the database. It returns the types as a list of strings, asciibatically sorted (in order based on the ASCII character values). The sorted() function or .sort() method can be helpful here. count_by_type(db,type): Given a database db and a single pokemon type (as a string), this function collects and reports three statistics: 1.how many pokemon in db have type as their only type 2.how many dual-type pokemon in db have type as one of their two types 3.a sum of the two values (1 and 2) above A tuple of (single_type_count, dual_type_count, total_count) should be returned. fastest_type(db): Given a database db, determine the type with the highest average speed. Ties are possible, so return a list of types (strings) sorted asciibatically. Hints: The sorted() function or .sort() method can be helpful here, as can get_types() and pokemon_by_types(). legendary_count_of_types(db): Given a database db, for every type in that database, count how many pokemon of that type are legendary. Create a new dictionary to report the counts. It should have one entry for every type in the original database and be structured in the format: type: count_of_legendary. For example, { "Fire": 2, "Ground": 1 }. Hint: get_types() and pokemon_by_types() could be helpful here. team_hp(db, team): Given a database db and a list of pokemon names (as strings) team, find out the total hps of all pokemon on that team and return it as an integer. Assume all pokemon on the team are included in the database. show_of_strength_game(db, team1, team2): Given a database db and two teams (two separate lists of pokemon names), have the teams play out a competition where each pokemon competes against the corresponding pokemon on the other team. Assume all pokemon are listed in the database. A team will win a match if the pokemon on their team has a higher attack stat. If only one team has a pokemon to compete in that slot, that team wins automatically. Return the difference between Team1s score and Team2s score. For example: 5 Team1 = ["Pokemon1", "Pokemon2", "Pokemon3", "Pokemon4"] Team1 = ["Pokemon5", "Pokemon6"] [Team1] [Team2] Pokemon1 Pokemon5 # Pokemon5 has higher attack than Pokemon1, therefore Team2 wins Pokemon2 Pokemon6 # Pokemon2 has higher attack than Pokemon6, therefore Team1 wins Pokemon3 # Team2 ran out of pokemon, so Team1 automatically wins Pokemon4 # Team2 ran out of pokemon, so Team1 automatically wins Team1 wins three times, Team2 wins one time, return 3-1 = 2. strongest_pokemon(db, type = None, generation = None): Given a database of pokemon db, determine the pokemon with the highest total of hp, attack, and defense. If the user decides to restrict the type or generation, provide only the strongest pokemon that also meet those criteria. Since ties are possible, return a list of the strongest pokemon by name, asciibatically sorted. Return None if no pokemon meet the specified requirements. Hint: The sorted() function or .sort() method can be helpful here. Extra Credit top_team_with_best_attackers(db, size=6): Given a database of pokemon db and an integer size, determine the best team that can be made (based on their attack only). The team should always have size members unless the number of pokemon in the original database db is lower than size in which case the team has to include all available pokemon but sorted based on their attack. Break ties asciibetically -- lower asciibetical sorting will come first, e.g. if a Spearow and an Ekans have the same attack power, an Ekans will be chosen before a Spearow. Return the team as a list of pokemon names sorted by the pokemons attack statistic. Grading Rubric Code passes shared tests: 80% (64 tests @ 1.25 pts each == 80pts) Well-documented/submitted: 10% No globals used (just def's): 10% --------------------------------- TOTAL: 100% +5 extra credit Note About Testing You should always test your code not only with the provided testing script, but also by directly calling your functions. If you store sample message strings to variables after all the definitions, you can use them in these interactive calls. This is also how you might test your code out in the visualizer, which we highly recommend. Just be sure to remove them before turning in your work they are globals, which are not allowed in your final submission. Consider the file below on the left, named shouter.py, which you can run as shown below on the right using interactive mode (-i). 6 def shout(msg): print(msg.upper()) mystring1 = "hello" mystring2 = "another one" demo$ python3 i shouter.py >>> shout("i wrote this") 'I WROTE THIS' >>> shout(mystring1) 'HELLO' >>> shout(mystring2) 'ANOTHER ONE' You will not earn test case points if you hard-code the answers. Your program should work for all possible inputs, not just the provided test cases. If you notice that one test case calls something(3), and you write the following to pass that particular test case, you'd be hardcoding. def something(x): if x==3: # hard-coding example return 8 # a memorized answer that avoids calculating the number directly ... Notice how it's not actually calculating, it's merely regurgitating a memorized answer. Doing this for all used test cases might make you feel like you've completed the program, but there are really unlimited numbers of test cases; hardcoded programs only work on the inputs that were hardcoded. Nobody learns, and the program isn't really that useful. When it's a true corner case (often around zero, empty lists, etc), we might need to list a direct answer; this is not hardcoding. Reminders on Turning It In: No work is accepted more than 48 hours after the initial deadline, regardless of token usage. Tokens are automatically applied whenever they are available, based on your last valid submission's time stamp. You can turn in your code as many times as you want; we only grade the last submission that is <=48 hours late. If you are getting perilously close to the deadline, it may be worth it to turn in an "almost-done" version about 30 minutes before the clock strikes midnight. If you don't solve anything substantial at the last moment, you don't need to worry about turning in code that may or may not be runnable, or worry about being late by just an infuriatingly small number of seconds you've already got a good version turned in that you knew worked at least for part of the program. You can (and should) check your submitted files. If you re-visit BlackBoard and navigate to your submission, you can double-check that you actually submitted a file (it's possible to skip that crucial step and turn in a nofiles submission!), you can re-download that file, and then you can re-test that file to make sure you turned in the version you intended to turn in. It is your responsibility to turn in the correct file, on time, to the correct assignment. Use a backup service. Do future you an enormous favor, and keep all of your code in an automatically synced location, such as a Dropbox or Google Drive folder. Each semester someone's computer is lost/drowned/dead, or their USB drive or hard drive fails. Don't give these situations the chance to doom your project work! 7 Examples >>> info_db1 = read_info_file("info_file2.csv") >>> info_db1 {'Crobat': (169, 'Poison', 'Flying', 2, False), 'Charizard': (6, 'Fire', 'Flying', 1, False), 'Reshiram': (643, 'Dragon', 'Fire', 5, True)} >>> stats_db1 = read_stats_file("stats_file2.csv") >>> stats_db1 {6: (78, 84, 78, 100), 146: (90, 100, 90, 90), 643: (100, 120, 100, 90)} >>> db1 = combine_databases(info_db1, stats_db1) >>> db1 {'Charizard': (6, 'Fire', 'Flying', 78, 84, 78, 100, 1, False), 'Reshiram': (643, 'Dragon', 'Fire', 100, 120, 100, 90, 5, True)} >>> db = {'Bulbasaur': (1, 'Grass', 'Poison', 45, 49, 49, 45, 1, False), 'Charizard': (6, 'Fire', 'Flying', 78, 84, 78, 100, 1, False), 'Charmander': (4, 'Fire', None, 39, 52, 43, 65, 1, False), 'Reshiram': (643, 'Dragon', 'Fire', 100, 120, 100, 90, 5, True), 'Tornadus, (Incarnate Form)': (641, 'Flying', None, 79, 115, 70, 111, 5, True)} >>> pokemon_by_types(db,["Fire"]) {'Charizard': (6, 'Fire', 'Flying', 78, 84, 78, 100, 1, False), 'Charmander': (4, 'Fire', None, 39, 52, 43, 65, 1, False), 'Reshiram': (643, 'Dragon', 'Fire', 100, 120, 100, 90, 5, True)} >>> pokemon_by_types(db,["Poison","Dragon"]) {'Bulbasaur': (1, 'Grass', 'Poison', 45, 49, 49, 45, 1, False), 'Reshiram': (643, 'Dragon', 'Fire', 100, 120, 100, 90, 5, True)} >>> pokemon_by_hp_defense(db,30,72) {'Charizard': (6, 'Fire', 'Flying', 78, 84, 78, 100, 1, False), 'Reshiram': (643, 'Dragon', 'Fire', 100, 120, 100, 90, 5, True)} >>> get_types(db) ['Dragon', 'Fire', 'Flying', 'Grass', 'Poison'] >>> count_by_type(db,'Fire') (1, 2, 3) >>> legendary_count_of_types(db) {'Dragon': 1, 'Fire': 1, 'Flying': 1, 'Grass': 0, 'Poison': 0} >>> fastest_type(db) ['Flying'] >>> team_hp(db, ['Charizard','Reshiram','Bulbasaur']) 223 >>> show_of_strength_game(db, ['Reshiram','Bulbasaur'],['Bulbasaur','Chariz ard','Charmander','Tornadus, (Incarnate Form)']) -2 >>> strongest_pokemon(db) ['Reshiram'] >>> strongest_pokemon(db,type="Flying") ['Tornadus, (Incarnate Form)'] >>> strongest_pokemon(db,type="Fire",generation = 1) ['Charizard'] >>> db_ties = {'Charmander': (4, 'Fire', None, 39, 52, 42, 65, 1, False),'Trapinch': (328, 'Ground', None, 45, 100, 66, 10, 3, False),'Charizard': (6, 'Fire', 'Flying', 78, 84, 78, 100, 1, False),'Vibrava': (329, 'Ground', 'Dragon', 50, 70, 50, 70, 3, False),'Charmeleon': (5, 'Fire', None, 58, 64, 58, 80, 1, False),'Camerupt': (323, 'Fire', 'Ground', 70, 100, 70, 40, 3, False),'Drifloon': (425, 'Ghost', 'Flying', 90, 50, 34, 70, 4, False)} >>> strongest_pokemon(db_ties) ['Camerupt', 'Charizard'] >>> top_team_with_best_attackers(db) ['Reshiram', 'Tornadus, (Incarnate Form)', 'Charizard', 'Charmander', 'Bulbasaur'] >>> top_team_with_best_attackers(db_ties) ['Camerupt', 'Trapinch', 'Charizard', 'Vibrava', 'Charmeleon', 'Charmander']

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases Theory And Applications 27th Australasian Database Conference Adc 20 Sydney Nsw September 28 29 20 Proceedings Lncs 9877

Authors: Muhammad Aamir Cheema ,Wenjie Zhang ,Lijun Chang

1st Edition

3319469215, 978-3319469218

More Books

Students also viewed these Databases questions