Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Program Description Program 3: Trees & Neighborhoods. Due 10am, Wednesday, 15 February. Learning Objective: to successfully filter formatted data using standard Pandas operations for selecting

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed

Program Description Program 3: Trees \& Neighborhoods. Due 10am, Wednesday, 15 February. Learning Objective: to successfully filter formatted data using standard Pandas operations for selecting and joining data and evaluate simple (constant) models using loss functions. Available Libraries: Pandas and core Python 3.6+. Data Sources: The New York City Street TreesCount Project, Neigborhood Tabulation Areas. Sample Datasets: - Census Demographics for Neighborhood Tabulation Areas - Tree Census: - 2015: https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh - 2005: https://data.cityofnewyork.us/Environment/2005-Street-Tree-Census/29bw-z7pj - 1995: https://data.cityofnewyork.us/Environment/1995-Street-Tree-Census/kyad-zm4j Is there a neighborhood in New York City with more trees than people? for Manhattan: 'MN99' and name 'park-cemetery-etc-Manhattan'. model the tree population using two common loss functions. The assignment is broken into the following functions to allow for unit testing: - clean_df(der year - 2015): This function takes two inpuls: c df: the name of a DataFrame containing TreesCount Data from OpenData NYC.. a year: the year of the dalk sel. There are three possible years 1995, 2005, or 2015. The defaull value is 2015. The function does the following: a If the specified year is 2115 , the function should take dt and drop all columns exeept: I'tree_db', 'liealth', 'spe_latin', 'spe_conmon', 'nta', 'latitucle', 'lonqitude'l If the specified year is 2005 , the function stould take df and drop all culurms except: ['trce_dbh', 'etatus', 'spc_latio', 'spe_common', 'nts', 'latitude', 'longitude'] and rename the corresponding columns that difter from 2015 to the 2015 names. For example, status is renamed to health. o If the speciitied year is 1995 , the function should take df and drop all columns exeept: [ diameter', 'conettion', 'apc_1at1n', 'ap_comon', 'nta_2010', '1atitude', '1ongttucle'] and rename the corresponding columns that differ from 2015 to the 2015 names. For example, alameter is renamed to tree_dbb. Irregardless of the specified year, the function should retum the resulting DataFrame. Hint: This is slightly different than the function from Program 2 in that different columns are dropped. - make_nta_df(file_name\}: This function takes one input: file_name: the name of a CSV file containing population and names for neighburhood tabulation areas (NYC OpenData NTA Demographics). population (laboled as population). - count_py_area(ar): This function takes one inpurs: o df: a DataFrame that includes the nta column. - neifbborbood_trees(tree_df, ata_df): This function takes two inputs: = tree_df: a DataFiane containing the column nta ota_af: a DataFrame with two columns, 'NTACode' and 'NTAName'. following order: 0 ata 6 num t.ress V ata_name - population : trees_per_cepita: this is a newly calculated columb, calculated by dividiag the number of tres by the population in each neighborhood. - compute_sumary_state df,col : 'This function takes two inputs: dr: a DataFrame containing a column col. - col: the name of a numeric-valued col in the DataFrame. - nse_loge(tleta, Y_vale) : : This function takes two inputs: : theta: a numeric value. Y_vals: a Series contiining numeric values. assignment and your function should compute MSE without using numpy. - nae_lose (theta, Y_vale) t: This function takes two inputs: thet.s: a numeric value. Y_ Yvals: a Series contuining numeric Yalues. assignment and your function should compute MAE without using numpy. - test_mse(loss_fnc-mec_loss): This test function takes one input: 1088_fne: a function that takes in two input parameters (a numeric value and a Series of numeric values) and returns a numeric value. It has a default value of miseloss. This is a test function, used to test whether the loss_fne returning True if the loss_fnc performs correctly (e.g. computes Mean Squared Error) and ralee otherwise. Let's run through some testing code to check if your program is written correctly. For example, let's set up a DataFrame using the Tree Census restricted to Staten Island: df_si = pd.read_csv ('trees_si_2015.csv') df_si=clean_df (df_si) print(df_si) will print: There are 105,318 trees recorded on Staten Island, and we have kept their diameter, health, species, NTA, and latitude and longitude. Next, we'll make a DataFrame with the demographic information organized by neighborhood: \[ \begin{array}{l} \text { nta_df }=\text { make_nta_df( } \text { ( Census_Demographics_NTA.csv') } \\ \operatorname{print}(\mathrm{nta} d f) \end{array} \] will print: Using the counts_by_area function: \[ \begin{array}{l} \text { df_si_counts }=\text { count_by_area (df_si) } \\ \text { print(df_si_counts) } \end{array} \] will print a row for each neighborhood in Staten Island: Combining the two Datalirimes: \[ \begin{array}{l} \text { df = neighborhood_tresa }\left[d f \_s i \_c o t n t s, n t a \_d f ight] \\ \text { print }(d t) \end{array} \] will print: of the resulting DataFrame. Plotting the results: import matplotlib.pyplot a plt. import ababorn a ans.nisteplot.(dt [ 'trear per capita'], bing=5) p1t. ahori would give the plot: We can summary statistics for trees per capita on Staten Island: What we have built here is a tester function, not unlike the ones used to grade assignments in Gradescope Autograder. To test if our test function is working as expected, try the following: \( \begin{array}{ll}\text { print(f'Testing mse_loss: } & \text { test_mse(mse_loss) }\}^{\prime} \text { ) } \\ \text { print(f'Testing mae_loss: } & \left.\text { \{test_mse(mae_loss) }\}^{\prime} ight)\end{array} \) will print: \[ \begin{array}{ll} \text { Testing mse_loss: } & \text { True } \\ \text { Testing mae_loss: } & \text { False } \end{array} \] Hints: HackerRank or codio, the autograder will crash since those are not available. functions, either comment the code out before submitting or use a main function that is conditionally executed (see Think CS: Section 6.8 for details)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions