Question

1 Approved Answer

Posted on Sep 03, 2024

Program Description Program 3: Trees & Neighborhoods. Due 10am, Wednesday, 15 February. Learning Objective: to successfully filter formatted data using standard Pandas operations for selecting

image text in transcribed

Program Description Program 3: Trees \& Neighborhoods. Due 10am, Wednesday, 15 February. Learning Objective: to successfully filter formatted data using standard Pandas operations for selecting and joining data and evaluate simple (constant) models using loss functions. Available Libraries: Pandas and core Python 3.6+. Data Sources: The New York City Street TreesCount Project, Neigborhood Tabulation Areas. Sample Datasets: - Census Demographics for Neighborhood Tabulation Areas - Tree Census: - 2015: https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh - 2005: https://data.cityofnewyork.us/Environment/2005-Street-Tree-Census/29bw-z7pj - 1995: https://data.cityofnewyork.us/Environment/1995-Street-Tree-Census/kyad-zm4j Is there a neighborhood in New York City with more trees than people? for Manhattan: 'MN99' and name 'park-cemetery-etc-Manhattan'. model the tree population using two common loss functions. The assignment is broken into the following functions to allow for unit testing: - clean_df(der year - 2015): This function takes two inpuls: c df: the name of a DataFrame containing TreesCount Data from OpenData NYC.. a year: the year of the dalk sel. There are three possible years 1995, 2005, or 2015. The defaull value is 2015. The function does the following: a If the specified year is 2115 , the function should take dt and drop all columns exeept: I'tree_db', 'liealth', 'spe_latin', 'spe_conmon', 'nta', 'latitucle', 'lonqitude'l If the specified year is 2005 , the function stould take df and drop all culurms except: ['trce_dbh', 'etatus', 'spc_latio', 'spe_common', 'nts', 'latitude', 'longitude'] and rename the corresponding columns that difter from 2015 to the 2015 names. For example, status is renamed to health. o If the speciitied year is 1995 , the function should take df and drop all columns exeept: [ diameter', 'conettion', 'apc_1at1n', 'ap_comon', 'nta_2010', '1atitude', '1ongttucle'] and rename the corresponding columns that differ from 2015 to the 2015 names. For example, alameter is renamed to tree_dbb. Irregardless of the specified year, the function should retum the resulting DataFrame. Hint: This is slightly different than the function from Program 2 in that different columns are dropped. - make_nta_df(file_name\}: This function takes one input: file_name: the name of a CSV file containing population and names for neighburhood tabulation areas (NYC OpenData NTA Demographics). population (laboled as population). - count_py_area(ar): This function takes one inpurs: o df: a DataFrame that includes the nta column. - neifbborbood_trees(tree_df, ata_df): This function takes two inputs: = tree_df: a DataFiane containing the column nta ota_af: a DataFrame with two columns, 'NTACode' and 'NTAName'. following order: 0 ata 6 num t.ress V ata_name - population : trees_per_cepita: this is a newly calculated columb, calculated by dividiag the number of tres by the population in each neighborhood. - compute_sumary_state df,col : 'This function takes two inputs: dr: a DataFrame containing a column col. - col: the name of a numeric-valued col in the DataFrame. - nse_loge(tleta, Y_vale) : : This function takes two inputs: : theta: a numeric value. Y_vals: a Series contiining numeric values. assignment and your function should compute MSE without using numpy. - nae_lose (theta, Y_vale) t: This function takes two inputs: thet.s: a numeric value. Y_ Yvals: a Series contuining numeric Yalues. assignment and your function should compute MAE without using numpy. - test_mse(loss_fnc-mec_loss): This test function takes one input: 1088_fne: a function that takes in two input parameters (a numeric value and a Series of numeric values) and returns a numeric value. It has a default value of miseloss. This is a test function, used to test whether the loss_fne returning True if the loss_fnc performs correctly (e.g. computes Mean Squared Error) and ralee otherwise. Let's run through some testing code to check if your program is written correctly. For example, let's set up a DataFrame using the Tree Census restricted to Staten Island: df_si = pd.read_csv ('trees_si_2015.csv') df_si=clean_df (df_si) print(df_si) will print: There are 105,318 trees recorded on Staten Island, and we have kept their diameter, health, species, NTA, and latitude and longitude. Next, we'll make a DataFrame with the demographic information organized by neighborhood: \[ \begin{array}{l} \text { nta_df }=\text { make_nta_df( } \text { ( Census_Demographics_NTA.csv') } \\ \operatorname{print}(\mathrm{nta} d f) \end{array} \] will print: Using the counts_by_area function: \[ \begin{array}{l} \text { df_si_counts }=\text { count_by_area (df_si) } \\ \text { print(df_si_counts) } \end{array} \] will print a row for each neighborhood in Staten Island: Combining the two Datalirimes: \[ \begin{array}{l} \text { df = neighborhood_tresa }\left[d f \_s i \_c o t n t s, n t a \_d f ight] \\ \text { print }(d t) \end{array} \] will print: of the resulting DataFrame. Plotting the results: import matplotlib.pyplot a plt. import ababorn a ans.nisteplot.(dt [ 'trear per capita'], bing=5) p1t. ahori would give the plot: We can summary statistics for trees per capita on Staten Island: What we have built here is a tester function, not unlike the ones used to grade assignments in Gradescope Autograder. To test if our test function is working as expected, try the following: $ \begin{array}{ll}\text { print(f'Testing mse_loss: } & \text { test_mse(mse_loss) }\}^{\prime} \text { ) } \\ \text { print(f'Testing mae_loss: } & \left.\text { \{test_mse(mae_loss) }\}^{\prime} ight)\end{array} $ will print: \[ \begin{array}{ll} \text { Testing mse_loss: } & \text { True } \\ \text { Testing mae_loss: } & \text { False } \end{array} \] Hints: HackerRank or codio, the autograder will crash since those are not available. functions, either comment the code out before submitting or use a main function that is conditionally executed (see Think CS: Section 6.8 for details)