Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

School District Funding Gaps 3. (15 points) In this question, you'll work with data on school funding provided by the School Finance Indicators Database.

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed

School District Funding Gaps 3. (15 points) In this question, you'll work with data on school funding provided by the School Finance Indicators Database. The dataset contains information on each school district in the US, including student demographics, district spending per student, test score outcomes, and more. You'll work with the following three columns: state_name fundinggap, the difference in how much the district should spend on each student and the amount it actually spends per student. Negative values indicate insufficient spending. You can find more information about the data at the SFID website. For ease of visualization, we'll limit our analysis to the following five states: California, the District of Columbia, Nevada, Oregon, and Texas. The file dcd.csv contains the dataset we will be using. You should use the provided q3.ipynb file to get started, which has cells with some useful variables already defined, and a hint about how to use fancy indexing. This notebook is not comprehensive: it only has some starter code and useful functions for this question. (a) (1 point) Visualize the funding gap for all districts in the five states above. In two sentences or less, describe any differences you see between the data from larger states (California and Texas) and smaller ones (Nevada and DC). (b) (3 points) We'll use a hierarchical model to help us understand state-level averages in the funding gap: each state will have a state-level mean r with common mean a, and for each district j in state i, the funding gap yij will be normally distributed with mean . We'll assume the variances are known, so the model can be written as: Hi ~ Normal(a, 0) Yij ~ Normal(i, o) Draw a graphical model for this setup. (c) (4 points) Implement the model from part (b) in PyMC, using a = $700, 0 = = $4000. Using the plot_state_posterior_means function provided for you in the notebook, visualize the posterior distributions for the means of each of the five states. For which state(s)/district(s) is the posterior mean the most certain? For which state/district is it least certain? Explain why. (d) (2 points) Re-run your model from the previous part, changing only one variable at a time as follows: (i) a = $700,00 = $4000, = $400 (ii) a = = $700,00 = $400, = $4000 (iii) = -$700,0 = $4000, = = $4000 What changes in each of the three cases, and why? Hint: you can answer this question by focusing on the changes in the mean for the District of Columbia. (e) (2 points) Suppose we had treated a as a normal random variable with mean y and standard deviation . Draw a graphical model for this new model. Hint: the answer should only require a small change from your answer to part (b). (f) (3 points) Implement the model from the previous part in PyMC, using y = 0, = 4000, and A 10000. Using your samples, compute the posterior variance of the mean for the District of Columbia (DC), var(DC\y)., and the posterior variance of the mean for California, var(cA|y). = = import numpy as np import pandas as pd matplotlib inline import matplotlib.pyplot as plt import seaborn as sns sns.set() import pymc as pm import arviz as az Question 3 This notebook contains some starter code for Question 3 of Homework 2. This is not the same as the skeleton code that you see in the lab assignments. You may (and should) add cells of your own code as you work through Question 3. # Your code to load the data goes here dcd_full # Do not change this line states_to_use = ('California', 'District of Columbia', 'Nevada', 'Oregon', 'Texas') # Your code to filter down to the states above goes here dcd = ... # This line gives the state index corresponding to each row state_indices_full = dcd_full['state_name'].astype('category').cat.codes state indices = dcd['state_name'].astype('category').cat.codes # This cell shows you a very simple example of "fancy indexing" array_of_values = np.array([42, 672, 9001]) array_of_indices = np.array([1, 1, 0, 2, 0, 0]) array_of_values [array_of_indices] + # This cell shows you an example of how to use "fancy indexing" with the state indices (above) # to take an array with one item per state and get the corresponding item for each district # You don't have to fill anything in here, but understanding it will help you later! # An array with the population of each of our five states of interest (CA, DC, NV, OR, TX) state_population_2020 = np.array([39538223, 689545, 3104614, 4237256, 29145505]) state_pop_for_county = state_population_2020 [state_indices] dcd.loc[:, 'state_pop'] = state_pop_for_county # The state_pop column now contains the population for each district's state dcd.loc[:, ['state_name', 'district', 'fundinggap', 'state_pop']] # You don't have to change def plot_state_posterior_means (trace, state_names, **kwargs): """Shows distribution of posterior means from a PyMC trace. Args: ||||||| trace: the result of pm. sample(...). Assumes the state-level means have been called 'mu' state_names: a list or array of state names **kwargs: any extra arguments are passed in to sns.histplot num_state = len (state_names) mu_array = trace.posterior ['mu'].values.reshape(-1, num_states) means_wide = means_long = pd.DataFrame(mu_array, columns=state_names) pd.melt (means_wide, var_name='State', value_name='Posterior mean') sns.histplot(means_long, x='Posterior mean', hue='State', bins=np. linspace (-5000, 4000, 500)) : # If implemented correctly, your solutions to 3(d), 3(e), 3(g), and 3(h) shouldn't take more than one minute to run. # Hint: your solution should use the state_indices array defined earlier! num_states = 5 with ...: # Your code for the random variables goes here # Don't change this line trace_d = pm. sample(5, chains=2, tune=100, return_inferencedata=True) Hint: the rows of the mu array in the posterior correspond to the five states in the states_to_use array, in the same order. plot_state_posterior_means (trace_d, states_to_use)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Hard Decisions with decision tools

Authors: Robert Clemen, Terence Reilly

3rd edition

538797576, 978-0538797573

More Books

Students also viewed these Mathematics questions