Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 15, 2024

School District Funding Gaps 3. (15 points) In this question, you'll work with data on school funding provided by the School Finance Indicators Database.

School District Funding Gaps 3. (15 points) In this question, you'll work with data on school funding provided by the School Finance Indicators Database. The dataset contains information on each school district in the US, including student demographics, district spending per student, test score outcomes, and more. You'll work with the following three columns: state_name fundinggap, the difference in how much the district should spend on each student and the amount it actually spends per student. Negative values indicate insufficient spending. You can find more information about the data at the SFID website. For ease of visualization, we'll limit our analysis to the following five states: California, the District of Columbia, Nevada, Oregon, and Texas. The file dcd.csv contains the dataset we will be using. You should use the provided q3.ipynb file to get started, which has cells with some useful variables already defined, and a hint about how to use fancy indexing. This notebook is not comprehensive: it only has some starter code and useful functions for this question. (a) (1 point) Visualize the funding gap for all districts in the five states above. In two sentences or less, describe any differences you see between the data from larger states (California and Texas) and smaller ones (Nevada and DC). (b) (3 points) We'll use a hierarchical model to help us understand state-level averages in the funding gap: each state will have a state-level mean r with common mean a, and for each district j in state i, the funding gap yij will be normally distributed with mean . We'll assume the variances are known, so the model can be written as: Hi ~ Normal(a, 0) Yij ~ Normal(i, o) Draw a graphical model for this setup. (c) (4 points) Implement the model from part (b) in PyMC, using a = $700, 0 = = $4000. Using the plot_state_posterior_means function provided for you in the notebook, visualize the posterior distributions for the means of each of the five states. For which state(s)/district(s) is the posterior mean the most certain? For which state/district is it least certain? Explain why. (d) (2 points) Re-run your model from the previous part, changing only one variable at a time as follows: (i) a = $700,00 = $4000, = $400 (ii) a = = $700,00 = $400, = $4000 (iii) = -$700,0 = $4000, = = $4000 What changes in each of the three cases, and why? Hint: you can answer this question by focusing on the changes in the mean for the District of Columbia. (e) (2 points) Suppose we had treated a as a normal random variable with mean y and standard deviation . Draw a graphical model for this new model. Hint: the answer should only require a small change from your answer to part (b). (f) (3 points) Implement the model from the previous part in PyMC, using y = 0, = 4000, and A 10000. Using your samples, compute the posterior variance of the mean for the District of Columbia (DC), var(DC\y)., and the posterior variance of the mean for California, var(cA|y). = = import numpy as np import pandas as pd matplotlib inline import matplotlib.pyplot as plt import seaborn as sns sns.set() import pymc as pm import arviz as az Question 3 This notebook contains some starter code for Question 3 of Homework 2. This is not the same as the skeleton code that you see in the lab assignments. You may (and should) add cells of your own code as you work through Question 3. # Your code to load the data goes here dcd_full # Do not change this line states_to_use = ('California', 'District of Columbia', 'Nevada', 'Oregon', 'Texas') # Your code to filter down to the states above goes here dcd = ... # This line gives the state index corresponding to each row state_indices_full = dcd_full['state_name'].astype('category').cat.codes state indices = dcd['state_name'].astype('category').cat.codes # This cell shows you a very simple example of "fancy indexing" array_of_values = np.array([42, 672, 9001]) array_of_indices = np.array([1, 1, 0, 2, 0, 0]) array_of_values [array_of_indices] + # This cell shows you an example of how to use "fancy indexing" with the state indices (above) # to take an array with one item per state and get the corresponding item for each district # You don't have to fill anything in here, but understanding it will help you later! # An array with the population of each of our five states of interest (CA, DC, NV, OR, TX) state_population_2020 = np.array([39538223, 689545, 3104614, 4237256, 29145505]) state_pop_for_county = state_population_2020 [state_indices] dcd.loc[:, 'state_pop'] = state_pop_for_county # The state_pop column now contains the population for each district's state dcd.loc[:, ['state_name', 'district', 'fundinggap', 'state_pop']] # You don't have to change def plot_state_posterior_means (trace, state_names, kwargs): """Shows distribution of posterior means from a PyMC trace. Args: ||||||| trace: the result of pm. sample(...). Assumes the state-level means have been called 'mu' state_names: a list or array of state names kwargs: any extra arguments are passed in to sns.histplot num_state = len (state_names) mu_array = trace.posterior ['mu'].values.reshape(-1, num_states) means_wide = means_long = pd.DataFrame(mu_array, columns=state_names) pd.melt (means_wide, var_name='State', value_name='Posterior mean') sns.histplot(means_long, x='Posterior mean', hue='State', bins=np. linspace (-5000, 4000, 500)) : # If implemented correctly, your solutions to 3(d), 3(e), 3(g), and 3(h) shouldn't take more than one minute to run. # Hint: your solution should use the state_indices array defined earlier! num_states = 5 with ...: # Your code for the random variables goes here # Don't change this line trace_d = pm. sample(5, chains=2, tune=100, return_inferencedata=True) Hint: the rows of the mu array in the posterior correspond to the five states in the states_to_use array, in the same order. plot_state_posterior_means (trace_d, states_to_use)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Hard Decisions with decision tools

Authors: Robert Clemen, Terence Reilly

3rd edition

538797576, 978-0538797573

More Books

Students also viewed these Mathematics questions

Question

A corporate entity must make disclosures in the notes of its financial statements. This will assist with evaluating the uncertain tax provision. All entities are required to disclose the total...

Answered: 1 week ago

Question

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Answered: 1 week ago

Question

The new line character is utilized solely as the last person in each message. On association with the server, a client can possibly (I) question the situation with a client by sending the client's...

Answered: 1 week ago

Question

★★★★★

Data obtained from asking the wrong questions at the wrong time or in the wrong place can lead to misleading summary statistics. Explain why the following collection procedures are likely to produce...

Answered: 1 week ago

Question

★★★★★

Why is accident running considered to be so sleazy? Is it really as bad as the press makes it out to be?

Answered: 1 week ago

Question

★★★★★

Using the data in Exercises 54 and 55, what is the probability that a random sample of 1500 families from Nova Scotia contains more loneparent families than a random sample of 1500 families from...

Answered: 1 week ago

Question

★★★★★

Describe the types of evaluations a forensic psychologist may be asked to perform.

Answered: 1 week ago

Question

★★★★★

A researcher plans to conduct an experiment evaluating the effect of a treatment. A sample of n = 9 participants is selected and each person receives the treatment before being tested on a...

Answered: 1 week ago

Question

★★★★★

The Titan Enterprises Company manufactures cleaning spray for public schools. During 2020, the company spent $800,000 on prime costs and $900,000 on conversion costs Overhead is allocated at a rate...

Answered: 1 week ago

Question

★★★★★

At January 1, 20:8. Caf Med leased restaurant equipment from Crescent Corporation under a nine-year lease agreement. The lease agreement specifies annual payments of $26.000 beginning January 1,...

Answered: 1 week ago

Question

★★★★★

Two employees are arguing loudly in the hallway about their working relationship. One accuses the other of setting them up for failure and looking bad in front of their supervisor. Tensions escalate...

Answered: 1 week ago

Question

★★★★★

Pizza Land operates a pizza franchise in LA. The owner has provided the following budgeted data for next year. Revenue $11,838,000 Fixed Costs $3,198,000 Variable Costs (depends on the # of pizzas...

Answered: 1 week ago

Question

★★★★★

An executive coach started working with employees at Agency ???? ???? one month ago, and since then, Agency X ' " id="MathJax-Element-2-Frame" role="presentation" style="font-size: 121%; position:...

Answered: 1 week ago

Question

★★★★★

Which of the following statements IS NOT true for a stone attached to a string and swung in a uniform vertical circular motion? 1. The magnitude of resultant force acting on the stone changes...

Answered: 1 week ago

Question

★★★★★

Identify an example of a vision statement for an automobile manufacturing company. Multiple choice question. To offer the best luxury cars in the country To increase the number of workers in the...

Answered: 1 week ago

Question

★★★★★

Pierre's Hair Salon is considering opening a new location in French Lick, California. The cost of building a new salon is $286,000. A new salon will normally generate annual revenues of $70,360, with...

Answered: 1 week ago

Question

★★★★★

How many years will it take a $700 balance to grow into $900 in an account earning 5%?

Answered: 1 week ago

Question

★★★★★

=+9.19 The report 2005 Electronic Monitoring & Surveillance Survey: Many Companies Monitoring, Recording, Videotapingand FiringEmployees (American Management Association, 2005) summarized the results...

Answered: 1 week ago

Question

★★★★★

=+b. Construct a 95% confidence interval for the proportion of U.S. adults for whom math was the least favorite subject.

Answered: 1 week ago

Question

★★★★★

=+9.17 The article CSI Effect Has Juries Wanting More Evidence (USA Today, August 5, 2004) examines how the popularity of crime-scene investigation television shows is influencing jurors...

Answered: 1 week ago

Previous Question Next Question