Question

1 Approved Answer

Posted on Oct 10, 2024

INSTRUCTIONS : For this assignment you have to execute and interpret one regression model. The objective is to determine how much will be saved if

INSTRUCTIONS : For this assignment you have to execute and interpret one regression model. The objective is to determine how much will be saved if 10% of complicated appendicitis cases were prevented, meaning they were instead "non-complicated." The dependent variable is defined as the total charge associated with the inpatient episode (TCHGS). The "treatment variable" of interest is complicated versus uncomplicated appendicitis. Pay special attention to it when discussing the results and provide an explanation for the outcomes between the models. Is it statistically significant? Why is it positive or negative and does this make sense? In addition to the main variable of interest, don't forget to discuss (very briefly) the other explanatory variables. In the model, in addition to the treatment variable, control for the influence of the following confounding variables (specifications are discussed in the short video): age, race, gender, ethnicity, insurance status, and severity. In your results section, provide both (a) the difference in total cost per case, e.g. complicated vs. non-complicated, and (b) the total savings is 10% of complicated cases were instead treated as non-complicated. To figure out how many cases are complicated versus non-complicated, you can use a frequency statement as shown below. 2. Which data set to use? Appendicitis.SAS7BDat. 3. Turn in a structured abstract (no more than 300 words, not including the title and section headers, so 307 in total) with the format shown below. a. A point will be deducted if you exceed the word limit. Adhering to the word limit is essential as it forces you to get to the point; at the same time you have to be careful not to omit anything of importance. i. Do not include tables or graphs in the abstract (that would go in the main text or presentation - if you were to do one)! Determine which values from your test(s) are most important to the reader and report those in the abstract. ii. Since the word limit applies to everyone equally, no exceptions will be made!!! 4. What is important or relevant information is by definition, at least partially, subjective. However, there are certain rules that have to be followed. i. For example, in the results section you should definitely discuss the significance of your test-statistic (at minimum you should provide the p-value associated with the treatment variable). If you do not, at least one point will be deducted (yes, this is a big oversight). Given the word limit, you may want to group the other variables and report as such. ii. In the methods section, provide a rationalization for using the specific method. 5. Special importance is attached to the discussion section of the abstract (see below). This is where you have to apply your creativity as a story teller. This is where you help make information out of data as it involves the reader. Try to convey why the hypothesis and the analysis are important (if you personally don't care, ask yourself why anyone would). Imagine how and what decisions could be affected by the analysis and findings and communicate this to your intended audience. Simple repetitive statements already expressed elsewhere in the abstract will result in deductions, so do not just repeat the conclusions! Abstract Objective: This contains a brief statement describing the general objective of the analysis. For example "The objective of this analysis is to examine the impact of ..." Side note: each independent variable included in the model is associated with a hypothesis. For example, a hypothesis (in alternative form) is that Hispanics have a lower percentage of charge associated with labs because (you fill in the blank). Given your word limit, you should just focus on the treatment variable. Data and methods: This contains a brief description of the particular dataset used to test the hypothesis. For example, concerning the dataset briefly mention the unit of measurement, whether the data is publicly available, and what types of variables it includes. You should also discuss what you are controlling for (e.g. age). Results: This contains a brief layman's description of the least squares results and whether or not each variable (or groups of were) was significant. Conclusions: A very brief statement indicating your conclusions pertaining to the hypothesis - did your results reject the null hypothesis or not? Discussion: This is perhaps the most important section. Here you tell the story of why you believe the analysis was important - why is it "information" and not just "data"? Why would anyone within related professional or policy circles find your analysis and results interesting? How could your analysis help such individuals in terms of decision making? You may want to select one of the listed independent variables and assume that it is your treatment variable in an experiment. This will help focus your abstract. As part of the discussion, indicate whether you would have liked to test the influence of additional variables if they had been available .

*************************Data output of SAS FOR LINEAR REGRESSION **************************

Dependent Variable is Operating room Total charges

The REG Procedure

Model: MODEL1

Dependent Variable: TCHGS TCHGS

Number of Observations Read	47460
Number of Observations Used	47460

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	F Value	Pr>F
Model	8	1.842725E13	2.303406E12	2473.72	<.0001
Error	47451	4.418411E13	931152403
Corrected Total	47459	6.261136E13

Root MSE	30515	R-Square	0.2943
Dependent Mean	50896	Adj R-Sq	0.2942
Coeff Var	59.95464

Parameter Estimates
Variable	Label	DF	Parameter Estimate	Standard Error	tValue	Pr>\|t\|
Intercept	Intercept	1	271956	2177.14211	124.91	<.0001
AGE	AGE	1	91.20939	7.26987	12.55	<.0001
black		1	5684.95352	470.91645	12.07	<.0001
otherNW		1	6813.95711	425.26911	16.02	<.0001
female		1	311.73308	284.31449	1.10	0.2729
hispanic		1	2457.95591	342.49490	7.18	<.0001
uninsured		1	1136.03849	366.21885	3.10	0.0019
complicated	complicated	1	11983	313.34904	38.24	<.0001
SeverityOverall	SeverityOverall	1	-237142	2110.40017	-112.37	<.0001

Logistic regression with Dependent variable =Emergency

The LOGISTIC Procedure

Model Information
Data Set	WORK.APPEND
Response Variable	postOperative	postOperative
Number of Response Levels	2
Model	binary logit
Optimization Technique	Fisher's scoring

Number of Observations Read	28736
Number of Observations Used	28736

Response Profile
Ordered Value	postOperative	Total Frequency
1	1	1848
2	0	26888

Probability modeled is postOperative='1'.

Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics
Criterion	Intercept Only	Intercept and Covariates
AIC	13718.520	10969.309
SC	13726.786	11043.702
-2 Log L	13716.520	10951.309

Testing Global Null Hypothesis: BETA=0
Test	Chi-Square	DF	Pr>ChiSq
Likelihood Ratio	2765.2111	8	<.0001
Score	4864.3455	8	<.0001
Wald	1982.3999	8	<.0001

Analysis of Maximum Likelihood Estimates
Parameter	DF	Estimate	Standard Error	Wald Chi-Square	Pr>ChiSq
Intercept	1	7.5077	0.3847	380.8706	<.0001
AGE	1	0.0101	0.00227	19.9878	<.0001
black	1	0.1945	0.0808	5.7930	0.0161
otherNW	1	-0.1207	0.0913	1.7481	0.1861
female	1	-0.4945	0.0558	78.4029	<.0001
hispanic	1	-0.1497	0.0716	4.3635	0.0367
uninsured	1	-0.0698	0.0639	1.1918	0.2750
SeverityOverall	1	-11.3226	0.3626	975.2230	<.0001
complicated	1	1.1893	0.0551	466.5174	<.0001

Odds Ratio Estimates
Effect	Point Estimate	95% Wald Confidence Limits
AGE	1.010	1.006	1.015
black	1.215	1.037	1.423
otherNW	0.886	0.741	1.060
female	0.610	0.547	0.680
hispanic	0.861	0.748	0.991
uninsured	0.933	0.823	1.057
SeverityOverall	<0.001	<0.001	<0.001
complicated	3.285	2.949	3.659