Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1) (75 pts) The National Science Foundation's Higher Education Research and Development Survey is primary source of information on R&D expenditures at U.S. colleges and

image text in transcribed
image text in transcribed
1) (75 pts) The National Science Foundation's Higher Education Research and Development Survey is primary source of information on R\&D expenditures at U.S. colleges and universities. The survey coll information on R\&D expenditures by field of research and source of funds, as well as information on type research and expenses and headcounts of R\&D personnel. The survey is an annual census of institutions expended at least $150,000 in separately accounted for R\&D in the fiscal year. The data in HERD FY20 gives research expenditures (dollars in thousands) for FY's 2010 - 2020. a) Give summary statistics for all years including all institutions (mean, median, SD, IQR). You will have to make decisions about the observations that have missing data or text indicating their reporting status for that year. (Data cleanup is part of this step. Cells that contain nonnumeric values need to be addressed, empty columns removed.) b) Create boxplots for the top 75 and top 200 universities based upon 2020 spending on R\&D expenditures across higher education institutions for each of the last 5 years and describe any changes that you can recognize in the graphs. Your boxplots should include the mean data point to be used to help interpret the graphs. Translating graphics to meaningful text is important to help readers understand the findings. What information becomes more meaningful about expenditures based upon graphics rather than spreadsheet? c) Analyze and describe the distribution of FY20 R\&D expenditures among institutions. For example, are R&D expenditures uniformly distributed across universities, or do a few universities account for the majority of expenditures? (Calculate \% of institution expenditures compared to total expenditures for the year. How would you describe the spending among all universities?) d) Identify FY2019 - 2020 overall rankings and percentiles of Texas universities (Texas schools appear in red). Which Texas universities moved up in rankings and which moved down from FY2019 to FY2020. (Sort schools based upon 2019 expenditures and assign number based upon rank then compare that with the 2020 rankings.) e) Based on the last five fiscal years, project R\&D expenditures and overall rankings for Texas universiti FY21. (Use the TREND function.) (Display top 15 schools graphically using a timeline chart.) 2) (75 pts) The original Wisconsin Breast Cancer data consists of attribute information on fine-needle aspirated breast tissue from 569 women (this is the student sample file that is included with the file to be used for analysis). You will use sample file that contains 470 observations for your analysis. Information on each sample includes Column A: Patient ID number Column B: Diagnosis ( M= malignant, B= benign ) Columns C-AF: Summary statistics on ten real-valued features computed from each cell nucleus in a sample a) radius (mean of distances from center to points on the perimeter) b) texture (standard deviation of gray-scale values) c) perimeter d) area e) smoothness (local variation in radius lengths) f) compactness (perimeter 2/ area 1.0 ) g) concavity (severity of concave portions of the contour) h) concave points (number of concave portions of the contour) i) symmetry j) fractal dimension ("coastline approximation") For each feature, the mean, standard deviation, and "worst" or largest (mean of the three largest values) were computed for each sample, resulting in 30 measurements. For example, column C is Mean Radius (of cells measured in biopsy), column M is Radius SD (of cells measured in biopsy), column W is Worst Radius (worst cell radius measured in biopsy). a) Give summary statistics (mean, median, SD, range, IQR) for each attribute feature overall and with respect to diagnosis (summary stats for benign and summary stats for malignant). b) Construct parallel boxplots for each attribute feature, comparing differences between diagnoses. Ca you identify, simply by a visual comparison of the box plots, any variables that have very different measurements for benign cells compared to malignant cells? c) From the summary statistics and boxplots, suggest (and justify) guidelines for automating the diagnosis using the attribute features

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Complete Guide To Operational Auditing 1995 Supplement

Authors: Harry R. Reider

1st Edition

0471102547, 978-0471102540

More Books

Students also viewed these Accounting questions