R&D Expenses (introduced in Chapter 19) This data file contains a variety of accounting and financial values

Question:

R&D Expenses (introduced in Chapter 19) This data file contains a variety of accounting and financial values that describe companies operating in the information and professional services sectors of the economy. One column gives the expenses on research and development (R&D), and another gives the total assets of the companies. Both of these columns are reported in millions of dollars. This data table expands previous versions (introduced in Chapter 19) by adding data for professional services. To estimate regression models, we need to transform both expenses and assets to a log scale.
(a) Plot the log of R&D expenses on the log of assets for both sectors together in one scatterplot. Use color-coding or distinct symbols to distinguish the groups. Does it appear that the relationship is different in these two sectors or can you capture the association with a single simple regression? A common question asked when fitting models to subsets is “Do the equations for the two groups differ from each other?” For example, does the equation for the information sector differ from the equation for professional services? We’ve been answering this question informally, using the t- statistics for the slopes of the dummy variable and interaction. There’s just one small problem: We’re using two tests to answer one question. What’s the chance for a false-positive error? If you’ve got one question, better to use one test. To see if there’s any difference, we can use a variation on the F-test for R2. The idea is to test both slopes at once rather than separately. The method uses the change in the size of R2. If the R2 of the model increases by a statistically significant amount when we add both the dummy variable and interaction to the model, then something changed and the model is different. The form of this incremental, or partial,
F-test is
F = Change in R2 / number of added slopes / (1 – R2full) / (n – kfull – 1)
In this formula, kfull denotes the number of variables in the model with the extra features, including dummy variables and interactions. R2full is the R2 for that model. As usual, a big value for this F-statistic is 4.
(b) Add a dummy variable (coded as 0 for information companies and 1 for those in professional services) and its interaction with Log Assets to the model. Does the fit of this model meet the conditions for the MRM? Comment on the consequences of any problem that you identify.
(c) Assuming that the model meets the conditions for the MRM, use the incremental F-test to assess the size of the change in R2. Does the test agree with your visual impression? (The value of kfull for the model with dummy and interaction is 3, with - slopes added. You will need to fit the simple regression of Log R&D Expenses on Log Assets to get the R2 from this model.)
(d) Summarize the fit of the model that best captures what is happening in these two sectors.