Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 19, 2024

Complete using R: Part 1 1. Regression assignment. The goal of this assignment is to look at the money spent on marketing via facebook, youtube

Complete using R:

Part 1

1. Regression assignment. The goal of this assignment is to look at the money spent on marketing via facebook, youtube and newspaper and its effect on sales.

a. Open the RegressionAssignment2.r file. Check the libraries in the file and make sure to install any packages you may need.

b. The data file you will be using is "marketing". It is included in the package called datarium. So install that package as well

c. The rest of the file is similar to what we did in the regression module, running the r program and looking at the relationship between facebook marketing $ and impact on sales.

d. Run the example file and answer the following:

e. Given marketing through facebook and sales, which is the dependent variable, which is the independent variable?

f. State the null hypothesis and alternate hypothesis as we did in the income vs. happiness example. Remember null hypothesis is always that there will be no effect of one variable on the other, and the alternate hypothesis is that there will be an influence of one variable on another.

g. Run the example and interpret the results referencing the r squared value and p value. What do they tell you about the accuracy of your regression model?

Repeat steps a to g by changing facebook to youtube

Repeat steps a to g by changing youtube to newspaper

Summarize your findings and state which of the three media of marketing results in the best sales? Look at the slope of he graphs and decide which of the three venues gave the best sales per dollar.

Part 2

For this you will use the files - KmeansEuroJobVAssignment. and Eurojob.csv

Then open the r file and import the Eurojob.csv.

1. Run each line of the code and observe the output.

In this example I have used Man (manufacturing) and SPS (Services) as the two variables

Print the 2 plots from this file. Notice that the second plot, automatically labels x and y axes as Dim1 and Dim2. This is a default output given by Fviz. It basically takes all the variables in the data set and chooses the best as the two new variables. You will learn more about this technique called principal component analysis when you do more advanced courses. For now this is sufficient to know.

2. Next play with a combination of any other two variables from the Euro jobs and see what kind of clusters you get. For example with MAN and SPS, you'll notice there is more data for one cluster vs. the other. See if this changes and document it.

PreviousNext

RegressionAssignment2.r:

library(ggplot2)

library(dplyr)

library(broom)

library(ggpubr)

data("marketing", package = "datarium")

head(marketing, 4) # Show the first 4 rows of data

summary(marketing)

plot(sales ~ facebook, data = marketing)

sales.facebook.lm <- lm(sales ~ facebook, data = marketing)

summary(sales.facebook.lm)

#Plot the data points on a graph

sales.graph<-ggplot(marketing, aes(x=sales, y=facebook))+

geom_point()

sales.graph

#Add the regression line to the plot

sales.graph <- sales.graph + geom_smooth(method="lm", col="black")

sales.graph

#Add the equation for the regresson line

sales.graph <- sales.graph +

stat_regline_equation(label.x = 3, label.y = 110)

sales.graph

#Add titles and labels

sales.graph +

theme_bw() +

labs(title = "Reported sales y as a function of marketing budget for facebook x",

x = "facebook marketing x ",

y = "sales y")

KmeansEuroJobVAssignment:

#This example uses Euro jobs that describes the percent of employment per sector

#agriculture, mining, manufacturing,power supply, construction, services,

#finance, personal services and transportation

install.packages("ISLR")

library(ISLR)

library(cluster)

Eurojobs <- read.csv(

file = "https://statsandr.com/blog/data/Eurojobs.csv",

sep = ",", dec = ".", header = TRUE

)

head(Eurojobs) # head() is used to display only the first 6 observations

#remove column1

Eurojobs <- read.csv(

file = "https://statsandr.com/blog/data/Eurojobs.csv",

sep = ",", dec = ".", header = TRUE, row.names = 1

)

head(Eurojobs)

summary(Eurojobs)

set.seed(140) # Randomize for reproducibility

Eurojobs.scaled <- as.data.frame(apply(Eurojobs, MARGIN = 2, FUN = scale))

#head(Eurojobss.scaled)

scaledClusters <- kmeans(Eurojobs.scaled, centers = 2, nstart = 50, iter.max = 5000)

scaledClusters

scaledClusters$cluster

Eurojobs.scaled[c("Man", "SPS")]

plot(Eurojobs.scaled[c("Man", "SPS")],

col = scaledClusters$cluster, main = "Eurojobs clusters, scaled")

scaledClusters$centers

scaledClusters$centers[, c("Man", "SPS")]

# cex is font size, pch is symbol

points(scaledClusters$centers[, c("Man", "SPS")],

col = 1:2, pch = 4, cex = 3)

library(factoextra)

library(NbClust)

library(cluster)

km_res <- kmeans(Eurojobs, centers = 2, nstart = 20)

fviz_cluster(km_res, Eurojobs, ellipse.type = "norm")

Eurojob.csv:

Country	Agr	Min	Man	PS	Con	SI	Fin	SPS	TC
Belgium	3.3	0.9	27.6	0.9	8.2	19.1	6.2	26.6	7.2
Denmark	9.2	0.1	21.8	0.6	8.3	14.6	6.5	32.2	7.1
France	10.8	0.8	27.5	0.9	8.9	16.8	6	22.6	5.7
W. Germany	6.7	1.3	35.8	0.9	7.3	14.4	5	22.3	6.1
Ireland	23.2	1	20.7	1.3	7.5	16.8	2.8	20.8	6.1
Italy	15.9	0.6	27.6	0.5	10	18.1	1.6	20.1	5.7
Luxembourg	7.7	3.1	30.8	0.8	9.2	18.5	4.6	19.2	6.2
Netherlands	6.3	0.1	22.5	1	9.9	18	6.8	28.5	6.8
United Kingdom	2.7	1.4	30.2	1.4	6.9	16.9	5.7	28.3	6.4
Austria	12.7	1.1	30.2	1.4	9	16.8	4.9	16.8	7
Finland	13	0.4	25.9	1.3	7.4	14.7	5.5	24.3	7.6
Greece	41.4	0.6	17.6	0.6	8.1	11.5	2.4	11	6.7
Norway	9	0.5	22.4	0.8	8.6	16.9	4.7	27.6	9.4
Portugal	27.8	0.3	24.5	0.6	8.4	13.3	2.7	16.7	5.7
Spain	22.9	0.8	28.5	0.7	11.5	9.7	8.5	11.8	5.5
Sweden	6.1	0.4	25.9	0.8	7.2	14.4	6	32.4	6.8
Switzerland	7.7	0.2	37.8	0.8	9.5	17.5	5.3	15.4	5.7
Turkey	66.8	0.7	7.9	0.1	2.8	5.2	1.1	11.9	3.2
Bulgaria	23.6	1.9	32.3	0.6	7.9	8	0.7	18.2	6.7
Czechoslovakia	16.5	2.9	35.5	1.2	8.7	9.2	0.9	17.9	7
E. Germany	4.2	2.9	41.2	1.3	7.6	11.2	1.2	22.1	8.4
Hungary	21.7	3.1	29.6	1.9	8.2	9.4	0.9	17.2	8
Poland	31.1	2.5	25.7	0.9	8.4	7.5	0.9	16.1	6.9
Rumania	34.7	2.1	30.1	0.6	8.7	5.9	1.3	11.7	5
USSR	23.7	1.4	25.8	0.6	9.2	6.1	0.5	23.6	9.3
Yugoslavia	48.7	1.5	16.8	1.1	4.9	6.4	11.3	5.3	4