Question

1 Approved Answer

Posted on Oct 29, 2024

--- output: html_document: default pdf_document: default --- --- title: 'Twitter Retweetability Analysis' subtitle: 'UMaine BUA684 Module 3' author: - FirstName LastName date: `r format(Sys.time(), '%d

--- output: html_document: default pdf_document: default ---

--- title: 'Twitter Retweetability Analysis' subtitle: 'UMaine BUA684 Module 3' author: - FirstName LastName date: "`r format(Sys.time(), '%d %B %Y')`" output: pdf_document ---

# Problem The aim of this assignment is to understand the relationship between retweetability of one tweet (whether the tweet is retweeted and, if it does, how many times it is retweeted) and some features of the tweet.

# Data ```{r message=FALSE, warning=FALSE} # import data install.packages("readr") library(readr) LLBean_retweet <- read_csv("LLBean_retweet.csv")

# wrangle data install.packages(c("dplyr", "stringr","stringi")) library(dplyr) LLBean1<-LLBean_retweet%>%filter(language == c('en'))%>% mutate(hashtags = gsub("\\[|\\]", "", hashtags), urls=gsub("\\[|\\]", "", urls))

library(stringr) library(stringi) LLBean2<-LLBean1%>% mutate( tweet_length=str_length(tweet), url_ind=ifelse(str_length(urls)==0, 0, 1), hashtags_count=ifelse(str_length(hashtags)==0,0,stri_count_fixed(hashtags, ",") + 1), retweet_ind=as.numeric(retweets_count>0))%>%select(-language)

LLBean3<-LLBean2%>%filter(retweet_ind==1)

head(LLBean_retweet) head(LLBean1) head(LLBean2) head(LLBean3) ```

In below, Please write all your answers in **Bold** font.

*Problem 1: Based on the data outputs `LLBean_retweet`, `LLBean1`, `LLBean2`, and `LLBean3`, describe how the above R code wrangles the data.* **(Note: Because the data is big, to have a complete view of the four datasets, you must also run the above code in Console not just in this R Notebook. Or, you can open each of the four datasets from the Environment panel by clicking on the dataset name. )**

**Your answer: ( )**

# Analysis ## Logistic regression model *Problem 2: In the following chunk, use the business case example for logistic model as reference to build a logistic regression model based on the dataset `LLBean2` to predict `retweet_ind` using `tweet_length`, `url_ind`, `hashtags_count`, and `video` as predictors. Show all your R code in the submission including that for addressing the multicollinearity problem. If the problem appears, you need to update your model to resolve the problem. Also, you should evaluate importance of different predictors in the model.* ```{r message=FALSE, warning=FALSE}

```

*Problem 3: Based on your final model results for Problem 2, interpret the meaning of the regression coefficient estimate for the most important predictor.*

**Your answer: ( )**

*Problem 4: In the following chunk, estimate the possible range of regression coefficient estimate for the most important predictor at 95% confidence level.* ```{r message=FALSE, warning=FALSE}

```

*Problem 5: Based on the result for Problem 4, interpret the generalized meaning of the regression coefficient estimate for the most important predictor.*

**Your answer: ( )**

*Problem 6: In the following chunk, measure performance of the logistic regression model you build. Show all your R code in the submission* ```{r message=FALSE, warning=FALSE}

```

*Problem 7: According to the model performance measure in Problem 6, is this model a poor/average/good/strong model?*

**Your answer: ( )**

## Least-square regression model *Problem 8: In the following chunk, use the business case example for least-square model as reference to build a least-square regression model based on the dataset `LLBean3` to predict `retweets_count` using `tweet_length`, `url_ind`, `hashtags_count`, and `video` as predictors. Show all your R code in the submission including that for checking the assumptions for this type of regression model, detecting influential outliers, and addressing the multicollinearity problem. If the outlier problem and/or the multicollinearity problem appears, you need to update your model to resolve the problem(s). Also, you should evaluate importance of different predictors in the model.* ```{r message=FALSE, warning=FALSE}

```

*Problem 9: Is there some assumption(s) not satisfied by this dataset? If yes, what assumptions are not satisfied.*

**Your answer: ( )**

*Problem 10: Based on your final model results for Problem 8, interpret the meaning of the regression coefficient estimate for the most important predictor.*

**Your answer: ( )**

*Problem 11: In the following chunk, estimate the possible range of regression coefficient estimate for the most important predictor at 95% confidence level.* ```{r message=FALSE, warning=FALSE}

```

*Problem 12: Based on the result for Problem 11, interpret the generalized meaning of the regression coefficient estimate for the most important predictor.*

**Your answer:( )**

*Problem 13: In the following chunk, measure performance of the least-square regression model you build. Show all your R code in the submission* ```{r message=FALSE, warning=FALSE}

```

*Problem 14: According to the model performance measure in Problem 12, do you think this model a good model?*

**Your answer:( )**

# Discussion *Reflect on the ways in which the logistic and least-square regression model results could contribute to the development of an enhanced social media marketing strategy for the company. Although this analysis won't be detailed here, you will have the opportunity to collaborate with your project team, allowing you to work together with your teammates to further examine this aspect and devise comprehensive marketing approaches.*