Question

1 Approved Answer

Posted on Sep 26, 2024

The code for all regressions is to be written in Python language in the Jupyter Notebooks environment . The dataset weibo_data.csv contains the columns: a)

The code for all regressions is to be written in Python language in the Jupyter Notebooks environment. The dataset weibo_data.csv contains the columns: a) location, b) show_id, c) episode_num, d) censor_dummy, e) log_rating, f) log_tweet, g) av_tweets, h) day_id, i) mainland_dummy

Here is a screenshot of weibo_data.csv for your reference:

image text in transcribed

1) Measuring the impact of online word-of-mouth You are trying to measure the impact of online word-of-mouth on product demand in the Chinese TV market. Specifically, you are interested in finding out whether consumers' tweets about a TV show lead to higher viewership of the show. You obtain episode-level data of ratings (market share in terms of viewership) for a large set of TV shows and information on the number of tweets on Sina Weibo mentioning the name of the show on the day on which a specific episode aired. You also have data on ratings for a set of shows in Hong Kong, where Sina Weibo has almost no market penetration because Hong Kong residents mainly use Twitter (which is blocked in mainland China).

Use the dataset weibo_data.csv for the following questions:

1.1) Simple regression Question 1a) In Python, regress (log) ratings of each show onto the (log) number of tweets per episode. Question 1b) Do you think this regression gives you the causal effect of tweets on show viewership? If not, do you think your estimate will be biased upwards or downwards?

1.2) Geographic Difference-in-difference During the time period of your data, the Chinese government blocked the entire Sina Weibo platform due to a political scandal for 3 days (a dummy for those 3 days is called censor_dummy). Assume that the censorship constitutes an exogenous shock that affected the number of tweets during the 3 days it lasted. You want to exploit this shock in order to analyze whether ratings decreased during the censorship.

Question 1.2a) i) In Python, run a regression of episode-level (log) ratings on show fixed effects and the censorship dummy using only data from mainland China in Python. Question 1.2a) ii) Interpret the coefficient on the censorship dummy. Is this result what you expected?

Question 1.2b) i) Was it necessary to control for show fixed effects in the regression above? Question 1.2b) ii) If you ran the regression without show fixed effects, how would the interpretation of the coefficient on the censorship dummy differ?

Question 1.2c) i) In Python, run the same regression as in part 1.2a) i), but use only data from Hong Kong (and not mainland China). Make sure to control for show fixed effects. Question 1.2c) ii) Interpret the coefficient on the censorship dummy. Is this result what you expected?

Question 1.2d) i) In Python, use data from both Hong Kong and mainland China to implement a difference-in-differences regression with mainland China as the treatment group and Hong Kong as the control group (i.e. show that the censorship event had a differential effect in mainland China relative to Hong Kong). Make sure to control for show fixed effects. Question 1.2d) ii) Interpret the relevant coefficients of this regression.

1.3) Across-show Difference-in-difference From here onwards, use only observations from shows in mainland China.

The variable av_tweets denotes the average number of tweets associated with an episode of each show (outside of the censored time period). Therefore, this variable is show specific, but it does not vary over time. We can use this variable to capture the general level of social media interest in each show.

Generate a set of 3 dummy variables based on the av_tweets variable where: The first dummy = 1 for shows with fewer than 5 tweets per episode, The second dummy = 1 for shows with at least 5 but less than 100 tweets per episode, and The third dummy = 1 for shows with at least 100 tweets per episode.

Question 1.3a) i) In Python, run a regression for shows with less than 5 tweets per episode Question 1.3a) ii) In Python, run a regression for shows with 5 to 100 tweets per episode Question 1.3a) iii) In Python, run a regression for shows with at least 100 tweets. Question 1.3a) iv) What do you find in terms of impact of the censorship event across the three regressions?

Question 1.3b) i) In Python, run a difference-in-difference regression that allows for the censorship event to have a different effect for 3 sets of shows with the 3 different activity levels defined above. Question 1.3b) ii) Interpret the relevant coefficients.

Question 1.3c) i) Relate your findings across shows with different activity levels to the geographic difference-in-difference approach. Question 1.3c) ii) Which regression is more informative regarding the impact of the censorship on ratings?

The code for all regressions is to be written in Python language in the Jupyter Notebooks environment. Please refer to the screenshot at the top of the page for more on the weibo_data.csv dataset.