Question
Problem 4: Capturing global customers with machine translation [24 points] Data Source: https://docs.google.com/spreadsheets/d/1G-tXIGYoW17awkI1S8K0FpuPOvz3i_xJre3kvS4juGg/edit?usp=sharing As many e-commerce platforms try to grow their business beyond borders, they
Problem 4: Capturing global customers with machine translation [24 points]
Data Source:
https://docs.google.com/spreadsheets/d/1G-tXIGYoW17awkI1S8K0FpuPOvz3i_xJre3kvS4juGg/edit?usp=sharing
As many e-commerce platforms try to grow their business beyond borders, they realize that language is one key barrier in international trade. For example, buyers in Mexico may not understand listing titles and descriptions in English, and therefore do not buy from U.S. sellers even though they are interested in the sellers' products. To mitigate language barrier across borders, an e-commerce platform adopts machine translation (MT) to automatically translate listings for buyers. The platform first adopts MT from English to Spanish for buyers in Spanish-speaking Latin American countries. In particular, buyers from these countries will see listing titles and descriptions in Spanish, instead of in English.
Question a). The platform wants to evaluate the effect of the introduction of MT on international trade via a difference-in-difference (DiD) approach. The DiD approach compares intertemporal changes in exports from the U.S. to Spanish-speaking Latin American countries to intertemporal changes in exports from the U.S. to other countries. Use "data_DID.csv" to perform the DiD analysis. The outcome variable should be "log_revenue". Report the line of code that you used for running the regression. Interpret the estimated coefficients (except the fixed effects and the constant). [8 points]
- "buyer_country" is the ID of the buyer's country.
- "treatment" is the treatment status dummy: =1 for Spanish-speaking Latin American countries; =0 for other countries
- "norm_m" is the same as in part a).
- "post": =1 if "norm_m">=0; "post"=0 if "norm_m"<0
- "log_unit": logarithm of U.S. exports to a country, where exports is measured in units/quantity.
- "log_revenue": logarithm of U.S. exports to a country, where exports is measured in dollars.
Question b). One data scientist in the company argues that we should control for "log_unit" in the regression because changes in revenue should obviously be correlated with changes in the number of units sold. Do you agree with this view? Why? [10 points]
Question c). What is an important assumption in this DiD approach? Do you think the assumption is justified in this scenario? Why? (You should use both "data_graph.csv" and your reasoning to answer this question) [10 points]
- "norm_m" is the normalized month, so that "0" means the first month that MT was introduced. "-1" means the month before MT was introduced.
- "norm_revenue_treated" is the average monthly U.S. exports in dollar terms to Spanish-speaking Latin American countries.
- "norm_revenue_control" is the (normalized) average monthly U.S. exports in dollar terms to other countries.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started