Question: Part of your job as a data analyst will be to find information on the web, clean it and present the data in a meaningful

Part of your job as a data analyst will be to find information on the web, clean it and present the data in a meaningful format other people can understand.  In this assignment, you are going to do is with a tourism dataset from the world bank. You will download the data, clean it and then make a function which will allow a user to input a variable number of countries and years, and the output will be a graph of the countries vs years.  This assignment directly maps to the following learning outcomes:

 

  1. Utilize the R programming language to write functions, loops, examine and explore data and utilize libraries for added functionality for data analysis such as: dplyr, ggplot2, lubridate, and tidyr.

 

  1. Demonstrate how to turn unstructured data (messy data) into structured data (tidy data).

 

  1. Demonstrate how to search for online databases, find open data sources on the internet, and utilize the data.

 

  1. Retrieve data from the web, clean it, and present the data to a user in a readable, often visual, format which utilizes tools and techniques learned throughout the course.


 

 

Directions

  1. Download the dataset from:
    http://data.worldbank.org/indicator/ST.INT.RCPT.CD

     
  2. Unzip the file and load it into R Studio
    1. You can use read.csv(), or read.xls() from library(xlsx)
  3. Clean & Tidy the data
    1. Note: you need to convert data from data wide to data long.
  4. Plot out graph of 3 countries tourism $ vs time
    1. You need to use ggplot() for this part of the problem.       If you use another plotting function ie. plot() or qplot() you will only received 50% credit for this part of the assignment.
    2. Convert your y-axis a log axis.
  5. Make a function by wrapping your code with a function argument
    1. Your arguments should be three countries
    2. Extra Credit part 1 - Use the ". . ." argument to pass multiple countries and multiple years in the function.  This will allow the user to plot as many countries and for whatever years they want.
    3. Credit part 2 - create an argument that allows you to select a sequential number of years.  So from 1997:2005
  6. Save the code as a . R file or a . Rmd file and upload the file to moodle

 

Note: You R code function and plot should look like the next page.

 

 

 

Dollars 1e+11- 1e+10- 1e+09- 1e+08- 1995 1996 1997 1998 19992000200120022003200420052006200720082009201020112012201320142015201620'17 Years Countries

 

 

 

 

                                                                                                                      

tourism_plot("China", "Ghana", "United States")
 

Dollars 1e+11- 1e+10- 1e+09- 1e+08- 1995 1996 1997 1998 19992000200120022003200420052006200720082009201020112012201320142015201620'17 Years Countries China Ghana United States

Step by Step Solution

3.34 Rating (145 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

Step 1 Download and Load the Dataset Assuming youve downloaded the dataset and saved it as touri... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Computer Network Questions!