Question
R code: ## 4. __Scrape baseball-reference.com with rvest__ You will use the package rvest to scrape data from the website baseball-reference.com. Begin at the teams
R code:
## 4. __Scrape baseball-reference.com with rvest__
You will use the package rvest to scrape data from the website baseball-reference.com.
Begin at the teams page
For each active team (30), visit each team's page and download the "Franchise History" table. The node you will want to use is "#franchise_years". Combine all the tables in one. Note that some franchises have names and locations. To keep track of the team, add a column to the dataframe called "current" which will contain the current name of the team. (e.g. In the 'current' column, the row for 1965 Milwaukee Braves will contain the value 'Atlanta Braves')
__Hint:__ When I ran my code, my table had 2624 rows and 22 columns.
__Hint:__ _I used the function `html_table()` to extract the table from each team's page._
library(rvest) # starting page teampage <- read_html("http://www.baseball-reference.com/teams/")
# write your r code here # create a table called baseball that contains all of the teams' franchise histories
# at the end, be sure to print out the dimensions of your baseball table. dim(baseball) head(baseball)
```{r baseball_cleanup, error = TRUE} # you should not need to modify this code, but you will probably need to run it. library(stringr) # This code checks to see if text in table has regular space character # Because the text from the web uses a non-breaking space, we expect there to be a mismatch # I'm converting to raw because when displayed on screen, we cannot see the difference between # a regular breaking space and a non-breaking space. all.equal(charToRaw(baseball$Tm[1]), charToRaw("Arizona Diamondbacks"))
# identify which columns are character columns char_cols <- which(lapply(baseball, typeof) == "character")
# for each character column, convert to UTF-8 # then replace the non-breaking space with a regular space for(i in char_cols){ baseball[[i]] <- str_conv(baseball[[i]], "UTF-8") baseball[[i]] <- str_replace_all(baseball[[i]],"\\s"," ") # baseball[[i]] <- str_replace_all(baseball[[i]],"[:space:]"," ") # you might have to use this depending on your operating system and which meta characters it recognizes }
# check to see if the conversion worked ## should now be TRUE all.equal(charToRaw(baseball$Tm[1]), charToRaw("Arizona Diamondbacks"))
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started