Question

1 Approved Answer

Posted on Sep 22, 2024

Please use R to do Q1 It hasically and informatie embedded inside HTML elements, with long bediiden pepec It will try 10 Atat se pogo240-2013

Please use R to do Q1 image text in transcribed

It hasically and informatie embedded inside HTML elements, with long bediiden pepec It will try 10 Atat se pogo240-2013 10 10 2 * a od track (5) i class 1 * P1RE THE V4. pi//www.ua/tutti/p/ CO) ratu calott./ 101 ** HTML team write with a start the and with the continua .a), 2.... heading to put ong, - Page tended . Ondertus . Individual I **) as declarecie 121 431d-swerthodowy to cha The wage to extracting the locationstead of the that we can extract information found that location heading conting Index 13-13.Chinese-13 What we are two mring with dig/trailing white sem. The weal Information is embedded in the ITML . We cycle themelerepe sion from the previoun lab to remove the white HTML the HTML IK Gremove cach otherwys move all the ... The first way office will the way might be income. My viento try to the ... If the propondo while can be red with into place de W with double y singhe Question la plata) Wille Rode to do the Mine weite for this year's offering of the wall Inps move all of the HTML formatting and willing and the white and we dance dind the print out the Pride the de then in, polt the code brom the the wh BR//www.au/out.2021/pin/340/2100, provide the de Ar that the mode with the of the city that we that Questo 1e (poleta) Wille tilfe Maping the URL of and The last two wheel by the code, lot with two the name of the tree of the with all word Demostrat de URL provide the code det I Wth that want to HTML have two what we are to The les persone di ty Law of the wounded by the ITM. We can be in the prep) function * 314 316 Turva- Topo Index for the Parsing HTML The Winter imple Male we them, will I/www.in.ca/chart, wie bo of the top 10 song with their be the mandate proll We would like to ale with informatie. Of course the whold the tomato Read the wc toate profesia wewny de ve ding to extract the Inter wie dalam kepalaw that can be 1 The advantage to extracting the location instead of the value is that we can extract information around that location. heading - course_page[heading_index[1] -1): (heading_index [2] +1)] What we get are two strings with leading/trailing white spaces. The useful information is embedded in the HTML tags. We can recycle the regular expres- sion from the previous lab to remove the white spaces and HTML tags. For the HTML tags, we can remove each of them separately, or we can remove all the <...> tags at once. The first way is somewhat inefficient, while the second way might cause trouble in some cases. My advice here is to try to remove all the <...> tags first. If the unexpected results appears, do some fixes. Excess whitespace can be removed with trimws and using gsub to replace the regular expression w+ with '' (i.e., double spaces are replaced by single spaces). Question la, (2 points): Write R code to download the course out- line website for this year's offering of this course and extract all h3 head- ings remove all of the HTML formatting and any excess white space (leading and trailing white space and also repeated whitespace characters) from all h3 headings, and then print out those headings. Provide the R code. Ques- tion 1b, (2 points): Extract the course code from the text of the web- site https://www.sfu.ca/outlines.html?2021/spring/stat/240/2100, and provide the R code. Argue that the same code works on the pages for the out- lines of other courses (or, modify that your code so that it does). Question 1c, (6 points): Write an R function called course. This function should take as an argument a string specifying the URL of a course outline, and return a list. The list should have two elements, one with name course and value given by the course code, and one with name instructor and value given by the name of the instructor of the course (with all extreneous whitespace removed). Demonstrate that this code works on a few URLs and provide the code and the demonstration. It hasically and informatie embedded inside HTML elements, with long bediiden pepec It will try 10 Atat se pogo240-2013 10 10 2 * a od track (5) i class 1 * P1RE THE V4. pi//www.ua/tutti/p/ CO) ratu calott./ 101 ** HTML team write with a start the and with the continua .a), 2.... heading to put ong, - Page tended . Ondertus . Individual I **) as declarecie 121 431d-swerthodowy to cha The wage to extracting the locationstead of the that we can extract information found that location heading conting Index 13-13.Chinese-13 What we are two mring with dig/trailing white sem. The weal Information is embedded in the ITML . We cycle themelerepe sion from the previoun lab to remove the white HTML the HTML IK Gremove cach otherwys move all the ... The first way office will the way might be income. My viento try to the ... If the propondo while can be red with into place de W with double y singhe Question la plata) Wille Rode to do the Mine weite for this year's offering of the wall Inps move all of the HTML formatting and willing and the white and we dance dind the print out the Pride the de then in, polt the code brom the the wh BR//www.au/out.2021/pin/340/2100, provide the de Ar that the mode with the of the city that we that Questo 1e (poleta) Wille tilfe Maping the URL of and The last two wheel by the code, lot with two the name of the tree of the with all word Demostrat de URL provide the code det I Wth that want to HTML have two what we are to The les persone di ty Law of the wounded by the ITM. We can be in the prep) function * 314 316 Turva- Topo Index for the Parsing HTML The Winter imple Male we them, will I/www.in.ca/chart, wie bo of the top 10 song with their be the mandate proll We would like to ale with informatie. Of course the whold the tomato Read the wc toate profesia wewny de ve ding to extract the Inter wie dalam kepalaw that can be 1 The advantage to extracting the location instead of the value is that we can extract information around that location. heading - course_page[heading_index[1] -1): (heading_index [2] +1)] What we get are two strings with leading/trailing white spaces. The useful information is embedded in the HTML tags. We can recycle the regular expres- sion from the previous lab to remove the white spaces and HTML tags. For the HTML tags, we can remove each of them separately, or we can remove all the <...> tags at once. The first way is somewhat inefficient, while the second way might cause trouble in some cases. My advice here is to try to remove all the <...> tags first. If the unexpected results appears, do some fixes. Excess whitespace can be removed with trimws and using gsub to replace the regular expression w+ with '' (i.e., double spaces are replaced by single spaces). Question la, (2 points): Write R code to download the course out- line website for this year's offering of this course and extract all h3 head- ings remove all of the HTML formatting and any excess white space (leading and trailing white space and also repeated whitespace characters) from all h3 headings, and then print out those headings. Provide the R code. Ques- tion 1b, (2 points): Extract the course code from the text of the web- site https://www.sfu.ca/outlines.html?2021/spring/stat/240/2100, and provide the R code. Argue that the same code works on the pages for the out- lines of other courses (or, modify that your code so that it does). Question 1c, (6 points): Write an R function called course. This function should take as an argument a string specifying the URL of a course outline, and return a list. The list should have two elements, one with name course and value given by the course code, and one with name instructor and value given by the name of the instructor of the course (with all extreneous whitespace removed). Demonstrate that this code works on a few URLs and provide the code and the demonstration