Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. Use basic vector operations with jane.vector so you can convince yourself that this is in fact a vector of strings. a. Display lines 350-355

image text in transcribed
image text in transcribed
1. Use basic vector operations with jane.vector so you can convince yourself that this is in fact a vector of strings. a. Display lines 350-355 of jane.vector (if this is empty, try another set of 5 or so lines) b. How many lines of jane.vector are there? c. Display the last line of jane.vector. d. Are there any lines that have 30 characters? How do you know? Display any lines with 30 characters. e. What is the median number of characters in jane.vector? 1. Where is the longest line(s) of jane.vector located? g. Display the longest line(s) of jane.vector. h. Create a boxplot of the number of characters per line in jane.vector. I. Ave there any lines with no characters? If so, remove them, jane.vector into a cleaner version. Check that the new length of jane.vector makes sense to you. 2. Collapse all of the lines in jane.vector into one big string, separating each line by a space in doing so, using the paste(j) command. Call the resulting string jane.string. In the following problems, be sure you are using string commands such as substr, gsubisub, strsplit, etc. a. How many characters does jane.string have? b. Display the 120 th - 170 th characters in jane.string. c. Display the last 50 characters of jane.string.3.Split up jane.string into words, using the strsplit() command, and save the result as jane.all. Be absolutely sure you understand the structure of the object that strsplit returns! Note: pay attention to how you should split the strings in questions-look at the syntax for strsplit for this! d. What type of object is jane als? Is it easy to work with? e. How many components does jane.ail have? 3. Save the first component of jane.all as a vector of strings called jane.words, and answer the following questions. (Note; this is a simple subsetting command. The rest of these questions deal again with vector subsetting. a. How long is this vector, L.e., how many words are there? b. Display words 900 through 1050 . a. Give the 5 number summary of characters in the words. What is the word with the largest number of characters? b. How many words have 8 characters? List a few of these words. 4. To really study this set of words, we first need to see just how flawed the data is. There are probably digits, empty characters, punctuation marks that are saved as words. Use a grep command to see it there is any punctuation in jane.words. Then, show exactly what one of those lines contains. 6The easiest way to deal with the punctuation problem is to use strsplit to split on punctuation as well as blanks. (So we are redoing the exercise above in part 3). Execute the following line to do so: jane.allR to all lowercase, and do this to the current jane.words. Again, overwrite the current jane.words so that the result does not have any punctuation, uppercase words, digits, or blank spaces! It may not be perfect for word analysis, but it should be MUCH betterl a. Display words 1000 through 1250 in jane.words. b. Display the last 30 word in jane.words. 6. Using the "unique" function, find all of the unique words in jane.words, and store them in jane.words.unique. a. How many unique words are there? Compare this to how many words there are in jane.words. 7. Get a boxplot of the number of characters in jane. words.unique. What do you notice? is easiest to use the basic boxplot command rather than ggplot in this case.) 8. Remember, the sort) function sorts a vector into increasing order. A close relativo is the order(0) function, which returns the indices that put the vector into increasing order. Try the following code for an example. ve-c(10, 1, 1000, 100, 1) order(v) x

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_step_2

Step: 3

blur-text-image_step3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction To Data Mining

Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar

1st Edition

ISBN: 321321367, 978-0321321367

More Books

Students also viewed these Databases questions

Question

Is anything missing? Ideas, data, evidence, detail, and so on?

Answered: 1 week ago

Question

2. Show the trainees how to do it without saying anything.

Answered: 1 week ago