In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little details once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party_simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut = quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are Democrat and Republican`. Store the results in `presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA`, `TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r} In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little detaills once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut =quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z` = depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the `bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, `candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are `Democrat and Republican`. Store the results in presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the `TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA, TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r} In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little details once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party_simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut = quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are Democrat and Republican`. Store the results in `presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA`, `TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r} In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little detaills once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut =quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z` = depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the `bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, `candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are `Democrat and Republican`. Store the results in presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the `TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA, TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r}
Expert Answer:
Answer rating: 100% (QA)
Part 1 Base R Load necessary packages and data loadpresidentialracesRData librarytidyverse Fix categorical variables to factor presidentialraces mutat... View the full answer
Related Book For
Financial Accounting and Reporting a Global Perspective
ISBN: 978-1408076866
4th edition
Authors: Michel Lebas, Herve Stolowy, Yuan Ding
Posted Date:
Students also viewed these programming questions
-
Platinum corporation, a Canadian company, invests 50,000,000 Euros in France. The investment generates after-tax cash flows of Euros 25 Million, 33 million, 46 million and 30 million in the first 4...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
List three specific parts of the Case Guide, Objectives and Strategy Section (See below) that you had the most difficulty understanding. Describe your current understanding of these parts. Provide...
-
Consider a soap bubble. Is the pressure inside the bubble higher or lower than the pressure outside?
-
Would you expect a very small company to have as sophisticated a system of internal control as that of a much larger company? Why not? What are ways in which a small company can have effective...
-
In the article "Education and Correctional Populations" (Bureau of Justice Statistics Special Report, NCJ 195670), C. Harlow examined the educational attainment of prisoners by type of prison...
-
Inventory costing methods The following transactions affecting materials occurred in February: Feb. 1 Balance on hand, 1,200 feet @ $2.76, $3,312.00 (plastic tubing, stores ledger account #906). 5...
-
Hovak Company has credit sales of $ 4.5 million for year 2013. At December 31, 2013, the companys Allowance for Doubtful Accounts has an unadjusted debit balance of $ 3,400. Hovak prepares a schedule...
-
\f
-
Consider a real-time system with the following task set: Task 1: C-4, T-8 Task 2: C-3, T-12 Task 3: C-2, T-20 where C is the computation time and T is the period. Assume periods and deadlines are the...
-
11. The typical supply curve illustrates that: A) other things equal, the quantity supplied for a good is inversely related to the price of a good. B) other things equal, the supply of the good...
-
Sheffield has been selling auto parts to the general public for over 70 years. It has built a reputation for outstanding customer service, becoming the third largest auto parts retailer in the...
-
During Weeks 5-8, you will plan and carry out a Fieldwork Project in the tradition of ethnomusicology. The subject of the Fieldwork Project could be anything from a colleague who plays acoustic...
-
From The New York Times, Feb 2 0 , 2 0 0 9 , in a column by Alice Waters and Katrina Heron with that headline: How much would it cost to feed 3 0 million American schoolchildren a wholesome meal? It...
-
Discuss a case where a company faced a leadership crisis. How did the organization handle it, and what were the consequences on employee morale and productivity?
-
1.We have 4 ways to recruit operators for out factory: on line internet, head hunters, ads in trade journals, and on-line company intranet.HR has discovered on-line internet ads reach 300 people per...
-
A 16-pound bag of Kitty Kibbles is $24.00. An 8-pound bag of Feline Flavor is $11.20 Which statement about the unit prices is true? o Feline Flavor has a higher unit price of $1.50/pound.
-
The time to assemble the first unit on a production line is 10 hours. The learning rate is 0.94. Approximately how long will it take for the seventh unit to be assembled? The number of hours needed...
-
Sweden-based Ericsson (Telefonaktiebolaget L. M. Ericsson) is the worlds leading provider of communication technology, telecommunications equipment and services to mobile and fixed telecom network...
-
Multiple Choice Questions 1. In a capital increase, the difference between the price paid by the buyer for a companys common share and the par value of each share can be called (several possible...
-
The Agfa-Gevaert group develops, produces and distributes an extensive range of analog and digital imaging systems and IT solutions, mainly for the printing industry and the healthcare sector, as...
-
Accounting for Partly Completed Events: A Prelude to Chapter 3} Ehrlich Smith, the owner of The Shoe Box, has asked you to help him understand the proper way to account for certain accounting items...
-
Explain how the periodicity assumption, revenue recognition principle, and matching concept affect the determination of income. - The revenue recognition principle under IAS 18 currently specifies...
-
Professional and Ethical Behaviour} Your close friend Avery was recently hired to work in the accounting department at Ted's Automotive Ltd. You are excited to have your friend working at the same...
Study smarter with the SolutionInn App