Question
ALY6010 Module 1 Project Instructor: Dr. Dee Chiluiza, PhD Discrete probability and normal distributions Overview and Rationale This assignment is designed to provide hands-on experience
ALY6010 Module 1 Project Instructor: Dr. Dee Chiluiza, PhD Discrete probability and normal distributions
Overview and Rationale
This assignment is designed to provide hands-on experience in performing descriptive statistical methods on a data set. The data set is provided in an Excel workbook and contains a widerange to data types that you will need to work with.
Assignment Summary
Using the data provided in the attached Excel workbook, apply the methods of graphical and numerical descriptive statistics. Follow the instructions in the project document to analyze the data presented in the Excel workbook. Then complete one report summarizing your data analyses. Important note on the report: for this project, your report will be an HTML file produced using R Markdown. Files to submit: Important to remember, for this project you must submit two files: R Markdown File. HTML report.
Tasks to complete before starting the project.
1. Install the latest versions of R and R Studio on the computer. (Read file: 01a R Install, create folder and project.ppt) 2. Create one folder on the computer named "ALY6010 R Project" and a subfolder named "DataSets". 3. From R Studio, create an R Project for this class using the "ALY6010 R Project" folder created above. (Read file: 01a R Install, create folder and project.ppt) 4. Learn how to import data sets into R using the strategy requested by the instructor. (Read file: 03 R Import data sets.ppt) 5. Learn how to use R Markdown. Will be using only basic codes to produce the HTML outcome reports. (Read file: R Markdown Introduction.ppt) 6. Save the file "M1data_carsales.xlsx" inside the DataSets folder. 7. Create an R Markdown file inside the ALY6010 R Project, name this file: Project1_myname.Rmd. (Read file: R Markdown Introduction.ppt)
8. Import the data set into R Studio using the strategy that leaned above and present the code using an initial R chunk. 9. Do not present install.package() codes on the report. If still need to install any new package, do it directly in the R Studio console.
Create initial R chunk to activate the libraries and import that into data sets. Use the following headeron this R chunk: {r message=FALSE, warning=FALSE} Some libraries to include on the libraries R Chunk. If do not have them, install the packages in the console. library(readxl) library(tidyverse) library(dplyr) library(DT) library(RColorBrewer) library(rio) library(dbplyr) library(psych) library(FSA)
Report starts here
Title.
Create one Title to the report with the report's name (Project 1 Report), name and CRN of the class.
Introduction. Create one title for the Introduction section. (A) Write some sentences to present general information about car sales market, global and in India. Here there are some websites you can read, these are examples, find others if you prefer:
Wagner, I. February 5, 2021. Automotive industry worldwide - statistics & facts. Statista. Link: https://www.statista.com/topics/1487/automotive-industry/
Thakkar, K. January 11, 2021. Indian car market may post record 30% growth in 2021 on low base. Auto.com. Link: https://auto.economictimes.indiatimes.com/news/passenger- vehicle/cars/indian-car-market-may-post-record-30-growth-in-2021-on-low-base/80218106
Culver, M. December 17, 2020. Global Auto Sales Expected to Gain Momentum Next Year; 83.4 Million Light Vehicles to Be Sold In 2021, According to IHS Markit. Business Wire. Link: https://www.businesswire.com/news/home/20201217005798/en/Global-Auto-Sales- Expected-to-Gain-Momentum-Next-Year-83.4-Million-Light-Vehicles-to-Be-Sold-In-2021- According-to-IHS-Markit
(B) Write one paragraph describing and explaining the importance of discrete and continuous probability distributions.
(C) Write one sentence describing the data set.
Analysis section. Task 1 Create one R Chunk. Start with the name of the data set, then using the pipes %>% , apply code dplyr::select() to select only the variables Efficiency, Power_bhp, Seats, Km, and Price. By using the second pipe, apply code psych::describe(), nothing inside the parenthesis. Run the code. Two things that should call attention: descriptive statistics are in the columns, not in the rows, and there are too many decimals. Correct these issues. Using another pipe, enter code t() to transpose values. Run code and observe. Using another pipe, enter code round(2) to reduce decimals to only 2. Using another pipe, enter code knitr::kable() to improve table presentation. Present the table in your Report. Write some observations about the code strategy.
Task 2.
Prepare and present one bar plot to show the frequencies of each category of variable location. Prepare and present one pie chart to show the Percentages of each category of variable fuel type. Important: Use code par(mfcol=c(1,2)) to organize the bar plots presentation in 1x2 matrix. Improve graphs presentation with clear y- and x- axes labels, colors. Write some observations of the figure.
Task 3
Create one table with the variable Owner on the rows, and present their corresponding frequencies, cumulative frequencies, percentages, and cumulative percentages. Notice that the table contains 5 columns, plus the labels for each row. If it contains decimals, always reduce them to 2 or 3 only. Follow these steps: Create one table to present locations and its frequencies. Convert table using as.data.frame() Rename columns: Var1 to Location and Freq to Frequency. Use code mutate() to create three new columns (these are new calculated fields): The cumulative frequencies, name column: CumFrequency. The percentages, name column: Percentage The cumulative percentage, name column: CumPercentage
Present it using the library(knitr). , in this case knitr::kable(). Optional: practice these codes to present the table, but will need to install package kableExtra. knitr::kable(digits = 2, caption = "Task 3 Table") %>% kable_classic(full_width = FALSE, font_size = 12) Check: https://cran.rproject.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html. Write some observations of the figure.
Task 4.
Using density() and then plot(), present one density curve for kilometers. Using abline(), add one vertical line for the mean, Add one vertical line for the value with standard score = 2.4 Add one vertical line for the value with standard score = -3.1 Using code mtext(), add the actual values for each line. Visit the website: https://rpubs.com/Dee_Chiluiza/816756 to learn how to use these codes, observeHistogram Example 1. Write some observations of the figure.
Task 5
Prepare one horizontal box plot and one histogram to display the data distribution of the continuous numerical variable kilometers. Use the code par(mfrow=c(2,1), mai=(1,1,1,1)) at the beginning of the R chunk. mfrow will present the two figures one on top of the other as a group, in this case, c(2,1) indicates 2rows and 1 column. Mai will change the margins of your figures, bottom, left. Top. Right. Play with the mai numbers to observe changes. Remove the title of the graphs by using main = NA. Always make observations after each task.
Task 6.
Similar to task 5, present the box plot and histogram for variable price.
Task 7
Prepare and present one box plot to display the price distribution per owner. This is not one box plot for all the variable data, the graph must contain several boxes inside for eachcategory of the owner. Provide the figure with a good presentation format. Remember to always make observations after each task.
Task 8
Similar to task 7, prepare and present one box plot to display kilometers distribution per location.
Task 9
Apply and present the outcomes of code boxplot.stats() for variable kilometers. Explain the information obtained with the application of this code.
Task 10
With the information obtained in task 9, prepare and present one dotchart() to display the quartilesvalues for variable kilometers, the code strategy for this is: boxplot.stats()$stats.
Create one CONCLUSIONS title.
In the conclusions section, make one global summary of the results and whathas been learnt from thewhole work performed. Provide overall observation of the whole project, the meaning of the results obtained regarding the direction of the project, explain any new skills gained from this. Also, imagine preparing this report for a company or research institution, therefore, the user must make meaningful contributions. Think about what recommendations can be provided.
Create one BIBLIOGRAPHY title.
In the reference section, indicate all information sources that has been used to support the work on thisproject.
References must be used in the main body of the report: Technically speaking, if noreferences are mentioned in the main body of the report, did not use any references, even if one list is added at the end. Present references in the main body of the reports in the place where it can be used as an information source, use either only the first author's last name and year, e.g., (Bluman, 2017) and then list them in the bibliography section in alphabetical order, or use one number in order of appearance, then list them in the bibliography section in that numerical order.
Appendix Since the report is presented as an HTML file, but also will submit the original Rmd file, create one appendix section and write one sentence: An R Markdown file has been attached to this report. The name of the file is.... R codes: If comfortable using R, feel free to add additional codes using more complex libraries. Do not present install.packages() codes in the Rmd file. Install all packages that require using the console or the Packages tab on the R Studio program. Grade: 100 Points
Dataset Link: https://docs.google.com/spreadsheets/d/1s3u2ScZuBgzgdt8g1p5rZBh4QbRHpJRU/edit?usp=sharing&ouid=116339528075866201272&rtpof=true&sd=true
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started