Answered step by step
Verified Expert Solution
Link Copied!
Question
1 Approved Answer

For your final project, imagine you re a data analyst at an organization that helps other institutions with their data. Your supervisor has provided you

For your final project, imagine youre a data analyst at an organization that helps other institutions with their data. Your supervisor has provided you with data and is asking you to take a preliminary look and give a brief report of your findings. Since you are new to the organization, they would also like to see your thought process and have asked for both the code used to run your analysis and screen captures of your work in addition to your informed conclusions.
Specifically, you must address the critical elements listed below. Most of the critical elements align with a particular course outcome (shown in brackets).
Data Assessment: Your first task is to review the Excel file and perform a preliminary data assessment. You will be sharing this work in the Appendices section only.
Excel Calculations: For two columns in each data set, calculate the minimum, maximum, and average.
Source Code Management: Include text box comments in the document to explain your work. Then, take a screenshot/screen capture of your work and include it in the Appendices section.
Data Validation and Discovery: In this section, youll be validating the information you discovered in the previous section. You will be sharing this work in the Appendices section only.
Prepare Data: Utilize the command line interface in Linux to copy files from the ~/workspace/SNHU/DAT-500/finalproject folder to the
~/workspace/Analysis folder.
From the supplied files in the ~/workspace/SNHU/DAT-500/finalproject folder, copy only the dat500_final_project_GBR_data.csv and dat500_final_project_USA_data.csv data set files to the Analysis folder in preparation for analysis.
Modify Files: Utilize Linux commands to rename files.
From the supplied files, rename them by removing the dat500_final_project_ portion of the file name.
Import Data: From the Integrated Development Environment in RStudio, import the data files into your workspace using Rscript.
Summary: Perform the summary function to get the descriptive statistics from both files for comparison. Describe your findings and include your rationale. Show your work and be sure to include inline comments.
Variables: Create variables for the minimum, maximum, and averages for both the columns in the previous section. Show your work and be sure to include inline comments.
Source Code Management: Include inline comments (denoted by # symbol) to explain why you are using certain code. Then, take a screenshot/screen capture of your work and include it in the Appendices section.
Data Structures: Next, you will create additional data structures to more easily compare the data sets and come to an informed conclusion. You will be sharing this work in the Appendices section only.
Create Vectors: Create vectors for each data set including the minimum, maximum, and average using the six variables you created in part IIE.
Combine Data Into Single Matrix: Create a matrix that combines the data from both data sets for better comparison.
Data Frame: Create a data frame using the combined matrix and the difference between the two constructs.
Final Script and Output: Rewrite a clean final script including output from running the commands in the IDE.
Source Code Management: Include inline comments for all work completed (denoted by # symbol) to explain why you are using certain code. Name data structures in coordination with data set. Then, take a screenshot/screen capture of your work and include it in the Appendices section.
Data Analysis Report:
Findings: Briefly summarize (3 to 5 sentences) your findings from the calculations you ran in the above sections.
Informed Conclusion: Based on the calculations and commands you have run above, what were you able to discern from the information? Clearly state your informed conclusion.
Looking Ahead: If you had additional information regarding the data sets or a larger amount of data, how might you handle this information by leveraging the data structures you have learned about thus far? Consider how a data frame or matrix may flex to accommodate this additional data.
Reflection:
Command Line Interface: Discuss your experience with the command line interface and how you might use it in your current or future profession. How might the command line interface have value in the industry?
Data Analysis Process: Describe your experience in analyzing the data. What would you have done differently if given the chance? Explain why.
Source Code Management Strategies: Evaluate your methods for source code management as you were analyzing the data. Do your comments effectively communicate the intent of your code? Explain why.
Database Management Systems: Determine whether a database management system would have been appropriate given the data sets you worked with. Explain your rationale.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image
Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning VB.NET Databases

Authors: Thearon Willis

1st Edition

1594864217, 978-1594864216

More Books

Students explore these related Databases questions