Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this assignment, you are given a dataset to perform an exploratory analysis to better understand the shape, structure and quality of the data, investigate

In this assignment, you are given a dataset to perform an exploratory analysis to better understand the
shape, structure and quality of the data, investigate and resolve data issues, and develop preliminary
insights & analysis. Your final submission will take the form of a report consisting of obtained results and
captioned visualizations that convey key insights gained during your analysis.
Note: Do not include the questions as well as dataset in your submission (to avoid similarity with other
submissions)
Business Problem
The data is related to a pharmasutical study. During the study, the subjects of study are surveyed about
their sleep quality and other demographics, in order to assess what sort of sleep disorder they might be
prone to.
Dataset:
F eatu re De finition
a ge Patient 's a ge
Gender Patients Gender
Occupation Patients Occupation
Sleep duration Sleep duration in hour
Quality of sleep A ranked quality of sleep for each individual
Physical activity level A ranked physical activity level
Stress level Stress level on a scale of 1 to 10
BMI Category Body mass index
Blodd pressure Blood pressure high/low
Heart Rate Hear rate in BPM
Daily steps Daily steps count
y If the patient might be prone to sleep disorder and what is that
Required Analysis
You are supposed to perform an extensive exploratory analysis of the dataset, including the following
exploration phases:
The data quality report.
Identify issues in data if any, data quality plan and mitigate the identified issues.
Several meaningful data insights (in the form of tables and graphs) which can help to understand the
dataset. There is no limit on the number of insights you can provide, but the minimum of 4 is at least
expected. (Note: Creativity is important in this section.)
Deliverables
1. Project Proposal
The main purpose of the proposal is for us to check on whether the scope of the project is in the range of what
we're expecting, whether your plans are crisp enough, and in cases where you plan to use a different dataset
than one from the list above, whether it looks suitable and promising. On average we expect proposals to be
about half-a-page long, though we know the lengths will vary. Please create a document containing the
following two parts.
1. Dataset
o Describe the data. As part of this, please include the total size of the dataset (e.g. number of rows)
and a small sample of the data.
o Include a link to the source of the data, and discuss any difficulties you anticipate getting the data
ready for analysis.
2. Goals
o Formulate a specific set of questions you want to answer, points you want to make, or issues you
wish to explore through the data. Be as concrete as possible.
What To Turn In
Your proposal should be in a pdf document named project1_proposal.pdf. Include clearly at the top of the
document the name(s) and SUID(s) for the student submitting the proposal, then include the two parts of the
proposal specified above. Upload the pdf document along with the complete project.
2. Complete Project
Use data mining techniques and tools to manipulate, analyze, and possibly visualize the data in order to achieve
your objectives. Here are a few tips and techniques:
1. How to implement data mining in Excel
2. How to treat missing values
3. How to import data from a website to Excel
It is likely you will end up developing a data processing pipeline, where in each step you transform or otherwise
manipulate some or all of your data to get it into a form that's suitable for the next step. In the final step your
data should be in the best form to answer your questions or otherwise achieve your objectives.
In many cases the early steps in a pipeline are more about preparing the data -- correcting mistakes, filling in
missing values, creating consistent representations, mapping corresponding values -- while the later steps are
more focused on summarization and analysis. If you use one of the recommended datasets, your preparation
steps may be minimal.
In case you need to develop some features using Flagging, aggregation, ratio, and mapping techniques, mention
what fetaures you derived and what the feature type is.
What To Turn In
You will be turning in a single PDF writeup to Gradescope.
The writeup should include parts 1 and 2 from the project proposal, discuss in reasonable detail how you went
about your analysis, and finally (and most importantly) discuss the conclusions drawn from your data-driven
study. On average we expect the writeups to be about 3-5 pages long, though we know the lengths will vary.
Data visualizations can be pasted into the writeup. At the end of your writeup, include a section titled
Description of Files Used that lists all the artifacts that you used to generate the analysis and visualizations,
with a clear description of what each one contains. For example:
data_visualization.tbx-Thistableau file performsthemaindataanalyses,usingqueries
OR
data_cleaning.xlsx -Thisspreadsheetperformsadditionaldatamanipulationsandcontainsthe final
visualizations
Here is a guideline for the sections in the main writeup:
1. Include clearly at the top of the document the name(s) and Student-ID(s) for the
student or student-pair submitting the project.
2. Dataset: as in project proposal
3. Goals: as in project proposal
4. Data processing: Description of steps that were taken from raw data to final results
5. Visualizations: you need to share your visualizations either in Power BI or Tableau.
Please share your Tableau dashboards on Tableau Public (you need to first create a
profile here, and then follow this link to learn how to publish thedashboard)
For Power BI: Either share them on a public workplace or if you can't, share them
under a one drive folder, and share the link.
6. Conclusions: resolution of questions, issues, or points from part 2, based on your study
7. Description of Files Used
Upload the pdf document under the Assignment 1 link.
Data Analysis and Visualization Tools:
Feel free to use any of the following tools for data analysis and visualization:
- Tableau
- Power BI
1.0 Grading Rubric
Key Points Grade Allocation
(%)
Format (font type, size, table, formulas), overall
content, including references if required (APA
Style)
20
Results, analysis and assumptions 60
Novelty and creativity in solution 20
N.B. Failure to comply with the above would result in low grades.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Data Management Databases And Organizations

Authors: Richard T. Watson

2nd Edition

0471180742, 978-0471180746

More Books

Students also viewed these Databases questions

Question

Which form of proof do you find least persuasive? Why?

Answered: 1 week ago