Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Instructions for Students 1. Download the attached ipynb file IDS_Assignment_Template.ipynb(pasted in below) and read through the file to understand the expectations of the assignment. You

Instructions for Students 1. Download the attached ipynb file "IDS_Assignment_Template.ipynb"(pasted in below) and read through the file to understand the expectations of the assignment. You are required to fill your code in the respective "areas" in the same ipynb file. Also, execute the code in the same file. Along with the code, in some places, you are asked to give justifications for your choice of a particular method or to write your observations about a specific result. Everything should be written inline in the same file in the required "areas". 2. After completion of the assignment, you have to submit 2 files: the original ipynb file and its pdf version. Both files should have the required outputs (i.e it should be executed, and outputs should be inline in the ipynb/pdf file). If outputs in required "areas" are not present, marks will be deducted. 3. It is the responsibility of the assignment group to ensure the correctness of the pdf/ipynb file uploaded, to ensure all results are visible, the formatting of the file is correct, and to make sure you are not uploading incomplete or older version files. Pay extra attention to verify the formatting of the contents of the converted pdf file. 4. The students are free to use any kind of data. 5. Late submissions will carry negative marks. (-2 marks will be deducted) 6. Plagiarism checks will be performed, and if found, both groups will be awarded zero marks. --------------------------------------------------------------- IDS_Assignment_Template.ipynb ------ { "cells": [ { "cell_type": "markdown", "id": "9bda93ba", "metadata": { "id": "9bda93ba" }, "source": [ "#

**Introduction to Data Science (S1-22_DSECLZG532)-ASSIGNMENT**

", " ", "## Group No ", " ", "## Group Member Names: ", "1. ", "2. ", "3. ", "4." ] }, { "cell_type": "markdown", "id": "f5d80c60", "metadata": { "id": "f5d80c60" }, "source": [ "# 1. Business Understanding ", " ", "Students are expected to identify a data analytics task of your choice. You have to detail the Business Understanding part of your problem under this heading which basically addresses the following questions. ", " ", " 1. What is the business problem that you are trying to solve? ", " 2. What data do you need to answer the above problem? ", " 3. What are the different sources of data? ", " 4. What kind of analytics task are you performing? ", " ", "Score: 1 Mark in total (0.25 mark each)" ] }, { "cell_type": "raw", "id": "3298e121", "metadata": { "id": "3298e121" }, "source": [ "--------------Type the answers below this line-------------- " ] }, { "cell_type": "markdown", "id": "3cc8e0cb", "metadata": { "id": "3cc8e0cb" }, "source": [ "# 2. Data Acquisition ", " ", "For the problem identified , find an appropriate data set (Your data set must ", "be unique) from any public data source. ", " ", "--- ", " ", " ", " ", "## 2.1 Download the data directly ", " " ] }, { "cell_type": "code", "execution_count": null, "id": "4b51d895", "metadata": { "id": "4b51d895" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "49530d0c", "metadata": { "id": "49530d0c" }, "source": [ "## 2.2 Code for converting the above downloaded data into a dataframe" ] }, { "cell_type": "code", "execution_count": null, "id": "c1f4c171", "metadata": { "id": "c1f4c171" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "7b1fea4d", "metadata": { "id": "7b1fea4d" }, "source": [ "## 2.3 Confirm the data has been downloaded correctly by displaying the first 5 and last 5 records." ] }, { "cell_type": "code", "execution_count": null, "id": "624e6c58", "metadata": { "id": "624e6c58" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "bb84fc56", "metadata": { "id": "bb84fc56" }, "source": [ "## 2.4 Display the column headings, statistical information, description and statistical summary of the data." ] }, { "cell_type": "code", "execution_count": null, "id": "086ad28e", "metadata": { "id": "086ad28e" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "812edb18", "metadata": { "id": "812edb18" }, "source": [ "## 2.5 Write your observations from the above. ", "1. Size of the dataset ", "2. What type of data attributes are there? ", "3. Is there any null data that has to be cleaned? ", " ", "Score: 2 Marks in total (0.25 marks for 2.1, 0.25 marks for 2.2, 0.5 marks for 2.3, 0.25 marks for 2.4, 0.75 marks for 2.5)" ] }, { "cell_type": "raw", "id": "60d80d2f", "metadata": { "id": "60d80d2f" }, "source": [ "--------------Type the answers below this line--------------" ] }, { "cell_type": "markdown", "id": "102e0e36", "metadata": { "id": "102e0e36" }, "source": [ "# 3. Data Preparation" ] }, { "cell_type": "raw", "id": "637cb469", "metadata": { "id": "637cb469" }, "source": [ "If input data is numerical or categorical, do 3.1, 3.2 and 3.4 ", "If input data is text, do 3.3 and 3.4" ] }, { "cell_type": "markdown", "id": "2bcb953b", "metadata": { "id": "2bcb953b" }, "source": [ "## 3.1 Check for ", " ", "* duplicate data ", "* missing data ", "* data inconsistencies " ] }, { "cell_type": "code", "execution_count": null, "id": "5a5b960e", "metadata": { "id": "5a5b960e" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "06fdebf8", "metadata": { "id": "06fdebf8" }, "source": [ "## 3.2 Apply techiniques ", "* to remove duplicate data ", "* to impute or remove missing data ", "* to remove data inconsistencies " ] }, { "cell_type": "code", "execution_count": null, "id": "dd3118eb", "metadata": { "id": "dd3118eb" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "2139fedf", "metadata": { "id": "2139fedf" }, "source": [ "## 3.3 Encode categorical data" ] }, { "cell_type": "code", "execution_count": null, "id": "da6886fc", "metadata": { "id": "da6886fc" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "ae5a2917", "metadata": { "id": "ae5a2917" }, "source": [ "## 3.4 Text data ", " ", "1. Remove special characters ", "2. Change the case (up-casing and down-casing). ", "3. Tokenization process of discretizing words within a document. ", "4. Filter Stop Words." ] }, { "cell_type": "code", "execution_count": null, "id": "c86d5b0b", "metadata": { "id": "c86d5b0b" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "code", "execution_count": null, "id": "a3b2cdee", "metadata": { "id": "a3b2cdee" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "e3cec4fc", "metadata": { "id": "e3cec4fc" }, "source": [ "## 3.4 Report ", " ", "Mention and justify the method adopted ", "* to remove duplicate data, if present ", "* to impute or remove missing data, if present ", "* to remove data inconsistencies, if present ", " ", "OR for textdata ", "* How many tokens after step 3? ", "* how may tokens after stop words filtering? ", " ", "If the any of the above are not present, then also add in the report below. ", " ", "Score: 2 Marks (based on the dataset you have, the data prepreation you had to do and report typed, marks will be distributed between 3.1, 3.2, 3.3 and 3.4)" ] }, { "cell_type": "code", "execution_count": null, "id": "3ab84ce6", "metadata": { "id": "3ab84ce6" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "code", "execution_count": null, "id": "f0fea7e8", "metadata": { "id": "f0fea7e8" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "793cd04b", "metadata": { "id": "793cd04b" }, "source": [ "## 3.5 Identify the target variables. ", " ", "* Separate the data from the target such that the dataset is in the form of (X,y) or (Features, Label) ", " ", "* Discretize / Encode the target variable or perform one-hot encoding on the target or any other as and if required. ", " ", "* Report the observations ", " ", "Score: 1 Mark" ] }, { "cell_type": "code", "execution_count": null, "id": "c9089b57", "metadata": { "id": "c9089b57" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "3ae0b5d2", "metadata": { "id": "3ae0b5d2" }, "source": [ "# 4. Data Exploration using various plots ", " " ] }, { "cell_type": "markdown", "id": "186bf4d7", "metadata": { "id": "186bf4d7" }, "source": [ "## 4.1 Scatter plot of each quantitative attribute with the target. ", " ", "Score: 1 Mark" ] }, { "cell_type": "code", "execution_count": null, "id": "868d7b27", "metadata": { "id": "868d7b27" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] }, { "cell_type": "markdown", "id": "575f9e37", "metadata": { "id": "575f9e37" }, "source": [ "## 4.2 EDA using visuals ", "* Use (minimum) 2 plots (pair plot, heat map, correlation plot, regression plot...) to identify the optimal set of attributes that can be used for classification. ", "* Name them, explain why you think they can be helpful in the task and perform the plot as well. Unless proper justification for the choice of plots given, no credit will be awarded. ", " ", "Score: 2 Marks" ] }, { "cell_type": "code", "execution_count": null, "id": "4d614311", "metadata": { "id": "4d614311" }, "outputs": [], "source": [ "##---------Type the code below this line------------------##" ] },

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

International Baccalaureate Computer Science HL And SL Option A Databases Part I Basic Concepts

Authors: H Sarah Shakibi PhD

1st Edition

1542457084, 978-1542457088

More Books

Students also viewed these Databases questions

Question

=+52-1 Describe the social tasks and challenges of adolescence.

Answered: 1 week ago