Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Assessment and Submission Details Marks: 40% of the Total Assessment for the Course Due Date: 11:59pm Sunday, Week 12 Submit your assignment to Blackboard Task
Assessment and Submission Details Marks: 40% of the Total Assessment for the Course Due Date: 11:59pm Sunday, Week 12 Submit your assignment to Blackboard Task 3. Please follow the submission instructions in Blackboard. The assignment will be marked out of a total of 100 marks and forms 40% of the total assessment for the course. ALL assignments will be checked for plagiarism by SafeAssign system provided by Blackboard automatically. Refer to your Course Outline or the Course Web Site for a copy of the Student Misconduct, Plagiarism and Collusion guidelines. Late submission will be penalised according to the policy in the course outline. Please note Saturday and Sunday are included in the count of days late. Requests for an extension to an assignment MUST be made to the course coordinator prior to the date of submission and requests made on the day of submission or after the submission date will only be considered in exceptional circumstances. Assignment submission extensions will only be made using the official University guidelines. Assignment Task This assignment consists of two deliverables, being:
The marking rubrics are viewable on the blackboard. Report Format Your report should be about 1000 words. The report MUST be formatted using the following guidelines:
- One code implementation (40%). This requires a zip file which should include: o The code file in Jupyter Notebook format.
- A report (60%). The report must be uploaded as a separate file.
- For code reproduction, your code must be self-contained. That is, it should not require other libraries besides PySpark environment we have used in the semester. The data files are packaged properly with your code file.
- The data sets used in the lecture slides should not be used as the data set of the assignment. This will result in 0 mark for the coding component.
- Exploratory data analysis
- Recommendation engine
- Classification
- telling its number of rows and columns,
- doing the data cleaning (missing values or duplicated records) if necessary
- selecting 3 columns, and drawing 1 plot (e.g. bar chart, histogram, boxplot, etc.) for each to summarise it
- Model training and predictions
- Model evaluation using MSE
- Logistic Regression model training
- Model evaluation
- Provide a high-level survey on the advances of data science in the past 2 years.
- Compare the features of Spark version 2.4 that we used this semester and the new version 3.0.
- Explain your design and implementation of the machine learning parts in your code, including the following topics:
Table of Contents 1.0 Advancement of Data Science (500 words) 2.0 Comparison of Spark 2.4 and 3.0 (250 words) 3.0 Machine Learning Implementation (250 words) 3.1 Data set 3.2 Collaborative filtering Features of the model, key parameters and configuration Evaluation 3.3 Logistic regression Features of the model, key parameters and configuration Evaluation References |
- Title Page Must not contain headers, footers, or page numbering. Include your name as the reports author.
- Header Report title
- Footer your name and the page number
- Paragraph text 12 point Calibri single line spacing
- Headings Arial in an appropriate type size
- Margins 2.5cm on all margins
- Page numbering
- Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting at page 1 from the introduction.
- The report is to be created as a single Microsoft Word document (version 2007 or later). No other format is acceptable and doing so will result in the deduction of marks.
- Ensure that you clearly understand the requirements for the assignment what must be done and what are the deliverables.
- If you do not understand any of the assignment requirements Please ASK your tutor.
- Each time you work on any aspect of the assignment reread the assignment requirements to ensure that what is required is clearly understood.
- We have practiced nearly all coding tasks in DataCamp before. If you have any difficulty, redoing the practices in DataCamp is recommended.
- All work must be submitted through SafeAssign.
- SafeAssign will pick up any similarities between work online as well as work from other students (in this semester and previous)
- Please make sure you reference your work properly. If you are using any material from the internet or any books from the library, you need to cite the work correctly. Failure to do so will result in possible cases of Academic Misconduct.
- Please do not share your work with other students. Do not give anyone your files to have a look. SafeAssign will pick up collusion, but keep in mind the percentages for Collusion may not report accurately until all student assignments have been submitted. Both the person copying and the person providing will potentially be held accountable.
- You can submit a draft assignment through SafeAssign before making the actual submission.
Attachments:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started