Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Oct 06, 2020

Examine real-world scenarios and apply relevant data mining concepts Scenario 1: Data from an experiment on learning mindsets Suppose you are given a dataset from

Examine real-world scenarios and apply relevant data mining concepts

Scenario 1: Data from an experiment on learning mindsets

Suppose you are given a dataset from a research experiment. In this experiment, researchers surveyed 10,000 high school students to ask them about demographic characteristics and personality traits.

The researchers then randomly divided students into two groups. The first group (the control group) viewed a video lecture about the structure of the human brain. The second group (the experimental group) viewed a video lecture that told them they are capable of learning difficult concepts if they try hard and seek help when they get stuck.

The researchers hope that students in the experimental group will improve their grades because of the lecture they watched. The data they collected have 5 demographic characteristics (variables such as age) and 10 personality traits (variables such as level of anxiety). The researchers also recorded students grade point average (GPA) before the experiment and one year later to measure improvement.

The researchers ask you to predict student improvement from these data (Explain all your answers well)

a) What is one way that you could measure improvement from the two GPA scores recorded for each student?

b) What features could you use to predict improvement?

c) What kind of data mining problem is this (classification, clustering, regression, etc.) and

why?

d) What steps would you take to approach this problem, from start to finish?

e) Could you transform this into another type of data mining problem? If yes, how? If no,

why not?

Scenario 2: Movie recommendations

A movie streaming company has decided to provide recommendations to their customers about what movies they might like to watch, hoping that the customers will like the recommendations and watch more movies on their service. The company has already collected ratings, on a scale from 1 to 10, from their customers about some of the movies they have watched. You are given a subset of their data and asked to develop a model to predict whether an existing customer (somebody in the dataset) will like a particular movie that they have not seen.

The dataset includes data from 1,000 customers and 500 movies. Each customer has watched an average of 8 movies, although some customers have only watched 1 movie and other customers have watched 100 movies. Each instance has a customer ID, a movie ID, and a feature that may have a missing value: their rating of that movie. You can assume that the customer watched the movie if there is an instance, and that if there is no instance for some particular customer ID with some particular movie ID then they did not watch that movie.

Your model should take a customer ID and a movie ID as input and make a prediction for the customer’s rating of that movie.

a) How many instances are in the dataset?

b) If every customer watched every movie, how many instances would there be?

c) Is there any advantage to creating a larger dataset with every possible (customer ID,

movie ID) pair as an instance? What might a model learn from the fact that a customer

did not watch a particular movie?

d) What kind of data mining problem is this, and why?

e) What preprocessing is needed for these data?

f) What model(s) would likely be appropriate for this problem, and why?

g) How would you cross-validate your model?

h) Do you need to split the training data into validation and training sets for the model you

proposed? Why or why not?

Step by Step Solution

★★★★★

3.40 Rating (156 Votes )

There are 3 Steps involved in it

Step: 1

Solution 1a You can calculate the mean deviation for the marks for each and every student and if it is significantly positive then yes the student has ... blur-text-image