We will use the dataset in the link below to detect the expertise of a group of problem solvers based on the ideas that they

We will use the dataset in the link below to detect the expertise of a group of problem solvers based on the ideas that they generate. Our problem solvers have different expertise levels in their problem domain (expert, moderate, or novice). We asked them to come up with solution ideas for given problems. We have evaluations of their ideas based on different dimensions (variables 3 to 16). We also collected information about the problem solvers that may indicate their expertise levels (variables 17 to 28). We want to know whether idea evaluations or user evaluations can predict user expertise. To that end, you will create a series of predictive models to detect Expertise.

Data:

ideas=read_csv(url('https://ygenc.github.io/lectures/data/ideas.csv'))

Below is a list of variables in the dataset. Each row represents an idea a problem solver (a user) describes. The expertise of the user who created the idea is identified in the first column, which will be the output variable you will be aiming to predict.

The ideas recorded as regular text are in the second column (# 2).

Each created idea was later evaluated by other individuals. Variables 3 to 16 evaluate each idea presented in the second column based on different aspects, such as creativity or practicality. These columns are shown as Idea Eval. Variable below.

During the idea creation, we also collected some information about the problem solver. Variables 17 to 28 show demographic and other relevant information collected about the problem solvers. These columns are shown as Expert Eval. Variable below.

1. A 30 pts. Create a predictive model to predict Expertise (column 1) based on Idea Evaluation variables (columns 3 to 16.) Please select a model that can be tuned later (meaning it should require parameters). In this question, you should run it with some initial default values (or by some values, you came up with.)

B 20 pts Update your model to conduct parameter tuning to improve the accuracy of your model. Has the model improved? Briefly explain.

2. A 30 pts Create another model (select a different algorithm than you selected above) to predict Expertise (column 1) based on "User Evaluation variables" (columns 17 to 28.) In this case, you don't have to conduct tuning, so you can select any algorithm that wasn't used during question 2.

B 20 pts Did your model that is based on "User Evaluation Variables" (in 2A) perform better (or worse) than your best model based on "Idea Evaluation Variables" (in question 1)? What are the possible reasons for one model to be better (or worse) than the other?

3. Bonus 10 pts Create a model that predicts Expertise (column # 1) only based on the Idea (column # 2.) Hint: Since ideas are in the form text (unstructured), you must convert them to structured features. Evaluate your model based on the accuracy metric. Is the model prediction better than random selection? Briefly explain.

Step by Step Solution

★★★★★

3.45 Rating (168 Votes )

There are 3 Steps involved in it

Step: 1

Solution 1A Model Random Forest Classifier Parameters nestimators100 maxdepth10 randomstate42 Accuracy 075 1B Parameter tuning nestimators 200 maxdept... blur-text-image