Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Code the following question in Python: 1(d) Let's focus on the following features: danceability tempo energy valence For each of these features in order), produce

Code the following question in Python:

image text in transcribed

image text in transcribed

1(d) Let's focus on the following features: danceability tempo energy valence For each of these features in order), produce a histogram that shows the distribution of the feature values in the training set, separated for positive and negative examples. By "positive examples" we mean target = 1 (user liked the song, positive sentiment) and by "negative examples" we mean target = 0 (used disliked the song, negative sentiment). As an example, here is what the histogram would look like for a different feature, loudness: (You don't have to match all the details exactly, such as colour, but your histograms should look something like this, with a reasonable number of bins to see the shape of the distribution.) As shown above, there are two different histograms, one for target = 0 and one for target = 1, and they are overlaid on top of each other. The histogram above shows that extremely quiet songs tend to be disliked (more blue bars than orange on the left) and very loud songs also tend to be disliked (more blue than orange on the far right). Here is some code that separates out the dataset into positive and negative examples, to help you get started: [ ]: negative_examples = df_train.query("target == 0") positive_examples = df_train.query("target == 1") Type Markdown and LaTeX: a2 1(e) Let's say you had to make a decision stump (decision tree with depth 1), _by hand_, to predict the target class. Just from looking at the plots above, describe a reasonable split (feature name and threshold) and what class you would predict in the two cases. For example, in the loudness histogram provided earlier on, it seems that very large values of loudness are generally disliked (more blue on the right side of the histogram), so you might answer something like this: "A reasonable split would be to predict O if loudness > -5 (and predict 1 otherwise)" Type Markdown and LaTeX: a2 1(f) Let's say that, for a particular feature, the histograms of that feature are identical for the two target classes. Does that mean the feature is not useful for predicting the target class? 1(g) Note that the dataset includes two free text features labeled song_title and artist : In [ ]: Ndf_train[["song_title", "artist"]].head() Do you think these features could be useful in predicting whether the user liked the song or not? Would there be any difficulty in using them in your model? Type Markdown and LaTeX: a2 1(d) Let's focus on the following features: danceability tempo energy valence For each of these features in order), produce a histogram that shows the distribution of the feature values in the training set, separated for positive and negative examples. By "positive examples" we mean target = 1 (user liked the song, positive sentiment) and by "negative examples" we mean target = 0 (used disliked the song, negative sentiment). As an example, here is what the histogram would look like for a different feature, loudness: (You don't have to match all the details exactly, such as colour, but your histograms should look something like this, with a reasonable number of bins to see the shape of the distribution.) As shown above, there are two different histograms, one for target = 0 and one for target = 1, and they are overlaid on top of each other. The histogram above shows that extremely quiet songs tend to be disliked (more blue bars than orange on the left) and very loud songs also tend to be disliked (more blue than orange on the far right). Here is some code that separates out the dataset into positive and negative examples, to help you get started: [ ]: negative_examples = df_train.query("target == 0") positive_examples = df_train.query("target == 1") Type Markdown and LaTeX: a2 1(e) Let's say you had to make a decision stump (decision tree with depth 1), _by hand_, to predict the target class. Just from looking at the plots above, describe a reasonable split (feature name and threshold) and what class you would predict in the two cases. For example, in the loudness histogram provided earlier on, it seems that very large values of loudness are generally disliked (more blue on the right side of the histogram), so you might answer something like this: "A reasonable split would be to predict O if loudness > -5 (and predict 1 otherwise)" Type Markdown and LaTeX: a2 1(f) Let's say that, for a particular feature, the histograms of that feature are identical for the two target classes. Does that mean the feature is not useful for predicting the target class? 1(g) Note that the dataset includes two free text features labeled song_title and artist : In [ ]: Ndf_train[["song_title", "artist"]].head() Do you think these features could be useful in predicting whether the user liked the song or not? Would there be any difficulty in using them in your model? Type Markdown and LaTeX: a2

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Knowledge Discovery In Databases

Authors: Animesh Adhikari, Jhimli Adhikari

1st Edition

3319132121, 9783319132129

More Books

Students also viewed these Databases questions

Question

A message sent using the Community Emergency Notification System

Answered: 1 week ago

Question

5. Discuss the key roles for training professionals.

Answered: 1 week ago