Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Need R code!!!! question 4-7 Find the csv file genre_data.csv on Canvas. This file contains data extracted from Spotify on a sample of 1797 songs

Need R code!!!!

question 4-7

image text in transcribed

image text in transcribed

Find the csv file genre_data.csv on Canvas. This file contains data extracted from Spotify on a sample of 1797 songs from pop and metal artists. It contains the following variables: - song: song title - artist: the performer - danceability: measure between 0-100 of how danceable the song is. The higher the easier it is to dance to it. - energy: measure between 0100 of how energetic the song is. The higher the more energetic. - loudness: loudness in decibels. The higher the louder the song. - valence: a measure between 0-100 of how happy/cheerful the song is. The higher the more cheerful. - tempo: a measure of the overall pace of the song (in bpm: beats per minute) - popul: streaming popularity of the song (between 0-100). The higher, the more a song has been streamed - genre: genre of the song (pop or metal) - duration: play time in minutes We are interested in classifying songs to a genre (pop or metal) based on one of its features. The actual genre for the list of songs is contained in the genre variable. Use the dataset for the following questions: Note: most of the questions below are quickly answered once you have completed Question 1, so program this first to save time and effort! 1. Write an R function named ML_rule that does the following: It takes as input 2 vectors: 1 vector containing features of the song (loudness or energy, for instance) and 1 vector with the true genre of the songs. The function then performs the ML rule (with normal populations and equal variances) and returns the following objects: (i) a vector called R1 that contains the lower and upper bounds of R1 (ii) a vector called R2 that contains the lower and upper bound of R2 (iii) a matrix called confusion_matrix that contains the confusion matrix (iv) a vector called mis_prob that contains the probabilities of misclassification Note: the function should work for a general variable you give as input (not only for e.g. loudness, the variables can have a general length,...). 2. One could argue that metal songs are typically hard(er) to dance to than pop songs. Plot in the same graph the distributions of the danceability variable for metal and pop songs (make clear which distribution belongs to what genre). Does it look like danceability might be a good tool to classify the songs into metal or pop? Explain why (not)! 3 3. Consider the variable danceability to obtain the discriminant regions R1 and R2 for the genre of the songs. Use the ML rule with normal distributions and equal variances. Report and comment on the results. 4. Report the number of misclassified observations in a confusion matrix. Obtain the estimated probabilities of misclassification and APER. Comment the results. 5. According to your classification rule, if all you are told about a song is that danceability =40, which genre would you say it is? What is the estimated probability that you are wrong? 6. It is sometimes said that "Metal is just loud noise". Based on this dataset, is loudness indeed a good discriminating variable between pop music and metal? Use the ML rule to compute the confusion matrix and APER based on this variable, and comment on this claim. 7. Consider instead energy as the classification variable. Compare the classification performance to earlier results and comment on the differences

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

PostgreSQL 10 High Performance Expert Techniques For Query Optimization High Availability And Efficient Database Maintenance

Authors: Ibrar Ahmed ,Gregory Smith ,Enrico Pirozzi

3rd Edition

1788474481, 978-1788474481

More Books

Students also viewed these Databases questions

Question

=+Will the guilt enhance the message?

Answered: 1 week ago

Question

Determine miller indices of plane A Z a/2 X a/2 a/2 Y

Answered: 1 week ago

Question

2. Define communication.

Answered: 1 week ago