Only need answer 2 to 7 Find the csv file genre_data.csv on Canvas. This file contains data extracted from Spotify on a sample of 1797

Only need answer 2 to 7

image text in transcribed

Find the csv file genre_data.csv on Canvas. This file contains data extracted from Spotify on a sample of 1797 songs from pop and metal artists. It contains the following variables: - song: song title - artist: the performer - danceability: measure between 0-100 of how danceable the song is. The higher the easier it is to dance to it. - energy: measure between 0100 of how energetic the song is. The higher the more energetic. - loudness: loudness in decibels. The higher the louder the song. - valence: a measure between 0-100 of how happy/cheerful the song is. The higher the more cheerful. - tempo: a measure of the overall pace of the song (in bpm: beats per minute) - popul: streaming popularity of the song (between 0-100). The higher, the more a song has been streamed - genre: genre of the song (pop or metal) - duration: play time in minutes We are interested in classifying songs to a genre (pop or metal) based on one of its features. The actual genre for the list of songs is contained in the genre variable. Use the dataset for the following questions: Note: most of the questions below are quickly answered once you have completed Question 1, so program this first to save time and effort! 1. Write an R function named ML_rule that does the following: It takes as input 2 vectors: 1 vector containing features of the song (loudness or energy, for instance) and 1 vector with the true genre of the songs. The function then performs the ML rule (with normal populations and equal variances) and returns the following objects: (i) a vector called R1 that contains the lower and upper bounds of R1 (ii) a vector called R2 that contains the lower and upper bound of R2 (iii) a matrix called confusion_matrix that contains the confusion matrix (iv) a vector called mis_prob that contains the probabilities of misclassification Note: the function should work for a general variable you give as input (not only for e.g. loudness, the variables can have a general length,...). 2. One could argue that metal songs are typically hard(er) to dance to than pop songs. Plot in the same graph the distributions of the danceability variable for metal and pop songs (make clear which distribution belongs to what genre). Does it look like danceability might be a good tool to classify the songs into metal or pop? Explain why (not)! 3 3. Consider the variable danceability to obtain the discriminant regions R1 and R2 for the genre of the songs. Use the ML rule with normal distributions and equal variances. Report and comment on the results. 4. Report the number of misclassified observations in a confusion matrix. Obtain the estimated probabilities of misclassification and APER. Comment the results. 5. According to your classification rule, if all you are told about a song is that danceability =40, which genre would you say it is? What is the estimated probability that you are wrong? 6. It is sometimes said that "Metal is just loud noise". Based on this dataset, is loudness indeed a good discriminating variable between pop music and metal? Use the ML rule to compute the confusion matrix and APER based on this variable, and comment on this claim. 7. Consider instead energy as the classification variable. Compare the classification performance to earlier results and comment on the differences