Question

1 Approved Answer

Posted on Sep 08, 2024

I have finished the ML_function as below, use ML_function to answer question 8 NEED R CODE FOR 8.!! ML_rule metal_mean){ predicted_genre = (pop_mean + metal_mean)/2,

I have finished the ML_function as below, use ML_function to answer question 8

NEED R CODE FOR 8.!!

ML_rule metal_mean){ predicted_genre = (pop_mean + metal_mean)/2, "pop", "metal") confusion_matrix = (pop_mean + metal_mean)/2, "metal", "pop") confusion_matrix

ML_rule(dta$danceability, dta$genre)

image text in transcribed

Find the csv file genre_data.csv on Canvas. This file contains data extracted from Spotify on a sample of 1797 songs from pop and metal artists. It contains the following variables: - song: song title - artist: the performer - danceability: measure between 0-100 of how danceable the song is. The higher the easier it is to dance to it. - energy: measure between 0100 of how energetic the song is. The higher the more energetic. - loudness: loudness in decibels. The higher the louder the song. - valence: a measure between 0-100 of how happy/cheerful the song is. The higher the more cheerful. - tempo: a measure of the overall pace of the song (in bpm: beats per minute) - popul: streaming popularity of the song (between 0-100). The higher, the more a song has been streamed - genre: genre of the song (pop or metal) - duration: play time in minutes We are interested in classifying songs to a genre (pop or metal) based on one of its features. The actual genre for the list of songs is contained in the genre variable. Use the dataset for the following questions: Note: most of the questions below are quickly answered once you have completed Question 1 , so program this first to save time and effort! 1. Write an R function named ML_rule that does the following: It takes as input 2 vectors: 1 vector containing features of the song (loudness or energy, for instance) and 1 vector with the true genre of the songs. The function then performs the ML rule (with normal populations and equal variances) and returns the following objects: (i) a vector called R1 that contains the lower and upper bounds of R1 (ii) a vector called R2 that contains the lower and upper bound of R2 (iii) a matrix called confusion_matrix that contains the confusion matrix (iv) a vector called mis_prob that contains the probabilities of misclassification Note: the function should work for a general variable you give as input (not only for e.g. loudness, the variables can have a general length,...). Test next the reliability of the function ML_rule (obtained in Question 1) through a simulation study. Implement the simulation study by repeating the following two steps 1000 times for i=1,,1000 : (a) Generate 150 observations from a population 1 with normal distribution N(1,2) where 1=1 and 2=1, and 150 observations from a population 2 with normal distribution N(2,2) where 2=2 and variance 2=1. For each iteration, set the seed of the simulations equal to 1+i. (b) Use the function ML_rule 1 to implement the ML rule and obtain the estimated missclassification probabilities p12 and p21 from the simulated data. Compare the distribution of p12 (you can calculate the kernel density from the 1000 values of p12 from the simulations) with the true probability p12, and compare the distribution of p21 with the true probability p21. Note that since the data is generated we can compute the true missclassification probabilities as p12=p21=(212), where () is the cumulative distribution function of the standard normal distribution (i.e. N(0,1) ) Comment on the results