Question
Nieto and colleagues at New York University wrote a paper in 2014 looking at how humans evaluate systems that automatically find section boundaries in songs
Nieto and colleagues at New York University wrote a paper in 2014 looking at how humans evaluate systems that automatically find section boundaries in songs (e.g. between verse and chorus).
They evaluated precision, recall and F-measure as a way of approximating to human judgements. They found that precision was far more important than recall in explaining people's responses.
What might this imply for the standard F-measure? Select ONE observation that best applies.
a.
Since the F-measure is based on the harmonic mean of precision and recall, it should give a sweet spot for evaluation.
b.
The standard F-measure favours high scores on one measure even if the other score is low. This will work poorly here.
c.
The standard F-measure penalises low scores on one measure even if the other score is high. This will work well here.
d.
The standard F-measure penalises low scores on one measure even if the other score is high. This will work poorly here.
e.
The standard F-measure gives equal weight to high precision and high recall, so will perform poorly if one is more important than the other.
f.
The standard F-measure favours high scores on one measure even if the other score is low. This will work well here.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started