Nieto and colleagues at New York University wrote a paper in 2014 looking at how humans evaluate systems that automatically find section boundaries in songs (e g between verse and chorus) They evaluated precision, recall and F measure as a way of approximating to human judgements They found that precision was far more important than recall in explaining people's responses What might this imply for the standard F measure Select ONE observation that best applies a Since the F measure is based on the harmonic mean of precision and recall, it should give a sweet spot for evaluation b The standard F measure favours high scores on one measure even if the other score is low This will work poorly here c The standard F measure penalises low scores on one measure even if the other score is high This will work well here d The standard F measure penalises low scores on one measure even if the other score is high This will work poorly here e The standard F measure gives equal weight to high precision and high recall, so will perform poorly if one is more important than the other f The standard F measure favours high scores on one measure even if the other score is low This will work well here

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Nieto and colleagues at New York University wrote a paper in 2014 looking at how humans evaluate systems that automatically find section boundaries in songs

Nieto and colleagues at New York University wrote a paper in 2014 looking at how humans evaluate systems that automatically find section boundaries in songs (e.g. between verse and chorus).

They evaluated precision, recall and F-measure as a way of approximating to human judgements. They found that precision was far more important than recall in explaining people's responses.

What might this imply for the standard F-measure? Select ONE observation that best applies.

Since the F-measure is based on the harmonic mean of precision and recall, it should give a sweet spot for evaluation.

The standard F-measure favours high scores on one measure even if the other score is low. This will work poorly here.

The standard F-measure penalises low scores on one measure even if the other score is high. This will work well here.

The standard F-measure penalises low scores on one measure even if the other score is high. This will work poorly here.

The standard F-measure gives equal weight to high precision and high recall, so will perform poorly if one is more important than the other.

The standard F-measure favours high scores on one measure even if the other score is low. This will work well here.