Question

1 Approved Answer

Posted on Sep 13, 2024

In this problem, we will study how one can use binary classification to do multiclass classi- fication. Suppose we want to construct a k-class classifier

image text in transcribed

In this problem, we will study how one can use binary classification to do multiclass classi- fication. Suppose we want to construct a k-class classifier f that maps data from some input space to the label space( 1, 2, , k}. There are two popular methods to construct f from just combining results from multiple binary classifiers -one-vs-rest (OVR) technique-for each class i E , one can view the classification problem as computing a function fi : Rd {class , not class i} (i.e., assigning ex- amples from class i the label 1 and all other classes the label 0). One can combine the results for each f, to construct a multiclass classifier f. - one-vs-one (OvO) technique - For each pair of classes i,j E [k] (i,j distinct), one can view the classification problem as computing a function fij : Rd {class i, class j} (taking only training points with labels i or j). One can combine the results for each fij to construct a multiclass classifier f. i) Assuming that your base binary classifiers can only be linear, show a training dataset for each of the following cases. (Your example training dataset for each case must have the following properties - (i) number of classes in the dataset k > 2, (ii) dataset con- tains equal number of datapoints per class, and (iii) each class contains at least two datapoints.) OvR gives better accuracy over OvO OvO gives better accuracy over OvR For any e > 0, both OvO and OvR give accuracy of at most e. (your example training set can depend on E) For any > 0, both OvO and OvR give accuracy of at least 1 - e. (your example training set can depend on e) Suppose our goal is to minimize the number of calls made to binary classification during test time (let's call this quantity c). Propose a technique to construct a k-class classifier f from binary classifiers that minimizes c. For your proposed technique, what is c? (i.e., express it in terms of parameters of the data, such as, number of classes k, number of datapoints n, dimensionality of your dataset d, etc). Prove that your technique is indeed minimizes c, that is, there is no other technique that makes fewer binary classification calls than your technique during test time and still achieve comparable accuracy In this problem, we will study how one can use binary classification to do multiclass classi- fication. Suppose we want to construct a k-class classifier f that maps data from some input space to the label space( 1, 2, , k}. There are two popular methods to construct f from just combining results from multiple binary classifiers -one-vs-rest (OVR) technique-for each class i E , one can view the classification problem as computing a function fi : Rd {class , not class i} (i.e., assigning ex- amples from class i the label 1 and all other classes the label 0). One can combine the results for each f, to construct a multiclass classifier f. - one-vs-one (OvO) technique - For each pair of classes i,j E [k] (i,j distinct), one can view the classification problem as computing a function fij : Rd {class i, class j} (taking only training points with labels i or j). One can combine the results for each fij to construct a multiclass classifier f. i) Assuming that your base binary classifiers can only be linear, show a training dataset for each of the following cases. (Your example training dataset for each case must have the following properties - (i) number of classes in the dataset k > 2, (ii) dataset con- tains equal number of datapoints per class, and (iii) each class contains at least two datapoints.) OvR gives better accuracy over OvO OvO gives better accuracy over OvR For any e > 0, both OvO and OvR give accuracy of at most e. (your example training set can depend on E) For any > 0, both OvO and OvR give accuracy of at least 1 - e. (your example training set can depend on e) Suppose our goal is to minimize the number of calls made to binary classification during test time (let's call this quantity c). Propose a technique to construct a k-class classifier f from binary classifiers that minimizes c. For your proposed technique, what is c? (i.e., express it in terms of parameters of the data, such as, number of classes k, number of datapoints n, dimensionality of your dataset d, etc). Prove that your technique is indeed minimizes c, that is, there is no other technique that makes fewer binary classification calls than your technique during test time and still achieve comparable accuracy