Question

1 Approved Answer

Posted on May 15, 2024

Machine Learning Part A is due at the end of class in week 5; Part B is due on 28 February midnight in Canvas. Part

Machine Learning Part A is due at the end of class in week 5; Part B is due on 28 February midnight in Canvas. Part B will be accepted upto 2 days after the deadline. There will be a penalty of 10% imposed for each day that is late, irrespective of what time has lapsed after midnight on the day turned in. No exceptions will be made. As with project 1 this project will involve two parts, Part A and Part B. Part A is individual and will be completed in class on Tuesday or Wednesday depending on when your section is scheduled. Context and Background The purpose of this project is to investigate the effectiveness of using image classification to improve search time in massive image galleries (e.g.,the Shutterstock image library has some 50 million images). Two applications of image search are: 1) biometrics authentication for customers who wish to get access to their financial records and 2) people who are interested in searching for an image of a certain person in an image gallery. Given that image search is a very time-consuming process the key question is what can be done to improve efficiency. One approach that immediately comes to mind is whether a hash filter will help. Hash filters have been successfully used in minimizing search time in large textual databases and hence it is natural to ask whether they could be used to accelerate search in image databases. This project will use the hypothesis that a hash function is effective. The question then becomes: how exactly is a hash filter built? Basically, we have two choices: use landmark points on the face as components of the filter or use high level features such as Age, Gender and Race (assuming that the use of low-level features such as pixels will be hopelessly inefficient). In this project we shall take the latter approach. Building the Hash Filter Now that the decision has been taken to use demographics for the filter we now have to focus on specifics of these demographic features. Gender, of course, can be divided into two categories: male and female. With respect to race, we can draw on the standard classifications such as White Caucasian, Black, Indian, etc. This choice will be made on the availability of annotations as well as your own optimization decisions which you will take downstream in this project. The handling of Age is entirely your own decision to make. One possibility would be to divide age into decades of life and impose a cap at the higher end (say 61 and more). If this approach is taken then 7 age categories would emerge (however, this is only a suggestion and your experimentation may reveal that a different categorization is more efficient; remember this is not an age estimation problem per se, rather we are interested in filtering on age). Organization of the hash filter Once built the filter will greatly reduce search overhead. For the moment we will assume that classification accuracy on each of the 3 variables is 100%. Then if we use 7 categories for age, 5 for race and 2 for gender we will on the average (assuming uniform distribution of images for each of these 3 variables) reduce the search effort by a factor of 70. In order to realize these time savings, it will be necessary for the data to be organized into 70 hash buckets at the leaf level of the hierarchy, with each bucket containing images corresponding to its definition. Thus, for example one of the buckets will contain all females who are white and 31-40 years of age. Due to the fact that classification accuracy is not 100% accurate errors will result in some images not being found in the hash bucket that is predicted by the classification process. For example, a sample that has been presented as a 3140-year-old white male may correspond to a 2130-year Asian male. This means that both the age and race features have been misclassified. For each classification error you will need to compute the error distance which is the number of tree traversals between the predicted bucket and the bucket that contains the person id. Traversal is always left to right and bottom to top. A generic example of this process is given below for 3 generic features (F1, F2 and F3) on binary domains is illustrated below. After the classification process the bucket indicated is 3. Each hash bucket is annotated by the hash key (combination of F1, F2 and F3). The composite (overall) prediction for the sample is then compared with the hash key for bucket 3 and if they F2 F3 F1 0 1 2 3 4 5 6 7 8 9 10 11 12 are the same then the error distance is 0. On the other hand, if the ground truth value for that sample has the hash key corresponding to bucket 11, then the error distance is 6. In practice, errors cause a lot of overhead an unnecessary search has to be done at the predicted bucket and thereafter searches done by left to right and down to up movement until the sample (image) is located at one of the buckets. Search overhead is directly proportional to the error distance and that is why we use the error distance to weight errors. Note that you will not actually search records at buckets but rather use the ground to establish search overhead in terms of error distance. Note that in practice ground truth is not known for test samples and actual searches will need to be done. Error distance is therefore a good proxy measure of search overhead caused by incorrect classification. This backtracking process then raises an issue: in what order should the filter be defined? Should it be F1, F2, F3 or F1, F3, F1 or one of the other 4 possibilities? What influences this decision? Do the individual classification accuracy rates on the Age, Gender and Race features determine in what order they should be sequenced? Project Tasks Task 1 (both partners working together) Implement a simple CNN from scratch (use the custom-built CNN for the cats-vs-dogs classifier). Determine classification accuracy on the test dataset (use a 70/30 split) for each of the Age, Gender and Race features for the UTK Face dataset (both partners working together). Submit your Python code for this task. Use these accuracies as baseline values for Task 2. Task 2 (partner 1) On the basis of the baseline results produced in Task 1, determine the sequence in which the features should be used to build the hash filter. Justify your choice of feature sequence in the design of the hash filter. Determine the cost of each classification error at the leaves of the hash filter hierarchy tree. The error cost function (loss) is unique to this application. For each sample that is in error use the ground truth value (actual triple for Age, Gender and Race) and then determine the number of buckets to be searched to reach the target bucket according to the search procedure outlined above for the binary tree (siblings first then parent, etc). The number of buckets to be searched is the cost for that sample. The overall loss is then the sum of all loss values for samples that are misclassified. 2.1 Justify your choice of feature sequence in the design of the hash filter. 2.2 Determine the average loss value over the test dataset Submit a Python script that automates the task of computing the loss note that no actual searching is needed, all that is needed is to compute the tree distance between the predicted bucket and actual bucket for each sample in error. Task 3 (partner 2) Develop a more sophisticated classifier than the one developed in Task 1. Use transfer learning to leverage the VGG16 CNN to classify the UTK face dataset. 3.1 State the cut-off level where weights are frozen in VGG16. Justify your answer. 3.2 Submit the Python code for developing the new classifier based on transfer learning. Task 4 (individual answers from each partner) 4.1 What skills did you learn from this project? a paragraph is all that is required. 4.2 In your opinion, was transfer learning effective in this case? Justify your answer, whatever it may be. In summary, what you need to do is: 1. Build baseline CNN models for Age, Race and Gender. 2. Design the hash filter based on the results of step 1. 3. Compute the loss function which is the average error distance over the test dataset. 4. Use transfer learning on VGG16 to improve baselines in 1. 5. Reflect on what you learnt in this project