Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on May 16, 2024

1 Formal Definition Recall a pool-based teacher cannot create arbitrary data points. Instead, the teacher is given a pool of data points P =

1 Formal Definition Recall a pool-based teacher cannot create arbitrary data points. Instead, the teacher is given a "pool" of data points P = {(x1,y1)... (xN, YN)}, and the teacher must select pairs from the pool to form its teaching set D. For this homework, we allow D to be a multiset (i.e. allow repeated items from P). Exact teaching is in general infeasible in pool-based teaching. Instead, the teacher aims to approximately teach the student the target model g. In theory, let f = kNN(D) where we treat kNN as a function (a learning algorithm) that takes a training set D and outputs a classifier f. The teacher has a target classifier g, and has a probability density function p(x) over the input space X. Let the 0-1 loss be l(y, y') = 1 if y y' and 0 otherwise. Then the teacher can define the disagreement between f and gas d(f,g) := [ p(x)l(f(x), g(x))dx. (1) Clearly, d(f,g) [0,1]. d(f,g) = 0 if equals 9 with probability one. We can now state the teaching problem as D* = argmin DCP d(kNN(D), g). (2) 2 Approximating Integral with Sampling In practice, integrating over X can be difficult. Instead, we approximate integral with sampling. For this, we need a large sample Z = {(x1,y1 = g(x1)), . . ., (x'm, Y'm = g(x'm))} where x ~ p(x). In general, Z is distinct from the pool P, though the two could be the same. We can then approximate the integral by average and define (f,g): == 1 m m l (f(x^{), yk). (3) i=1 It will be the case that d(f, g) d(f, g) when m is large. Note d(f, g) is a random variable because it depends on the sample Z, while d(f, g) is a constant. Given Z, the teacher can solve a slightly different problem D = argmin DCP Again, there are at least two ways to solve this problem: (kNN(D), g). (4) 1. Enumeration. The teacher enumerates all subsets of P. Note a subset can be even smaller than k: kNN is well defined on any training set size. If a training set is smaller than k, kNN will simply do a majority vote on all training items. 2. Greedy. Given a partial teaching set D (initially empty), the teacher enumerates all one-item extension DU {(x, y)} where (x, y) = P. The teacher adds the best one-item extension (x, y) to D, where (x, y) = argmin(x,y) EP This greedy process repeats so D grows. (kNN(DU {(x, y)}), g). 3 Adding a Teaching Cost The teacher's objective so far is to guide the student close to g. Finally, we introduce the notation of a "teaching cost" function c(D) which specifies how costly it is for the teacher to use a teaching set D. The new teaching problem is D= argminDCP (kNN(D), g) + c(D). (6) c(D) can be thought of as a "regularizer" in the data space that encourages cheap teaching sets. For this homework, we will consider a particularly simple teaching cost: c(D) = 0 if |D| n, and otherwise, for a given threshold n. This simply prevents D from being larger than n. For enumeration, you will consider subsets of size at most n; for greedy, simply stop after size n. 4 Hand in For this homework let k = 1 (INN), n = 20, P = Z=hw5data.txt. Each row x1 x2 y is a data item with 2D feature (x1, x2) and binary label y. Implement both enumeration and greedy. For each teaching set size, show: 1. the number of teaching sets of that size that you have to search through; 2. number of seconds it takes for that size; 3. (kNN(D), g) of the best teaching set of that size 4. for enumeration, plot the best teaching set D in relation to P (i.e. plot both, but use different symbols for D) For greedy, only plot one figure for the last teaching set D with size n* in relation to P, but see if you can mark the order of teaching items entering the greedy teaching set (e.g. use numbers instead of symbols to show items in D). Again you may not be able to enumerate teaching sets of large size, and that is OK.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Hard Decisions with decision tools

Authors: Robert Clemen, Terence Reilly

3rd edition

538797576, 978-0538797573

More Books

Students also viewed these Mathematics questions

Question

1 A . After dan's EFN analysis for East Coast Yachts, Larissa has decided to expand the companys operation. She has asked Dan to enlist an underwriter to help sell $ 6 0 million in new 2 0 - year...

Answered: 1 week ago

Question

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Answered: 1 week ago

Question

Microkernel operating systems aim to address perceived modularity and reliability issues in traditional "monolithic" operating systems. (i) Describe the typical architecture of a microkernel...

Answered: 1 week ago

Question

★★★★★

1. Was 1MDB a fraudulent venture from the beginning? Why or why not? 2. How did the US Government become involved in the 1MDB saga? 3. What does the involvement of so many financial institutions in...

Answered: 1 week ago

Question

★★★★★

Refer to the data for Green House Inc. in Exercise 67. In Exercise: Required: Compute the equivalent units for October for the Cleaning Department, assuming that the company uses the FIFO method of...

Answered: 1 week ago

Question

★★★★★

You and your team are asked to determine the hearing safety parameters for a loud experiment. Since the sound will persist for only a short time, the allowed intensity level that will not result in...

Answered: 1 week ago

Question

★★★★★

=+ Propose an ethical solution that considers the welfare of all stakeholders.

Answered: 1 week ago

Question

★★★★★

Kroger, Safeway Inc., and Winn-Dixie Stores Inc. are three grocery chains in the United States. Inventory management is an important aspect of the grocery retail business. Recent balance sheets for...

Answered: 1 week ago

Question

★★★★★

Last year, you purchased 400 shares of GreenZone Energy Corp stock at $619 per share. You later sold all the shares for $672 per share. What is your holding period return on this investment? Enter...

Answered: 1 week ago

Question

★★★★★

Rusty mowers, Inc is expected to pay a dividend of $2.25 at the end of the year. The required rate of return is rs = 8%, and the expected constant growth rate is g = -3.0% (i.e. negative 3%). What is...

Answered: 1 week ago

Question

★★★★★

Problem 2-8 Corporate Taxes (LG2-3) 640 Everybody's Fitness's 2021 income statement is reported below (in millions of dollars) (Use corporate tax rate of 21 percent for your calculations.)...

Answered: 1 week ago

Question

★★★★★

Marilyn Terrill is the senior auditor for the audit of Uden Supply Company for the year ended December 31, 20X4. In planning the audit, Marilyn is attempting to develop expectations for planning...

Answered: 1 week ago

Question

★★★★★

Calculate the cash flow for each of the given cases. a. Cash paid for advertising: Advertising expense $62,000 Prepaid advertising, January 1 13,000 Prepaid advertising, December 31 15,000 Cash Flow...

Answered: 1 week ago

Question

★★★★★

Please explain. Should we include the 80,000 debt in calculation? The capital balances, prior to the liquidation of the XYZ partnership, were as follows: X, Capital $ 130,000 Y, Capital $ 130,000 Z,...

Answered: 1 week ago

Question

★★★★★

Write a paper by answer the following question: Should Recycling Be Mandatory?

Answered: 1 week ago

Question

★★★★★

Suppose a random sample taken in 2016 of 900 adults between the ages of 25 and 34 years of age indicated that 396 owned a home. In 1984, a random sample of 900 adults in the same age group showed...

Answered: 1 week ago

Question

★★★★★

A sample of 25 pieces of laminate used in the manufacture of circuit boards was selected and the amount of warpage (in.) under particular conditions was determined for each piece, resulting in a...

Answered: 1 week ago

Question

★★★★★

A study of the ability of individuals to walk in a straight line (Can We Really Walk Straight? Amer. J. of Physical Anthro., 1992: 1927) reported the accompanying data on cadence (strides per second)...

Answered: 1 week ago

Previous Question Next Question