Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 11, 2024

Stochastic gradient descent 1 point possible ( graded ) Even if we can fit a large data set in the memory of our computer, computing

Stochastic gradient descent

1

point possible

(

graded

)

Even if we can fit a large data set in the memory of our computer, computing the gradient of requires iterating over every data point. This can be a considerably lengthy process, and must be repeated often for each step in the gradient descent process. Instead, we seek to reduce the number of data points that are required for computing the gradient. We will do this by taking a subsample of the data.

Notice that we can freely rescale the loss function by a multiplicative constant:

This looks a lot like a mean. Therefore, if we define our data set to be a population, then we can write the loss function as an expectation over this population:

We can then estimate the update step by using the gradient for only one data point,

The data point to use should be chosen randomly at each iteration step.

As long as we choose uniformly from the data set, we find that this estimator is unbiased:

This is known as stochastic gradient descent.

In order to ensure that we use all available data, we should choose the maximum number of iterations to be several times larger than the data set. There are also variations on this scheme where the data point used at each step is selected sequentially an array of data points. In that case, the update rule would be

A sequential stochastic gradient descent can be useful if selecting random data points on the fly is computationally expensive. This can occur for some storage mediums, such as hard drives, which have a slow random access time. By reading the data sequentially, we improve performance by ordering our read requests to the storage medium.

In the sequential variation of stochastic gradient descent, how should we initialize this array of data points to minimize the number of iterations required to converge

(

i

.

e

.

increase the algorithmic performance of SGD itself

) ?

In the order that the data points were observed experimentally.

In random order.

The initialization strategy for the array of data points does not matter.

unanswered

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional IPhone And IPad Database Application Programming

Professional IPhone And IPad Database Application Programming

Authors: Patrick Alessi

1st Edition

0470636173, 978-0470636176

More Books

Students also viewed these Databases questions

Question

★★★★★

Tres Company manufactures one product. Standard costs for each unit of the product follow: Materials: 5 gallons at $1.15..............................$ 5.75 Direct labor: 1 hour at...

Answered: 1 week ago

Question

★★★★★

Review the GLOBE research. Take a particular culture (or set of cultures) you are familiar with and analyze how effectively you believe the GLOBE research captures the nuances of this culture(s)....

Answered: 1 week ago

Question

★★★★★

=+how they could be specifically applied.

Answered: 1 week ago

Question

★★★★★

Briefly describe some of the similarities and differences between U.S. GAAP and iGAAP with respect to the accounting for leases.

Answered: 1 week ago

Question

★★★★★

Stochastic gradient descent 1 point possible ( graded ) Even if we can fit a large data set in the memory of our computer, computing the gradient of requires iterating over every data point. This can...

Answered: 1 week ago

Question

★★★★★

The Elkmont Corporation needs to raise $63.8 million to finance its expansion into new markets. The company will sell new shares of equity via a general cash offering to raise the needed funds. The...

Answered: 1 week ago

Question

★★★★★

During July, the following are given: Standard unit price 12.50 Standard Quantity Allowed for actual production 6,300 units Actual unit purchase price Quantity purchased and used for production...

Answered: 1 week ago

Question

★★★★★

An RFP is used for what type of procurement? Give an advantage and a disadvantage of the RFP process

Answered: 1 week ago

Question

★★★★★

You are the area manager of several starbucks and you make all of the decisions regarding land, building, and equipment. Additionally you are responsible for all of the profits of the starbucks in...

Answered: 1 week ago

Question

★★★★★

Question 26 2 Points Dalal faces a reoccurring situation every month, introducing a new product. Accordingly, she created a complex integration mechanism, a permanent committee responsible for...

Answered: 1 week ago

Question

★★★★★

An electronics company headquartered in India, manufacturer & supplier of semiconductor chips to automobile companies across Europe, US and Asia- the Pacific. Looking at the product market demand...

Answered: 1 week ago

Question

★★★★★

Teamwork. As your instructor directs, role-play one or more of the following situations in which you are required to give constructive feedback. (Objective 3) a. You are the leader of a...

Answered: 1 week ago

Question

★★★★★

Teamwork. Recall a recent conflict in which you were involved. How did you behave? Did you allow the other person to treat you as if you had little or no value (passively)? Did you treat the other...

Answered: 1 week ago

Question

★★★★★

What are some of the negative consequences a business may suffer as a result of unresolved conflicts between employees? (Objective 4)

Answered: 1 week ago

Previous Question Next Question