Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I have the matlab code below for the attached question. I need help running this in python. can I get some help translating these to

I have the matlab code below for the attached question. I need help running this in python. can I get some help translating these to python codes/algorithm.

% Poisson penalty W = rand(V,K); H = rand(K,N); obj = []; for ite = 1:150 Wn = W./repmat(sum(W,1),V,1); X2 = X./(W*H); H = H.*(Wn*X2) + eps; Hn = H./repmat(sum(H,2),1,N); X2 = X./(eps+W*H); W = W.*(X2*Hn) + eps; obj(end+1) = sum(sum(X2)) - sum(sum(X.*log(X2+eps)))

end.

Question: endimage text in transcribed

In this problem you will factorize an N x M matrix X into a rank-K approximation WH, where W is N x K, H is K x M and all values in the matrices are nonnegative. Each value in W and H can be initialized randomly to a positive number, e.g., from a Uniform(1,2) distribution. The data to be used for this problem consists of 8447 documents from The New York Times. (See below for how to process the data.) The vocabulary size is 3012 words. You will need to use this data to construct the matrix X, where Xij is the number of times word i appears in document j. Therefore, X is 3012x8447 and most values in X will equal zero. a) Implement and run the NMF algorithm on this data using the divergence penalty. Set the rank to 25 and run for 100 iterations. This corresponds to learning 25 topics. Plot the objective as a function of iteration. b) After running the algorithm, normalize the columns of W so they sum to one. For each column of W, list the 10 words having the largest weight and show the weight. The ith row of W corresponds to the ith word in the "dictionary" provided with the data. Organize these lists in a 5 x 5 table Comments about Problem 2: You can add a very small number (e.g., 10 16 to the denominator to avoid this. Each row in nyt-data .txt corresponds to a single document. It gives the index of words appearing in that document and the number of times they appear. It uses the format "idx:cnt" with commas separating each unique word in the document. Any index that doesn't appear in a row has a count of zero for that word. The vocabulary word corresponding to each index is given in the corresponding row of nyt vocab. dat

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

AWS Certified Database Study Guide Specialty DBS-C01 Exam

Authors: Matheus Arrais, Rene Martinez Bravet, Leonardo Ciccone, Angie Nobre Cocharero, Erika Kurauchi, Hugo Rozestraten

1st Edition

1119778956, 978-1119778950

More Books

Students also viewed these Databases questions

Question

What is paper chromatography?

Answered: 1 week ago

Question

Explain the cost of capital.

Answered: 1 week ago

Question

Define capital structure.

Answered: 1 week ago