Question

1 Approved Answer

Posted on Aug 25, 2024

Please help write the algorithm, Let's take a look at how this works for a dataset about Linear Algebra. ProofWiki is a website that has

image text in transcribed

Please help write the algorithm,

Let's take a look at how this works for a dataset about Linear Algebra. ProofWiki is a website that has mathematical definitions and proofs. Because they are laid out in wikipedia format, it is relatively easy to scrape the pages and create a network. In this case, let's look at a network which includes all the mathematical definitions on the ProofWiki website. An edge points from node a to node bif definition a has node b in it's description. For example, the article on Linear Span has the text: The linear span can be interpreted as the set of all linear combinations (of finite length) of these vectors." This means that the Linear Span has edges out to Linear Combination, Finite, and Vector. In this case, to make it smaller and faster, I've limited it to the definitions that are contained within the topic of Linear Algebra. Load this network (to do this, make sure you download the proofwikidefs_la.gml file into the same directory as your python file): G - zen. io.gml.read('proofwikidefs_la.gml',weight_fxn = lambda x: x['weight']) We can use a variety of formats, but the GML format is a useful one to keep the full names of the nodes if the node names have spaces in them. The last part is some "advanced" Python which defines a function in place. It is describing how the value of the edge weights should be read in. In this case, there is a property in the gml called "weight" (sometimes it is called something different, eg, value). Create a function to compute the new cocitation network (and then ultimately the cocitation matrix), assuming an original network that is weighted (the text describes how to incorporate weights into the cocitation calculation). The simple way is to weight the cocitation network using just the count of common in-bound neighbors. You may want to do this first and then develop the weighted extension. You will want to look at the functions G. in_neighbors and G. weight(). One subtlety to be aware of. in zen, if you add an edge with zero weight, it still is considered an edge, so it will be used to find (in/out)-neighbors of nodes. Later in the course, we'll see some algorithms on graphs in which defining zero weights can be useful. This is a case where the graph has additional information beyond what the adjacency matrix contains. Here, we don't want zero weight edges, so be careful not to add them! def cocitation(G): write algorithm here return G_cocitation You'll need to create a new Graph object and then build it up according to the rules of the cocitation graph. Then use this function to calculate the cocitation matrix of the ProofWiki network. Compare the matrix of this cocitation network with the formula in the book C=AAT (recall that you need to switch A to A. transpose()). In NumPy (one of the math modules in Python) to calculate a matrix multiplication, you call the function numpy.dot(A,B) C1 = numpy.dot (A. transpose(), A) C2 = G_cocitation.matrix Cdiff = C1-C2 print 'Difference between cocitation methods: %i' % Cdiff.sum(). sum() Let's take a look at how this works for a dataset about Linear Algebra. ProofWiki is a website that has mathematical definitions and proofs. Because they are laid out in wikipedia format, it is relatively easy to scrape the pages and create a network. In this case, let's look at a network which includes all the mathematical definitions on the ProofWiki website. An edge points from node a to node bif definition a has node b in it's description. For example, the article on Linear Span has the text: The linear span can be interpreted as the set of all linear combinations (of finite length) of these vectors." This means that the Linear Span has edges out to Linear Combination, Finite, and Vector. In this case, to make it smaller and faster, I've limited it to the definitions that are contained within the topic of Linear Algebra. Load this network (to do this, make sure you download the proofwikidefs_la.gml file into the same directory as your python file): G - zen. io.gml.read('proofwikidefs_la.gml',weight_fxn = lambda x: x['weight']) We can use a variety of formats, but the GML format is a useful one to keep the full names of the nodes if the node names have spaces in them. The last part is some "advanced" Python which defines a function in place. It is describing how the value of the edge weights should be read in. In this case, there is a property in the gml called "weight" (sometimes it is called something different, eg, value). Create a function to compute the new cocitation network (and then ultimately the cocitation matrix), assuming an original network that is weighted (the text describes how to incorporate weights into the cocitation calculation). The simple way is to weight the cocitation network using just the count of common in-bound neighbors. You may want to do this first and then develop the weighted extension. You will want to look at the functions G. in_neighbors and G. weight(). One subtlety to be aware of. in zen, if you add an edge with zero weight, it still is considered an edge, so it will be used to find (in/out)-neighbors of nodes. Later in the course, we'll see some algorithms on graphs in which defining zero weights can be useful. This is a case where the graph has additional information beyond what the adjacency matrix contains. Here, we don't want zero weight edges, so be careful not to add them! def cocitation(G): write algorithm here return G_cocitation You'll need to create a new Graph object and then build it up according to the rules of the cocitation graph. Then use this function to calculate the cocitation matrix of the ProofWiki network. Compare the matrix of this cocitation network with the formula in the book C=AAT (recall that you need to switch A to A. transpose()). In NumPy (one of the math modules in Python) to calculate a matrix multiplication, you call the function numpy.dot(A,B) C1 = numpy.dot (A. transpose(), A) C2 = G_cocitation.matrix Cdiff = C1-C2 print 'Difference between cocitation methods: %i' % Cdiff.sum(). sum()