Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

you will be writing a few stored procedures in SQL Server to analyze a graph data set. The data set to analyze contains citation information

you will be writing a few stored procedures in SQL Server to analyze a graph data set.
The data set to analyze contains citation information for about 5000 papers from the Arxiv high-energy physics theory paper archive. The data set has around 14,400 citations between those papers. The data set is comprised of two database tables:
nodes (paperID, paperTitle);
edges (paperID, citedPaperID);
The first table gives a unique paper identifier, as well as the paper title. The second table indicates citations between the papers (note that citations have a direction).
Your task is to write stored procedure that analyze this data.
1.1 Connected Components
You will first write a stored procedure that treats the graph as being undirected (that is, do not worry about the direction of citation) and finds all connected components in the graph that have more than four and at most ten papers, printing out the associated lists of paper titles. My implementation found eight such connected components in the data set. To refresh your memory, a connected component is a subgraph such that there exists a path between each pair of nodes in the subgraph. Such a subgraph must be maximal in the
sense that it is not possible to add any additional nodes that are connected to any node in the subgraph. The standard method for computing a connected component is a simple breadth-first search. Pick a random starting node, and then search for all nodes reachable from the starting node, then search for all nodes reachable from all of those nodes, and then search for all of the nodes reachable from those nodes, and so on, until no new nodes are found. The entire set of discovered nodes is a connected component. If there are any nodes that are not part of any connected component analyzed so far, then pick one of those
nodes, and restart the computation. You are done when all of the nodes are part of exactly one connected component.
Your program should first compute all of the connected components, and then print out all of the connected components that are larger than size four, and no larger than size ten. When you print out the components, print each paper ID as well as the title.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Microsoft Visual Basic 2008 Comprehensive Concepts And Techniques

Authors: Gary B. Shelly, Corinne Hoisington

1st Edition

1423927168, 978-1423927167

More Books

Students also viewed these Databases questions

Question

Communicating Competently

Answered: 1 week ago

Question

LO5 Describe job analysis and the stages in the process.

Answered: 1 week ago