Answered step by step
Verified Expert Solution
Question
1 Approved Answer
you will be writing a few stored procedures in SQL Server to analyze a graph data set. The data set to analyze contains citation information
you will be writing a few stored procedures in SQL Server to analyze a graph data set.
The data set to analyze contains citation information for about papers from the Arxiv highenergy physics theory paper archive. The data set has around citations between those papers. The data set is comprised of two database tables:
nodes paperID paperTitle;
edges paperID citedPaperID;
The first table gives a unique paper identifier, as well as the paper title. The second table indicates citations between the papers note that citations have a direction
Your task is to write stored procedure that analyze this data.
Connected Components
You will first write a stored procedure that treats the graph as being undirected that is do not worry about the direction of citation and finds all connected components in the graph that have more than four and at most ten papers, printing out the associated lists of paper titles. My implementation found eight such connected components in the data set. To refresh your memory, a connected component is a subgraph such that there exists a path between each pair of nodes in the subgraph. Such a subgraph must be maximal in the
sense that it is not possible to add any additional nodes that are connected to any node in the subgraph. The standard method for computing a connected component is a simple breadthfirst search. Pick a random starting node, and then search for all nodes reachable from the starting node, then search for all nodes reachable from all of those nodes, and then search for all of the nodes reachable from those nodes, and so on until no new nodes are found. The entire set of discovered nodes is a connected component. If there are any nodes that are not part of any connected component analyzed so far, then pick one of those
nodes, and restart the computation. You are done when all of the nodes are part of exactly one connected component.
Your program should first compute all of the connected components, and then print out all of the connected components that are larger than size four, and no larger than size ten. When you print out the components, print each paper ID as well as the title.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started