Answered step by step
Verified Expert Solution
Question
1 Approved Answer
We will study a political blog dataset first compiled for the paper Lada A . Adamic and Natalie Glance, The political blogosphere and the 2
We will study a political blog dataset first compiled for the paper Lada A Adamic and Natalie Glance,
The political blogosphere and the US Election in Proceedings of the WWW Workshop on the
Weblogging Ecosystem It is assumed that blogsite with the same political orientation are more
likely to link to each other, thus, forming a community or cluster in a graph. In this question, we will
see whether or not this hypothesis is likely to be true based on the data.
The dataset nodes.txt contains a graph with n vertices nodes corresponding to political
blogs.
The dataset edges.txt contains edges between the vertices. You may remove isolated nodes nodes
that are not connected to any other nodes in the preprocessing.
We will treat the network as an undirected graph; thus, when constructing the adjacency matrix, make
it symmetrical by eg set the entry in the adjacency matrix to be one whether there is an edge between
the two nodes in either direction
In addition, each vertex has a label in the rd column of the data file corresponding to the true
political orientation of that blog. We will consider this as the true label and check whether spectral clustering
will cluster nodes with the same political orientation as possible.
points Use spectral clustering to find the k clusters in the network of political blogs
each node is a blog, and their edges are defined in the file edges.txt Find majority labels Same as
purity score from the image compression problem in each cluster for different k values, respectively.
For example, if there are k clusters, and their labels are and then the majority
label for the first cluster is and for the second cluster is It is required you implement the
algorithms yourself rather than calling from a package.
Now compare the majority label with the individual labels in each cluster, and report the mismatch
rate Also known as misclassification rate for each cluster, when k For instance, in
the example above, the mismatch rate for the first cluster is only the first node differs from the
majority and the second cluster is
points Tune your k and find the number of clusters to achieve a reasonably small mismatch rate.
Please explain how you tune k and what is the achieved mismatch rate. Please explain intuitively what
this result tells about the network community structure.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started