Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Feb 29, 2024

I stuck in the Spatial Clustering Python. This is the question: Political blogs dataset We will study a political blog dataset first compiled for the

I stuck in the Spatial Clustering Python.

This is the question:

Political blogs dataset

We will study a political blog dataset first compiled for the paper Lada A. Adamic and Natalie Glance, "The political blogosphere and the 2004 US Election", in Proceedings of the WWW-2005 Workshop on the Weblogging Ecosystem (2005). It is assumed that blog-site with the same political orientation are more likely to link to each other, thus, forming a "community" or "cluster" in a graph. In this question, we will see whether or not this hypothesis is likely to be true based on the data.

1. The dataset nodes.txt contains a graph with n = 1490 vertices ("nodes") corresponding to political blogs.

2. The dataset edges.txt contains edges between the vertices. You may remove isolated nodes (nodes that are not connected to any other nodes) in the pre-processing.

We will treat the network as an undirected graph; thus, when constructing the adjacency matrix, make it symmetrical by, e.g., set the entry in the adjacency matrix to be one whether there is an edge between the two nodes (in either direction).

In addition, each vertex has a 0-1 label (in the 3rd column of the data file) corresponding to the true political orientation of that blog. We will consider this as the true label and check whether spectral clustering will cluster nodes with the same political orientation as possible.

These are two questions:

1. Use spectral clustering to find the k = 2, 5, 10, 25 clusters in the network of political blogs (each node is a blog, and their edges are defined in the file edges.txt). Find majority labels in each cluster for different k values, respectively. For example, if there are k = 2 clusters, and their labels are {0, 1, 1, 1} and {0, 0, 1} then the majority label for the first cluster is 1 and for the second cluster is 0. It is required you implement the algorithms yourself rather than calling from a package.

Now compare the majority label with the individual labels in each cluster, and report the mismatch rate for each cluster, when k = 2, 5, 10, 25. For instance, in the example above, the mismatch rate for the first cluster is 1/4 (only the first node differs from the majority), and the second cluster is 1/3.

2. Tune your k and find the number of clusters to achieve a reasonably small mismatch rate. Please explain how you tune k and what is the achieved mismatch rate. Please explain intuitively what this result tells about the network community structure.

I've already seen the conceptual questions on youtube, so don't answer the conceptual questions. I checked every step of the code, but I still couldn't get the right answer.

This is my code:

1 from scipy import sparse import numpy as np 2 3 import pandas as pd 4 from matplotlib import pyplot as plt from sklearn.cluster import KMeans 5 6 from os. path import abspath 7 np. random.seed (14) 8 9 10 #Loading both files to numpy array 11 file_1 = pd. read_csv (abspath("nodes.txt"), sep='\t', header = None) file_2 = pd. read_csv (abspath("edges.txt"), sep='\t', header = None) 13 #print(file_1) 12 14 #print (file_2) 15 16 17 18 19 20 21 22 # remove the second and third column from nodes nodes file_1.to_numpy ( ) [:, [0,2]] #print (nodes) file_2.to_numpy() edges = #print (edges)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Calculus

Calculus

Authors: Dale Varberg, Edwin J. Purcell, Steven E. Rigdon

9th edition

131429248, 978-0131429246

More Books

Students also viewed these Programming questions

Question

★★★★★

importance of workalike balance for women as a leader in healthcare. 1. The purpose of the research; 2. Research question/s; 3. Findings and relevant data; 4. Any recommendations and; 5. The...

Answered: 1 week ago

Question

★★★★★

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Answered: 1 week ago

Question

★★★★★

What are the problems with the company/individual/concepts mentioned in the case study article? Why are the problems important? In what way does the problem impact the organization? What are...

Answered: 1 week ago

Question

★★★★★

A job order cost accounting system is fully integrated into the general ledger of a company. Identify the major general ledger accounts used in a job order cost system. Explain how manufacturing...

Answered: 1 week ago

Question

★★★★★

The Seattle Recycling Company (SRC) purchases old water and soda bottles and recycles them to produce plastic covers for outdoor furniture. The company processes the bottles in a special piece of...

Answered: 1 week ago

Question

★★★★★

Which of the following for statements is logically incorrect? a. b. c. d. for (var count 10; count

Answered: 1 week ago

Question

★★★★★

1. Walk to the child, look into his or her eyes.

Answered: 1 week ago

Question

★★★★★

U.S. companies lose $63.2 billion per year from workers with insomnia. Workers lose an average of 7.8 days of productivity per year due to lack of sleep (Wall Street Journal, January 23, 2013). The...

Answered: 1 week ago

Question

★★★★★

Financial analysts forecast Limited Brands ( LTD ) growth rate for the future to be 9 . 5 percent. LTD ' s recent dividend was $ 0 . 4 5 . What is the value of Limited Brands stock when the required...

Answered: 1 week ago

Question

★★★★★

2x 2x 2x3 4 2X2 2 x 5x 2x3 1 8 x x 4x 11 2 + 2X2 = 4

Answered: 1 week ago

Question

★★★★★

Selected financial statement data for XYZ Company are presented below. tatus: Cash Short-term investments 12/31/17 Accounts receivable $ 15,000 Inventories 20,000 Total current liabilities 60,000...

Answered: 1 week ago

Question

★★★★★

7. IP A solenoid with 385 turns per meter and a diameter of 17.0 cm has a magnetic flux through its core of magnitude . (a) Find the current in this solenoid. (b) How would your answer to part (a)...

Answered: 1 week ago

Question

★★★★★

Peruse the website for companies in manufacturing that you feel would use process costing. Go to the company's website and review their annul report. Required: You are to assume the role of a Manager...

Answered: 1 week ago

Question

★★★★★

Which crystal system(s) listed below has (have) the following interaxial angle relationship? Triclinic Cubic Rhombohedral Hexagonal Monoclinic Tetragonal Orthorhombic a == y = 90

Answered: 1 week ago

Question

★★★★★

A very specific aspect of IT Governance is Data Governance. 1. What's the term "Data Governance" and explain it. 2. Explain what a data governance committee does within a health care organization. 3....

Answered: 1 week ago

Question

★★★★★

The purpose of this assignment is to record the work you have done finding a topic, building research questions, and doing a rhetorical analysis for your paper. Place your three (3) research...

Answered: 1 week ago

Question

★★★★★

4.20 LAB: The UCSB surroundings tour guide! CSW8 Learning Goals In this iab, you will - Use Boolean copentors (andiofinot) to specify a range - Use ifielifiel se to coriecily select a place and...

Answered: 1 week ago

Question

★★★★★

In Exercises 1558, find each product. (9 - 5x) 2

Answered: 1 week ago

Question

★★★★★

The temperature T in degrees Celsius at (x, y, z) is given by T = 10/(x2 + y2 + z2). where distances are in meters. A bee is flying away from the hot spot at the origin on a spiral path so that its...

Answered: 1 week ago

Question

★★★★★

Use loga x = (lnx) / (lna) to calculate each of the logarithms in Problems. (a) log5 12 (b) log7 (0.11) (c) log11 (8.12)1/5 (d) log10 (8.57)7

Answered: 1 week ago

Question

★★★★★

In Problems a-b, first find the general solution (involving a constant C) for the given differential equation. Then find the particular solution that satisfies the indicated condition. a. dy/dx = x2...

Answered: 1 week ago

Question

★★★★★

If the nominal rate of interest paid on a savings account is 2% compounded monthly, what is the effective rate of interest?

Answered: 1 week ago

Question

★★★★★

From a lenders point of view, would you rather disclose to borrowers the nominal interest rate or the effective interest rate?

Answered: 1 week ago

Question

★★★★★

After 27 months of quarterly compounding, a $3000 debt had grown to $3810. What effective rate of interest was being charged on the debt?

Answered: 1 week ago

Previous Question Next Question