Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

. Discuss the relative advantages and disadvantages of EM clustering as compared to ?-means. 16. Rather than relying only on distance (as in ?-means) or

. Discuss the relative advantages and disadvantages of EM clustering as compared to ?-means. 16. Rather than relying only on distance (as in ?-means) or probability (as in EM), we might instead cluster points based on density. That is, we can determine clusters based on the areas where points tend to bunch together. DBSCAN is the best-known example of such a density-based clustering strategy. In DBSCAN, there are two parameters, a positive integer ? that specifies the minimum number of data points in a "core" neighborhood, and a parameter ? > 0 that determines the neighborhood size. Based on these parameters, every point in the dataset can be classified as follows. ? Core point ? We say that ? is a core point provided that ? or more data points are within a distance of ? of ?, where ? itself is included among these ? points. ? Reachable point ? The point ? is said to be reachable if it satisfies ?(?, ? ) ? ? for some core point ? . Note that any core point is also a reachable point, but not all reachable points are core points. ? Outlier ? Any ? that is not a reachable point is classified as an outlier. The outliers are often viewed as noise. In the DBSCAN algorithm, we randomly select a point ? that has not previously been visited. If ? is not a core point, simply mark it as visited and select another non-visited point ?. If the selected point ? is a core point, then we grow a cluster based on ? by including all points in the ? neighborhood of ?. Furthermore, for each core point ? included in this manner, we iterate the process, that is, we include all points in the ? neighborhood of ?, thus extending the cluster based on the core points in neighborhoods of neighbors. This continues until no more points can be added to the cluster. In this way, all reachable points are added to this ?-based cluster. Only core points are used to extend a cluster to additional neighborhoods. Thus, we can view the non-core reachable points as being on the edge of the cluster. Finally, the algorithm terminates when there are no more unvisited points to select from when attempting to initiate a new cluster. 6.7 PROBLEMS 175 Unlike ?-means and EM clustering, for DBSCAN, we do not specify the number of clusters. Another major distinction is that with DBSCAN, outlier points are not assigned to any cluster. And, recall that ?-means clusters are round, while EM clusters based on Gaussian distributions are elliptical. In contrast, a DBSCAN cluster can be of any arbitrary shape, since clusters are based on the local density of points, not on a fixed topology. a) In DBSCAN, a previously visited point can be added to a cluster. Clearly explain how this could happen. b) The DBSCAN algorithm is not entirely deterministic. Explain how a point ? could be assigned to different clusters depending on the order in which points are selected. c) Implement the DBSCAN algorithm and use your program to cluster the data below, which can be found in the file dbscan.txt on the textbook website. Test each of the 4 pairs of parameters (?, ?) ? {(0.6, 3),(0.75, 4),(1.0, 5),(2.0, 10)}.

imageimage

a.Show by a code example the declaration of a function named least() that returns an Int and accepts three Int parameters, but is not a definition of that function. b. As C and C++ perform short-circuit evaluation of logical expressions, describe how the order of the clauses and values of the variables in the following If statement's conditional expression affect whether the different clauses are evaluated. Where would specific results in left-to-right evaluation result in short-circuit termination of evaluation? If (vvalld || ((x | y) && () && k))) saturate();

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Accounting Information Systems

Authors: George H. Bodnar, William S. Hopwood

11th Edition

0132871939, 978-0132871938

More Books

Students also viewed these Computer Network questions