All Matches
Solution Library
Expert Answer
Textbooks
Search Textbook questions, tutors and Books
Oops, something went wrong!
Change your search query and then try again
Toggle navigation
FREE Trial
S
Books
FREE
Tutors
Study Help
Expert Questions
Accounting
General Management
Mathematics
Finance
Organizational Behaviour
Law
Physics
Operating System
Management Leadership
Sociology
Programming
Marketing
Database
Computer Network
Economics
Textbooks Solutions
Accounting
Managerial Accounting
Management Leadership
Cost Accounting
Statistics
Business Law
Corporate Finance
Finance
Economics
Auditing
Hire a Tutor
AI Study Help
New
Search
Search
Sign In
Register
study help
mathematics
statistics
Questions and Answers of
Statistics
Find all the frequent subsequences with support ≥ 50% given the sequence database shown in Table 7.15. Assume that there are no timing constraints imposed on the sequences.
(a) For each of the sequences w = given below, determine whether they are subsequences of the sequenceTable 7.15. Example of event sequences generated by various sensors(b) Determine whether each of
For each of the sequence w = {e1, . . . , elast} below, determine whether they are subsequences of the following data sequence: ({A,B}{C,D}{A,B}{C,D}{A,B}{C,D}) subjected to the following timing
Consider the following frequent 3-sequences: < {1, 2, 3} >, < {1, 2}{3} >, < {1}{2, 3} >, < {1, 2}{4} >, < {1, 3}{4} >, < {1, 2, 4} >, < {2, 3}{3} >, < {2, 3}{4} >, < {2}{3}{3} >, and < {2}{3}{4}
Consider the data sequence shown in Table 7.16 for a given object. Count the number of occurrences for the sequence ({p}{q}{r}) according to the following counting methods:Assume that ws = 0, mingap
Describe the types of modifications necessary to adapt the frequent subgraph mining algorithm to handle: (a) Directed graphs (b) Unlabeled graphs (c) Acyclic graphs (d) Disconnected graphs For each
Draw all candidate subgraphs obtained from joining the pair of graphs shown in Figure 7.2. Assume the edge-growing method is used to expand the subgraphs.
Draw all the candidate subgraphs obtained by joining the pair of graphs shown in Figure 7.4. Assume the edge-growing method is used to expand the subgraphs.
(a) If support is defined in terms of induced subgraph relationship, show that the confidence of the rule g1 −→ g2 can be greater than 1 if g1 and g2 are allowed to have overlapping vertex
Consider a graph mining algorithm that uses the edge-growing method to join the two undirected and unweighted subgraphs shown in Figure 19a.i. Draw all the distinct cores obtained when merging the
(a) Consider the data set shown in Table 7.4. Suppose we apply the following discretization strategies to the continuous attributes of the data set.D1: Partition the range of each continuous
The original association rule mining framework considers only presence of items together in the same transaction. There are situations in which itemsets that are infrequent may also be informative.
Suppose we would like to extract positive and negative itemsets from a data set that contains d items. (a) Consider an approach where we introduce a new variable to represent each negative item. With
For each type of pattern defined below, determine whether the support measure is monotone, anti-monotone, or non-monotone (i.e., neither monotone nor anti-monotone) with respect to increasing itemset
Consider the data set shown in Table 7.8. The first attribute is continuous, while the remaining two attributes are asymmetric binary. A rule is considered to be strong if its support exceeds 15% and
Consider the data set shown in Table 7.12.Table 7.12. Data set for Exercise 4.(a) For each combination of rules given below, specify the rule that has the highest confidence. i. 15 ii. 15 iii. 15 (b)
For the data set with the attributes given below, describe how you would convert it into a binary transaction data set appropriate for association analysis. Specifically, indicate for each attribute
Consider the data set shown in Table 7.13. Suppose we are interested in extracting the following association rule:{α1 ¤ Age ¤ α2, Play Piano =
Consider the transactions shown in Table 7.14, with an item taxonomy given in Figure 7.25.Table 7.14. Example of market basket transactions.(a) What are the main challenges of mining association
The following questions examine how the support and confidence of an association rule may vary in the presence of a concept hierarchy.
(a) List all the 4-subsequences contained in the following data sequence: < {1, 3} {2} {2, 3} {4} >, assuming no timing constraints. (b) List all the 3-element subsequences contained in the data
Consider a data set consisting of 220 data vectors, where each vector has 32 components and each component is a 4-byte value. Suppose that vector quantization is used for compression and that 216
Would the cosine measure be the appropriate similarity measure to use with K-means clustering for time series data? Why or why not? If not, what similarity measure would be more appropriate?
Total SSE is the sum of the SSE for each separate attribute. What does it mean if the SSE for one variable is low for all clusters? Low for just one cluster? High for all clusters? High for just one
The leader algorithm (Hartigan [4]) represents each cluster using a point, known as a leader, and assigns each point to the cluster corresponding to the closest leader, unless this distance is above
The Voronoi diagram for a set of K points in the plane is a partition of all the points of the plane into K regions, such that every point (of the plane) is assigned to the closest point among the K
You are given a data set with 100 records and are asked to cluster the data. You use K means to cluster the data, but for all values of K, 1 ≤ K ≤ 100, the K-means algorithm returns only one
Traditional agglomerative hierarchical clustering routines merge two clusters at each step. Does it seem likely that such an approach accurately captures the (nested) cluster structure of a set of
Use the similarity matrix in Table 8.1 to perform single and complete link hierarchical clustering. Show your results by drawing a dendrogram. The dendrogram should clearly show the order in which
Hierarchical clustering is sometimes used to generate K clusters, K > 1 by taking the clusters at the Kth level of the dendrogram. (Root is at level 1.) By looking at the clusters produced in this
Suppose we find K clusters using Ward's method, bisecting K-means, and ordinary K-means. Which of these solutions represents a local or global minimum? Explain.
Hierarchical clustering algorithms require O(m2 log(m)) time, and consequently, are impractical to use directly on larger data sets. One possible technique for reducing the time required is to sample
Consider the following four faces shown in Figure 8.7. Again, darkness or number of dots represents density. Lines are used only to distinguish regions and do not represent points.(a) For each
Compute the entropy and purity for the confusion matrix in Table 8.2.
You are given two sets of 100 points that fall within the unit square. One set of points is arranged so that the points are uniformly spaced. The other set of points is generated from a uniform
Using the data in Exercise 24, compute the silhouette coefficient for each point, each of the two clusters, and the overall clustering.
Given the set of cluster labels and similarity matrix shown in Tables 8.4 and 8.5, respectively, compute the correlation between the similarity matrix and the ideal similarity matrix, i.e., the
Compute the hierarchical F-measure for the eight objects {p1, p2, p3, p4, p5, p6, p7, p8} and hierarchical clustering shown in Figure 8.8. Class A contains points p1, p2, and p3, while p4, p5, p6,
Prove that ∑Ki = ∑ x∈Ci (x − mi)(m − mi) = 0. This fact was used in the proof that TSS = SSE + SSB on page 557.
Many partitional clustering algorithms that automatically determine the number of clusters claim that this is an advantage. List two situations in which this is not the case.
Clusters of documents can be summarized by finding the top terms (words) for the documents in the cluster, e.g., by taking the most frequent k terms, where k is a constant, say 10, or by taking all
We can represent a data set as a collection of object nodes and a collection of attribute nodes, where there is a link between each object and each attribute, and where the weight of that link is the
Identify the clusters in Figure 8.3 using the center-, contiguity-, and densitybased definitions. Also indicate the number of clusters for each case and give a brief indication of your reasoning.
For the following sets of two-dimensional points, (1) provide a sketch of how they would be split into clusters by K-means for the given number of clusters and (2) indicate approximately where the
Consider the mean of a cluster of objects from a binary transaction data set. What are the minimum and maximum values of the components of the mean? What is the interpretation of components of the
Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.
For sparse data, discuss why considering only the presence of non-zero values might give a more accurate view of the objects than considering the actual magnitudes of values. When would such an
We take a sample of adults and measure their heights. If we record the gender of each person, we can calculate the average height and the variance of the height, separately, for men and women.
Compare the membership weights and probabilities of Figures 9.1 and 9.4, which come, respectively, from applying fuzzy and EM clustering to the same set of data points. What differences do you
Figure 9.1 shows a clustering of a two-dimensional point data set with two clusters: The leftmost cluster, whose points are marked by asterisks, is somewhat diffuse, while the rightmost cluster,
Show that the MST clustering technique of Section 9.4.2 produces the same clusters as single link. To avoid complications and special cases, assume that all the pairwise similarities are distinct.
One way to sparsify a proximity matrix is the following: For each object (row in the matrix), set all entries to 0 except for those corresponding to the objects k-nearest neighbors. However, the
Give an example of a set of clusters in which merging based on the closeness of clusters leads to a more natural set of clusters than merging based on the strength of connection (interconnectedness)
For the definition of SNN similarity provided by Algorithm 9.10, the calculation of SNN distance does not take into account the position of shared neighbors in the two nearest neighbor lists. In
Grid-clustering techniques are different from other clustering techniques in that they partition space instead of sets of points. (a) How does this affect such techniques in terms of the description
In CLIQUE, the threshold used to find cluster density remains constant, even as the number of dimensions increases. This is a potential problem since density drops as dimensionality increases; i.e.,
Given a set of points in Euclidean space, which are being clustered using the K-means algorithm with Euclidean distance, the triangle inequality can be used in the assignment step to avoid
Consider a set of documents. Assume that all documents have been normalized to have unit length of 1. What is the "shape" of a cluster that consists of all documents whose cosine similarity to a
Discuss the advantages and disadvantages of treating clustering as an optimization problem. Among other factors, consider efficiency, non-determinism, and whether an optimization based approach
What is the time and space complexity of fuzzy c-means? Of SOM? How do these complexities compare to those of K-means?
For the fuzzy c-means algorithm described in this book, the sum of the membership degree of any point over all clusters is 1. Instead, we could only require that the membership degree of a point in a
Explain the difference between likelihood and probability.
Equation 9.12 gives the likelihood for a set of points from a Gaussian distribution as a function of the mean μ and the standard deviation σ. Show mathematically that the maximum likelihood
Compare and contrast the different techniques for anomaly detection that were presented in Section 10.1.2. In particular, try to identify circumstances in which the definitions of anomalies used in
Compare the following two measures of the extent to which an object belongs to a cluster: (1) distance of an object from the centroid of its closest cluster and (2) the silhouette coefficient
Consider the (relative distance) K-means scheme for outlier detection described in Section 10.5 and the accompanying figure, Figure 10.10. (a) The points at the bottom of the compact cluster shown in
If the probability that a normal object is classified as an anomaly is 0.01 and the probability that an anomalous object is classified as anomalous is 0.99, then what is the false alarm rate and
When a comprehensive training set is available, a supervised anomaly detection technique can typically outperform an unsupervised anomaly technique when performance is evaluated using measures such
Consider a group of documents that has been selected from a much larger set of diverse documents so that the selected documents are as dissimilar from one another as possible. If we consider
Consider a set of points, where most points are in regions of low density, but a few points are in regions of high density. If we define an anomaly as a point in a region of low density, then most
Consider a set of points that are uniformly distributed on the interval [0,1]. Is the statistical notion of an outlier as an infrequently observed value meaningful for this data?
An analyst applies an anomaly detection algorithm to a data set and finds a set of anomalies. Being curious, the analyst then applies the anomaly detection algorithm to the set of anomalies. (a)
Consider the following definition of an anomaly: An anomaly is an object that is unusually influential in the creation of a data model. (a) Compare this definition to that of the standard model-based
In one approach to anomaly detection, objects are represented as points in a multidimensional space, and the points are grouped into successive shells, where each shell represents a layer around a
Association analysis can be used to find anomalies as follows. Find strong association patterns, which involve some minimum number of objects. Anomalies are those objects that do not belong to any
Discuss techniques for combining multiple anomaly detection techniques to improve the identification of anomalous objects. Consider both supervised and unsupervised cases.
Describe the potential time complexity of anomaly detection approaches based on the following approaches: model-based using clustering, proximity-based, and density. No knowledge of specific
The Grubbs' test, which is described by Algorithm 10.1, is a more statistically sophisticated procedure for detecting outliers than that of Definition 10.3. It is iterative and also takes into
Many statistical tests for outliers were developed in an environment in which a few hundred observations was a large data set. We explore the limitations of such approaches. (a) For a set of
The probability density of a point x with respect to a multivariate normal distribution having a mean μ and covariance matrix Σ is given by the equationUsing the sample
Let Y denote the number of "heads that occur when two coins are tossed.(a) Derive the probability distribution of Y.(b) Derive the cumulative probability distribution of Y.(c) Derive the mean and
Compute the following probabilities: (a) If Y is distributed x2,4, find Pr(Y≤ 7.78). (b) If Y is distributed x2,10, find Pr(Y > 18.31). (c) If Y is distributed F10,∞ find Pr(Y > 1.83). (d) Why
X is a Bernoulli random variable with Pr(X = 1) = 0.99, Y is distributed N(0, 1), W is distributed N(0, 100), and X, Y, and W are independent. Let S = XY + (1 - X)W. (That is, S = Y when X = 1, and S
Suppose Yi = 1, 2, . . . , n, are i.i.d. random variables, each distributed N(10, 4). (a) Compute Pr(9.6 ≤ ≤ 10.4) when (i) n = 20, (ii) n = 100, and (iii) n = 100f). (b) Suppose c is a
Yi, i = 1, . . . , n, are i.i.d. Bernoulli random variables with p = 0.4. Let denote the sample mean.(a) Use the central limit to compute approximations fori. Pr(≥ 0,43) when n = 100.ii. Pr(≤
Consider two random variables X and Y. Suppose that Y takes on k values y1, . . . , yk and that X takes on / values x1, . . . , xl.(b) Use your answer to (a) to verify Equation (2.19).(c) Suppose
X is a random variable with moments E(X), E(X2), E(X3), and so forth. (a) Show E(X - µ)3 = E(X3) - 3[E(X2)][E(X)] + 2[E(X)]3. (b) Show E(X - µ)4 = E(X4) - 4[E(X)][E(X3)] + 6] E(X)]2[E(X2)] -
This exercise provides an example of a pair of random variables X and Y for which the conditional mean of Y given X depends on X but corr (X, Y) = 0. Let X and Z be two independently distributed
(Review of summation notation) Let x1, . . . xn denote a sequence of numbers. y1, . . . ,yn denote another sequence of numbers, and a, b, and c denote three constants. Show that(a)(b)(c)(d)
X and Z are two jointly distributed random variables. Suppose you know the value of Z, but not the value of . Let X = E( X| Z) denote a guess of the value of X using the information on Z, and let W
Using the random variables X and Y from Table 2.2, consider two new random variables W = 3 + 6 X and V = 20 = 7Y. Compute (a) E(W) and E(V); (b) σ2w and σ2v; and (c) σwv and corr(W, V).
In September, Seattle's daily high temperature has a mean of 70oF and a standard deviation of 7oF. What are the mean, standard deviation, and variance in oC?
In a given population of two-earner male/female couples, male earnings have a mean of $40,000 per year and a standard deviation of $12.000. Female earnings have a mean of $45.000 per year and a
X and Y are discrete random variables with the following joint distribution:That is, Pr(X = 1, Y = 14) = 0.02, and so forth.(a) Calculate the probability distribution, mean, and variance of Y.(b)
In a population, µY = 100 and σ2Y = 43. Use the central limit theorem to answer the following questions:(a) In a random sample of size n = 100, find Pr( < 101).(b) In a random sample of size n
Consider the estimator , defined in Equation (3.1). Show that (a) E() = fly and (b) var() = 125σ2Y/n.
Data on fifth-grade test scores (reading and mathematics) for 420 school districts in California yield = 646.2 and standard deviation sY = 19.5.(a) Construct a 95% confidence interval for the mean
Read the box "The Gender Gap of Earnings of College Graduates in the United States" in Section 3.5. (a) Construct a 95% confidence interval for the change in men's average hourly earnings between
(a) is an unbiased estimator of µY. Is 2 an unbiased estimator of µ2Y?(b) is a consistent estimator of µY. Is 2 a consistent estimator of µ2Y?
Show that the pooled standard error [SE pooied(m - w] given following Equation (3.23) equals the usual standard error for the difference in means in Equation (3.19) when the two group sizes are the
Showing 67800 - 67900
of 88274
First
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
Last