Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

It is important to define or select similarity measures in data mining applications, such as clustering, outlier analysis, and nearest-neighbor classification. However, the studies show

It is important to define or select similarity measures in data mining applications, such as clustering, outlier analysis, and nearest-neighbor classification. However, the studies show that there is no single similarity measure approach that consistently outperforms other approaches in all situations. Nonetheless, seemingly different similarity measures may be equivalent after some transformations. Let us considered 5 data objects in Table 1:

skin

insu

mass

pedi

x1

19

88

33.6

0.627

x2

20

188

26.6

0.351

x3

28

128

23.3

0.672

x4

21

94

28.1

0.167

x5

34

168

43.1

2.288

Table 1: Diabetes

Attribute information is listed below:

Triceps skin folds thickness in mm (skin): the minimum value is 0 and the maximum value is 99.

2-Hour serum insulin in mu U/ml (insu): the minimum value is 0 and the maximum value is 850.

Body mass index measured as weight in kg/(height in m)^2 (mass): the minimum value is 0 and the maximum value is 70.0.

Diabetes pedigree function (pedi): the minimum value is 0.05 and the maximum value is 2.50.

Given a new object (20, 98, 25.6, 0.201) as a query, rank the objects in Table 1 based on similarity with the query using Cosine similarity. Then, identify which of the following is a true statement about the ranking.

x2, x1, x3, x4, x5

x1, x2, x5, x3, x4

x3, x2, x5, x4, x1

x3, x1, x2, x4, x5

Suppose a group of 12 students with the test scores 72, 50, 21, 65, 97, 36, 85, 69, 70, 77, 88, and 93 into four intervals, using the equal-width approach. Do the partition, and identify the second smallest and largest values (among those values that appear in the list above) in each of the intervals. Then, find the true statement in the list below.

a.

36 is the second smallest value in its interval.

b.

88 is the second largest value in its interval.

c.

72 is the second largest value in its interval.

d.

93 is the second smallest value in its interval.

Suppose a group of 12 students with the test scores 72, 50, 21, 65, 97, 36, 85, 69, 70, 77, 88, and 93 into four intervals, using the equal-frequency approach. Do the partition, and identify the smallest and largest values (among those values that appear in the list above) in each of the intervals. Then, find the true statement in the list below.

a.

36 is the smallest value in its interval.

b.

65 is the largest value in its interval.

c.

88 is the smallest value in its interval.

d.

93 is the largest value in its interval

Suppose a hospital tested the age and body fat data for 18 randomly selected adults with the following result:

age

8

12

13

13

13

14

14

16

17

18

19

20

20

20

21

21

22

25

%fat

9.5

6.5

7.8

16.5

30.2

25.3

26.4

26.1

30.5

33.5

41.5

26.6

12.5

28.5

25.3

12.3

14.0

15.0

The five number summary of a distribution provides a good summary of the shape of the distribution. The five-number summary of the data of the fat is:

a.

7.80,26.68, 30.70, 33.93, 41.50

b.

8.00, 13.25, 17.50, 20.00, 25.00

c.

6.50, 12.88, 25.30, 28.03, 41.50

d.

41.50, 33.93, 28.78, 26.68, 7.80

Give anyone other term used for:

Input variable:

Target variable:

Attribute:

Row:

n what ways is data mining different from statistics? Choose the correct from following.

a.

Statistics tends to employ simpler algorithms

b.

Data mining tends to employ simpler algorithms

c.

In classical statistical inference, the same sample is used to make an estimate, and also to determine how reliable that estimate might be. In data mining, different samples are used.

d.

Data mining tends not to involve the strict limits around the question being addressed that classical inference requires.

From a statistical perspective, accurate models can be built in data mining with as few as several hundred records.

a.

The statement is true because what we need to build a model is 'good' representation of the population, which we can often get with a few hundred records.

b.

The statement is contradictory to the idea of data mining sifting through large amounts of data to gain useful information and therefore false.

c.

The statement is false because several hundred records are very unlikely to lead to an accurate model.

d.

The statement is false because using a small number of records would lead to overfitting.

List, in the correct order, the essential steps for building a data mining model.

- 1. 2. 3. 4. 5.

Run several modeling techniques, choosing one on the basis of its performance on the validation data. Results with the test data are an indicator of how well it will do with the rest of the database.

- 1. 2. 3. 4. 5.

Sampling from a larger database.

- 1. 2. 3. 4. 5.

Explore, Clean, Preprocess and Reduce the Data, including treatment of outliers and missing data.

- 1. 2. 3. 4. 5.

Develop the understanding of variables and selection of variables for building a model.

- 1. 2. 3. 4. 5.

Data partitioning into training, validation and test data sets.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions