Question
1) What is the type of the following kinds of attributes (a) age (in years), (b) salary, (c) ZIP code, (e) height, and (f) intensity
1) What is the type of the following kinds of attributes (a) age (in years), (b) salary, (c)
ZIP code, (e) height, and (f) intensity of rain? Classify them as continuous or discrete, and as
qualitative (nominal or ordinal) or quantitative (interval or ratio).
2)An analyst sets up a sensor network in order to measure the temperature of different
locations over a time period. What is the type of attributes collected (temperature)? What is the type of the dataset?
3) It is desired to partition customers into similar groups on the basis of their demographic profile.
a. What features could we use? Provide 3 examples. Would you describe such data as heterogeneous?
b. Which data mining problem is best suited to this task?
4)Suppose that you had a set of arbitrary objects, each representing different characteristics of gadgets. A domain expert gave you the similarity value between every pair of objects. How would you convert these objects into a multidimensional data set for clustering the gadgets ?
5)Suppose that you had a data set, such that each data point corresponds to sea-surface
temperatures over a square mile of resolution 1010. In other words, each data record contains a 1010 grid of temperature values with spatial locations. You also have some text
associated with each 1010 grid. How would you convert this data into a multidimensional
data set? How many features will each data point have?
6) Compute the cosine similarity, Jaccard coefficient
(if possible, for binary vectors), Euclidean distance, correlation coefficient for the following vectors, x, y:
a. x = (0, -1, 1, 2,-2), y = (0, -2, 2, 4, -4)
b. x = (0, 1, 0, 0, 0), y = (0, 1, 0, 0, 1)
c. x = (-1, -1, -1, -1, -1), y = (1, 1, 1, 1, 1)
7)Compute the cosine similarity and the Jaccard coefficient, between the two sets {A, B, C} and {A, C, D, E}. Hint: how will you represent each set?
8) Create three documents, A, B, and C such that the Euclidean distance between A and B is smaller than the Euclidean distance between A and C, even though documents A and B have no common words whereas documents A and C have some common words.
9)Are the following similarity measures good or bad for finding similarity in document-term data? Provide a one-line justification for each answer you provide.
a. correlation
b. cosine
c. Euclidean
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started