Question
1.Which of the following is a true statement about neural networks? A.Neural networks can only be used for classification problems. B.All machine learning problems can
1.Which of the following is a true statement about neural networks?
A.Neural networks can only be used for classification problems.
B.All machine learning problems can be solved by a neural network that only has input and output layers and no hidden layers.
C.Some machine learning problems can only be solved by a neural network if the network has at least one hidden layer.
D.Neural networks can only be used for regression problems.
2.Using parsing and standardization for record linkage will tend to:
A.not affect precision and recall
B.lead to very high precision but very low recall
C.increase recall
D.decrease recall
3.With rank swapping, lowering the value of p will do what?
A.cause the individual values of a particular variable to be replaced by a central measure such as the mean or median of the variable
B.ensure that each record in the modified dataset is equal to at least p other records
C.make the modified dataset more similar to the original dataset
D.make the modified dataset less similar to the original dataset
4.With k-micro-aggregation, lowering the value of k makes the modified dataset more similar to the original dataset. (T/F)
5.With the Perceptron training algorithm, if the algorithm does one pass over the dataset and no instances have been misclassified, which of the following is true?
A.the weights will be re-initialized to zeros
B.some weights will be changed and some unchanged during that pass
C.the weights will be re-initialized to ones
D.the weights will be unchanged during that pass
6.Global recoding of a variable (for the purpose of anonymization) means:
A.the variable is represented using more detail (finer granularity).
B.the variable is represented using less detail (coarser granularity).
C.the variable has some values randomly deleted.
D.the variable has random pairs of values swapped.
7.A benefit of using swapping for preserving confidentiality in a dataset is:
A.it does not affect the ability to do linear regression using the data
B.it preserves correlations between variables
C.it does not modify the data
D.it preserves basic summary statistics of a variable
8.Given only a Soundex representation of a name, we can determine what the original name was. (T/F)
9.Blocking in record linkage will tend to:
A.decrease recall
B.increase precision and recall
C.not affect precision and recall
D.decrease precision
10.A histogram is a useful way to visualize:
A.time series fluctuations
B.the distribution of a single variable
C.the correlation between two variables
D.how a continuous variables differs according to the values of a categorical variable
11.Binarization of a continuous numeric variable (using some threshold) causes:
A.the variable values to be randomly replaced with 1's and 0's
B.the variable to be turned into two new variables
C.information to be gained
D.information to be lost
12.Hidden Markov Models are called hidden because:
A.the transition probabilities are not known
B.the observations are not known
C.the observation probabilities are not known
D.the true state sequence is not known
13.The Viterbi algorithm and the calculation of minimum edit distance are both examples of dynamic programming. (T/F)
14.The Markov assumption is that a state's probability depends on the entire prior sequence of states (T/F)
15.If we use string similarity metrics to augment record linkage (rather than just saying that fields agree or disagree with each other), this will tend to:
A.increase precision
B.reduce the number of pairs we have to consider
C.cause every record pair to be considered a definite match
D.increase recall
16.The goal of blocking in record linkage is:
A.to increase the number of record pairs that need to be considered
B.to do closer inspection of record pairs that seem unlikely to be linked
C.to decrease the number of record pairs that need to be considered
D.to standardize common expressions
17.If a binary classifier always predicts 1 (the positive class), then precision will be 1. (T/F)
18.With the Fellegi-Sunter model, an m-probability represents:
A.the probability of being a match divided by the probability of not being a match
B.the prior probability of records matching
C.the probability of a record pair agreeing on a particular field, given that the pair is not a match.
D.the probability of a record pair agreeing on a particular field, given that the pair is a match.
19.A good blocking variable should be one that:
A.has values that are not usually standardized
B.has a large number of possible values
C.has values that tend to change frequently
D.has a small number of possible values
20.The Soundex representation of a name is meant to accurately reflect how the name is pronounced. (T/F)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started