Answered step by step
Verified Expert Solution
Question
1 Approved Answer
QUESTION 14 After the quarantine is lifted, and people start riding the Green Line again, a person sees you scribbling in a notebook while riding
QUESTION 14 After the quarantine is lifted, and people start riding the Green Line again, a person sees you scribbling in a notebook while riding the T. He looks over your shoulder and says, "Are you seriously using the Jacquard Coefficient to compare records based on categorical variables? That is so dumb of you!" You ask him politely to stop bothering you, but he insists on saying more. "Jacquard is pretty dumb, yo. If you really care about seeing how similar two records are, just use the matching coefficient instead. Take the number of things that the two observations BOTH have, add that to the number of things that NEITHER observation has, and then just divide that sum by the total number of variables that you're looking at." Why might this rude stranger's advice not be good? This rude stranger has overlooked the impact of data normalization. His advice might be good if the data does not require normalization, but it will not be applicable if some form of normalization is needed here. If there is a huge number of variables being considered, a misleading result could occur if this guy's recommendation is followed. If many of the categories are not present for either of the records being compared, the matching coefficient would imply a misleading sense of similarity between those records. The stranger is assuming that the data has already undergone some form of dimensionality reduction, most likely through a process known as Principal Components Analysis (PCA). If that has not yet occurred, then he cannot be sure which approach will be more applicable. The stranger should have mentioned Gower's distance instead -- that is the only certain way to prevent any risk of obtaining a misleading result. QUESTION 14 After the quarantine is lifted, and people start riding the Green Line again, a person sees you scribbling in a notebook while riding the T. He looks over your shoulder and says, "Are you seriously using the Jacquard Coefficient to compare records based on categorical variables? That is so dumb of you!" You ask him politely to stop bothering you, but he insists on saying more. "Jacquard is pretty dumb, yo. If you really care about seeing how similar two records are, just use the matching coefficient instead. Take the number of things that the two observations BOTH have, add that to the number of things that NEITHER observation has, and then just divide that sum by the total number of variables that you're looking at." Why might this rude stranger's advice not be good? This rude stranger has overlooked the impact of data normalization. His advice might be good if the data does not require normalization, but it will not be applicable if some form of normalization is needed here. If there is a huge number of variables being considered, a misleading result could occur if this guy's recommendation is followed. If many of the categories are not present for either of the records being compared, the matching coefficient would imply a misleading sense of similarity between those records. The stranger is assuming that the data has already undergone some form of dimensionality reduction, most likely through a process known as Principal Components Analysis (PCA). If that has not yet occurred, then he cannot be sure which approach will be more applicable. The stranger should have mentioned Gower's distance instead -- that is the only certain way to prevent any risk of obtaining a misleading result
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started