Suppose we want toperform a marketing study on the customers of an online store (shoes, movies, songs it does not matter what we are selling) Our data points are customers, C1, C2, , CN, and the variables are (rated) products P1, P2, , PM For each customer we have a vector of product ratings p1, , pM (thus a customer is a point in an M dimensional space) For the sake of the argument, we are oversimplifying the problem here by assuming that for each customer we have complete or at least almost complete (with but a few missing values) vector p1, ,pM(or maybe we have had a large focus group that we enticed into rating large number of products by promising that one lucky person will win a starbucks gift card worth $25) In real life, of course those vectors are most likelyvery sparse, with very few ratings available, per customer, across hundreds or thousands of products, so the problem of similarity will be much more difficult With such dataset in hand, we can pursue a few (related) goals Indeed, we can try the following for a new customer CX with some preferences known, find a few, (let's say 10) closest customers based on the similarity of the preference profiles across products and try predicting missing preferences for customer CX based on the preferences of those similar ones (this is the spirit of K nearest neighbors approach ) if our existing N customers themselves have some preferences missing, we can also impute missing preferences on the very dataset we already have and make some educated suggestions for customers C1 CN with regard to the products movies songs they did not rate buy yet we can try clustering data points C1 CN, in the hope that if there are indeed strong, well separated clusters in the data, then it is possible that we would obtain more reliable and robust preference prediction if instead of just looking at 10 nearest customers, we map new customer CX onto the nearest cluster and use that cluster's average preference profile for prediction Note that if we have the cluster structure, we can also tell how close the new customer is to its center (i e how typical they are), while if we simply do it KNN style, then it's harder to know if the nearest neighbors are representative enough and allow for generalization, or our customer CX is very unusual and their nearest neighbors are some rare unusual outliers themselves and using them as the basis for prediction is riskier Furthermore, if we do discover good, well separated clusters ( segments in our customer population), we might want to use customer data (e g sex, age, education level, zip code,socio economic parameters if we have them of course ) to try performing classification, i e fitting a model that predicts (categorical) cluster ID 1 K based on those customer data (classification is the theme of the remaining part of this course ) It might turn out that it is much easier to predict the cluster (distinct customer group we just discovered) from customer's personal data and then use that cluster's average preference profile to predict what the new customer is going to like, instead of trying to regress directly preferences for each product against customer personal data (just think how many models we'd need for that ) The difference from the previous item (where it was also suggested to map new customer onto a cluster) is that here we would not have to know (partial) preference profile for the new customer at all In other words, the logic is (1) find clusters distinct groups of customers 1 K based on preferences (2) learn to predict cluster 1 K based on customer personal data (but not preferences ) (3) as the new customer arrives, use our model and their personal data to predict the cluster (group of customers) they are likely to fall into and as we know the cluster, we also know the new customer's likely (at least, likelier ) preferences Now, think about those different avenues our analysis can take Whether it's clustering or KNN, we operated so far with (deliberately vague) notion of similarity between two data points (customers) i and j (pi1, , piM) vs (pj1, , pjM) In reality the preferences might be (semi )categorical star ratings , but these are still ordered variables and they run over range 0 5, so in the first approximation we can consider them to be numerical Thus we do not have to deal with defining a distance between realizations of categorical variables (which is still possible) Assuming that the ratings are just numbers, and that you want to be able to group together customers that tend to give similar actual ratings to the same product, across many different products (and or you want to predict the actual rating of a product for a new customer by assigning this customer into a previously discovered cluster), which distance metrics you would try Note that for this AND the following question we assume that the data are NOT scaled ( ) correlation ( ) Euclidean distance ( ) Hamming distance (Hamming distance between two strings vectors equals the number of mismatches, i e unequal values) ( ) absolute value of correlation Question 4 Continuing after Question 3 Suppose that you believe that some people are very opinionated and easily moved, and they tend to give either very low or very high ratings (0 or 5, hate it or love it ), while others are much more reserved demanding and tend to give products they like a modest rating of 3 4 (reserving 5 for something truly incredible that remains to be seen) In other words, customers may have preference profiles that differ onlyby the magnitude of like and dislike ratings If you suspect that this is what happens in your data, you want to account for such possibility, and you would like to call customers similar for as long as they like the same products more and the same products less, regardless of the actual strength of their opinions, then what distance metric would you try for clustering ( ) correlation ( ) Hamming ( ) euclidean ( ) absolute value of correlation Continuing after Questions 3 and 4 Would it make sense to perform two way clustering in this problem (i e also consider the products as the data points and cluster them in the N dimensional space of variables C1, ,CN), and what would the clustering of P1, PM represent Group of answer choices A ( ) Yes, it makes sense The clusters, if any, would represent similarly priced products B ( ) It makes no sense There is no clear concept of similarity between product preference profiles across customers C ( ) Yes, it makes sense The clusters, if any, would represent customers that like the same products D ( ) Yes, it makes sense Clusters (if any) would represent groups of products that tend to be all liked or all disliked, simultaneously, by customers Hmmm could we use those clusters for making recommendations (of movies, for instance) too Or to make sure we place those products next to each other on the store shelves to increase the sales

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 23, 2024

Suppose we want toperform a marketing study on the customers of an online store (shoes, movies, songs - it does not matter what we are

Suppose we want toperform a marketing study on the customers of an online store (shoes, movies, songs - it does not matter what we are selling). Our data points are customers, C1, C2, ..., CN, and the variables are (rated) products P1, P2, ..., PM. For each customer we have a vector of product ratings p1, ..., pM (thus a customer is a point in an M-dimensional space). For the sake of the argument, we are oversimplifying the problem here by assuming that for each customer we have complete or at least almost complete (with but a few missing values) vector p1,...,pM(or maybe we have had a large focus group that we enticed into rating large number of products by promising that one lucky person will win a starbucks gift card worth $25). [In real life, of course those vectors are most likelyvery sparse, with very few ratings available, per customer, across hundreds or thousands of products, so the problem of "similarity" will be much more difficult!] With such dataset in hand, we can pursue a few (related) goals. Indeed, we can try the following:

for a new customer CX with some preferences known, find a few, (let's say 10) "closest" customers based on the similarity of the preference profiles across products and try predicting missing preferences for customer CX based on the preferences of those "similar" ones (this is the spirit of K-nearest neighbors approach!); if our existing N customers themselves have some preferences missing, we can also impute missing preferences on the very dataset we already have and make some educated suggestions for customers C1...CN with regard to the products/movies/songs they did not rate/buy yet.
we can try clustering data points C1...CN, in the hope that if there are indeed strong, well separated clusters in the data, then it is possible that we would obtain more reliable and robust preference prediction if instead of just looking at 10 "nearest" customers, we map new customer CX onto the nearest cluster and use that cluster's average preference profile for prediction. Note that if we have the cluster structure, we can also tell how close the new customer is to its center (i.e. how "typical" they are), while if we simply do it KNN-style, then it's harder to know if the nearest neighbors are representative enough and allow for generalization, or our customer CX is very unusual and their nearest neighbors are some rare/unusual outliers themselves and using them as the basis for prediction is riskier
Furthermore, if we do discover good, well separated clusters ("segments" in our customer population), we might want to use customer data (e.g. sex, age, education level, zip code,socio-economic parameters - if we have them of course!) to try performingclassification,i.e. fitting a model that predicts (categorical) cluster ID 1...K based on those customer data (classification is the theme of the remaining part of this course!). It might turn out that it is much easier to predict the cluster (distinct customer group we just discovered) from customer's personal data and then use that cluster's average preference profile to predict what the new customer is going to like, instead of trying to regress directly preferences for each product against customer personal data (just think how many models we'd need for that!). The difference from the previous item (where it was also suggested to map new customer onto a cluster) is that here we would not have to know (partial) preference profile for the new customer at all. In other words, the logic is: (1) find clusters/distinct groups of customers 1...K based on preferences; (2) learn to predict cluster 1...K based on customer personal data (but not preferences!); (3) as the new customer arrives, use our model and their personal data to predict the cluster (group of customers) they are likely to fall into - and as we know the cluster, we also know the new customer's likely (at least, "likelier") preferences!

Now, think about those different avenues our analysis can take. Whether it's clustering or KNN, we operated so far with (deliberately vague) notion of "similarity" between two data points (customers) i and j :(pi1, ..., piM) vs (pj1, ..., pjM). In reality the preferences might be (semi-)categorical "star ratings", but these are still ordered variables and they run over range 0-5, so in the first approximation we can consider them to be numerical. Thus we do not have to deal with defining a distance between realizations of categorical variables (which is still possible). Assuming that the ratings are just numbers, and that youwant to be able to group together customers that tend to give similaractualratings to the same product, across many different products(and/or you want to predictthe actualrating of a product for a new customer by assigning this customer into a previously discovered cluster), which distance metrics you would try:

[* Note that for this AND the following question we assume that the data are NOT scaled]

( ) correlation

( ) Euclidean distance

( ) Hamming distance (Hamming distance between two strings/vectors equals the number of mismatches, i.e. unequal values)

( ) absolute value of correlation

Question 4

Continuing after Question 3. Suppose that you believe that some people are very opinionated and easily moved, and they tend to give either very low or very high ratings (0 or 5, hate it or love it!), while others are much more reserved/demanding and tend to give products they like a modest rating of 3-4 (reserving 5 for something truly incredible that remains to be seen). In other words, customers may have preference profiles that differ onlyby the magnitude of "like" and "dislike" ratings. If you suspect that this is what happens in your data, you want to account for such possibility, and you would like to call customers "similar" for as long as they like the same products more and the same products less, regardless of the actual strength of their opinions, then what distance metric would you try for clustering:

( ) correlation

( ) Hamming

( ) euclidean

( ) absolute value of correlation

Continuing after Questions 3 and 4.

Would it make sense to perform two-way clustering in this problem (i.e. also consider the products as the data points and cluster them in the N-dimensional space of variables C1,...,CN), and what would the clustering of P1, ...PM represent?

Group of answer choices

A ( ) Yes, it makes sense. The clusters, if any, would represent similarly priced products

B ( ) It makes no sense. There is no clear concept of "similarity" between product preference profiles across customers

C ( ) Yes, it makes sense. The clusters, if any, would represent customers that like the same products

D ( ) Yes, it makes sense. Clusters (if any) would represent groups of products that tend to be all liked or all disliked, simultaneously, by customers. Hmmm... could we use those clusters for making recommendations (of movies, for instance) too? Or to make sure we place those products next to each other on the store shelves to increase the sales?