Question
For this analysis assignment, you will work with transaction data from 1-Dec-2010 and 30-Nov-2011 for a UK-based online retail store. The company mainly sold unique,
For this analysis assignment, you will work with transaction data from 1-Dec-2010 and 30-Nov-2011 for a UK-based online retail store. The company mainly sold unique, all-occasion gifts. Many customers of the company were wholesalers.
Download the (Excel) data file for this assignment from Canvas by clicking on the following link: https://www.dropbox.com/s/pl5zm8xfree2c5l/Online%2BRetail.xlsx?dl=0
With the data file saved in your working directory, run the following code to clean up the dataset and get it ready for analysis.
library(readxl) # you may need to install the 'readxl' package first eretail = read_excel("Online Retail.xlsx") dim(eretail) names(eretail) eretail = eretail[eretail$Country != "Unspecified",] # remove 'unspecified' country eretail = eretail[eretail$Quantity > 0,] # remove returns/cancellations IDtab = table(eretail$Country, eretail$CustomerID) # crosstab country by customer ID IDtab = apply(IDtab >0, 2, sum) # is any customer ID duplicated across countries? duplicateIDs = names(IDtab[IDtab > 1]) # duplicate IDs to clean up eretail = eretail[!is.element(eretail$CustomerID, duplicateIDs),] rm(IDtab) eretail$InvoiceMth = substr(eretail$InvoiceDate, 1, 7) # extract month of invoice eretail = eretail[as.vector(eretail$InvoiceMth) != "2011-12",] # remove December 2011 as it only covers first week eretail$Amount = eretail$Quantity * eretail$UnitPrice # compute amount per invoice item eaggr = aggregate(Amount~Country+CustomerID, data=eretail, sum) # compute aggregate amount spent per customer row.names(eaggr) = eaggr$CustomerID eaggr = eaggr[,-2] eaggr = cbind(eaggr, aggregate(InvoiceMth~CustomerID, data=eretail, min)[,-1]) # 1st month of customer interaction names(eaggr)[3] = "FirstMth" eaggr = cbind(eaggr, aggregate(InvoiceMth~CustomerID, data=eretail, max)[,-1]) # last month of cust. interaction names(eaggr)[4] = "LastMth" # relabel months and compute duration of customer interaction eaggr$FirstMth = as.factor(eaggr$FirstMth) eaggr$LastMth = as.factor(eaggr$LastMth) levels(eaggr$FirstMth) = 1:12 levels(eaggr$LastMth) = 1:12 eaggr$Months = as.numeric(eaggr$LastMth) - as.numeric(eaggr$FirstMth) + 1 eaggr = cbind(eaggr, apply( table(eretail$CustomerID, eretail$InvoiceMth) , 1, sum ) ) names(eaggr)[6] = "Purchases" # Some useful statistics (which you may or may not decide to use) eaggr$Amount.per.Purchase = eaggr$Amount / eaggr$Purchases eaggr$Purchases.per.Month = eaggr$Purchases / eaggr$Months eaggr$Amount.per.Month = eaggr$Amount / eaggr$Months eaggr[1:30,]
- Conduct an exploratory data analysis.
- The objective is to segment this company's consumers by the recency and duration of their interaction with the company, the frequency of their purchases/orders, and the amount they spent (amounts are in Sterling pounds). Perform a cluster analysis. Include and interpret the results.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started