Answered step by step
Verified Expert Solution
Link Copied!
Question
1 Approved Answer

Data Management and Data Analytics . Objective of the project . Base SAS Programming Using SAS Studio on SAS Viya C. SAS Visual Data Mining

Data Management and Data Analytics . Objective of the project . Base SAS Programming Using SAS Studio on SAS Viya C. SAS Visual Data Mining and machine Learning (In some questions Base SAS Programming and SAS Visual Analytics should also be used) D. Instructions . Datasets description

image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed image text in transcribed Data Management and Data Analytics A. Objective of the project This Milestone Project is part of the required procedure for obtaining the SAS Academic Specialization in Data Management and Data Analytics. The objective of the project is to apply techniques for accessing. processing, managing and mining of real world data and to provide solutions to business problems that today's organizations face through the use of Base SAS Programming. SAS Visual Analytics and SAS Visual Data Mining and Machine Learning on SAS Viya. In order to accomplish the above objectives you are given a set of real world POS data that are related to sales of a retail compary along with other related data that are presented and described in section E of this document. You are asked to analyze the given data through the use of Base SAS, SAS VA and SAS VDMML and to write a relevant report (dellverable) to be handed to the management team of the organization by answering the question that follow. You are asked to analyze the given data through the use of SAS Viya and to write a relevant business report (deliverable) to be handed to the management team of the organization by answering the question that follow. The business format of the report means that, as it was explained in class, it should [1] contain advice and propositions on how the organization should act to become more efficient and more effective in their operation, based on the descriptive and predictive output of the data management, reporting and analytics work done using the data provided. It should be underlined that this is an individual project and the deadline for submission is 02/04/2023. The deliverable should be sent in paf and docx format to Andreas.Zaras@amail,com and to the secretariat of the program with title "FirstName_LastName_SAS:" Where FirstName and LastName is the first and last name of each student respectively. In the first page of the report the credentials used to access 5AS Vlya software should also be included. B. Base SAS Programming Using SAS Studio on SAS Viya The following tasks require the use of Base 5A5. Please take into account the following: - The data sets should be transformed to SAS format with the use of the data step or data sets must be done using exclusively the data step but the calculation of the statistics e.g. visits, total cost of purchases etc can be done using proc sql. Create a pie chart and a frequency table with the percentages of customers that belong to each age group. Augment your analysis by providing pie charts for the behavioral characteristics for each age group. 3. Exploration and understanding of sales: - What was the level of Sales and Returns? Create a bar chart with the monetary values. - Create graphs for the average basket size i.e. number of SKU's, total monetary value, etc and comment on your findings, - Create a report that shows the top products per product line and product type with respect to sales value in descending order. Show also the subtotal sales of each product type. - Use graphs to show the contribution to the company's revenues of each region of the country. - For the top region found In the previous question show the contribution to the company's revenues per gender. Proc sql can be used only for the calculation of the statistics e.g. of the average basket and not e.g. for merging data sets (for this data step should be used). 4. Zoom into the promotional activities by answering the following questions: - Use graphs to show what is the percentage of products that are sold without promotion and what is the percentage of products sold with promotion. Create a format to display the 0% promotion as "No Promotion" and the 10%, 20% and 30% as "Promotion". - Create pie charts to show the percentage of products that are sold on each promotion type (use the deseription of the promotion and not its code). Do not include the products sold without promotion. - What is the distribution of sales per day of the week? Is there any difference among the various days with respect to the number of distinct SKU's per invoice. In order to find the day of the week when the sale takes place use the weekday function. [4] 5. It should be also mentioned that the SKU of each product contains "hidden" information. The ninth (92) digit indicates the company that supplied the product (supplier), In order to unhide this piece of information use relevant functions and then store it to a new column. If we assume that an SKU is 58720443450301, then the supplier code is 4 . - Create a frequency report and a relevant chart to show the percentage of products sold by each supplier (use the name of the supplier and not its code). Weight the frequency of the skU by the quantity sold. This will show the supplier with the highest demand. - Create graphs to show the percentage and actual revenues of products sold by each supplier (use the name of the supplier and not its code). - Create a cross tabulation table to show the total revenue of the company with respect to the origins of the products sold by each supplier (Use the names of the suppliers and the names of the countries of origins and not their codes. Put the total revenue in the middle of the cross tabulation, the origin in the rows and the suppliers in the columns). For this task you have to use proc tabulate (find relevant instructions in the web or in sas helpl. 6. The company wants to protile its customers based on their importance so as to offer them personalized services and products. The customer segmentation is asked to be done based on the three parameters of the RFM model. Before the application of the RFM model the RFM data set should be created. It is reminded that the RFM model is based on the following three parameters: relevant data are stored in the data set customersAboutClas597.xlsx. The percentage of customers that bought category 97 is 30% and the percentage of customers that did not buy category 97 is 70%. The company will use this data set to build a predictive model and then it will apply the model to new data. - What should be done as part of the modelling process it the proportion of buyers non buyers was 5% (1200 customers) - 95% (22800 customers)? - Are there any missing values in the variables of the dataset? Provide a screenshot form the software to prove it. - What is the proportion of buyers and non-buyers in the data set? Provide a screenshot from the software to prove it (pie chart). - Provide a pie chart showing the proportion of buyers and non-buyers in ages over 40. What do you observer? - What is the average age of buyers and non-buyers? What does this mean with regards to the target variable? The marketing department of the company has conckuded to the following profit matrix. Explain the protit matrix by making any assumption that you like and use it so as to choose to how many and which customers you will send the promotional activity. - Based on this profit matrix what is the minimum probability that a customer must have so as to be considered as buyer? Provide the mathematical calculations that lead to the above result. - How many terminal leaves does the maximal tree have? Check the performance of the tree for the training and the validation data set and provide relevant charts. How is the phenomenon that is presented with the line of the training data set is called? Explain it in a few lines. Describe what is the solution to this phenomenon problem. The maximal tree should be used and explained only in this questions; in all the other tasks use and explain the optimal tree. [7] - As you know the decision tree and the decision tree model are two different concepts. Provide a description of the decision tree model. - Write a paragraph to explain the decision tree to non-technical people. What are the most important variables that differentiate buyers from non-buyers? - Check the curnulative $ response graph of the validation data set. Explain what this graph shows by using the 15\% point of the horizontal axis. - Check the \% response graph of the validation data set. How was this graph constructed and what do the values in the horizontal axis show? Explain what this graph shows by using the 15\% point of the horizontal axis. - Check the cumulative lift chart of the validabion data set. Explain what this graph shows by using the 20% point on the horizontal axis. - Check the cumulative \% captured response graph of the validation data set. Explain what this graph shows by using the 20As point on the horizontal axis. The company wants to apply the model by scoring 200 customers that are placed in the file entitled newCustomers.xlsx. - How many of those 200 customers are predicted to be buyers and how many are predicted to be non-buyers according to the model? - What is the biggest probability for someone to be buyer among these 200 The company wants to apply the model by scoring 200 customers that are placed in the file entitied newCustomers. x lsx. - How many of those 200 customers are predicted to be buyers and how many are predicted to be non-buyers according to the model? - What is the biggest probability for someone to be buyer among these 200 customers? What is the smallest probability? Note: In order to build the predictive model (decision tree) use the file customersAboutClass97xdsx. The data to be scored can be found in the file entitled NewCustomers.xlsx. In order to conduct the analysis you must transform the Excel files to SAS data sets and upload them to SAS Viya. D. Instructions - It is underined that the answers to the above questions should be addressed to business people so they should be written accordingly to be understandabie and aid in the decision making process. [8] - Charts and tables that document the answers should be included in the main deliverable. - Screenshots of technical details about the software and about how it was v ind produce the results should be included in the appendix of the report. The sas 9/18 should also be included in the appendix. E. Datasets description The datasets consist of POS data from a retail store. The available data are included in the following tables. The first one of them is related to data about customers and is entitled Customer, the second and the third are related to POS data and are entitled Imvoice & Basket respectively. the fourth contains the coding of the payment method done and is entitled Payment_Method, the fitth contains the coding of the promotional activities running and is entitled Promotions, the sixth contains the coding of the suppliers and is entitled Suppliers and finally the seventh contains the coding of the product origin and is entitied Product_Origin. This table is related to the data about the customers and contains the following columns: - CustomeriD: Customer ID, (unique for every customer) - LastName: The surname of the customer - FirstName: The first name of the customer We make the assumption that an imvoice can be paid with more than one payment methods. This table contains the following columns: - InvoicelD: The ID of the invoice (unique for every invoice) - ProductID: The ID of the product (unique for every product) - Promotionid: The promotion id - Quantity: The quantity of the product sold [12] This table contains the following columns: - ProductiD: The ID of the product (unique for every product) - ProductLine: The upper level of the product hierarchy This table contains the following colurnns: - SupplieriD: The ID of the supplier (unique for every supplier) - SupplierName: The name of the supplier - Cust_Code : The code of the customer - Class97: 1/0, whether the customer bought class 97 or not - BoughtClasses: The number of product classes the customer bought - Frequency: The number of times the customer has bought products from the company 16/14 SSaS forms - Recency: The number of months that have passed since the last time the customer made a purchase from the company - Monetary; The average purchase in monetary units that the customer has done in company's products - Age: The age of the customer - City: The city where the customer lives Data Management and Data Analytics A. Objective of the project This Milestone Project is part of the required procedure for obtaining the SAS Academic Specialization in Data Management and Data Analytics. The objective of the project is to apply techniques for accessing. processing, managing and mining of real world data and to provide solutions to business problems that today's organizations face through the use of Base SAS Programming. SAS Visual Analytics and SAS Visual Data Mining and Machine Learning on SAS Viya. In order to accomplish the above objectives you are given a set of real world POS data that are related to sales of a retail compary along with other related data that are presented and described in section E of this document. You are asked to analyze the given data through the use of Base SAS, SAS VA and SAS VDMML and to write a relevant report (dellverable) to be handed to the management team of the organization by answering the question that follow. You are asked to analyze the given data through the use of SAS Viya and to write a relevant business report (deliverable) to be handed to the management team of the organization by answering the question that follow. The business format of the report means that, as it was explained in class, it should [1] contain advice and propositions on how the organization should act to become more efficient and more effective in their operation, based on the descriptive and predictive output of the data management, reporting and analytics work done using the data provided. It should be underlined that this is an individual project and the deadline for submission is 02/04/2023. The deliverable should be sent in paf and docx format to Andreas.Zaras@amail,com and to the secretariat of the program with title "FirstName_LastName_SAS:" Where FirstName and LastName is the first and last name of each student respectively. In the first page of the report the credentials used to access 5AS Vlya software should also be included. B. Base SAS Programming Using SAS Studio on SAS Viya The following tasks require the use of Base 5A5. Please take into account the following: - The data sets should be transformed to SAS format with the use of the data step or data sets must be done using exclusively the data step but the calculation of the statistics e.g. visits, total cost of purchases etc can be done using proc sql. Create a pie chart and a frequency table with the percentages of customers that belong to each age group. Augment your analysis by providing pie charts for the behavioral characteristics for each age group. 3. Exploration and understanding of sales: - What was the level of Sales and Returns? Create a bar chart with the monetary values. - Create graphs for the average basket size i.e. number of SKU's, total monetary value, etc and comment on your findings, - Create a report that shows the top products per product line and product type with respect to sales value in descending order. Show also the subtotal sales of each product type. - Use graphs to show the contribution to the company's revenues of each region of the country. - For the top region found In the previous question show the contribution to the company's revenues per gender. Proc sql can be used only for the calculation of the statistics e.g. of the average basket and not e.g. for merging data sets (for this data step should be used). 4. Zoom into the promotional activities by answering the following questions: - Use graphs to show what is the percentage of products that are sold without promotion and what is the percentage of products sold with promotion. Create a format to display the 0% promotion as "No Promotion" and the 10%, 20% and 30% as "Promotion". - Create pie charts to show the percentage of products that are sold on each promotion type (use the deseription of the promotion and not its code). Do not include the products sold without promotion. - What is the distribution of sales per day of the week? Is there any difference among the various days with respect to the number of distinct SKU's per invoice. In order to find the day of the week when the sale takes place use the weekday function. [4] 5. It should be also mentioned that the SKU of each product contains "hidden" information. The ninth (92) digit indicates the company that supplied the product (supplier), In order to unhide this piece of information use relevant functions and then store it to a new column. If we assume that an SKU is 58720443450301, then the supplier code is 4 . - Create a frequency report and a relevant chart to show the percentage of products sold by each supplier (use the name of the supplier and not its code). Weight the frequency of the skU by the quantity sold. This will show the supplier with the highest demand. - Create graphs to show the percentage and actual revenues of products sold by each supplier (use the name of the supplier and not its code). - Create a cross tabulation table to show the total revenue of the company with respect to the origins of the products sold by each supplier (Use the names of the suppliers and the names of the countries of origins and not their codes. Put the total revenue in the middle of the cross tabulation, the origin in the rows and the suppliers in the columns). For this task you have to use proc tabulate (find relevant instructions in the web or in sas helpl. 6. The company wants to protile its customers based on their importance so as to offer them personalized services and products. The customer segmentation is asked to be done based on the three parameters of the RFM model. Before the application of the RFM model the RFM data set should be created. It is reminded that the RFM model is based on the following three parameters: relevant data are stored in the data set customersAboutClas597.xlsx. The percentage of customers that bought category 97 is 30% and the percentage of customers that did not buy category 97 is 70%. The company will use this data set to build a predictive model and then it will apply the model to new data. - What should be done as part of the modelling process it the proportion of buyers non buyers was 5% (1200 customers) - 95% (22800 customers)? - Are there any missing values in the variables of the dataset? Provide a screenshot form the software to prove it. - What is the proportion of buyers and non-buyers in the data set? Provide a screenshot from the software to prove it (pie chart). - Provide a pie chart showing the proportion of buyers and non-buyers in ages over 40. What do you observer? - What is the average age of buyers and non-buyers? What does this mean with regards to the target variable? The marketing department of the company has conckuded to the following profit matrix. Explain the protit matrix by making any assumption that you like and use it so as to choose to how many and which customers you will send the promotional activity. - Based on this profit matrix what is the minimum probability that a customer must have so as to be considered as buyer? Provide the mathematical calculations that lead to the above result. - How many terminal leaves does the maximal tree have? Check the performance of the tree for the training and the validation data set and provide relevant charts. How is the phenomenon that is presented with the line of the training data set is called? Explain it in a few lines. Describe what is the solution to this phenomenon problem. The maximal tree should be used and explained only in this questions; in all the other tasks use and explain the optimal tree. [7] - As you know the decision tree and the decision tree model are two different concepts. Provide a description of the decision tree model. - Write a paragraph to explain the decision tree to non-technical people. What are the most important variables that differentiate buyers from non-buyers? - Check the curnulative $ response graph of the validation data set. Explain what this graph shows by using the 15\% point of the horizontal axis. - Check the \% response graph of the validation data set. How was this graph constructed and what do the values in the horizontal axis show? Explain what this graph shows by using the 15\% point of the horizontal axis. - Check the cumulative lift chart of the validabion data set. Explain what this graph shows by using the 20% point on the horizontal axis. - Check the cumulative \% captured response graph of the validation data set. Explain what this graph shows by using the 20As point on the horizontal axis. The company wants to apply the model by scoring 200 customers that are placed in the file entitled newCustomers.xlsx. - How many of those 200 customers are predicted to be buyers and how many are predicted to be non-buyers according to the model? - What is the biggest probability for someone to be buyer among these 200 - Address: The street address and number of the customer - Country: The country of origin of each customer - Postal_code: The postal code of the customer - City: The city where the customer resides. - Region: The region where the customer resides - Gender: The gender of the customer - Day_Of_Birth: The day when the customer was born - Month_Of_Birth: The month when the customer was born - Year_of_Birth: The year when the customer was born This table contains data about the issued invoice (sale or return) and contains the following columns: - InvoicelD: The ID of the invoice funique for every inwoice) - InvoiceNo: The issue number of the invoice (unique for every invoice) - InwoiceDate: The date when the invoice was issued - CustomeriD: Customer ID, (unique for every customer) - Payment_Method: The code of the payment method [10] - Operation: Denotes whether the invoice is related to Sales or Return - ProductType: The middle level of the product hierarchy - Product: The name of the product (lowest level of the product hierarchy) - SKU: The stock keeping unit of the product. - ProductOrigin: The ID of the origin of the product - ProductPrices The price of the product This table contains the following columns: - Promotion_ID: The ID of the promotion - Promotion: The $ discount on the product price This table contains the following columns: - Countrya The country of origin of the product - Code: The code of the country of origin of the product - Monetary; The average purchase in monetary units that the customer has done in company's products - Age: The age of the customer - City: The city where the customer lives

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image
Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing

Authors: David M. Kroenke, David Auer

11th Edition

B003Y7CIBU, 978-0132302678

More Books

Students explore these related Databases questions

Question

Group Size and Communication

Answered: 3 weeks ago

Question

Understanding Group Roles

Answered: 3 weeks ago