Question

1 Approved Answer

Posted on Sep 19, 2024

Amazon wants to launch new digital marketing campaigns in for various categories for different brands to come up with new Christmas deal to: Increase their

Amazon wants to launch new digital marketing campaigns in for various categories for different brands to come up with new Christmas deal to: Increase their sales by a certain percentage. Promote products which are not least selling well Promote products which are giving higher more profits They have provided transactional data csv which contains historical transactions for a of few years, which also contains products details across multiple categories. As an analytics consultant, your responsibility is to provide valuable insights to the marketing, product, sales, and procurement teams. You have to provide various statistics across products /brands/categories/ segments and tell which all are performing well and which need to bean improved improvement. These statistics will help to run effective digital marketing campaigns. The scope of this project is limited to data engineering and analysis. Domain: Retail & Payments Analysis to be done: With transaction data we will collect stats about products, Customers, / and Brands segments and decide on a strategy to increase sales and find out if any segment is not performing well in terms of sale. Content: Data set contains transactions in csv format having following features: Invoice Number: Unique ID of Transaction Product Code: Unique Id of Product Description: Product Description Brand: Product Brand Category: Category of the Product Sub Category: Sub Category Of the Product Actual Price: Actual price printed on Product Profit Margin in % (Margin is on Transaction Amount) Discount Percentage: Discount offered on Product Quantity: Quantity Sold Transaction Amount: Amount spent on purchasing product Invoice Date: Date of the transaction Customer Id: Unique Id of Customer Country: Customer Country Mode of Payment: Cash / Card Steps to Solve: Read CSV file and push data to Kafka using Kafka Streams. Points to consider: a). Make sure no transaction log is lost b). They should be ordered sequentially. c). What should be your partition key in Kafka? d). Number of partitions needed theoretically. e). How you will increase producer rate (event publishing rate)? 2). Using Apache Flume to Push Data in HDFS from Kafka 3). Read data from HDFS using Spark Streaming [Stats] a). keep displaying live reports in spark logs b). For live reporting 4). Sending aggregate data to No SQL Database for Dashboard (UI Purpose) a). Sending aggregate data in H Base i). Define column families 5). Data Insights to be provided: Non-Real time insights i) . Which particular inventory you need to store at specific location Find top 5 most performing brand in mobiles in the months of November, December, and & January month for each year Find top 5 Products which were sold most in the months of November, December, and January month for each year ii). For investors 1). Number of products sold along with revenue, aggregated quarterly 2). Number of products sold in each category, aggregated quarterly 3). Number of products sold for each brand, aggregated quarterly iii). Find products which generated profit 1). Top 5 profit margin generating products with the highest profit margins in a given duration iv). Which category should a brand be marketing (Least and most performing brands across categories) Brand wise segregation of best performing categories profits wise. Brand wise segregation of least performing categories orders wise.

v). Find products which should be part of lighting deal: Find the top 5 selling products in given month (hot products) Find the top Find top 5 products having the highest having highest margin in a given month. (We can afford to give a higher more discount rates) Find least 5 selling products in a given month. (We can increase click rate and clear old stock)

b). Real time insights i. Provide alert if we have achieved target revenue for a given month. 8+ crores: Over Achieved 6+ crores: On Target > 6 crores: Under Achieved ii). Trending brands for given month order wise Points to consider while designing solution It is mandatory to solve this using creating pipeline which you should be able to use in production systems also. Components which are used should be loosely coupled In real scenario you wont get data in form of CSV files. Most companies have their on-data warehouse. They might use below approaches. Consumes historical data from HDFS or different file system so your code should have the capability to read and process data from file systems. For real time data companies use messaging systems like Rabbit MQ, Kafka, SQS, Kinesis etc. Your code should able to consume from them and process. It is important to avoid single point of failure.