Answered step by step
Verified Expert Solution
Question
1 Approved Answer
You have been tasked with developing a prototype fraud system for a compliance team at a financial institution. The first proof of concept aims to
You have been tasked with developing a prototype fraud system for a compliance team at a financial institution. The first proof of concept aims to assist this team by detecting
anomalous events within data pushed to a topic by operational systems. These data are consumed and evaluated by your model, and responses are sent via PubSub for manual investigation by the compliance team. The results of the investigation (confirming that the point was an ‘outlier’) are returned via PubSub for integration into the model improvement process.
1. Your data science solution will reside within the Evaluate component and the operational systems have been publishing data to the metrics topic for the last few days. How do you retrieve data for evaluation (describe it without code)? (1).
2. Suppose your solution is written in Python and is running within a pod in K8S where you are currently using a synchronous pull approach to consume data. You notice that you are falling behind on processing and scale your solution horizontally by adding more processing instances (pods) and subscriptions. Unfortunately, your PubSub costs have gone up after scaling out; why is that and how would you reduce it? (3)
3. Given the simplicity of the problem, you have implemented a Sklearn/Scikit model which doesn’t require any other state when scoring (besides what is consumed on the metrics topic). To leverage the scalability of the DataFlow runner you have decided to deploy your model into a Beam pipeline.
Describe the approach (do not provide code without an explanation)? (3)
anomalous events within data pushed to a topic by operational systems. These data are consumed and evaluated by your model, and responses are sent via PubSub for manual investigation by the compliance team. The results of the investigation (confirming that the point was an ‘outlier’) are returned via PubSub for integration into the model improvement process.
1. Your data science solution will reside within the Evaluate component and the operational systems have been publishing data to the metrics topic for the last few days. How do you retrieve data for evaluation (describe it without code)? (1).
2. Suppose your solution is written in Python and is running within a pod in K8S where you are currently using a synchronous pull approach to consume data. You notice that you are falling behind on processing and scale your solution horizontally by adding more processing instances (pods) and subscriptions. Unfortunately, your PubSub costs have gone up after scaling out; why is that and how would you reduce it? (3)
3. Given the simplicity of the problem, you have implemented a Sklearn/Scikit model which doesn’t require any other state when scoring (besides what is consumed on the metrics topic). To leverage the scalability of the DataFlow runner you have decided to deploy your model into a Beam pipeline.
Describe the approach (do not provide code without an explanation)? (3)
Step by Step Solution
★★★★★
3.52 Rating (152 Votes )
There are 3 Steps involved in it
Step: 1
Retrieving Data for Evaluation To retrieve data for evaluation from the metrics topic within the Evaluate component you can use a message broker or streaming platform like Apache Kafka or Google Cloud ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started