Question

1 Approved Answer

Posted on Sep 25, 2024

3. (20 pts.] A friend majoring in finance wants to implement an algorithm for computing the prefix variance of a sequence dataset of n real-valued

image text in transcribed

3. (20 pts.] A friend majoring in finance wants to implement an algorithm for computing the prefix variance of a sequence dataset of n real-valued numbers ao, Q1, ...,An-1, defined as the sequence of n real-valued numbers , ,..., n-1 such that t=0,1,..., n-1, 2 t +1 (--(-+--)) i=0 Our friend, having found the following two algorithms, PREFIX VARIANCE1 and PREFIX VARIANCE2, that perform the task correctly but not knowing which one to implement, is asking for your recommendation. Both algorithms take as input an n-element array A such that each element A[i] = a; for all i = 0,1,..., n - 1, and output an n-element array B such that each element B[i] = 7 for all i = 0,1,..., n - 1 with the corresponding prefix variance evaluation of the sequence in A. Algorithm 1 PREFIX VARIANCE1(A) 1: B new array of n numbers 2: for tt O to n - 1 do 3: VO 4: for it to t do 5: 240 6: for j 0 tot do 7: z+ 2+ A[j] 8: z+ 2/(t+1) 9: d+ A[i] - 2 10: Vu+dud 11: B[t] + v/(t+1) 12: return B Algorithm 2 PREFIXVARIANCE2(A) 1: B+ new array of n numbers 2: fort+ 0 ton - 1 do 3: 240 4: for it to t do 5: 2+ 2+ A[i] 6: 2+ 2/(t+1) 7: V+O 8: for it to t do 9: d+ A[i] 2 10: Vu+d*d 11: B[t] + v/(t+1) 12: return B (a) Give the tightest running-time and space characterization using Big-Oh and Big-Omega, or Big-Theta, of PREFIXVARIANCE1 in terms of n. Briefly justify your answer. Time Space (b) Give the tightest running-time and space characterization using Big-Oh and Big-Omega, or Big-Theta, of PREFIXVARIANCE2 in terms of n. Briefly justify your answer. Time Space (c) Based solely on asymptotic-analysis considerations of the algorithms, which one do you recommend our friend to implement, and why? 4. (15 pts.] Detecting anomalies in a data set is an important task in data science. One approach to anomaly detection involves the detection, retrieval, and analysis of outliers. The algorithm GETOUTLIERS takes as input an array A of n numbers and a positive number c and outputs a sorted/ordered list L of the numbers in A containing only outliers, where an outlier is defined as a number which deviates more than a factor c from its average u of the numbers in A, relative to the standndard deviation o of the numbers in A. It uses several auxiliary functions. The functions MEAN and std both take as input an array of n numbers and output the average and standard deviation of those numbers, respectively. Assume that they both run in linear time and use a constant amount of space. The function FINDOUTSIDE extract all the elements of an array A of n numbers that are smaller than a given value x and larger than another given value y, all given as input, and returns the elements in A that are in those lower and upper regions (i.e., outside an interval range) of the real-line using a sorted/ordered list data structure. Algorithm 3 GETOUTLIERS(A,C) 1: H+ MEAN(A) 2: 0 + STD(A) 3: return FINDOUTSIDE(A,u-C*0,4 + c*0) (a) Provide an efficient algorithm, in pseudcode, for the function FIND OUTSIDE described above: complete the step-by-step by writing down the missing statements, already started for you below. Assume that you have available an implementation of the sorted- list ADT which includes the method INSERT which, taking as input an element, inserts the element in the proper position in the sorted list, and does so in linear time and constant space. (Make sure to use indentation to clearly indicate the proper scope of each statement.) Algorithm 4 FINDOUTSIDE(A, x, y) 1: L + new sorted list initially empty 2: 3: 4: 5: return L (b) Give the tightest/best possible time and space characterization, Big-Oh, Big-Theta, or Big-Omega, in terms of n, of the algorithm FIND OUTSIDE. Justify your answer. Assume the implementation of the insert operation takes time linear in the size of the sorted list and uses a constant amount of space. (c) Give the tightest/best possible time and space characterization, Big-Oh, Big-Theta, or Big-Omega, in terms of n, of algorithm GETOUTLIERS. Justify your answer. Assume the implementation of the insert operation takes time linear in the size of the sorted list and uses a constant amount of space