need on Julia software. please do all, its urgent and i will highly appreciate you
(a) [5 pts] Have a look at the function add_to_A_B_times_C! in HW2_your_code.j1. This function adds the product of the input variables B and C to the input variable A. It is written such as to use the optimal loop order, and employs multiple optimizations by means of the @turbo macro. Go to section (a) of the file HW2_driver. jl to see how this function performs when compared to the built-in mul! function. If necessary, adapt the size of the matrix to your system. Hand in the file performance_per_size.pdf produced by the code with your homework. (b)locked [5 pts] Now implement a blocked/tiled variant of add_to_A_B_times_C! that takes an integer bks as a fourth input. This method should perform the matrix multiplication by calling the original add_to_A_B_times_C! on blocks of size (roughly) bks times bks. Again, your code should be correct and avoid memory allocations as checked by section (b) of HW2_driver. j1. (c) [5 pts] Use section HW2_driver. j1 to try the performance of the blocked algorithm for different matrix sizes and block sizes. Describe your results, and for the most interesting set of parameters, hand in the resulting plot with your homework. Describe and explain the results. (d) oblivious [5 pts] We can sometimes obtain better performance by using a hierarchical algorithm. Complete the skeleton for oblivious_add_to_A_B_times_C! to obtain an algorithm that equally divides each of its input matrices into four parts and then recurses on the resulting eight subproblems, until one of the problems reaches a size below bks. Again, make sure that your algorithm does not allocate memory and is correct using part (d) HW2_driver. j1. (e) [5 pts] Use part (e) of HW2_driver. j1 to benchmark your new algorithm for different values of bks. Are you able to obtain an improvement over the blocked algorithm? Hand in the figure produced by part (e) of HW2_driver.j1 for what you consider the most interesting set of parameters. Algorithms of this type are called "cache-oblivious." Explain why this is the case and what could be the theoretical advantages of this particular cache-oblivious algorithm