Answered step by step
Verified Expert Solution
Question
1 Approved Answer
A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook
A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation. A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started