Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook

image text in transcribed

A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation. A large meteorological data organisation regularly collects various meteorological data from various locations spread throughout the nation, applies some analytics and provides district-wise weather outlook for the next 6 hours, 12 hours, 24 hours and 48 hours. It is known that historical data for each district, past prediction details and new district data collected are together used to generate the forecasts. The data is fetched hourly, and details are updated every hour. For each district, the whole fetching process takes about 15 minutes and the core analytics work takes about 30 minutes. Since they still use a legacy network, any parallelization will result in a 5 min communication overhead. Assume there are 650 districts, and a cluster of 65,000 nodes. 80% of code can be parallelised. i.How much speedup is theoretically achievable given the high communication overhead? ii. The organisation was using reduced data size so that they could complete work in time at the cost of reduced accuracy for larger time windows. If it uses full data, then time will increase by a factor of 4. Will the company be able to execute for full data if communication overhead was reduced to zero? Justify with relevant computation

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Databases In Networked Information Systems 6th International Workshop Dnis 2010 Aizu Wakamatsu Japan March 2010 Proceedings Lncs 5999

Authors: Shinji Kikuchi ,Shelly Sachdeva ,Subhash Bhalla

2010th Edition

3642120377, 978-3642120374

More Books

Students also viewed these Databases questions

Question

3. Discuss the process of behavior modeling training.

Answered: 1 week ago