Question

1 Approved Answer

Posted on Sep 27, 2024

Kindly explain more further the attached files. Please give an example if possible. Thank you. 176 CHAPTER 8 Workload Management in the Data Warehouse weight

Kindly explain more further the attached files. Please give an example if possible. Thank you.

176 CHAPTER 8 Workload Management in the Data Warehouse weight to the already voluminous data warehouse from a workload perspective, causing overwhelm- ing workloads and underperforming systems. Distributing the workload does not improve scalabil- ity and reduce workload, as anyone would anticipate since each distribution comes with a limited scalability. New workloads and Big Data Big Data brings about a new definition to the world of workloads. Apart from traditional challenges that exist in the world of data, the volume, velocity, variety, complexity, and ambiguous nature of Big Data creates a new class of challenges and issues. The key set of challenges and issues that we need to understand regarding data in the Big Data world include: . Data does not have a finite architecture and can have multiple formats. Data is self-contained and needs several external business rules to be created to interpret and process the data. Data has a minimal or zero concept of referential integrity. Data is not relational. Data needs more analytical processing. Data depends on metadata for creating context. Data has no specificity with volume or complexity. Data is semi-structured or unstructured. Data needs multiple cycles of processing, but each cycle needs to be processed in one pass due to the size of the data. Data needs business rules for processing like we handle structured data today, but these rules need to be created in a rules engine architecture rather than the database or the ETL tool. Data needs more governance than data in the database. Data has no defined quality. Big Data workloads Workload management as it pertains to Big Data is completely different from traditional data and its management. The major areas where workload definitions are important to understand for design and processing efficiency include: Data is file based for acquisition and storage-whether you choose Hadoop, NoSQL, or any other technique, most of the Big Data is file based. The underlying reason for choosing file-based management is the ease of management of files, replication, and ability to store any format of data for processing. Data processing will happen in three steps: 1. Discovery-in this step the data is analyzed and categorized. 2. Analysis-in this step the data is associated with master data and metadata. 3. Analytics-in this step the data is converted to metrics and structured.Technology choices 177 . Each of these steps bring a workload characteristic: . Discovery will mandate interrogation of data by users. The data will need to be processed where it is and not moved across the network. The reason for this is due to the size and complexity of the data itself, and this requirement is a design goal for Big Data architecture. Compute and process data at the storage layer. Analysis will mandate parsing of data with data visualization tools. This will require minimal transformation and movement of data across the network. Analytics will require converting the data to a structured format and extracting for processing to the data warehouse or analytical engines. Big Data workloads are drastically different from the traditional workloads due to the fact that no database is involved in the processing of Big Data. This removes a large scalability constraint but adds more complexity to maintain file system-driven consistency. Another key factor to remember is there is no transaction processing but rather data processing involved with processing Big Data. These factors are the design considerations when building a Big Data system, which we will discuss in Chapters 10 and 11. Big Data workloads from an analytical perspective will be very similar to adding new data to the data warehouse. The key difference here is the tables that will be added are of the narrowarrow type, but the impact on the analytical model can be that of a widearrow table that will become wide/wide. Big Data query workloads are more program execution of MapReduce code, which is completely opposite of executing SQL and optimizing for SQL performance. The major difference in Big Data workload management is the impact of tuning the data pro- cessing bottlenecks results in linear scalability and instant outcomes, as opposed to the traditional RDBMS world of data management. This is due to the file-based processing of data, the self-con- tained nature of the data, and the maturity of the algorithms on the infrastructure itself. Technology choices As we look back and think about how to design the next generation of data warehouses with the concept of a workload-driven architecture, there are several technologies that have come into being in the last decade, and these technologies are critical to consider for the new architecture. A key aspect to remember is the concept of data warehousing is not changing but the deployment and the architecture of the data warehouse will evolve from being tightly coupled into the database and its infrastructure to being distributed across different layers of infrastructure and data architecture. The goal of building the workload-driven architecture is to leverage all the technology improve- ments into the flexibility and scalability of the data warehouse and Big Data processing, thereby creating a coexistence platform leveraging all current-state and future-state investments to better ROI. Another viewpoint to think about is that by design Big Data processing is built around proce- dural processing (more akin to programming language-driven processing), which can take advan- tage of multicore CPU and SSD or DRAM technologies to the fullest extent, as opposed to the RDBMS architecture where large cycles of processing and memory are left underutilized.178 CHAPTER 8 Workload Management in the Data Warehouse SUMMARY The next chapter will focus on these technologies that we are discussing including Big Data appli- ances, cloud computing, data virtualization, and much more. As we look back at what we have learned from this chapter, remember that without understanding the workload of the system, if you create architectures, you are bound to have limited success. In conclusion, the goal is for us to start thinking like designers of space exploration vehicles, which mandate several calculations and opti- mization techniques to achieve superior performance and reusable systems. This radical change of thinking will help architects and designers of new solutions to create robust ecosystems of technologies