Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Notice the use of the WITH keyword. This feature is known as Common Table Expressions . The easiest way to think of this is that

Notice the use of the WITH keyword. This feature is known as Common Table Expressions. The easiest way to think of this is that it creates a named temporary table which gets used later as a nested query. This is important is it is crucial to our issue. MySQL docs for this feature: https://dev.mysql.com/doc/refman/8.0/en/with.html

Also notice that there are several operators in this query. They are specific to the Amazon Redshift database this query was written for an will not work in MySQL. These are the :: for casting types and || for concatenation. There are a couple of other features specific to Redshift such as ILIKE as a case insensitive version of LIKE and the syntax for INTERVAL being slightly different. A version of this query modified to work with MySQL will be provided below.

One more feature to examine is the CASE statement. This is not relevant to this optimization task but a highly useful feature which functions similar to SWITCH statements in programming languages. https://www.w3schools.com/SQL/func_mysql_case.asp

Now that you've looked through the original query, and realize it is a series of smaller queries nested, let's try the MySQL version.

First, run the creates.sql

initialization query.

Second, load the converted query: OptimizeQueryMySQL.sql

The query should run no problem, despite no data entered in the table. We don't need data, we need the query plan. Try running EXPLAIN on this query. A query this complicated gives a messy result which makes optimization difficult. If this was your starting point, the the best approach would be to pull out each temporary table and explain in isolation, step by step.

But since this is originally from Redshift, let's look at Redshift's EXPLAIN results: QueryPlan.txt

What you should draw your attention to are the COST values. While these are arbitrary, they are not meaningless. Find the section of the query where this grows the largest, that is the slowest part of the query in need of optimization. (Save yourself some time as search for other_publisher_views) As this query had been previously reviewed for proper indexing, all queries should be indexed so we need to find another optimization.

Since this is already indexed, draw your attention to the ROWS returned. If you look at the next query which makes use of these results, you'll notice the ROWS is much smaller. So our problem is we are looking at way more data than we need to. We are pulling over 100,000 rows when we are trying to find 4 of them.

Now let's extract the critical section to work with, along with simplified versions of its dependencies:

WITH listing_info_raw AS ( SELECT '' AS property_id_sha, '' AS mc_export_property_id_sha, '(Rental)' AS pretty_address, ' (Rental)' AS display_address ), periods AS ( SELECT true AS is_current_period, CAST('2020-11-09' AS date) AS start_date, CAST('2020-11-15' AS date) AS end_date ), other_publisher_views AS ( SELECT clv.date, clv.streeteasy_page_views, clv.trulia_page_views, clv.zillow_page_views, clv.realtor_page_views FROM agent_insights_lt.agent_insights__combined_listing_views AS clv JOIN listing_info_raw ON clv.property_id_sha = listing_info_raw.property_id_sha ), other_publisher_views_by_period AS ( SELECT is_current_period, start_date, end_date, coalesce(sum(streeteasy_page_views), 0) AS streeteasy_page_views, coalesce(sum(trulia_page_views), 0) AS trulia_page_views, coalesce(sum(zillow_page_views), 0) AS zillow_page_views, coalesce(sum(realtor_page_views), 0) AS realtor_page_views FROM periods LEFT JOIN other_publisher_views ON other_publisher_views.date BETWEEN periods.start_date AND periods.end_date ) SELECT * FROM other_publisher_views_by_period;

Please optimize this section of the query, which includes other_publisher_views & other_publisher_views_by_period, thankyou

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Implementing Ai And Machine Learning For Business Optimization

Authors: Robert K Wiley

1st Edition

B0CPQJW72N, 979-8870675855

More Books

Students also viewed these Databases questions

Question

Explain the difference between a product cost and a period cost.

Answered: 1 week ago

Question

List the basic documents used in estate planning.

Answered: 1 week ago

Question

LOQ 15-5: Do psychological disorders predict violent behavior?

Answered: 1 week ago

Question

u = 5 j , v = 6 i Find the angle between the vectors.

Answered: 1 week ago