Question
Consider the following relational schema. Actors( aID string, fName string, lName string, gender string) Movies( mID integer, name string, mYear year) Directors ( dID integer,
Consider the following relational schema.
Actors(aID string, fName string, lName string, gender string)
Movies(mID integer, name string, mYear year)
Directors (dID integer, fName string, lName string)
Casts (aID string, mID integer, role string)
MovieDirectors (dID integer, mID integer)
Genres (mID integer, genre string)
The primary key fields are underlined, and the domain of each field is listed after the field name. Thus aID is the key for Actors, mID is the key for Movies, dID is the key for Directors, (aID, mID, role) forms the key for Casts, (dID, mID) forms the key for MovieDirectors, and (mID,genre)forms the key for Genres.
These tables have been created using MySQL server. Download IndexDB.sql to your computer and run it on MySQL. You will need to import the data from the movies database, which is a set of plain-text data files. Each file contains all the rows for the named table, one row per line.
In this assignment, you have to import the Movies dataset into MySQL, to analyze the workload of each SQL statement, and to create indexes on the data to make the queries run fast. Turn in your Word file of workload analysis and your SQL index statements to the assignment link.
Task 1: Importing the MoviesDB database. (10 points)
Task 2: Analyzing the workload of each of the following SQL statements. (50 Points)
Queries | SQL Statements | Access Table | Output Column | Join Conditions | Search Conditions | |
Q1 | Find the first and last names of all the actors who played in any comedy movie. | SELECT fName, lName FROM Actors A WHERE A.aID IN (SELECT C.aID FROM Casts C WHERE C.mID IN (SELECT M.mID FROM Movies M WHERE M.mID IN (SELECT G.mID FROM Genres G WHERE G.genre = 'Action')));
|
|
|
|
|
Q2 | Find the first and last names of all the directors who directed an 'Drama' movie in 2005 | SELECT fName, lName FROM Directors D JOIN MovieDirectors MD ON (D.dID = MD.dID) JOIN Movies M ON (M.mID = MD.mID) JOIN Genres G ON (M.mID = G.mID) WHERE genre ='Drama' AND mYear = '2005'; |
|
|
|
|
Q3 | Find the first and last names of all the actors who casted as Librarian in a film before 1980, and also in a film after 2000. (That is: < 1980 and > 2000.) | SELECT fName, lName FROM Actors A WHERE A.aID IN (SELECT C.aID FROM CASTS C WHERE role ='Librarian' AND C.mID IN (SELECT M.mID FROM Movies M, Movies M2 WHERE M.mYear<'1980' AND M2.mYear>'2000')); |
|
|
|
|
Q4 | Find the first and last name of all directors who directed 500 movies or more, in ascending order of the number of movies they directed. Return the directors' names and the number of movies each of them directed. | SELECT fName, lName, NoOfMovies FROM Directors D, ( SELECT dID, COUNT(*) NoOfMovies FROM MovieDirectors GROUP BY dID HAVING COUNT(*) >=500) AS T WHERE D.dID = T.dID ORDER BY NoOfMovies; |
|
|
|
|
Q5 | Find the first and last names of actors that played three or more roles in the same movie since the year 2010. | SELECT fName, lName FROM Actors A WHERE A.aID IN (SELECT C.aID FROM CASTS C JOIN Movies M ON ( M.mID = C.mID) WHERE M.mYear >'2010' GROUP BY aID, M.mID HAVING COUNT(*) >=3); |
|
|
|
|
Once you have imported the movie database, your main task is to analyze each of the above SQL statements. You may find that even simple queries will take a long time to execute. This is because the original database lacks indexes on frequently accessed attributes of its tables yet. Record the execute time of each query before and after adding indexes. Your goal is to choose up to about 12 indexes, such that once the indexes are created, each query should be executed under 1 minute on a modern computer. Be sure to create only indexes that matter. Put your answers in the following table.
| Frequency | Fraction | Estimated Performance | Execute Time (before indexing) | Execute Time (after indexing) | Chosen Index Type | Justification |
Q1 | 300 | 30% | <0.0015 Sec. |
|
|
|
|
Q2 | 200 | 20% | < 0.0015 Sec. |
|
|
|
|
Q3 | 100 | 10% | < 0.0015 Sec. |
|
|
|
|
Q4 | 150 | 15% | < 0.0015 Sec. |
|
|
|
|
Q5 | 250 | 25% | < 0.0015 Sec. |
|
|
|
|
Task 3: Writing SQL statements to create appropriate indexes for each of the above queries (40 Points).
Place a comment for each CREATE INDEX statement to denote the reason for choosing such index (e.g., in order to speed up the selection in question Q1 or in order to speed up the join in query 2). Save all of your code for Task 3 into a single file called creatingIndexYourTeamNumber.sql.
------------------------------------------------------------------------------------------------------
index
DROP DATABASE IF EXISTS IndexDB; CREATE DATABASE IndexDB; USE IndexDB;
CREATE TABLE Actors( aID VARCHAR(10), fName VARCHAR(50), lName VARCHAR(50), gender CHAR(1), PRIMARY KEY(aID));
CREATE TABLE Movies( mID VARCHAR(10), name VARCHAR(200), mYear CHAR(4) NULL, PRIMARY KEY(mID));
CREATE TABLE Directors( dID VARCHAR(10), fName VARCHAR(100), lName VARCHAR(100), PRIMARY KEY(dID));
CREATE TABLE Casts( aID VARCHAR(10), mID VARCHAR(10), role VARCHAR(40), PRIMARY KEY(aID, mID, role));
#FOREIGN KEY(aID) REFERENCES Actors(aID), #FOREIGN KEY(mID) REFERENCES Movies(mID));
CREATE TABLE MovieDirectors( dID VARCHAR(10), mID VARCHAR(10), PRIMARY KEY(dID, mID));
CREATE TABLE Genres( mID VARCHAR(10), genre VARCHAR(40), PRIMARY KEY(mID, genre), FOREIGN KEY(mID) REFERENCES Movies(mID));
LOAD DATA INFILE '/MoviesDB/actors.txt' into table Actors FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY ' ' (aID , fName , lName , gender );
LOAD DATA INFILE '/MoviesDB/movies.txt' into table Movies FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY ' ' (mID, name, mYear );
LOAD DATA INFILE '/MoviesDB/directors.txt' into table Directors FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY ' ' (dID, fname, lname );
LOAD DATA INFILE '/MoviesDB/movieDirectors.txt' into table MovieDirectors FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY ' ' (dID, mID );
LOAD DATA INFILE '/MoviesDB/genre.txt' into table Genres FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY ' ' (mID, genre );
LOAD DATA INFILE '/MoviesDB/casts.txt' into table Casts FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY ' ' (aID, mID , role );
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started