Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You were hired as a Big Data Analyst for a large 50 years old UMGC academic records system. Most of the academic records are stored

You were hired as a Big Data Analyst for a large 50 years old UMGC academic records system. Most of the academic records are stored in Ascii-based text files. The files are stored on high volume hard disks, optical discs, CDs, and DVDs. Three sample records in a file are shown below:

Record 1

Program: Information Technology, Specialization: Database Systems, Course: DBST651 Grade: A, Course: ITEC630 Grade: B, Course: DBST667 Grade: A, instructor 1: James Smith (DBST651), instructor 2: Jennifer Lopez (DBST651), instructor 3: Jennifer Lopez (ITEC630), instructor 4: Catharine Murphy (DBST667), student name: Yelena Bytenskaya, EmplD: 123456 , User Name : ybytensk, instructor id: 234567, graduated: Yes

Record 2

Program: Data Analytics, Course: DATA610 Grade: B, Course: DATA620 Grade: A, Course: DATA630 Grade: C, Course: DATA630 Grade: A, Course: DATA640 Grade: B, Course: DATA650 Grade: A, Course: DATA670 Grade: B, instructor 1: Steve Knode (DATA610), instructor2: Caroline Beam (DATA620), instructor 3: Bati Firdu (DATA630), instructor 4: Elena Gortcheva (DATA650) , instructor 5: Ozan Ozcan (DATA650) instructor 7: Jon McKeeby (DATA670), instructor 8: Steve Knode (DATA670), instructor 9: Steve Knode (DATA640), instructor 10: TA Yelena Bytenskaya (DATA650), student name: Linesh Dave, EmlID:567890, user name: ldave, instructor id: 567907, graduated: Yes

Record 3

Program: Information Technology, Specialization: Database Systems, Specialization: Project Management, Specialization: Software Engineering, course: DBST651 grade: F, course: DBST651 grade: B, course: ITEC610 grade: B, course: ITEC620 grade: A , course PMAN634 grade: C, instructor 1: Brandon Morris (ITEC 610), instructor 2: Elena Gortcheva (ITEC620), Instructor 3: James Green (DBST651), Instructor4: TA Yelena Bytenskaya (DBST651), student name: Jeff Martin, emplID: 987654, user name: jmartin, graduated: No

You are given the following information about the data (metadata):

A student is enrolled in a program.

Some programs may offer specializations. A student enrolled in a program that offers specializations may choose one or more specialization.

New specializations could be added to a program. If new specialization is added to a program that the student is enrolled in, the student may choose that specialization.

A student takes multiple classes and receives the final letter grade in each class.

A class session may have multiple instructors. A student may take multiple classes with the same instructor.

A student who graduated could be hired as an instructor.

A student could have multiple IDs (6-digit emplid, user name for accessing online classroom and academic records, faculty ID if a student is hired as an instructor.)

If a student repeats the class, the grade received on the last attempt overwrites the grade received on prior attempts for GPA calculation and on a transcript. However, the system should track all attempts for academic advising purposes.

The courses can be taken in any order

Your task: Theoretically set up a searchable database that can flexibly accommodate all the above requirements, can contain records of several hundred million students.

Your paper must have Introduction, Problem Statement, Design, Implementation Methods, Conclusion with a discussion of the pros and cons of your design. The following are required:

1.Design showing the different Big Data Systems that you will use to solve this problem.

2.Pseudocode of a function that will read in each record, parse it, and transform it into HBase table.

3.HBase data model showing column families and columns.

4. The student in record 3 above is enrolled in Information Technology program. How would you handle adding a new specialization to Information Technology program and letting the student choose it as additional specialization?

5.Discussion of the pros and cons of choosing ACID vs CAP systems for this problem.

6.Discussion of queries that database users would run.

7.Ideas for improving the speed of the query tool.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Databases And Information Systems 22nd European Conference Adbis 2018 Budapest Hungary September 2 5 2018 Proceedings Lncs 11019

Authors: Andras Benczur ,Bernhard Thalheim ,Tomas Horvath

1st Edition

3319983970, 978-3319983974

More Books

Students also viewed these Databases questions

Question

3 n 3 = ( n 3 ) True False

Answered: 1 week ago

Question

10:16 AM Sun Jan 29 Answered: 1 week ago

Answered: 1 week ago