Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 10, 2024

The final project is an opportunity to broaden or deepen your knowledge about big data which have not been covered in class. It could be

The final project is an opportunity to broaden or deepen your knowledge about big data which have not been covered in class. It could be one of the following options: 1. learn new functionalities in the known big data libraries; 2. learn new big data tools/software libraries; 3. learn new scalable models/algorithms/frameworks. Note, the new here only means new to you or to all the members in your group. It does not necessarily mean the newly open-sourced software or newly published frameworks.

For the project, you will work individually or team of 2-3 students on a project of your choosing that is interesting, significant, and relevant to big data. The goal of your final project is to research on something new, dig deep into it, and share what you have learned with rest of the class. All members of a group will receive the same grade on group work. Therefore, it is in your interest to choose other group member (ideally, first week of the class) who have the same goal in the class as you do. It is also in your interest to work together and ensure that all tasks are completed effectively. Your scores on group work may be adjusted based on your contribution.

Project Idea

We provide some concrete idea here as example to demo how the project should looks like. Also, we provide some project ideas for your consideration.

Sample project:

We will learn Spark library (week 5) in class as well as how to handle data streams (week 7), but in class not all build-in libraries will be covered in detail or in assignment, For example, we will not cover Spark Streaming library in class. Your project could be study spark streaming library. Try to practice some functions offered in the library with the dataset you chosed. Here is more detailed documentation of Spark streaming. The documentation is based on spark 3.3. Actually, we can use Spark 3.3 in ODU's cluster. Please send an email to instructor for setting up the environment for spark 3.3. Here, we just briefly mention the overall idea, your project abstract should go more lower level than what we provided here. For example, the detailed problem could be figure out how to use Dataframe and SQL queries on steaming data by working on some dataset. Note, in this project, one actually needs to implement and test out the functions using data as well as code. Reading through the documentation should be part of the project, but it should not be only thing in the project.

project ideas:

You may wonder what type of big data systems you want to study. In the following, I summarize popular big data system and group them into different categories:

Database or data warehouse: AmazonRedshift, Vertica, Google BigQuery, Hive, Presto, Amazon Athena, Azure SQL Data warehouse, Snowflake

big data processing: Apache Spark, AWS EMR (Spark), Azure HDInsight, and AWS Sage-maker

Machine Learning (ML) on large scale: Amazon SageMaker, BigQuery ML, Spark MLlib.

Deep Learning: Azure Batch AI, Tensorflow, Pytorch.

other popular systems:

graph database service: Amazon Neptune, Neo4j

NoSQL (not only SQL) database: Amazon DynamoDB

steaming data: Spark Steaming

natural language processing: Amazon Comprehend

Spark GraphX

People may not have heard about some of the above mentioned systems. Following notes provide more context for some of these big data systems:

For Tensorflow or PyTorch, sample project could be: code and test a Multi-layer perception neural network to recognize hand-written digit.

Presto: support interactive SQL queries

to use Spark (e.g. SparkML and SparkSQL) you can choose among ODU's cluster, Azure HDInsight, AWS EMR, and AWS Sage-maker

Amazon Athena: interactive query service to query data and analyze big data in Amazon S3

Spark Steaming is a build-in library in Spark that allows data engineers and data scientists to process real-time data from various sources including (but not limited to) Kafka, Flume, and Amazon Kinesis.)

Afore-mentioned are all ideas for your consideration. There could be many other interesting topics: such as study view in SQL, study hyper parameter tuning library for deep neural networks, etc.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

1. What is subjective well-being? How does it differ from eudaimonia?

Answered: 1 week ago

Question

★★★★★

Consider a sample comprised of firms that were targets of tender offers during the period 1978-1985. Conduct an analysis where the response variable represents the number of bids (Bids) received...

Answered: 1 week ago

Question

★★★★★

an example of an internal control procedure which related to cash is: A) Accuring revenue which will be collected later B) Performing reference checks on employees C) timely deposits of cash Reciepts...

Answered: 1 week ago

Question

★★★★★

An investment pays $2,100 per year for the first 3 years, $4,200 per year for the next 8 years, and $6,300 per year the following 12 years (all payments are at the end of each year). If the discount...

Answered: 1 week ago

Question

★★★★★

The figure below shows a storage tank holding natural gas. In an adjacent instrument room, a U-tube mercury manometer in communication with the storage tank reads L-1.9 m. If the atmospheric pressure...

Answered: 1 week ago

Question

★★★★★

Recursion Lab Longest Palindrome Subsequence (LPS) is a "poster child" for recursion and very hard to solve in any other way. In this lab you will create class named LPS and within that class you...

Answered: 1 week ago

Question

★★★★★

9. If an S corp., a calendar year taxpayer, has 72 individual shareholders and one of those shareholders sells his shares to a corporation on June 25, 20X6, when does the S corporation's tax year...

Answered: 1 week ago

Question

★★★★★

The freight forwarder has four different exporters with cargo going from Toronto to Paris via AirFrance. What contract is going to be signed between the freight forwarder and Air France? a. Master...

Answered: 1 week ago

Question

★★★★★

The data set below includes a random sample of the number of tornadoes that touched down in the United States for 5 months out of one year. What is the variance? Round your answer to the nearest...

Answered: 1 week ago

Previous Question Next Question