Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

What are the 3V's, what do they mean, and which is the most important for the majority of businesses? Which of the 3V's did McKinsey's

  • What are the 3V's, what do they mean, and which is the most important for the majority of businesses?
  • Which of the 3V's did McKinsey's definition of Big Data focused on in 2011?
  • What % of data science is data wrangling?
  • Why did the description of data science go from "sexiest job of the century" to being a
  • "data plumber"?
  • What is Hofstadter's Law?
  • Questions on the syllabus:

o What can the SONA activities replace?

o What is the team points on the team assignments?

o If you have a question on an assignment grade, how should you contact me? o Why do you need to sign in?

For the following chart types, what is each good for? Which is good for data over time? Distributions of data? Comparing totals or other aggregates?

o line charts o area chart o bar chart

o stacked bar o scatter plot o box plot

o 100% stacked bar

  • When is a line chart or area chart better?
  • What does the "box" in a box plot show?
  • When is a box plot or histogram better?
  • Stacked bars and 100% stacked bars - what is the difference? Which is a good
  • replacement for a pie chart?
  • Is a pie chart a good alternative to a 100% stacked bar?

  • If given a description of a data problem's Context, Need, Vision, or Outcome, could you say which part of the CoNVO framework it's discussing?
  • For the team project this semester, what is your team's context in terms of the CoNVO framework?
  • Could you say whether a specific question could be valid as a need in the Yelp context?
  • The author points out that this is an iterative process, which parts are the most iterative?
  • Should we start with data or a question?
  • Your team selected a question to work on this semester. Which of the four CoNVO
  • components is that question most closely related to?
  • The "Vision" in this context is a visual mock-up (do not confuse it with an organization's
  • vision).
  • The mock-up for the Vision is supposed to be a possible successful answer, which means
  • it must be feasible (and not violate the goals of our Yelp context) and it must show a
  • pattern that could be useful.
  • How much detail goes into specifying the Need?
  • Is the Vision the same as a specification? If the Vision is not the result, is the project a
  • failure?
  • Should the vision be a successful answer? What does that mean?
  • Should the vision be based on an analysis of the data?
  • Should the technology to be used be identified first so we don't waste time?
  • If you don't scope a problem first (such as using the CoNVO framework) before diving
  • into the data, what is the problem with the questions you'll end up addressing?
  • Why is it important to have a vision? It allows you to share ideas with others on the
  • team.
  • Understand the six techniques described in the book for evolving the Vision for a
  • project. See the example given in the chapter. If an activity was described, could you match it up with one of the six techniques for evolving a project's vision? The six techniques are:

o Interviews

o Rapid investigation

o Kitchen sink interrogation (brainstorming) o Working backwards

o More mock-ups

o Role-playing

  • Profiling in data wrangling (week 4) is most like which of these six techniques?
  • Understand the difference between, rapid investigation, brainstorming, and working
  • backwards.
  • Week 3 - JSON, Yelp data, and DataWrangling (Principles of Data Wrangling - Ch 3)

JSON file format - understand the components: o key:value pairs

o JSON object

o JSON array

Are objects or arrays composed of key:value pairs?

  • Can values in a key:value pair be JSON objects and/or JSON arrays? What examples did we talk about in class from the Yelp data?
  • What is each record (e.g., a user, business, review, or tip) in the yelp data - a JSON array or a JSON object?

At a high level, understand what data is in the dataset.

o How many metro areas are in your data? What countries are in your data?

o What is the relationships between metro areas, reviews, businesses, and users o Do you have all of the reviews or businesses in a metro area (or at least those

reviewed on Yelp)?

o Do you have all of the reviews and tips by each user in the dataset?

o Do you have all of the reviews or tips for each business in the dataset?

o Do all of the users in the data consider these 10 metro areas their "home"? o Do you have a user's gender? Their age? Their "home" city?

  • What are the two most reviewed categories on Yelp?
  • Is the user's gender in the dataset?

  • When you reopen your notebook a day later and see the output of a cell, what do you need to do before working further with that output?
  • How do you create a markdown cell? How do you create headers (large bold text) in markdown cells?
  • When you exit a markdown cell (click on another cell), what is the markdown displayed as?
  • Can you export and import notebooks? Is the data included with the notebook when it's exported?
  • If you publish a notebook, who can see it?
  • How do you add cells to a notebook? Can they only be added at the end of the notebook?
  • Does the whole notebook need to be recalculated when you add a cell?
  • Along the left-hand side of the Databricks interface is a toolbar. Which option do you click on when you want to open one of your notebooks or import a new notebook?
  • What does it mean that a variable is in a notebook's "state"?
  • If you export a notebook, does it include the data? If you import a notebook, does it
  • include the data?
  • If you do not start a cluster, what can you do in your notebook (e.g., can you view the
  • code, markdown, run python, run Spark)

  1. Week 4 -DataWrangling (Principles of Data Wrangling - Ch 3)
  • What are the four steps in data wrangling? Of the 4 steps, which 2 are the most iterative?
  • Understand the difference between individual and set-based profiling.
  1. o Which of these profiling techniques are more often checking for valid values? o Which of these profiling techniques is often aggregating (e.g., averaging or
  2. summing) multiple rows?
  3. What are the 3 types of transformations?
  • If some transformation or profiling of the Yelp data was described, could you identify the type of transformation or profiling being described?
  • Enriching and structuring could both add columns, can you identify which is being done?
  • Cleaning and structuring could both remove records, can you identify which is being
  • done?
  • What is metadata (data about data)? Where can you find metadata about the Yelp data?
  • What are the 3 types of publishing in the data wrangling process?

o How does the work you are doing fit in with publishing?

Week 4 - Working with Files

  • To move, rename, or delete files, do you drag-and-drop files within DBFS folders, write code in a notebook cells, or something else?
  • If shown the method names from dbutils could you match them with what they do (rm, ls, mkdirs, mv, head)?
  • Possible Essay or Short Answer Topics
  • Understand the "state" example we covered in the notebook in class.
  • Explain why one chart is better than another as a mock-up as the Vision for a specific
  • CoNVO Need.
  • Describe the problems with a Vision mock-up that you are shown
  • Say whether a vision mock-up could be a successful answer for Yelp
  • Understand at a high level what is in the data set All the reviews and tips for all of the
  • businesses reviewed on Yelp in 10 metro areas. The users who wrote those reviews.
  • What is Yelp's mission?
  • What is your team's question?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Basic Contract Law For Paralegals

Authors: Jeffrey A Helewitz

10th Edition

1543839533, 978-1543839531

More Books

Students also viewed these Law questions

Question

Why is diversity an important challenge facing organizations today?

Answered: 1 week ago

Question

What is the purpose of a standard cost sheet?

Answered: 1 week ago

Question

Relax your shoulders

Answered: 1 week ago

Question

Keep your head straight on your shoulders

Answered: 1 week ago