Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Study the scenario and complete the questions that follow: Data Wrangling Data wrangling is the process of converting raw data into information, also be called

Study the scenario and complete the questions that follow:
Data Wrangling
Data wrangling is the process of converting raw data into information, also be called data munging or data remediation. The data wrangling process typical happens before conducting any data analysis to ensure your data is reliable and complete. This describes a series of processes designed to explore, transform, and validate raw data from its messy and complex forms into high-quality data. You can use your wrangled data to produce valuable insights and guide business decisions.
.1 Read the data in the set into a DataFrame and display the first 15 rows of the data set. You should also display a general overview of the data set by determining the dimensions of the data set.
2.2 A standard naming convention is required for all the column names. As such, it was decided that all columns should be written in camel case. Examine all column names and rename those that do not use the specified naming convention.
2.3 Some rows in the data set are empty. Identify and remove all rows with empty values and display a count of the rows removed.
2.4 The Author rating column needs to have only 3 distinct values (Beginner, Intermediate, and Expert) and the genre column should only have 3 distinct values (Non-Fiction, Fiction, and Children). Alter the dataframe so that these specifications are reflected.
2.5 Create a new DataFrame, that contains only the books where the book rating is greater or equal to 4 and display the top 10 books with the highest ratings in descending order.
2.6 Create a new column in the original DataFrame named operatingCost that shows the cost involved in the publishing of a certain book (defined by the formula: grossSales publisherRevenue). Remove the salesRankcolumn from the DataFrame.
2.7 Calculate the average book rating for each distinct publisher.
2.8 Calculate the mean, median, and standard deviation for the grossSales column.
2.9 The publishing year columns has values that are not correctly showing. Trim out the extra characters so that only a proper year is show (i.e. if year is 1975.0, change it to 1975. You will also notice that the are some values in this column that are incorrect (values such as -300. Identify these rows and remove all rows containing incorrect dates.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro Oracle Fusion Applications Installation And Administration

Authors: Tushar Thakker

1st Edition

1484209834, 9781484209837

More Books

Students also viewed these Databases questions

Question

=+1. Determine the purpose.

Answered: 1 week ago