Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Study the scenario and complete the questions that follow: Data Wrangling Data wrangling is the process of converting raw data into information, also be called
Study the scenario and complete the questions that follow:
Data Wrangling
Data wrangling is the process of converting raw data into information, also be called data munging or data remediation. The data wrangling process typical happens before conducting any data analysis to ensure your data is reliable and complete. This describes a series of processes designed to explore, transform, and validate raw data from its messy and complex forms into highquality data. You can use your wrangled data to produce valuable insights and guide business decisions.
Read the data in the set into a DataFrame and display the first rows of the data set. You should also display a general overview of the data set by determining the dimensions of the data set.
A standard naming convention is required for all the column names. As such, it was decided that all columns should be written in camel case. Examine all column names and rename those that do not use the specified naming convention.
Some rows in the data set are empty. Identify and remove all rows with empty values and display a count of the rows removed.
The Author rating column needs to have only distinct values Beginner Intermediate, and Expert and the genre column should only have distinct values NonFiction, Fiction, and Children Alter the dataframe so that these specifications are reflected.
Create a new DataFrame, that contains only the books where the book rating is greater or equal to and display the top books with the highest ratings in descending order.
Create a new column in the original DataFrame named operatingCost that shows the cost involved in the publishing of a certain book defined by the formula: grossSales publisherRevenue Remove the salesRankcolumn from the DataFrame.
Calculate the average book rating for each distinct publisher.
Calculate the mean, median, and standard deviation for the grossSales column.
The publishing year columns has values that are not correctly showing. Trim out the extra characters so that only a proper year is show ie if year is change it to You will also notice that the are some values in this column that are incorrect values such as Identify these rows and remove all rows containing incorrect dates.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started