Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Assignment 2 ( Data Processing ) Consider the following table transaction _ id , customer _ id , timestamp,product _ category,quantity,price,numeric _ feature 1 ,
Assignment Data Processing
Consider the following table
transactionidcustomeridtimestamp,productcategory,quantity,price,numericfeature
::Electronics,
::Clothing,
::Electronics,
::Home & Garden,
::Electronics,
::Clothing,
::Home & Garden,
::Electronics,
::Clothing,
::Clothing,
Create a table in postgres and connect using python to do the following exercises in python
Data Import and Exploration:
Import the dataset using Pandas.
Display the first few rows of the dataset.
Check the data types and missing data count for each column.
Data Cleaning and Transformation:
Handle missing values by using appropriate strategies eg imputation
Identify and handle outliers in the dataset.
Create a new feature by applying a custom transformation to existing columns.
Data Merging and Joining:
Merge the dataset with another dataset containing customer information based on a common column.
Perform an inner join, left join, and outer join.
Time Series Data Processing:
Assuming the dataset includes a timestamp column, convert it to datetime format.
Resample the data to monthly frequency.
Apply a rolling window to calculate a day moving average.
Introduction to NumPy:
Create a NumPy array with random integer values.
Calculate the mean, maximum, and minimum of the array.
Reshape the array into a x matrix.
Data Visualization with Matplotlib and Seaborn:
Create a line plot using Matplotlib to visualize the trend of a numeric feature over time.
Use Seaborn to create a box plot showing the distribution of a categorical feature.
Combine multiple visualizations eg scatter plot, bar chart in a single Matplotlib figure.
Handling Categorical and Numeric Data:
Apply onehot encoding to a categorical feature.
Implement MinMax scaling on a numeric feature.
Bin a numeric feature into discrete categories.
Data Sampling and Splitting:
Randomly sample rows from the dataset.
Split the dataset into training and testing sets using a split ratio.
Advanced Numeric Data Handling:
Remove outliers using the zscore method.
Apply log transformation to a skewed numeric feature.
Create a new feature by squaring an existing numeric feature
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started