Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Assignment 2 ( Data Processing ) Consider the following table transaction _ id , customer _ id , timestamp,product _ category,quantity,price,numeric _ feature 1 ,

Assignment 2(Data Processing)
Consider the following table
transaction_id,customer_id,timestamp,product_category,quantity,price,numeric_feature
1,101,2022-01-0108:30:00,Electronics,2,500,45
2,102,2022-01-0109:15:00,Clothing,1,120,30
3,103,2022-01-0210:00:00,Electronics,3,400,50
4,101,2022-01-0314:45:00,Home & Garden,1,200,20
5,104,2022-01-0411:20:00,Electronics,2,450,35
6,102,2022-01-0513:00:00,Clothing,1,100,25
7,103,2022-01-0616:30:00,Home & Garden,2,300,40
8,104,2022-01-0709:45:00,Electronics,1,550,55
9,101,2022-01-0812:15:00,Clothing,3,150,28
10,103,2022-01-0915:00:00,Clothing,1,130,32
Create a table in postgres and connect using python to do the following exercises in python .
1. Data Import and Exploration:
- Import the dataset using Pandas.
- Display the first few rows of the dataset.
- Check the data types and missing data count for each column.
2. Data Cleaning and Transformation:
- Handle missing values by using appropriate strategies (e.g., imputation).
- Identify and handle outliers in the dataset.
- Create a new feature by applying a custom transformation to existing columns.
3. Data Merging and Joining:
- Merge the dataset with another dataset containing customer information based on a common column.
- Perform an inner join, left join, and outer join.
4. Time Series Data Processing:
- Assuming the dataset includes a timestamp column, convert it to datetime format.
- Resample the data to monthly frequency.
- Apply a rolling window to calculate a 7-day moving average.
5. Introduction to NumPy:
- Create a NumPy array with random integer values.
- Calculate the mean, maximum, and minimum of the array.
- Reshape the array into a 2x5 matrix.
6. Data Visualization with Matplotlib and Seaborn:
- Create a line plot using Matplotlib to visualize the trend of a numeric feature over time.
- Use Seaborn to create a box plot showing the distribution of a categorical feature.
- Combine multiple visualizations (e.g., scatter plot, bar chart) in a single Matplotlib figure.
7. Handling Categorical and Numeric Data:
- Apply one-hot encoding to a categorical feature.
- Implement Min-Max scaling on a numeric feature.
- Bin a numeric feature into discrete categories.
8. Data Sampling and Splitting:
- Randomly sample 100 rows from the dataset.
- Split the dataset into training and testing sets using a 80-20 split ratio.
9. Advanced Numeric Data Handling:
- Remove outliers using the z-score method.
- Apply log transformation to a skewed numeric feature.
- Create a new feature by squaring an existing numeric feature

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Successful Keyword Searching Initiating Research On Popular Topics Using Electronic Databases

Authors: Randall MacDonald, Susan MacDonald

1st Edition

0313306761, 978-0313306761

More Books

Students also viewed these Databases questions

Question

Find the area of the triangle. a = 13, b = 14, c = 22

Answered: 1 week ago

Question

5. Identify the logical fallacies, deceptive forms of reasoning

Answered: 1 week ago

Question

6. Choose an appropriate organizational strategy for your speech

Answered: 1 week ago