Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Let's finally deal with that pesky Export commodities column! It's not a number, but represents a list of items. It is common to deal with

image text in transcribed

image text in transcribed

Let's finally deal with that pesky "Export commodities" column! It's not a number, but represents a list of items. It is common to deal with data that is not numeric (arbitrary strings, strings representing single items, strings representing list/sets of items, etc). To deal with these non-numeric columns, we need to encode them. That is, we need to convert them from a non-numeric form to a numeric one. In computer science, there are many encodings and encoding methods. An encoding that you are probably already familiar with is ASCII, which is a way to represent Latin letters as numbers. Encodings are just known ways to represent non-numeric values as numbers. In machine learning and data science, probably the most popular encoding method is the One-Hot encoding. In One-Hot encoding, we take all the possible values an item can take and assign each possible value an index. Presence of that value results in a 1, which absence results in a 0. Essentially, we are turning a list of items into a series of binary columns which ask "Do you have this value?". For example, consider the following tables. In the first one, a student's major is kept as a string (or a list of strings). To encode it, a column is created for each possible major and a 1 is present if that student has that major (otherwise a 0 is used). Students: Students with One-Hot: Complete the function below that takes in a frame and a column name. The function should modify the frame to add multiple columns represented a one-hot encoding of the specified column. Assume that the given column contains a comma-separated list of values. Each value should be made lowercase and have additional whitespace removed from the beginning and end. The new column names should be named: "old column name>: value>. (Like in the students example above.) Return a new frame with the columns of the passed-in frame (minus the specified column) and all the new one-hot columns. Note that the above semantics will not work perfectly for our world data, but it will provide a good starting point. def one_hot(frame, column_name): return Notimplemented \# Before calling your function, we are going to clean up the data a little bit. \# You are not required to learn what we are doing below (regular expressions), \# but they are very useful and will always be useful to you. world_data['Export commodities'].replace (r\s\(\d{4}(?:\sest\.)?\\s ', '', regex = True, inplace = True ) regex = True, inplace = True ) world_data['Export commodities'].replace(r' \s\d{2}%\s, ' ', regex = True, inplace = True ) print("One-Hot encoded exports:") world_data = one_hot (world_data, 'Export commodities') world_data

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Fundamentals Of Database Systems

Authors: Ramez Elmasri, Shamkant B. Navathe

7th Edition Global Edition

1292097612, 978-1292097619

More Books

Students also viewed these Databases questions

Question

1. Which is the most abundant gas presented in the atmosphere?

Answered: 1 week ago