Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data clean-up using R code: We have a .csv file with only 1 column/field which contains job description. But we have nearly similar description (with

image text in transcribed
Data clean-up using R code: We have a .csv file with only 1 column/field which contains job description. But we have nearly similar description (with little difference) or miss-spelling in the distaste. Our goal is to compare those nearly similar job description (do a fuzzy match/pattern match) and change then to a single kind of description. Remember we not want to consolidate the number of rows thus we will have duplicate entries (that's fine). If we have 5000 rows now after our data transformation/clean-up we will have 5000 rows only. Here are some example cluster wise: Cluster 1: manager (say occurs 100 times) management (occurs 10 times) manger - in - training (occurs 5 times) manager - (occurs 3 times) manager) (occur 4 times) management (occurs 3 times) 1. Create a look-up table which will contain the pattern we will try to much with actual csv file entires.eg. cluster-1 our pattern can 'manage' 2. Use fuzzy match logic or regular repression (regex) or any pattern matching algorithm to match against look-up table $ concert to a meaningful value. e.g. all Cluster - 1 entries (total 125 rows) should be converted to 'manager' as the pattern matches to 'manage'. 3. Make the R code scalable so that new pattern can be added to the table for comparison with data. Some other Cluster example to run unit testing of the R code: Cluster - 2: (match with pattern "engine") engineer engineering engineer engineering sr. engineer engineer/sales Cluster - 3: (match with pattern 'intern') intern internship intern. intern/co-op co-op/interm intern/assistant worker/intern (intern)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Informix Database Administrators Survival Guide

Authors: Joe Lumbley

1st Edition

0131243144, 978-0131243149

More Books

Students also viewed these Databases questions

Question

LO10.3 Explain how demand is seen by a purely competitive seller.

Answered: 1 week ago