Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Task 3: Remove Duplicates Step 3.1: isDup Function (20 Marks) While Excel's built in remove duplicates feature was able to remove identical tweets, we would

image text in transcribed image text in transcribed

Task 3: Remove Duplicates Step 3.1: isDup Function (20 Marks) While Excel's built in remove duplicates feature was able to remove identical tweets, we would also like to remove similar but not identical tweets like: $1 USD is currently worth 0.00010627 BTC! and $1 USD is currently worth 0.00010692 BTC! To do this we will create a VBA function named isDup that will detect if two tweets are sufficiently similar to be considered a duplicate. isDup will return True if two tweets given to it are duplicates and False otherwise. This function must have the following function header: Function isDup (tweet1 As String, tweet2 As String, threshold As Double) As Boolear where tweet1 and tweet2 are the text of two different tweets and threshold is a percentage of the number of words that they must have in common to be considered a duplicate. It is based on the total number of words in the first tweet. If we are using a threshold of 0.7 and the first tweet has 100 words, at least 70 of those words must be in common with the second tweet for isDup to be True. If it is less than 70 words like 56 or 34 then the tweet is not deemed a duplicate and False should be returned. Note that threshold is passed as an argument to the function with a value between 0 and 1 and should not be hard coded as 0.7 in your function. Example: If tweet1 is Hours of planning can save weeks of coding and tweet2 is: Weeks of programming can save you hours of planning The correct count for words in common should be 7 out of 8 as the only difference is "coding" v.s. "planning" and isDup should return True if the threshold is less than 0.875. Note that the total number of words is based on the length of tweet1 and each word in tweet1 is matched at most once. "of occurs twice so it is matched twice in tweet1 (and not four times). Capitalization should be ignored Some Hints: You will need to use the string functions StrComp and Split. Use Split to break the strings into individual words and StrComp to compar e them while ignoring capitalization. You will need to use nested loops. One to go through each word of tweet1 and one to go through each word of tweet2. Examples: tweetl "Hours of planning can save weeks of coding" tweet2-"Weeks of programming can save you hours of planning" 7/8 words the same tweet 1-0urs of planning can save weeks of coding" tweet2-"Weeks of programming can save you hours planning" 7/8 words the same tweetl -"Hours of planning can save weeks of coding" tweet2-"Weeks programming can save you hours planning 5/8 words the same tweetl "Hours of planning can save weeks coding" tweet2-"Weeks of programming can save you hours of planning" 6/7 words the same tweet! urs planning can save weeks coding" tweet2-"Weeks of programming can save you hours of planning" 5/6 words the same tweet 1 Hours planning can save weeks coding" tweet2-"Weeks programming can save you hours planning" 5/6 words the same

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Horse Betting The Road To Absolute Horse Racing 2

Authors: NAKAGAWA,YUKIO

1st Edition

B0CFZN219G, 979-8856410593

More Books

Students also viewed these Databases questions

Question

Where is the graph of above the x-axis? R(x) = x3 x - 8 25 x2 x

Answered: 1 week ago