Question
You are given a set of Twitter accounts of 10 million Twitter accounts. 1) You are asked to sample 10% of the accounts using different
You are given a set of Twitter accounts of 10 million Twitter accounts.
1) You are asked to sample 10% of the accounts using different sampling methods. How do you apply “Stratified sampling” and “Sampling without replacement” on the given dataset?
2) (Missing data) You need to work on location data, but many of the users do not reveal their location in their profiles. What approach do you use to deal with missing location attributes in the Twitter dataset?
3) (Duplicate data) Propose an approach to find highly similar Twitter accounts.
Step by Step Solution
3.50 Rating (160 Votes )
There are 3 Steps involved in it
Step: 1
Sampling Methods a Stratified Sampling To perform stratified sampling on the given dataset of 10 million Twitter accounts you would first need to identify some strata or groups within the dataset For ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started