Question
Classifying Classified Ads Submitted Online. Consider the case of a website that caters to the needs of a specific farming community, and carries classified ads
Classifying Classified Ads Submitted Online. Consider the case of a website that caters to the needs of a specific farming community, and carries classified ads intended for that community. Anyone, including robots, can post an ad via a web interface, and the site owners have problems with ads that are fraudulent, spam, or simply not relevant to the community. They have provided a file with 4143 ads, each ad in a row, and each ad labeled as either 1 (not relevant) or 1 (relevant). The goal is to develop a predictive model that can classify ads automatically.
Open the file farm-ads.csv, and briefly review some of the relevant and non-relevant ads to get a flavor for their contents.
Following the example in the chapter, preprocess the data in R, and create a term-document matrix, and a concept matrix. Limit the number of concepts to 20.
a) Using logistic regression, partition the data (60% training, 40% validation), and develop a model to classify the documents as relevant or non-relevant. Comment on its efficacy.
b) Why use the concept-document matrix, and not the term-document matrix, to provide the predictor variables? Use the R Markdown file available in Blackboard(Week 7) to complete the homework.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started