In this problem, you will use the data and scenario described in this chapters example, in which
Question:
In this problem, you will use the data and scenario described in this chapter’s example, in which the task is to develop a model to classify documents as either auto-related or electronics-related.
a. Using the process shown in Figure 21.6, store the data as an ExampleSet. Then, load the data in a new process and create a label vector.
b. Following the example in this chapter, preprocess the documents. Explain what would be different if you did not perform the “stemming” step.
c. Use the LSA to create 10 concepts. Explain what is different about the concept matrix, as opposed to the TF-IDF matrix.
d. Using this matrix, fit a predictive model (different from the model presented in the chapter illustration) to classify documents as autos or electronics. Compare its performance with that of the model presented in the chapter illustration.
Step by Step Answer:
Machine Learning For Business Analytics
ISBN: 9781119828792
1st Edition
Authors: Galit Shmueli, Peter C. Bruce, Amit V. Deokar, Nitin R. Patel