Answered step by step
Verified Expert Solution
Question
1 Approved Answer
file: / / / C: / Users / 1 4 0 9 3 / Downloads / School % 2 0 - % 2 0 TTU
file:C:UsersDownloadsSchoolTTUDCpdf file:C:UsersDownloadsSchoolTTUBCpdf give sentences expalining each step. This assignment asks you to examine text mining for classification. Provide your answers to the questions in a Word document named AssignLastName.doc along with your source code that should be saved as AssignLastName, and click the title link to upload and subit them.
The BC text file includes news related to Blockchain technologies, and the DS text file contains news regarding Ebola issues. The preprocessing for text files is critical. Notably, in this assignment, you are requested to conduct mining text files. Therefore, the preprocessing becomes much more critical to complete the below two tasks. Please carefully read the directions and answer the questions.
Please treat yourself as a data scientist and come up with solutions. To provide the best answers to the below questions, you may need to iterate the procedures several times.
Data Preprocessing:
Import the necessary packages.
Import the two text files, and create the corpus. You can create one corpus containing two text files or generate two different corpora or corpuses for each text file. If you create two corpora or corpuses for two text files, you need to combine two documentterm matrices to answer the below questions. pts
Clean the data by applying the necessary steps. In the process of cleaning the data, please come up with reasonable selfstopwords for both text files. If you have two corpora or corpuses perform the cleaning process separately for each corpus, and combine the documentterm matrix after cleaning the data. There is no single way to conduct this task; hence youre strongly encouraged to provide detailed comments on your code. pts
Apply the Latent Dirichlet Allocation LDA technique to complete the following:
Find the optimal number of topics k for the text data. After determining k provide a rationale for why this number of topics is considered optimal for the dataset. pts
Display the termprobability and documentprobability by setting the number of topics at k Subsequently, explain the significance of these probabilities in the context of the chosen topics. pts
Verify your outcomes from a and b above with the tidy function. Discuss how the tidy function's results support or refine your findings, and explain any insights or patterns revealed by this analysis. pts
Note that the data files and preprocessing will be used in the test as well. It is strongly recommended that you prepare the source code of your assignment AssignLastName for the test.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started