Answered step by step
Verified Expert Solution
Question
1 Approved Answer
While the CSV file is not provided, please just code as you would, using comments to indicate parts you cannot do without the file. Thank
While the CSV file is not provided, please just code as you would, using comments to indicate parts you cannot do without the file. Thank you.
A1: Preprocessing NSSI Posts and Comments, Stage 1 Objective: Write a Python script to remove HTML tags and extract metadata from NSSI posts and comments. Output Requirement: Two CSV files: posts-exteded.csv and comments-extended.csv-comprising the data from the original files (posts.csv and comments.csv) augmented with six new columns each. Instructions: 1. Starting Point: The file posts.csv contains all public posts harvested from NSSI communities (collective blogs) on LiveJoumal. Read the column headers and the data rows from the file. Report the number of data rows. 2. Dataset Structure Identification: Locate the 'body' and 'blog' columns. 3. Data Extraction: - Calculate the total number of unique blogs included in the dataset by adding their identifiers to a set. Once you have included all blog identifiers in the set, report the total count of unique blogs. = For each post body, perform the following extractions: - Extract the plain text, removing all HTML markup. - Calculate the length of the extracted plain text. - Determine the presence of emphasis-related HTML tags within the post. For this purpose, group the tags as follows: b and >Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started