Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

While the CSV file is not provided, please just code as you would, using comments to indicate parts you cannot do without the file. Thank

While the CSV file is not provided, please just code as you would, using comments to indicate parts you cannot do without the file. Thank you.image text in transcribed

A1: Preprocessing NSSI Posts and Comments, Stage 1 Objective: Write a Python script to remove HTML tags and extract metadata from NSSI posts and comments. Output Requirement: Two CSV files: posts-exteded.csv and comments-extended.csv-comprising the data from the original files (posts.csv and comments.csv) augmented with six new columns each. Instructions: 1. Starting Point: The file posts.csv contains all public posts harvested from NSSI communities (collective blogs) on LiveJoumal. Read the column headers and the data rows from the file. Report the number of data rows. 2. Dataset Structure Identification: Locate the 'body' and 'blog' columns. 3. Data Extraction: - Calculate the total number of unique blogs included in the dataset by adding their identifiers to a set. Once you have included all blog identifiers in the set, report the total count of unique blogs. = For each post body, perform the following extractions: - Extract the plain text, removing all HTML markup. - Calculate the length of the extracted plain text. - Determine the presence of emphasis-related HTML tags within the post. For this purpose, group the tags as follows: b and >

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Repairing And Querying Databases Under Aggregate Constraints

Authors: Sergio Flesca ,Filippo Furfaro ,Francesco Parisi

2011th Edition

146141640X, 978-1461416401

More Books

Students also viewed these Databases questions