Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

i wrote this page scraper using python and now when i open up the csv file it saves to, the format is weird. How do

i wrote this page scraper using python and now when i open up the csv file it saves to, the format is weird. How do I clean this up and fix it? It has parenthesis and brackets around the text, and a "u" in front of everything. I would like everything to be in two columns, thanks

import requests import csv from datetime import datetime from bs4 import BeautifulSoup

# download the page myurl = requests.get("https://en.wikipedia.org/wiki/Goodyear_Tire_and_Rubber_Company") # create BeautifulSoup object soup = BeautifulSoup(myurl.text, 'html.parser')

# pull the class containing all tire name name = soup.find(class_ = 'logo') # pull the div in the class nameinfo = name.find('div')

# just grab text inbetween the div nametext = nameinfo.text

# print information about goodyear logo on wiki page #print(nameinfo)

# now, print type of company, private or public #status = soup.find(class_ = 'category') #for link in soup.select('td.category a'): #print link.text

# now get the ceo information #for employee in soup.select('td.agent a'): # print employee.text

# print area served #area = soup.find(class_ = 'infobox vcard') #print(area)

# grab information in bold on the left hand side vcard = soup.find(class_ = 'infobox vcard') rows = vcard.find_all('tr') first = [] for row in rows: cols = row.find_all('th') first.append([x.text.strip() for x in cols]) #print cols

vcard = soup.find(class_ = 'infobox vcard') rows = vcard.find_all('tr') second = [] for row in rows: cols2 = row.find_all('td') second.append([x.text.strip() for x in cols2]) #print second

with open('index.csv', 'w') as csv_file:

for f,s in zip(first,second): csv_file.write(str(f)+","+str(s)) csv_file.write(" ") print(f,s)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Semantics Of A Networked World Semantics For Grid Databases First International Ifip Conference Icsnw 2004 Paris France June 2004 Revised Selected Papers Lncs 3226

Authors: Mokrane Bouzeghoub ,Carole Goble ,Vipul Kashyap ,Stefano Spaccapietra

2004 Edition

3540236090, 978-3540236092

More Books

Students also viewed these Databases questions