Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Using Python to write this program, TIOBE provides a monthly index of the most popular programming languages. Write a program that shall collect and save
Using Python to write this program,
TIOBE provides a monthly index of the most popular programming languages. Write a program that shall collect and save information about the most common keywords (tags) associated with the top 20 most popular languages using questions on stackoverflow.com. Start by collecting the list of the 20 most popular languages on TIOBE in January 2018 and identifying the major StackOverflow seed tags that correspond to them (you can do this by hand). Choose only one most general tag for each language (for example, choose python for Python, not python-2.7 or python-3.x). Your program shall: 1. Download the main page for each seed tag (e.g., https://stackoverflow.com/questions/tagged/erlang for Erlang) using module urilib. 2. Extract the total number of tagged questions (e.g., 7,842 for Erlang). 3. Extract all questions on the page and their tags using module Beautifulsoup. 4. Repeat items 1 and 3 for the first 1,000 questions by extracting and following the links at the bottom of each page. Assume that each seed tag has at least 1,000 questions. Leave the downloaded pages in the cache folder as %language%-%n%.html, where %language% is the name of the seed tag and %n% is the page number as a 3-digit, zero-padded-on-the-left integer number (the number of the first page is 001). If the folder does not exist (at the first run), it shall be created. 5. Find 10 most frequently used sister tags, aside from the seed tag itself. Assume that each seed tag has at least ten associated sister tags. 6. Write the data into a CSV file named tags.csv using module csv. Each row in the file shall represent one language. The first column shall have the seed tags. The second column shall have the count of tagged questions as an integer number without decimal commas. The next 10 columns shall contain the most frequently used sister tags, one tag per columns, as strings, arranged from the most frequently used tag to the least frequently used tag. The first row of the file shall have column titles: "Language", "Count", "Tag1", "Tag2", ..., "Tag10" (less the quotes). The rows must be ordered by the "Count" column in the decreasing numerical order. TIOBE provides a monthly index of the most popular programming languages. Write a program that shall collect and save information about the most common keywords (tags) associated with the top 20 most popular languages using questions on stackoverflow.com. Start by collecting the list of the 20 most popular languages on TIOBE in January 2018 and identifying the major StackOverflow seed tags that correspond to them (you can do this by hand). Choose only one most general tag for each language (for example, choose python for Python, not python-2.7 or python-3.x). Your program shall: 1. Download the main page for each seed tag (e.g., https://stackoverflow.com/questions/tagged/erlang for Erlang) using module urilib. 2. Extract the total number of tagged questions (e.g., 7,842 for Erlang). 3. Extract all questions on the page and their tags using module Beautifulsoup. 4. Repeat items 1 and 3 for the first 1,000 questions by extracting and following the links at the bottom of each page. Assume that each seed tag has at least 1,000 questions. Leave the downloaded pages in the cache folder as %language%-%n%.html, where %language% is the name of the seed tag and %n% is the page number as a 3-digit, zero-padded-on-the-left integer number (the number of the first page is 001). If the folder does not exist (at the first run), it shall be created. 5. Find 10 most frequently used sister tags, aside from the seed tag itself. Assume that each seed tag has at least ten associated sister tags. 6. Write the data into a CSV file named tags.csv using module csv. Each row in the file shall represent one language. The first column shall have the seed tags. The second column shall have the count of tagged questions as an integer number without decimal commas. The next 10 columns shall contain the most frequently used sister tags, one tag per columns, as strings, arranged from the most frequently used tag to the least frequently used tag. The first row of the file shall have column titles: "Language", "Count", "Tag1", "Tag2", ..., "Tag10" (less the quotes). The rows must be ordered by the "Count" column in the decreasing numerical orderStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started