[Solved] Using Python to write this program, TIOBE

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 09, 2024

Using Python to write this program, TIOBE provides a monthly index of the most popular programming languages. Write a program that shall collect and save

Using Python to write this program,

image text in transcribed

TIOBE provides a monthly index of the most popular programming languages. Write a program that shall collect and save information about the most common keywords (tags) associated with the top 20 most popular languages using questions on stackoverflow.com. Start by collecting the list of the 20 most popular languages on TIOBE in January 2018 and identifying the major StackOverflow seed tags that correspond to them (you can do this by hand). Choose only one most general tag for each language (for example, choose python for Python, not python-2.7 or python-3.x). Your program shall: 1. Download the main page for each seed tag (e.g., https://stackoverflow.com/questions/tagged/erlang for Erlang) using module urilib. 2. Extract the total number of tagged questions (e.g., 7,842 for Erlang). 3. Extract all questions on the page and their tags using module Beautifulsoup. 4. Repeat items 1 and 3 for the first 1,000 questions by extracting and following the links at the bottom of each page. Assume that each seed tag has at least 1,000 questions. Leave the downloaded pages in the cache folder as %language%-%n%.html, where %language% is the name of the seed tag and %n% is the page number as a 3-digit, zero-padded-on-the-left integer number (the number of the first page is 001). If the folder does not exist (at the first run), it shall be created. 5. Find 10 most frequently used sister tags, aside from the seed tag itself. Assume that each seed tag has at least ten associated sister tags. 6. Write the data into a CSV file named tags.csv using module csv. Each row in the file shall represent one language. The first column shall have the seed tags. The second column shall have the count of tagged questions as an integer number without decimal commas. The next 10 columns shall contain the most frequently used sister tags, one tag per columns, as strings, arranged from the most frequently used tag to the least frequently used tag. The first row of the file shall have column titles: "Language", "Count", "Tag1", "Tag2", ..., "Tag10" (less the quotes). The rows must be ordered by the "Count" column in the decreasing numerical order. TIOBE provides a monthly index of the most popular programming languages. Write a program that shall collect and save information about the most common keywords (tags) associated with the top 20 most popular languages using questions on stackoverflow.com. Start by collecting the list of the 20 most popular languages on TIOBE in January 2018 and identifying the major StackOverflow seed tags that correspond to them (you can do this by hand). Choose only one most general tag for each language (for example, choose python for Python, not python-2.7 or python-3.x). Your program shall: 1. Download the main page for each seed tag (e.g., https://stackoverflow.com/questions/tagged/erlang for Erlang) using module urilib. 2. Extract the total number of tagged questions (e.g., 7,842 for Erlang). 3. Extract all questions on the page and their tags using module Beautifulsoup. 4. Repeat items 1 and 3 for the first 1,000 questions by extracting and following the links at the bottom of each page. Assume that each seed tag has at least 1,000 questions. Leave the downloaded pages in the cache folder as %language%-%n%.html, where %language% is the name of the seed tag and %n% is the page number as a 3-digit, zero-padded-on-the-left integer number (the number of the first page is 001). If the folder does not exist (at the first run), it shall be created. 5. Find 10 most frequently used sister tags, aside from the seed tag itself. Assume that each seed tag has at least ten associated sister tags. 6. Write the data into a CSV file named tags.csv using module csv. Each row in the file shall represent one language. The first column shall have the seed tags. The second column shall have the count of tagged questions as an integer number without decimal commas. The next 10 columns shall contain the most frequently used sister tags, one tag per columns, as strings, arranged from the most frequently used tag to the least frequently used tag. The first row of the file shall have column titles: "Language", "Count", "Tag1", "Tag2", ..., "Tag10" (less the quotes). The rows must be ordered by the "Count" column in the decreasing numerical order