Question

1 Approved Answer

Posted on Sep 26, 2024

Edit question Here are the draft of my Group Project, title is :The impacts of a well-balanced diet on immunity in combating the COVID-19 virus

Edit question

image text in transcribed

Here are the draft of my Group Project, title is :The impacts of a well-balanced diet on immunity in combating the COVID-19 virus in various countries How many countries adhere to the health authorities' recommendations to consume at least 40% vegetables for a balanced diet?

# Predicted values Difference': y_test-y_pred}) y_pred_df = pd. DataFrame (l'Actual Value':y_test, Predicted value':y_pred, y_pred_df [0:5] Actual Value Predicted value Difference 133 16971.0 -4.825337e+04 65224.373408 109 535203.0 4.949394e+05 40263.641184 59 42013.0 -1.394817e+04 55961.170614 80 5747.0 -5.708133e+04 62828.328736 7 903993.0 1.798835e+06 -894842.487280 1 + Getting the shape of data set to know the size and comes of the data set print the shape of our dataset :' food intakut shape) + From the remults, there are 170 rows and total 32 columns in the data set The shape of our data is 170. 32) + Cat ry of the dataset to know the womalies in our dataset can clean it before sending it to our models food Intakellata describe Butter Aquatic Cereals - Alcoholio Animal Animal Fruits - Fish Milk - Excluding Bever Products, Ereluding Products Meat Excluding Miscellaneous offels Seafood Dileres Pulses Spir Other Beer Vine count 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000000 170.000 mean 3.022971 0.221064 12.181871 0.013994 11.800347 0.470570 1.387195 5.621405 3.375934 6.S19776 0.443122 0.193435 0.B18120 0.537131 0.0911 std 2.382243 0.278304 5.852635 0.129382 5.824870 0331209 1.257382 3.152849 1.762911 5.020379 0.685727 0.159634 1.772273 0.601111 0.1211 0.000000 0.001000 1.739100 0.000000 3.401400 0.023900 0.034200 0.659600 0.356000 0.096300 0.000000 0.000000 0.009800 0.000000 0.0001 25% 0.895625 0.040225 7.236850 0.000000 7226850 0.187575 0.557100 3541950 1.891475 2.172250 0.032325 0.105050 0.134075 0.129650 0.016! 50% 2.866150 0.116850 12.097550 0.000000 10.142750 0.460150 1.029250 5.021250 3.424750 5.336900 0.196850 0.166800 0.326650 0.300800 012! 75% 4.710950 0.253900 16.444125 0.001400 15.148950 0644150 1821275 6827750 4.422450 10.407100 0.583625 0228575 0.691675 0.734900 0.123 max 15.370600 1.355900 26.886500 1.679400 29.80500 1.696000 8.795900 19302800 8.170000 20.837800 3.663400 1225600 12.176300 3.483800 0.6621 min pd import pandas as foodIntakeDataDesc foodIntakeDataDesc = pd. read_csv(' Supply_Food_Data_Descriptions.csv') Categories Items 0 1 5 8 9 Alcoholic Beverages Alcohol, Non-Food; Beer; Beverages, Alcoholic... Animal fats Butter, Ghee; Cream; Fats, Animals, Raw; Fish.... 2 Animal Products Aquatic Animals, Others; Aquatic Plants; Bovin... 3 Aquatic Products, Other Aquatic Animals, Others; Aquatic Plants; Meat... 4 Cereals - Excluding Beer Barley and products; Cereals, Other; Maize and... 5 Eggs Eggs 6 Fish, Seafood Cephalopods; Crustaceans; Demersal Fish; Fresh... 7 Fruits - Excluding Wine Apples and products; Bananas; Citrus, Other; D... Meat Bovine Meat Meat, Other; Mutton & Goat Meat; ... Milk - Excluding Butter Milk - Excluding Butter 10 Miscellaneous Infant food; Miscellaneous Offals Offals, Edible 12 Oilcrops Coconuts - Incl Copra; Cottonseed; Groundnuts ... 13 Pulses Beans; Peas; Pulses, Other and products 14 Spices Cloves; Pepper; Pimento; Spices, Other 15 Starchy Roots Cassava and products; Potatoes and products; R... 16 Stimulants Cocoa Beans and products; Coffee and products.... Sugar & Sweeteners Honey; Sugar (Raw Equivalent); Sugar non-centr... 18 Sugar Crops Sugar beet; Sugar cane 19 Treenuts Nuts and products 20 Vegetable Oils Coconut oil; Cottonseed Oil; Groundnut Oil; Ma... 21 Vegetables Onions; Tomatoes and products; Vegetables, Other Vegetal Products Alcohol, Non-Food; Apples and products; Banana... 11 17 22 From the list above we will be interested in following items: Proteins (Meat Fish Seafood, Eggs) Vegetables (Vegetables) Fruits (Fruits) Grains (Cereals. Excluding Beer) food stakaavde = foodatakaDatal'Country", "hot", "Fish, Seafood', 'Eers","Vepotables', 'Fruits - Excluding Vine", "Cereals Exeluding Bear'. 'Obesity', "Population', 'Confirmed'. foodata.head (5) Country Meat Fish, Seafood Fees Vegetables Fruits - Exeluding Wine Cereals - Excluding Beer Obesity Population Confirmed Deaths Recovered 0 Afghanistan 1.2020 0.0350 0.2090 6.7642 5.3495 24.8097 4.5 39928000.0 0.142134 0.006186 0.123374 1 Albania 18945 02126 0.5815 11.7753 6.7861 5.7817 22.3 2839000.0 2.967301 0.050951 1.792636 2 Algeria 1.1305 02416 0.5277 11.6484 6.3801 13,6816 26.6 44357000.0 0.244597 0.006558 0.167572 3 Angola 20571 1.7707 0.0587 2.3041 6.0005 9.1065 68 325220000 0.061687 0.001461 0.056808 4 Antigua and Barbuda 5.6888 4.1489 02274 54495 10.7451 5.9960 19.1 98000.0 0.293878 0.007143 0.190816 + Code Text foodIntakeNewDf l' Proteins'] = food IntakeNewDf. iloc:, -11:-8). sun (axis=1) foodIntakeNewDf. drop (I'Meat', 'Fish, Seafood', 'Eges'), axis = 1, inplace=True) food IntakeNewDf. renane (columns='' Fruits - Excluding Wine': 'Fruits', 'Cereals - Excluding Beer': 'Grains'], inplace=True) cols = list (food IntakeNewDE) cols, insert (1, cols, pop (cols, index (Proteins'))) foodIntakeNew = foodIntakeNewDf. loc: cols) foodIntakeNewDf l' Others'] = 100.0000 - food IntakeNewDf. iloc[:, -9:-5), sun (axis=1) cols = list (food IntakeNewDE) cols. insert (5, cols. pop (cols. index ("Others'))) foodIntakeNewDf = foodIntakeNewDf. loc(:, cols) foodIntakeNexDf. drop (I Proteins", "Fruits', 'Grains', 'Others', 'Population', 'Confirmed' , ' Deaths', 'Recovered' ], axis = 1, inplace=True) foodIntakeNewDf. head (5) 45 Country Vegetables Obesity 0 Afghanistan 6.7642 1 Albania 11.7753 22.3 2 Algeria 11.6484 26.6 3 Angola 2.3041 6.8 4 Antigua and Barbuda 5.4495 19.1 [] # Check the null values food IntakeNewDf. isnullo. sumo 0 0 Country Vegetables Obesity dtype: int64 [] # Replace all NaN values to its mean food IntakeNewDf. fillna (foodIntakeNewDf. mean, inplace=True) foodIntakeNewDf. isnul10. sumo Country Vegetables Obesity dtype: int64 0 0 0 Above is the pre processed dataset that we will be using to eveluate which countries have more vegetables intake and how it reflects with Covid-19 cases or deaths 2. Scraping Latest Worldometer Covid data As we couldn't find any realiable latest covid data, so we are scraping from https://www.worldometers.info/coronavirus/ [ from bs4 inport BeautifulSoup inport requests as req inport pandas as pd import 08 fron datetine inport date # getting worldoneter data contents = req.get("https://www.worldoneters. info/coronavirus/) Tes=[] soup = BeautifulSoup (contents. text, 'linl') table = soup.find_all('table')[1] table_rous = table.find_all('tr') data_zow pd. DataFrame() for trin table_rows: td = tr.find_all('td') TOW = [i. text for i in td] if(len (ov) YO): res. append (wow) # Formatting Data colunn_diet = {1:' Country', 2:'Total cares', 3:'New Cases', 4:' Total Deathe, 5: New Deaths", 6:' Total Recovered', 1 7: Ne Recovered', 8:'Active cases',9: Serious, Critical', 10:' Tot cases 11 pop', 11:'Death 11 pop', 12:'Total Testa', 13:"Tests Il pop', 14:' Population'] # pd. set_option("display. nax_rovs", None, "display, nax_columns, None) top_df = pd. DataFrane (res) tmp_df = tap_df (tmp_df [0] != "").reset_index() tnp_df. drop(l'inder', 0, 15, 16, 17, 18, 19, 20, 21), aris = 1, inplace=True) tmp_df.renane (colunns=colunn_dict, inplace=True) tmp_df.to_csv ('covid_19_country_data.csv", index=False) import pandas a pd food Intaketa - pd. read ov('Food Suroly Quantitrke Data, sy') covidCountryDataSet - pd. read_ove" content.col_19_county_data.csv') coridCountryDataSet. head (5) Total Deuth New Deaths Serious, Critical Pop 0 Total New Country Case Cases China 105,484 +73 USA 71,194,579 +779,036 India 38,903,731 +337,704 Brazil 23,757,741 +168,820 France 16,001,498 +400,851 Total Recovered 97,675 44,191,512 4,636 887,643 1 12 26,002 New Active Recovered +197 3,173 +143,71326,315,424 +242,676 2.113,336 NaN 1,283,172 +218,881 5,809,339 NaN +2777 +489 + 396 Tot cases 1 Death IM Total pop Tests 73 3 160,000,000 213.744 2,657901261,460 27,767 349 711,538,938 110,549 2,897 63,776,166 244 305 1,960 216,918555 Tests M Population pop 111,163 1,439,323,776 2,698,230 334,019,477 507,848 1401,067479 296,761 214,907,717 3,311,833 65,498,037 2 8,944 3 488,911 622,647 128,347 36,301,482 21,851,922 10,063,812 8,318 3,881 4 +233 Getting the shape of data set to know the size and columns of the data set print (The shape of our dataset is:', coridCountryOntaSet shape) + from the results, there are 224 70 and total 14 columns in the data set The shape of our dataset 1234, 14 = Get a summary of the dataset to know the anomalies in our dataset some can clean it before sending it to our models covidCountry DataSet describe Total New Country New Deaths Serious, Critical Total Deaths 224 Total Recovered 216 Population New Recovered 143 Total Tests Active cases 216 Death IN pop Tot cases 1M pop 222 Tests 1 POD 209 count 224 224 168 125 159 211 209 224 224 162 205 54 214 136 209 105 221 194 209 209 223 unique top 222 7 Barbados +83 +1 1 0 7 9 3 98.964 31,591 1 2 2 11 18 2 3 9 2 3 1 1 2 #Counting Null values in every column covidCountryDataSet. isnull(). sum (axis = 0) 0 0 56 Country Total cases New Cases Total Deaths New Deaths Total Recovered New Recovered Active cases Serious, Critical Tot cases 1M pop Death 1M pop Total Tests Tests 1M pop Population dtype: int64 0 99 8 81 8 65 2 13 15 15 0 Droping the necessary Column bidCountry Data Set drop ( Cases, Nux Death', Nux Recovered', 'Serious, Critical', 'Tot cases 1% popDeath IM pop'. 'Tests 1% pop', 'Total Tests'), axis = 1. inplueTrue) Bening to understand easily covi Counters Data Set res columns='Total cases":"total_covid_cases', 'Total Deaths":"total_covid_deaths', 'Total Recovered":"total_covidrecovered', Active cases' 'active_covid_cases, l, relacruel covi Country DataSet = covi Country DataSet sort_values' Country, ascending=l) covi Country DataSet.heado 146,135 Country total_covid cases total_oovid deaths total_covid recovered active_covid_cases Population 105 Afghanistan 159,516 7,390 5,991 40.289,298 96 Albania 244,182 3.292 216,785 24.105 2872,912 99 Algeria 232,325 6,468 157628 68.229 45.080,724 150 Andorra 33.025 144 27 872 5.009 77,457 119 Angola 95,676 1,884 86,928 6.864 34452 146 Higais Counting Mall values in every column covidCountry Data Setiamall 0.1mlaxis = 0) Country total_covid_case total_covid_deaths total_covidrecovered active_covid.cases Population dtype: int64 #as there are very few null values, thats why droping them covidCountryDataSet dropna (inplace=True) covidCountryDataSet. head() Country total_covid_cases total_covid_deaths total_covid recovered active_covid_cases Population 105 Afghanistan 159,516 7,390 146,135 5,991 40,289,298 96 Albania 244,182 3,292 216,785 24,105 2,872,912 99 Algeria 232,325 6,468 157,628 68,229 45,080,724 150 Andorra 33,025 144 27,872 5,009 77,457 119 Angola 95,676 1,884 86,928 6,864 34,452,146 #final shape of covid dataset covidCountryDataSet, shape (216, 6) merging covid datset and food intake data set with focusing on country name merged_df = pd. merge (covidCountryDataSet, food IntakeNewf,how=' inner', on='Country) Droping the raw which have null values merged_df. dropna (inplace=True) merged_df. head Country total_covid_cases total_covid_deaths total_covid recovered active_covid_cases Population Vegetables Obesity 0 Afghanistan 159,516 7,390 146,135 5,991 40,289,298 6.7642 4.5 1 Albania 244,182 3,292 216,785 24,105 2,872,912 11.7753 223 2 Algeria 232,325 6,468 157,628 68,229 45,080,724 11.6484 26.6 3 Angola 95,676 1,884 86,928 6,864 34,452,146 2.3041 6.8 4 Antigua and Barbuda 5,815 122 4,501 1,192 99,189 5.4495 19.1 # Getting the shape of data set to know the size and columns of the data set print('The shape of our Marged dataset isi', merged_df. shape) The shape of our Marged dataset is: (149, 8) - Visualization Converting few columns to Float type for plotting [ ] Worked merged_df ["active_covid_cases"] = merged_dfb"active_covid_cases"]. str.replace(',': "').astype (float) merged_df ["total_covid_cases"] = merged_df ["total_covid_cases").str. replace(',', ). astype (float) merged_df ["total_covid_recovered"] = merged_df ["total_covid_recovered"). str.replace(',', "). astype (float) dataTypeSeries = merged_df.dtypes print (Data type of each column of Dataframe :) print (dataTypeSeries) Data type of each column of Dataframe : Country object total_covid_cases float64 total_covid_deaths object total_covid_recovered float64 active_covid_cases float64 Population object Vegetables float64 Obesity float64 dtype: object [] #Not working merged_df l' total_covid_deaths').replace(["', ''), '0', inplace=True) merged_df ['total_covid_deaths'] = merged_df l' total_covid_deaths'). str.replace(','; "). astype (float) Double-click (or enter) to edit [] import pandas as pd pd. plotting, register_matplotlib_converters) import matplotlib. pyplot as plt Tomatplotlib inline import seaborn as sns import plotly, express as import os print("Setup Complete") Setup Complete inport Pycountry exceptions - D def get_alpha_3_code (cou) try: return pycountry.countries. Search_fuzzy (cou) [o). alpha_3 except: exceptions.append(cou) get_alpha_3_code(x)) nerged_df l' iso_alpha') - nerged_df ["Country').apply(lambda x # renoveing exceptions for exc in exceptions: merged_df - merced_df (worldonoter l' Country']!-exc] fig- px. scatter_geo (nerged_df, locations-"iso_alpha", color="Country", # which column to use to set the color of markers hover_nane="Country", # column added to hover information size="active_covid_cases", # size of narkers projection='orthographic"> fig fig 2x = plt. subplots (figsize= (10.5)) merged_df. sort_values (by='Obesity'. ascending=False inplace=True) sns. barplot( x="Obesity". y="Country" data-merged_df[:20]): Country Samoa Kuwait Saudi Arabia Jordan Turkey Bahamas New Zealand Canada Lebanon Egypt Malta Australia Fiji Uruguay Chile Hungary Czechia Argentina Lithuania Mexico 10 20 Obesity ax = plt. subplots (figsize= (10.5)) merged_df. sort_values (by=" Vegetables', ascending=False. inplace=True) sns, barplot( x="Vegetables", y="Country". data=merged_df[:20]): Country Tajikistan Armenia Tunisia Guyana Uzbekistan North Macedonia Turkey China Kuwait Croatia Vietnam Albania Algeria Bosnia and Herzegovina Kyrgyzstan Malta Oman Egypt Jordan Niger 0.0 2.5 5.0 7.5 12.5 15.0 17.5 20.0 10.0 Vegetables ] fis ex = plt. subplots (figsize= (10.5)) merged_df. sort_values (by="total_covid_cases', ascending=False, inplace=True) sns. barplot G="total_covid_cases", y="Country". data-perged_dft:20]): Country India Brazil France Turkey Italy Spain Germany Argentina Colombia Mexico Poland Indonesia Ukraine Netherlands South Africa Philippines Canada Malaysia Czechia Belgium 0.0 05 10 15 25 30 3.5 20 total_covid_cases 4.0 le7 fig ax = plt. subplots (figsize= (10.5)) merged_df. sort_values (by="total_covid_deaths", ascending=False inplace=True) sns. barplot G="total_covid_deaths", ="Country". data=merged_df[:20]): Country Brazil India Mexico Indonesia Italy Colombia France Argentina Germany Poland Ukraine South Africa Spain Turkey Romania Philippines Hungary Chile Czechia Vietnam 0 100000 200000 500000 600000 300000 400000 total_covid_deaths ) fig ex = plt. subplots (figsize= (10.5)) merged_df. sort_values (by="active_covid_cases", ascending=False inplace=True) ans. barplot (x="active_covid_cases", y="Country". data=merged_df[:20]): Country France Spain Italy India Germany Australia Brazil Argentina Netherlands Turkey Switzerland Mexico Belgium Sweden Poland Norway Ireland Israel Portugal Finland active_covid_cases le6 fig plt. subplots (figsize= (10.5)) merged_df. sort_values (by="total_covid_recovered, ascending=False inplace=True) sns. barplot (="total_covid_recovered". ;="Country". data=merged_df[:20]): Country India Brazil Turkey France Germany Italy Argentina Spain Colombia Indonesia Poland Ukraine Mexico South Africa Philippines Netherlands Malaysia Canada Czechia Thailand 0.0 20 0.5 10 2's 25 3.0 15 20 total_covid_recovered 35 le7 sns. rezplot (data=merged_df [merged_df [' Population') in 0 ----> 1 sns. regplot (data-merged_df (merged_df l' Population'] 56 57 58 pandas/_libs/ops.pyx in pandas. _libs.ops. scalar_compare() TypeError: ' 20.0 17.5 15.0 12.5 Vegetables 10.0 7.5 5.0 25 0.0 8 oo o's 3.0 10 15 20 2.5 total_covid_cases 3's 4.0 le7 plt. figure (figsize=(10, 6)) sns. regplot (=merged_df [* Obesity'), y=merged_df [' Vegetables']) plt. ylabel(" Vegetables consuption in kg) plt. xlabel("Obesity % each country') plt. title("Veggies consumption and Obesity 9") plt. savefig(output.png': dp:=300) Veggies consumption and Obesity % 20.0 17.5 15.0 125 Vegetables consuption in kg 10.0 7.5 50 25 0.0 0 10 30 20 Obesity % each country 1 Tith Regression Line plt. figure (Pigsize- (10, 6)) sns. resplot (xenerged_st Obesity' yenersed_dri' active_covid_cases']) pitylabel("active_covi cases) pit. xlabel("Obesity each country) pit.title("Active_covid_cases and Obesity) Text (0.5, 1.0. 'Active_covid_cases and Obesity' le6 Active_covid_cases and Obesity% 5 active_covid cases 1 30 Obesity each country pit. figure (fissize-(10,6)) sas. rezplet (erred Vegetables'.. zerreldfl'total_covidrecovered pitylabel("total_covid recovered pit. xlabel ("Vegetables intake of each country pit. titi ("total_covid_recovered and Vegetables intake of each country Text (0.5, 1.0,"total_covid recovered and Vegetables intake of each country') 27 total_covid recovered and vegetables intake of each country 35 20 25 20 total_covidre covered 15 10 05 horts 00 00 25 50 150 115 75 100 12.5 vegetables Wake of each country plt. figure (Pigsize-(10, 6)) sas. regplot (xenerged vegetables' yenerged_ar l'active_covid_cases']) plt.y label ("active_covid_cases) pit. xlabel("Vegetables intake of each country pit. titi:"active_covid_cases and vegetables intake of each country) Text (0.5, 1.0, "active_corid_cases and vegetables intake of each country's active_covid_cases and Vegetables intake of each country 5: active cevid cases hot 00 50 15.0 115 20.0 75 300 12.5 Vegetables intake of each country plt. figure (Pigsize- (10, 6)) sas. resplot (xenerd_atVegetables', yenerged_df l' active_covid_cases']) pitylabel ("active_covid_cases) plt.xlabel("Vegetables each country plt.title("Active_covid_cases and Vegetables Text (0.5, 1.0, Active_corid_cases and Vegetables") let Active_covid_cases and vegetables % 5- active_cewid cases hot 00 150 115 200 75 200 12.5 Vegetables %each country Eron sklearn. cluster import means fron sklearn. Detrics import silhouette_score Zerred_d!' Vegetables', "Obesity": - nered_active_covid_cases' inertiz - for i in range (2, 11): = intialise kmeans * - Keans (m_clusters-i, sax_iter-100, random_state-o) k. fit (X) inertiz.append(k. inerti cluster_labels - k. labels -1 = silhouette score silhouette_25 - silhouette_scoreXcluster_labels) print("For r_clusters-1. the silhouette Score is (4}" format (i, silhouette_29) case choose 5 Froup = Visualize elbow plot and choose the no. of cluster that make a conner or line in this plt. figure (figsize-(12, 6)) sns. lineplot(x-l for i in range (2, 11), inertiz plt. scatter(2, inertiai). 5 - 30 - 'red' sarker-'D') plt. tit.. The Blbow Method') pit. xlabel ('No. of Clusters) plt.ylabel('Inertia'); plt.show() KeyError Traceback (most recent call last) /usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.py in ret_loc(self, ker nethod, tolerance 2997 -> 2898 return self._engine.get_loc (casted_key) 2899 except KeyError 25 err: pandas/_libs/index.pyx in pandas. _libs.index. IndexEngine.get_loco pandas/_libs/index.prx in pandas. _libs, index. IndexEngine.get_loco pandas/_libs/hashtable_class_helper.pxi in pandas. _libs.hashtable. P ObjectHashTable.set_iten) pandas/_libs/hashtable_class_helper.pxi in pandas. _libs. hashtable. PyObjectHashTable.set_iten) KeyError: 'Vegetables'. 'Obesity') The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) 2 frames /usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.py in ret_loc(self, ker nethod, tolerance 2898 return self._engine.get_loc (casted_key) 2899 except KeyError 25 err: -> 2900 rzise KeyError (key) froe err 2901 2902 if tolerance is not None: KeyError: 'Segetables'. 'Obesity) -- - terper' trovice" te" here. Contr. In alangal, raries"'' per part del = your own editorial se the show 35 3 des Obely Machine Learning Regression - Second Dataset Double-click (or enter) to edit [] Fixporting the libraries import numpy a np import matplotlib.prplot as plt import panda: as pd Amatplotlib inline [] dataset merged_d.head Country total_covid_cates total_covid_deaths total_covid_recovered active_covid_caces Population Vegetables Obesity 0 Afghanistan 159516.0 7,390 146135.0 5991.0 40,289,298 6.7642 4.5 1 Albania 244182.0 3,292 216785.0 24105.0 2.872,912 11.7753 22.3 2 Algeria 232325.0 157628.0 68229.0 45,080, 724 11.6484 26.6 3 Angola 95676.0 1,884 86928.0 6864.0 34 452,146 2.3041 6.8 4 Antigua and Barbuda 5815.0 122 4501.0 1192.0 99,189 5.4495 19.1 6.468 [ ] Defining and y I - merged_dfl'total_covid_cases').values #Independent - merged_dfl'total_covid recovered'] =Dependent Hy = sarged_df.loc., 1-4) values #Dependent [] #building correlation matrix import seaborn sns, heatmap (merged_df.corr()) as sns (matplotlib. axes. _subplots. AxesSubplot at Ox7f69c9b359d0> -10 total_covid_cases - - 0.8 total_covid_recovered -0.6 active_covid_cases -0.4 Vegetables 0.2 Obesity total_covid_cases - Vegetables Obesity total_covid_recovered active_covid_cases - [] # Evaluate the model from sklearn. metrics import r2_score r2_score(y_test, y_pred) 0.5837705486126143 - Regression - Second Dataset [] #dataset #Covid Daily cases RAW data from 31/12/2019 till 14/12/2020 pip install pycountry import pandas As pd import pycountry isport plotly, express as px URL_DATASET = "https://raw. githubusercontent.com/mahamad-ums-ir-alik-sit/covid-data/main/COVID-16-reographic-disbtribution-worldwide-2020-12-14. csv" covid_daily_cases - pd. read_csv (URL_DATASET) covid_daily_cases head Requirement already satisfied: pycountry in /usr/local/lib/python2.7/dist-packages (22.1. 10) Requirement already satisfied: setuptools in /usr/local/lib/python2.7/dist-packages (fron pycountry) (57. 4. ) datelep day month your cases deaths countriesAndTerritories 0 14/12/2020 14 12 2020 746 6 Afghanistan 1 13/12/2020 13 12 2020 298 Afghanistan 2 12/12/2020 12 12 2020 113 11 Afghanistan 3 11/12/2020 11 12 2020 63 Afghanistan 4 10/12/2020 10 12 2020 202 16 Afghanistan 9 10 () #SIMPLE REGRESSION FOR TESTING: x - covid_daily_cases l'cases').values #Independent y covid_daily_cases death') #Dependent from sklearn import linear_model regr - linear_nodel. Linearhegression regr. fit X. reshape (-1, 1), ) #predict the Total Covid Cases predicted = regr. predict ([112]]) print (predicted (11.05537432 [] =Defining x and y EX #y = covid_daily_cases cases').values Independent = covid_daily_cases l' deaths'] =Dependent Fy = merged_df. iloc[:, 1:-4]. values Dependent #building correlation matrix import seaborn as sns, heatmap(covid_daily_cases corr() sns -10 day -0.8 month -0.6 year -0.4 cases F02 deaths -0.0 day month year cases deaths [] # Splitting the dataset into the Training set and Test set from sklearn. model_selection import train_test_split X_train. X_test, y_train, y_test = train_test_split(x,y test_size = 0.33. random_state [] # Fitting Multiple Linear Regression to the Training set from sklearn. linear_model import LinearRegression = LinearRegression reg. fit (X_train. reshape (-1, 1), y_train) reg LinearRegression [] # Predicting the Test set results = reg. predict (X_test. reshape (-1, 1)) y_pred y pred array([-4.82533734e+04, 4.94939359e+05, -1.39481706e+04, -5.70813287e+04, 1. 79883549+06, 1.60920173e+05, 7.85413642e+04, 1.34235874e+05, 6.34907272e+04, 1.09384679e+04, 7.01941104e+05, -2.88516585e+03, 1.50688494e+06, 4. 42468511e+04, 1.06833421e+06, 1.6035114le+06, 2.92703036e+06, 9.31678174e+05, -4.00659125e+04, 3. 47386900e+05, 4.09517533e+05, 1.27598447e+06, 2. 45301994e+06, -4.52985472e+04, 1. 18493359e+06, 9.04223322e+05, -4. 10903473e+04, 2.51046044e+06, 1.88577240e+05, 2.04033218e+06, -5.73672797e+04, 7.45977357e+04, 4. 12026953e+05, 8. 48595107e+06, 2.30081992e+05, 1.61826130e+05, -3. 12762621e+04, 2.81701033e+05, 8. 54667403e+05, 1. 43455787e+05, -5. 41621350e+04, -3.50720157e+04, 5. 29794193e+05, 1.82335066e+06, 1.24369231e+05, -5.88789891e+04, 1.84677636e+06, 6.96585537e+05, 1. 73374023e+06, 1.52571227e+06]) [] # Testing reg. predict([[159516.0]]) array([78596.59458557]) [ ] # Evaluate the model from sklearn. metrics import r2_score r2_score (y_test, y_pred) 0.9346188510191625 # Plot the results import matplotlib. pyplot as plt plt figure (figsize=(15,10)) plt. scatter (y_test. y pred) plt. xlabel("Actual') plt.ylabel "Predicted) plt title("Actual vs Predicted) Text (0.5, 1.0, Actual vs Predicted) let Actual vs Predicted . Predicted connect Actual le6 # Predicted values y_pred_df = pd. DataFrame (l'Actual Value':y_test, 'Predicted value' :y_pred, 'Difference': y_test-y_pred!) y_pred_df[0:5] 133 109 Actual Value Predicted value Difference 16971.0 -4.825337e+04 65224.373408 535203.0 4.949394e+05 40263.641184 42013.0 -1.394817e+04 55961.170614 5747.0 -5.708133e+04 62828.328736 903993.0 1.798835e+06 -894842.487280 59 80 7