Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Here is a simple definition of data science: Data science combines multiple fields including statistics, scientific methods, and data analysis to extract value from data.

Here is a simple definition ofdata science:

Data science combines multiple fields including statistics, scientific methods, and data analysis to extract value from data.

Those who practice data science are called data scientists, and they combine a range of skills to analyze data collected from the web, smartphones, customers, sensors, and other sources.

data sciencedefinitely has the potential to help others.

Data science enables retailers to influence our purchasing habits, but the importance of gathering data extends much further.

Data science can improve public health through wearable trackers that motivate individuals to adopt healthier habits and can alert people to potentially critical health issues. Data can also improve diagnostic accuracy, accelerate finding cures for specific diseases, or even stop the spread of a virus. When the Ebola virus outbreak hit West Africa in 2014, scientists were able to track the spread of the disease and predict the areas most vulnerable to the illness. This data helped health officials get in front of the outbreak and prevent it from becoming a worldwide epidemic.

Data science has critical applications across most industries. For example, data is used by farmers for efficient food growth and delivery, by food suppliers to cut down on food waste, and by nonprofit organizations to boost fundraising efforts and predict funding needs.

Follow the given code below which is commented:

#import libraries

import os

import numpy as np

import pandas as pd

import numpy as np, pandas as pd

import matplotlib.pyplot as plt

#import the dataset

#dataset taken from kaggle (pokemon stats)

df = pd.read_csv("C:/Users/aksha/Downloads/Pokemon.csv", sep=",")

x = df.iloc[:, 6].values.reshape(-1, 1) # Defensive stats of the pokemons, this is our parameter

y = df.iloc[:, 4].values.reshape(-1, 1) # Total stats of the pokemons, this is our target

plt.scatter(x, y) # Let's visualize our raw data

plt.show()

output:

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

from sklearn.metrics import r2_score

# Linear Regression Method

linear_reg = LinearRegression()

linear_reg.fit(x, y)

y_LinPredict = linear_reg.predict(x)

plt.scatter(x, y, color="lightblue") # Let's visualize our raw data

plt.plot(x, y_LinPredict, color="darkorange") # Then visualize the Linear Regression model

plt.show()

print("R Square Score for the Linear Regression : ", r2_score(y, y_LinPredict))

#It can be seen that the Linear Regression is not that efficient for this data set

output:

# Polynomial Linear Regression Method

poly_reg = PolynomialFeatures(degree=2)

x_poly = poly_reg.fit_transform(x)

poly_linear_reg = LinearRegression()

poly_linear_reg.fit(x_poly, y)

y_PolyPredict = poly_linear_reg.predict(x_poly)

plt.scatter(x, y, color="lightblue") # Let's visualize our raw data

plt.plot(x, y_PolyPredict, color="darkorange") # Then visualize the Linear Regression model

plt.show()

print("R Square Score for the Polynomial Linear Regression : ", r2_score(y, y_PolyPredict))

#It can be seen that the Polynomial Linear Regression is also not that efficient for this data set. Moreover, the plot given by this method seems a bit absurd.

output:

# Multiple Linear Regression Method

# In this step I re-define my parameter data "x"

x = df.iloc[:, [5, 6]].values # I define my parameters as Attack and Defense Stats

y_multilinear_reg = LinearRegression()

y_multilinear_reg.fit(x, y)

y_MultiLinearPredict = y_multilinear_reg.predict(x)

print("R Square Score the Multiple Linear Regression: ", r2_score(y, y_MultiLinearPredict))

#It can be seen that the Multiple Linear Regression is more efficient than the Linear Regression and Polynomial Linear Regression methods, in this case. However, it's R Square score is also not that high.

output:

The Pandas library is a robust piece of software and is full of advantages.

  • 1. Excellent representation of data:
  • 2. Less coding done, more work accomplished:
  • 3. Efficient handling of huge data:
  • 4. Extensive feature set:
  • 5. Built for Python:
  • 6. The flexibility of data and easy customization:

Disadvantages:

  • 1. A complex syntax that is not always in line with Python:
  • 2. Learning curve: (Pandas have a very steep learning curve. While it may seem very easy to use and navigate through in the beginning, it is just the tip of the iceberg.)
  • 3. Poor documentation:
  • 4. Poor 3D matrix compatibility:

R-Squared (R or the coefficient of determination) is a statistical measure in a regression model that determines the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, r-squared shows how well the data fit the regression model (the goodness of fit).

Interpretation of R-Squared

The most common interpretation of r-squared is how well the regression model fits the observed data. For example, an r-squared of 60% reveals that 60% of the data fit the regression model. Generally, a higher r-squared indicates a better fit for the model.

However, it is not always the case that a high r-squared is good for the regression model. The quality of the statistical measure depends on many factors, such as the nature of the variables employed in the model, the units of measure of the variables, and the applied data transformation. Thus, sometimes, a high r-squared can indicate the problems with the regression model.

A low r-squared figure is generally a bad sign for predictive models. However, in some cases, a good model may show a small value.

There is no universal rule on how to incorporate the statistical measure in assessing a model. The context of the experiment or forecast is extremely important and, in different scenarios, the insights from the metric can vary.

python 3

this is the answer of the questions of page 4 and 5 but I also need the python code that generated this project and clear the view of those question to be answered. so look at the hint please

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction to Wireless and Mobile Systems

Authors: Dharma P. Agrawal, Qing An Zeng

4th edition

1305087135, 978-1305087132, 9781305259621, 1305259629, 9781305537910 , 978-130508713

More Books

Students also viewed these Programming questions

Question

How do digital media change how we relate to others?

Answered: 1 week ago

Question

Compare and contrast licensing and subcontracting.

Answered: 1 week ago