Question
Here is a simple definition of data science: Data science combines multiple fields including statistics, scientific methods, and data analysis to extract value from data.
Data science combines multiple fields including statistics, scientific methods, and data analysis to extract value from data.
Those who practice data science are called data scientists, and they combine a range of skills to analyze data collected from the web, smartphones, customers, sensors, and other sources.
data sciencedefinitely has the potential to help others.
Data science enables retailers to influence our purchasing habits, but the importance of gathering data extends much further.
Data science can improve public health through wearable trackers that motivate individuals to adopt healthier habits and can alert people to potentially critical health issues. Data can also improve diagnostic accuracy, accelerate finding cures for specific diseases, or even stop the spread of a virus. When the Ebola virus outbreak hit West Africa in 2014, scientists were able to track the spread of the disease and predict the areas most vulnerable to the illness. This data helped health officials get in front of the outbreak and prevent it from becoming a worldwide epidemic.
Data science has critical applications across most industries. For example, data is used by farmers for efficient food growth and delivery, by food suppliers to cut down on food waste, and by nonprofit organizations to boost fundraising efforts and predict funding needs.
Follow the given code below which is commented:
#import libraries
import os
import numpy as np
import pandas as pd
import numpy as np, pandas as pd
import matplotlib.pyplot as plt
#import the dataset
#dataset taken from kaggle (pokemon stats)
df = pd.read_csv("C:/Users/aksha/Downloads/Pokemon.csv", sep=",")
x = df.iloc[:, 6].values.reshape(-1, 1) # Defensive stats of the pokemons, this is our parameter
y = df.iloc[:, 4].values.reshape(-1, 1) # Total stats of the pokemons, this is our target
plt.scatter(x, y) # Let's visualize our raw data
plt.show()
output:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import r2_score
# Linear Regression Method
linear_reg = LinearRegression()
linear_reg.fit(x, y)
y_LinPredict = linear_reg.predict(x)
plt.scatter(x, y, color="lightblue") # Let's visualize our raw data
plt.plot(x, y_LinPredict, color="darkorange") # Then visualize the Linear Regression model
plt.show()
print("R Square Score for the Linear Regression : ", r2_score(y, y_LinPredict))
#It can be seen that the Linear Regression is not that efficient for this data set
output:
# Polynomial Linear Regression Method
poly_reg = PolynomialFeatures(degree=2)
x_poly = poly_reg.fit_transform(x)
poly_linear_reg = LinearRegression()
poly_linear_reg.fit(x_poly, y)
y_PolyPredict = poly_linear_reg.predict(x_poly)
plt.scatter(x, y, color="lightblue") # Let's visualize our raw data
plt.plot(x, y_PolyPredict, color="darkorange") # Then visualize the Linear Regression model
plt.show()
print("R Square Score for the Polynomial Linear Regression : ", r2_score(y, y_PolyPredict))
#It can be seen that the Polynomial Linear Regression is also not that efficient for this data set. Moreover, the plot given by this method seems a bit absurd.
output:
# Multiple Linear Regression Method
# In this step I re-define my parameter data "x"
x = df.iloc[:, [5, 6]].values # I define my parameters as Attack and Defense Stats
y_multilinear_reg = LinearRegression()
y_multilinear_reg.fit(x, y)
y_MultiLinearPredict = y_multilinear_reg.predict(x)
print("R Square Score the Multiple Linear Regression: ", r2_score(y, y_MultiLinearPredict))
#It can be seen that the Multiple Linear Regression is more efficient than the Linear Regression and Polynomial Linear Regression methods, in this case. However, it's R Square score is also not that high.
output:
The Pandas library is a robust piece of software and is full of advantages.
- 1. Excellent representation of data:
- 2. Less coding done, more work accomplished:
- 3. Efficient handling of huge data:
- 4. Extensive feature set:
- 5. Built for Python:
- 6. The flexibility of data and easy customization:
Disadvantages:
- 1. A complex syntax that is not always in line with Python:
- 2. Learning curve: (Pandas have a very steep learning curve. While it may seem very easy to use and navigate through in the beginning, it is just the tip of the iceberg.)
- 3. Poor documentation:
- 4. Poor 3D matrix compatibility:
R-Squared (R or the coefficient of determination) is a statistical measure in a regression model that determines the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, r-squared shows how well the data fit the regression model (the goodness of fit).
Interpretation of R-Squared
The most common interpretation of r-squared is how well the regression model fits the observed data. For example, an r-squared of 60% reveals that 60% of the data fit the regression model. Generally, a higher r-squared indicates a better fit for the model.
However, it is not always the case that a high r-squared is good for the regression model. The quality of the statistical measure depends on many factors, such as the nature of the variables employed in the model, the units of measure of the variables, and the applied data transformation. Thus, sometimes, a high r-squared can indicate the problems with the regression model.
A low r-squared figure is generally a bad sign for predictive models. However, in some cases, a good model may show a small value.
There is no universal rule on how to incorporate the statistical measure in assessing a model. The context of the experiment or forecast is extremely important and, in different scenarios, the insights from the metric can vary.
python 3
this is the answer of the questions of page 4 and 5 but I also need the python code that generated this project and clear the view of those question to be answered. so look at the hint please
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started