Your final grade will be based on your project Your project consists on finding a data set on any topic of your preference I am attaching links previously sent to you You can find any sets to build your project Your job is to build models on your data set Throughout the semester, lectures were built as steps to follow for your project From beginning lecture to last, code was provided and explained on each lecture You project must include all information presented to you in class (lectures) Your project may not need all code provided but I am sure that most of it can be used to create your final project Can you change the code provided Can you come up with a different code Of course, as a graduate student your project is expected to be innovated, Be creative Find the right approach for your delivery This is huge on your grade Video presentation As you build your project, build your presentation on Microsoft Power Point When you are done with your project, make a video of your presentation about 10 minutes, just like a presentation you will deliver in class Make a second two (2) minutes video presentation A summary of your project highlighting major milestones Your PP presentation must include on each slide a frame with the logo or and name of our institution Very clear without flaws Before recording, practice your presentation Do not speak too fast or too slow No too laud, no too soft Normal tone of voice I did not ask for much to you during semester because the bar is set a little high I am very exited to see great analysis including data manipulation (data management), data frames, explanation of your variables, descriptive statistics, modeling, findings, conclusions and recommendations All the good stuff that only a Data Scientist can put together All I ask is a great work Do not just make me think you are the best but show it to the whole world KDNuggets A Good List of Free Data Sources and a Good Source for Interesting Data Science Information https www kdnuggets com 2017 12 big data free sources html Some Multivariate Datasets http archive ics uci edu ml datasets html http kaggle com https opendata socrata com http data gov http hadoopilluminated com hadoop illuminated hadoop illuminated pdf Pages 64 www Kdnuggets com Term Paper Topics on Big Data Titles Real Time Stream Processing, Analysis and Architecture BIG DATA IN TELECOM SECTOR Big Data in Human Resource Recommender Systems and Big Data Big Data and Blockchain Make a Powerful Couple for New Trend Evolution in Master Data Management (MDM) to Address the Needs of the Big Data Revolution Data Visualization in The Era of Big Data Big Data in Banking Finance A Survey of Music Recommender Systems Heart Failure Prediction based on Multi Layer Perceptron Classifier Data Mining Technology and Its Application in E commerce BIG DATA IN MANUFACTURING Big Data In Baseball Time series analysis using the hybrid of artificial neural network and Autoregressive integrated moving average (ARIMA) model HEDGE FUNDS AND BIG DATA BIG DATA IN FOOD INDUSTRY Big Data and Machine Learning Big Data in Deep Learning How Big data is empowering AI and Machine Learning Data Migration from Relational Database to NoSQL Database Impact of Big Data on Video Games Accelerating Data Management Systems with Machine Learning Big Data in NBA A Survey of Big Data in Sports DATA VISUALIZATION IN THE WORLD OF BIG DATA An Overview of Current RNA Sequencing Methods Big Data Application Medical Care Science A Survey on Web Usage Mining Grading Sheet for Term Papers for Individual Traits are graded 1 5, 5 being best Topic Content Knowledge Displayed Depth (Degree of Difficulty) Clarity Style Overall Project Grading Grades based on (range 0 5, average 3) Presentation Paper Abstract Brief Description Assumptions Data Set Variables Data Analysis Descriptive Stats Defining Variables Independent, dependent Fitting Model Conclusion how can i attach the code he provided in class to you i can't copy and paste the ppt he gave i need to attach them i have inly this code the rest are ppt coding utf 8 In 22 import pandas as pd df pd read csv('C Users Vic Documents SAINT PETERS DS 690 Week 3 ch 6 made simple ' 'ESRD QIP Complete QIP Data Payment Year 2018 csv', header 0) In 23 print( df) In 24 print('Number of rows ' str(df shape 0 )) In 25 print('Number of columns ' str(df shape 1 )) In 26 print(df head(n 5)) In 27 print(df columns) In 28 for column in df columns print(column) In 30 df states df groupby('State') size() print(df states) In 31 df states df groupby('State') size() sort values(ascending False) print(df states) In 33 df states df groupby('State') size() sort values(ascending False) head(n 10) print(df states) In 40 df ca df loc df 'State' 'CA' print(df ca) In 41 print(df groupby('Total Performance Score') size()) In 45 df filt df loc df 'Total Performance Score' 'No Score' In 46 df filt 'Total Performance Score' pd to numeric(df filt 'Total Performance Score' ) In 53 df tps df filt 'Facility Name','State','Total Performance Score' sort values('Total Performance Score') print(df tps head(n 5)) In 54 import numpy as np df state means df filt groupby('State') agg( 'Total Performance Score' np mean ) print(df state means sort values('Total Performance Score',ascending False)) In 56 import numpy as np df state means df filt groupby('State') agg( 'Total Performance Score' np mean,'State' np size ) print(df state means sort values('Total Performance Score',ascending False)) In 75 import os In 76 os path abspath( hvbp tps 11 07 2017 csv ) In 79 import pandas as pd pathname 'C Users Vic Documents SAINT PETERS DS 690 Week 3 ch 6 made simple Hospital Revised Flatfiles ' files of interest 'hvbp tps 11 07 2017 csv','hvbp clinical care 11 07 2017 csv','hvbp safety 11 07 2017 csv', 'hvbp efficiency 11 07 2017 csv','hvbp hcahps 11 07 2017 csv' dfs foi pd read csv(pathname foi, header 0) for foi in files of interest In 80 for k, v in dfs items() print( k ' Number of rows ' str(v shape 0 ) ', Number of columns ' str(v shape 1 ) ) In 85 for v in dfs values() for column in v columns print(column) print(' ') In 87 df master dfs files of interest 0 merge( dfs files of interest 1 , on 'Provider Number', how 'left', copy False ) print(df master shape) In 88 print(df master columns) In 90 for df in dfs values() df columns col if col not in 'Provider Number' else 'Provider Number' for col in df columns for num in 2,3,4 df master df master merge( dfs files of interest num , on 'Provider Number', how 'left', copy False ) print(df master shape) In 91 for column in df master columns print(column)

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 29, 2024

Your final grade will be based on your project.Your project consists on finding a data set on any topic of your preference. I am attaching

Your final grade will be based on your project.Your project consists on finding a data set on any topic of your preference. I am attaching links previously sent to you. You can find any sets to build your project.Your job is to build models on your data set. Throughout the semester, lectures were built as steps to follow for your project. From beginning lecture to last, code was provided and explained on each lecture. You project must include all information presented to you in class (lectures).Your project may not need all code provided but I am sure that most of it can be used to create your final project. Can you change the code provided? Can you come up with a different code? Of course, as a graduate student your project is expected to be innovated, Be creative. Find the right approach for your delivery. This is huge on your grade.Video presentation: As you build your project, build your presentation on Microsoft Power Point. When you are done with your project, make a video of your presentation about +/- 10 minutes, just like a presentation you will deliver in class.

Make a second two (2) minutes video presentation. A summary of your project highlighting major milestones. Your PP presentation must include on each slide a frame with the logo or/ and name of our institution. Very clear. without flaws. Before recording, practice your presentation. Do not speak too fast or too slow. No too laud, no too soft . Normal tone of voice. I did not ask for much to you during semester because the bar is set a little high. I am very exited to see great analysis including data manipulation (data management), data frames, explanation of your variables, descriptive statistics, modeling, findings, conclusions and recommendations. All the good stuff that only a Data Scientist can put together. All I ask is a great work. Do not just make me think you are the best but show it to the whole world.

KDNuggets A Good List of Free Data Sources and a Good Source for Interesting Data Science Information

https://www.kdnuggets.com/2017/12/big-data-free-sources.html

Some Multivariate Datasets

http://archive.ics.uci.edu/ml/datasets.html
http://kaggle.com
https://opendata.socrata.com/
http://data.gov/
http://hadoopilluminated.com/hadoop_illuminated/hadoop-illuminated.pdf Pages 64++
www.Kdnuggets.com

Term Paper Topics on Big Data Titles:

Real Time Stream Processing, Analysis and Architecture
BIG DATA IN TELECOM SECTOR
Big Data in Human Resource
Recommender Systems and Big Data
Big Data and Blockchain Make a Powerful Couple for New Trend
Evolution in Master Data Management (MDM) to Address the Needs of the Big Data Revolution
Data Visualization in The Era of Big Data
Big Data in Banking & Finance
A Survey of Music Recommender Systems
Heart Failure Prediction based on Multi-Layer Perceptron Classifier
Data Mining Technology and Its Application in E-commerce
BIG DATA IN MANUFACTURING
Big Data In Baseball
Time series analysis using the hybrid of artificial neural network and Autoregressive integrated moving average (ARIMA) model
HEDGE FUNDS AND BIG DATA
BIG DATA IN FOOD INDUSTRY
Big Data and Machine Learning
Big Data in Deep Learning: How Big data is empowering AI and Machine Learning
Data Migration from Relational Database to NoSQL Database
Impact of Big Data on Video Games
Accelerating Data Management Systems with Machine Learning
Big Data in NBA
A Survey of Big Data in Sports
DATA VISUALIZATION IN THE WORLD OF BIG DATA
An Overview of Current RNA Sequencing Methods
Big Data Application: Medical Care Science
A Survey on Web Usage Mining

Grading Sheet for Term Papers for Individual Traits are graded 1 5, 5 being best.

Topic:
Content:
Knowledge Displayed:
Depth (Degree of Difficulty):
Clarity:
Style:
Overall:

Project Grading: Grades based on (range 0 5, average 3):

Presentation & Paper:
- Abstract
- Brief Description
- Assumptions
- Data Set - Variables
- Data Analysis Descriptive Stats
- Defining Variables Independent, dependent
- Fitting Model
- Conclusion

how can i attach the code he provided in class to you?

i can't copy and paste the ppt he gave i need to attach them. i have inly this code the rest are ppt

# coding: utf-8

# In[22]:

import pandas as pd df = pd.read_csv('C:\\Users\\Vic\\Documents\\SAINT PETERS\\DS 690\Week 3 ch 6 made simple\\' + 'ESRD QIP - Complete QIP Data - Payment Year 2018.csv', header=0)

# In[23]:

print( df)

# In[24]:

print('Number of rows: ' + str(df.shape[0]))

# In[25]:

print('Number of columns: ' + str(df.shape[1]))

# In[26]:

print(df.head(n=5))

# In[27]:

print(df.columns)

# In[28]:

for column in df.columns: print(column)

# In[30]:

df_states = df. groupby('State').size() print(df_states)

# In[31]:

df_states = df.groupby('State').size().sort_values(ascending=False) print(df_states)

# In[33]:

df_states =df.groupby('State').size().sort_values(ascending=False).head(n=10) print(df_states)

# In[40]:

df_ca = df.loc [df['State'] == 'CA'] print(df_ca)

# In[41]:

print(df.groupby('Total Performance Score').size())

# In[45]:

df_filt= df.loc[df['Total Performance Score'] != 'No Score']

# In[46]:

df_filt['Total Performance Score'] = pd.to_numeric(df_filt['Total Performance Score'])

# In[53]:

df_tps = df_filt[['Facility Name','State','Total Performance Score']].sort_values('Total Performance Score') print(df_tps.head(n=5))

# In[54]:

import numpy as np df_state_means = df_filt.groupby('State').agg({'Total Performance Score': np.mean}) print(df_state_means.sort_values('Total Performance Score',ascending=False))

# In[56]:

import numpy as np df_state_means = df_filt.groupby('State').agg({'Total Performance Score': np.mean,'State': np.size}) print(df_state_means.sort_values('Total Performance Score',ascending=False))

# In[75]:

import os

# In[76]:

os.path.abspath("hvbp_tps_11_07_2017.csv")

# In[79]:

import pandas as pd pathname ='C:\\Users\\Vic\\Documents\\SAINT PETERS\\DS 690\Week 3 ch 6 made simple\\Hospital_Revised_Flatfiles\\'

files_of_interest = ['hvbp_tps_11_07_2017.csv','hvbp_clinical_care_11_07_2017.csv','hvbp_safety_11_07_2017.csv', 'hvbp_efficiency_11_07_2017.csv','hvbp_hcahps_11_07_2017.csv']

dfs = { foi: pd.read_csv(pathname + foi, header=0) for foi in files_of_interest}

# In[80]:

for k, v in dfs.items(): print( k + ' - Number of rows: ' + str(v.shape[0]) + ', Number of columns: ' + str(v.shape[1]) )

# In[85]:

for v in dfs.values(): for column in v.columns: print(column) print(' ')

# In[87]:

df_master = dfs[files_of_interest[0]].merge( dfs[files_of_interest[1]], on='Provider Number', how='left', copy=False ) print(df_master.shape)

# In[88]:

print(df_master.columns)

# In[90]:

for df in dfs.values(): df.columns = [col if col not in ['Provider_Number'] else 'Provider Number' for col in df.columns] for num in [2,3,4]: df_master = df_master.merge( dfs[files_of_interest[num]], on='Provider Number', how='left', copy=False ) print(df_master.shape)

# In[91]:

for column in df_master.columns: print(column)