Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Your final grade will be based on your project.Your project consists on finding a data set on any topic of your preference. I am attaching

Your final grade will be based on your project.Your project consists on finding a data set on any topic of your preference. I am attaching links previously sent to you. You can find any sets to build your project.Your job is to build models on your data set. Throughout the semester, lectures were built as steps to follow for your project. From beginning lecture to last, code was provided and explained on each lecture. You project must include all information presented to you in class (lectures).Your project may not need all code provided but I am sure that most of it can be used to create your final project. Can you change the code provided? Can you come up with a different code? Of course, as a graduate student your project is expected to be innovated, Be creative. Find the right approach for your delivery. This is huge on your grade.Video presentation: As you build your project, build your presentation on Microsoft Power Point. When you are done with your project, make a video of your presentation about +/- 10 minutes, just like a presentation you will deliver in class.

Make a second two (2) minutes video presentation. A summary of your project highlighting major milestones. Your PP presentation must include on each slide a frame with the logo or/ and name of our institution. Very clear. without flaws. Before recording, practice your presentation. Do not speak too fast or too slow. No too laud, no too soft . Normal tone of voice. I did not ask for much to you during semester because the bar is set a little high. I am very exited to see great analysis including data manipulation (data management), data frames, explanation of your variables, descriptive statistics, modeling, findings, conclusions and recommendations. All the good stuff that only a Data Scientist can put together. All I ask is a great work. Do not just make me think you are the best but show it to the whole world.

KDNuggets A Good List of Free Data Sources and a Good Source for Interesting Data Science Information

  • https://www.kdnuggets.com/2017/12/big-data-free-sources.html

Some Multivariate Datasets

  • http://archive.ics.uci.edu/ml/datasets.html
  • http://kaggle.com
  • https://opendata.socrata.com/
  • http://data.gov/
  • http://hadoopilluminated.com/hadoop_illuminated/hadoop-illuminated.pdf Pages 64++
  • www.Kdnuggets.com

Term Paper Topics on Big Data Titles:

  • Real Time Stream Processing, Analysis and Architecture
  • BIG DATA IN TELECOM SECTOR
  • Big Data in Human Resource
  • Recommender Systems and Big Data
  • Big Data and Blockchain Make a Powerful Couple for New Trend
  • Evolution in Master Data Management (MDM) to Address the Needs of the Big Data Revolution
  • Data Visualization in The Era of Big Data
  • Big Data in Banking & Finance
  • A Survey of Music Recommender Systems
  • Heart Failure Prediction based on Multi-Layer Perceptron Classifier
  • Data Mining Technology and Its Application in E-commerce
  • BIG DATA IN MANUFACTURING
  • Big Data In Baseball
  • Time series analysis using the hybrid of artificial neural network and Autoregressive integrated moving average (ARIMA) model
  • HEDGE FUNDS AND BIG DATA
  • BIG DATA IN FOOD INDUSTRY
  • Big Data and Machine Learning
  • Big Data in Deep Learning: How Big data is empowering AI and Machine Learning
  • Data Migration from Relational Database to NoSQL Database
  • Impact of Big Data on Video Games
  • Accelerating Data Management Systems with Machine Learning
  • Big Data in NBA
  • A Survey of Big Data in Sports
  • DATA VISUALIZATION IN THE WORLD OF BIG DATA
  • An Overview of Current RNA Sequencing Methods
  • Big Data Application: Medical Care Science
  • A Survey on Web Usage Mining

Grading Sheet for Term Papers for Individual Traits are graded 1 5, 5 being best.

  • Topic:
  • Content:
  • Knowledge Displayed:
  • Depth (Degree of Difficulty):
  • Clarity:
  • Style:
  • Overall:

Project Grading: Grades based on (range 0 5, average 3):

  • Presentation & Paper:
    • Abstract
    • Brief Description
    • Assumptions
    • Data Set - Variables
    • Data Analysis Descriptive Stats
    • Defining Variables Independent, dependent
    • Fitting Model
    • Conclusion

how can i attach the code he provided in class to you?

i can't copy and paste the ppt he gave i need to attach them. i have inly this code the rest are ppt

# coding: utf-8

# In[22]:

import pandas as pd df = pd.read_csv('C:\\Users\\Vic\\Documents\\SAINT PETERS\\DS 690\Week 3 ch 6 made simple\\' + 'ESRD QIP - Complete QIP Data - Payment Year 2018.csv', header=0)

# In[23]:

print( df)

# In[24]:

print('Number of rows: ' + str(df.shape[0]))

# In[25]:

print('Number of columns: ' + str(df.shape[1]))

# In[26]:

print(df.head(n=5))

# In[27]:

print(df.columns)

# In[28]:

for column in df.columns: print(column)

# In[30]:

df_states = df. groupby('State').size() print(df_states)

# In[31]:

df_states = df.groupby('State').size().sort_values(ascending=False) print(df_states)

# In[33]:

df_states =df.groupby('State').size().sort_values(ascending=False).head(n=10) print(df_states)

# In[40]:

df_ca = df.loc [df['State'] == 'CA'] print(df_ca)

# In[41]:

print(df.groupby('Total Performance Score').size())

# In[45]:

df_filt= df.loc[df['Total Performance Score'] != 'No Score']

# In[46]:

df_filt['Total Performance Score'] = pd.to_numeric(df_filt['Total Performance Score'])

# In[53]:

df_tps = df_filt[['Facility Name','State','Total Performance Score']].sort_values('Total Performance Score') print(df_tps.head(n=5))

# In[54]:

import numpy as np df_state_means = df_filt.groupby('State').agg({'Total Performance Score': np.mean}) print(df_state_means.sort_values('Total Performance Score',ascending=False))

# In[56]:

import numpy as np df_state_means = df_filt.groupby('State').agg({'Total Performance Score': np.mean,'State': np.size}) print(df_state_means.sort_values('Total Performance Score',ascending=False))

# In[75]:

import os

# In[76]:

os.path.abspath("hvbp_tps_11_07_2017.csv")

# In[79]:

import pandas as pd pathname ='C:\\Users\\Vic\\Documents\\SAINT PETERS\\DS 690\Week 3 ch 6 made simple\\Hospital_Revised_Flatfiles\\'

files_of_interest = ['hvbp_tps_11_07_2017.csv','hvbp_clinical_care_11_07_2017.csv','hvbp_safety_11_07_2017.csv', 'hvbp_efficiency_11_07_2017.csv','hvbp_hcahps_11_07_2017.csv']

dfs = { foi: pd.read_csv(pathname + foi, header=0) for foi in files_of_interest}

# In[80]:

for k, v in dfs.items(): print( k + ' - Number of rows: ' + str(v.shape[0]) + ', Number of columns: ' + str(v.shape[1]) )

# In[85]:

for v in dfs.values(): for column in v.columns: print(column) print(' ')

# In[87]:

df_master = dfs[files_of_interest[0]].merge( dfs[files_of_interest[1]], on='Provider Number', how='left', copy=False ) print(df_master.shape)

# In[88]:

print(df_master.columns)

# In[90]:

for df in dfs.values(): df.columns = [col if col not in ['Provider_Number'] else 'Provider Number' for col in df.columns] for num in [2,3,4]: df_master = df_master.merge( dfs[files_of_interest[num]], on='Provider Number', how='left', copy=False ) print(df_master.shape)

# In[91]:

for column in df_master.columns: print(column)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

SQL Server Query Performance Tuning

Authors: Sajal Dam, Grant Fritchey

4th Edition

1430267429, 9781430267423

More Books

Students also viewed these Databases questions