Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Project Purpose: Demonstrate your ability to apply the modeling tools and methods to a data set of your own interest. The project is way to

Project Purpose: Demonstrate your ability to apply the modeling tools and methods to a data set of your own interest. The project is way to put "it all together" in terms of material taught to you. The project is not something to agonize over. Learn, experience, and have fun with data! I want you to focus on the predictive goal for the project. The first step is the identification of variables. In this course, we will study predictive models for predicting a Y variable. Y variables come in two types: 1) Continuous Y variable. This is a Y variable that takes on a range of values (e.g., $, weight, rating scale, temperature, winning %). For such a variable, we will use the regression model. 2) Binary Y variable that takes on two values (yes/no). Examples might be bank customer defaults on loan or not, student graduates or not, customer returns or not, etc. For such a variable, we will use the regression model. Our coverage of the continuous Y variable will make up about 90% of the course. We will not get to the binary Y scenario until the very end of the course. So, I would like your project experience to be the building of models for a continuous Y variable (not a binary Y variable). Your project should minimally do complete regression model build; you can consider adding regression trees if your data set is not a time series. Predictor Variables/Data After you have determined the target Y variable that you are trying to predict, the next step is to determine your "X" variables. X variables are the information that you think might be insightful in the prediction of the Y variable. You should target minimally 510 predictor variables. I would minimally target 3050 observations, more is better. Note: If your Y variable and X variables are timeoriented (e.g., Y is monthly sales), then the time frequency should be the same for the Y and X variables (month vs. month, week vs. week, daily vs. daily, etc.). Don't mix time frequencies (e.g., don't collect annual Y data and try to predict the Y variable with monthly X variables). If you are looking at sports data, then consider getting 23 consecutive seasons (e.g., two seasons of NFL teams givens 2*32 = 64 observations). If you are dealing with a timeseries on monthly sales, then get at least 35 years' worth which will give you 3660 observations; this gives the opportunity to see a repeat of a given month for seasonal estimation.

Examples Let me emphasize that the goal is to show me your ability to model a set of data. I am open to your data being from your company or some personal interest (data gathered on your own or from the internet). Here are some examples of past projects: Y variable: Monthly Claims Expenses of a particular insurance company. X variables: Variety of monthly economic indicators such as medical CPI. Y variable: Selling price of used Acuras. X variables: Mileage, age, transmission type, and a variety of accessory information. Y variable: NBA teams' winning percentages. X variables: A slew of statistics (RBI, saves, etc.). There are millions of possibilities. If the data are from your company, then that is your source. If you are looking externally on the internet, there are tens of thousands of sites. Google is your best friend. I have not had a student not come up with a data set to analyze. Note: Data may be collected personally, from your business, or from a variety of internet sources. However, I will NOT allow the following sources: 1) There are sample data found within software such as JMP or others (R, Minitab, etc.). These are not acceptable sources. 2) Data examples found in other statistics books. Finance applications: Some students explore stockrelated data. For such data, we generally look at the changes in prices relative to changes to prices rather than the prices per se. Looking at a single stock series and its changes is not sufficient for the project. Make sure that your Y variable is not somehow defined by your X variables. Here are some problematic projects of the past: 1) Predicting a baseball team's winning % as a Y with one X being the number of runs scored in the season and the other X being the number of runs against the team in the season. These X's pretty much "define" the Y and nearly perfectly predict Y. No insights are gained. 2) Like the above problematic project, I had a student do a project of predicting how many hits her favorite baseball player had per game as a Y variable and her X variables included the number of times at bat per game and number of strikeouts per game. Other than the of walks, her X variables pretty much "define" the Y and nearly perfectly predict Y. No insights are gained.

3) Predicting average monthly temperature in Milwaukee as a Y with X variables being the high and low temperatures of the month. Here again the X variables are almost one and the same as the Y variable. It would be no surprise that they nearly perfectly predict Y. No insights are gained. Software The main software for this course is JMP. The project must be done in JMP. It is fine to supplement JMP output with occasional Excel output. Your final report is to be neat, clearly written, and concise. State clearly your objective, methods, and conclusion(s). Some weight will be given to the quality of report writing. Include necessary figures to permit the reader to understand how and why you reached your conclusion(s). I suggest that your report should minimally contain the following: Introduction and statement of the problem being addressed. Source of data. Unambiguous explanation of what variable is being predicted and what variables are the potential predictor variables. Provide clear objective definitions of the variables. Presentation of the data and the data analysis (all relevant software output). Conclusions and recommendations for further study. It is critical that you show ALL output of your data journey (from initial plots to implementing model selection methods to model estimation to model diagnostic checks to model performance measures to final model). Beyond any standard regression hypothesis testing (e.g., ttest, Ftest) that you may do, your project should include some type of model selection, whether stepwise with AIC or stepwise with data splitting (refer to lecture on model selection).

I expect comments with all the output. The comments do not have to be long; a few sentences telling the reader what you see and what your next steps are. Do not put output in an Appendix. Integrate the output of your data journey in your report writeup. Finally, you are not graded on how great your model predicts. Some students will have models that explain a high % of the data while others will find that their X's didn't do so well. You are not graded on this dimension. Data are the data. Let the data speak whatever the results are. In the end, you are graded on your professionalism of presentation, completeness of providing all the steps of your data journey, your comments along the way. Wrap up the paper with a conclusion and next steps. Remember that this is your project (not mine). My role is not to "do" your project. So, have confidence in yourself to tell me a data story. So, please don't send me a draft of the project prior to submission asking, "Is it okay?" I am presuming that you will create report in a Word document. Name your Word document with your last name, e.g., "Smithproject.docx". Also, I ask that you electronically submit the data file. Submit the data file as a CSV file. This means that you will submit two files (no less, no more): (1) the report; and (2) the CSV data file.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Linear Algebra

Authors: Jim Hefferon

1st Edition

978-0982406212, 0982406215

More Books

Students also viewed these Mathematics questions

Question

x^2+2x+1=1 a) x=1,2 b) x= 0, -2 c) x=1 d) x= 0,2 e) x-1

Answered: 1 week ago