Question

1 Approved Answer

Posted on Sep 05, 2024

Problem 2. This problem is continuing from Assignment 7. 1. Suppose you have a $100, 000 account. Starting from day 1500, at the end of

Problem 2. This problem is continuing from Assignment 7.

1. Suppose you have a $100, 000 account. Starting from day 1500, at the end of the trading day, you decide whether to long(meaning buy) the stock based on the arima model prediction for day 1501. If the prediction is positive, then you long(buy) using all your money at the adjusted closing price of day 1500, and close(meaning sell) at the end of day 1501 using the corresponding adjusted closing price of day 1501. Otherwise you do nothing. We neglect transaction costs here. What is the value of your portfolio at the end of the process? Note, if you buy on day 1500, then you will sell at the end of day 1501 (using the adjusted closing price of day 1501). If now the prediction for day 1502 is positive, then you will use all your money to buy again at the end of day 1501 and sell on 1502.(effectively this means you buy on day 1500, and sell on day 1502).

2. Now, do the same thing as in the previous step, except that you long(buy) only when the predicted return is greater than the aver age return of the training set (the first 1500). What is the value of your portfolio at the end of the process?

3. Now, do the same thing as the previous step, except that you long(buy) only when the predicted return is greater than the mean return of the training set + 0.25 standard deviation of the time series (the first 1500). What is the value of your portfolio at the end of the process?

4. Do the same thing as the previous step, except that you long only when the predicted return is greater than the average returns of the past twenty trading days + k the standard deviation of the time series, where k = 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5. Use a for loop for this computation. Whats the value of your portfolio at the end of the process for each k? Which k is optimal? 2

install.packages("quantmod") library(quantmod) install.packages("fpp") library(fpp) library(forecast) # Define the start and end dates start_date <- as.Date("2013-10-01") end_date <- as.Date("2019-12-31") # Download SPY data using getSymbols function getSymbols("SPY", from = start_date, to = end_date) # Calculate daily log returns SPY$log_return <- dailyReturn(Ad(SPY)) # Define the split point (1500th row) split_point <- 1500 # Create the training and holdout sets training_data <- SPY[1:split_point, ] holdout_data <- SPY[(split_point + 1):nrow(SPY), ] log_returns <- training_data$log_return[!is.na(training_data$log_return)] ##Question 1 m1 <- auto.arima(log_returns) m1 <-arima(log_returns, order = c(1, 0, 1)) summary(m1) ##Question 2 acf(log_returns, lag.max = 40) pacf(log_returns, lag.max = 40) # Fit the MA(q) model (model 2) m2 <- arima(log_returns, order = c(0, 0, 1)) # Fit the AR(p) model (model 3) m3 <- arima(log_returns, order = c(2, 0, 0)) ##Question 3 # Use the backtest function to perform backtesting on the three models "backtest" <- function(m1,rt,orig,h,xre=NULL,fixed=NULL,inc.mean=TRUE){ # m1: is a time-series model object # orig: is the starting forecast origin # rt: the time series # xre: the independent variables # h: forecast horizon # fixed: parameter constriant # inc.mean: flag for constant term of the model. # regor=c(m1$arma[1],m1$arma[6],m1$arma[2]) seaor=list(order=c(m1$arma[3],m1$arma[7],m1$arma[4]),period=m1$arma[5]) T=length(rt) if(orig > T)orig=T if(h < 1) h=1 rmse=rep(0,h) mabso=rep(0,h) nori=T-orig err=matrix(0,nori,h) jlast=T-1 for (n in orig:jlast){ jcnt=n-orig+1 x=rt[1:n] if (is.null(xre)) pretor=NULL else pretor=xre[1:n] mm=arima(x,order=regor,seasonal=seaor,xreg=pretor,fixed=fixed,include.mean=inc.mean) if (is.null(xre)) nx=NULL else nx=xre[(n+1):(n+h)] fore=predict(mm,h,newxreg=nx) kk=min(T,(n+h)) # nof is the effective number of forecats at the forecast origin n. nof=kk-n pred=fore$pred[1:nof] obsd=rt[(n+1):kk] err[jcnt,1:nof]=obsd-pred } # for (i in 1:h){ iend=nori-i+1 tmp=err[1:iend,i] mabso[i]=sum(abs(tmp))/iend rmse[i]=sqrt(sum(tmp^2)/iend) } print("RMSE of out-of-sample forecasts") print(rmse) print("Mean absolute error of out-of-sample forecasts") print(mabso) backtest <- list(origin=orig,error=err,rmse=rmse,mabso=mabso) } results_m1 <- backtest(m1, rt = SPY$log_return, orig = 1500, h = 1) results_m2 <- backtest(m2, rt = SPY$log_return, orig = 1500, h = 1) results_m3 <- backtest(m3, rt = SPY$log_return, orig = 1500, h = 1) ##Question 4 # Define the ARIMA order for each model rolling_accuracy <- function(model, returns, orig, h, xre = NULL, fixed = NULL, inc.mean = TRUE) { # model: is a time-series model object # orig: is the starting forecast origin # returns: the time series of returns # xre: the independent variables # h: forecast horizon # fixed: parameter constraints # inc.mean: flag for the constant term of the model. regor = c(model$arma[1], model$arma[6], model$arma[2]) seaor = list(order = c(model$arma[3], model$arma[7], model$arma[4]), period = model$arma[5]) T = length(returns) if (orig > T) orig = T if (h < 1) h = 1 correct_predictions = 0 jlast = T - 1 mse = rep(0, h) rmse = rep(0, h) for (n in orig:jlast) { x = returns[1:n] if (is.null(xre)) { pretor = NULL } else { pretor = xre[1:n] } mm = arima(x, order = regor, seasonal = seaor, xreg = pretor, fixed = fixed, include.mean = inc.mean) if (is.null(xre)) { nx = NULL } else { nx = xre[(n + 1):(n + h)] } fore = predict(mm, h, newxreg = nx) predictions = fore$pred actual_returns = returns[(n + 1):(n + h)] predicted_direction = ifelse(predictions > 0, 1, -1) actual_direction = ifelse(actual_returns > 0, 1, -1) correct_predictions = correct_predictions + sum(predicted_direction == actual_direction) # Calculate MSE and RMSE for each forecast horizon for (i in 1:h) { iend = n - orig + 1 tmp = actual_returns[i] mse[i] = mse[i] + (tmp - predictions[i])^2 rmse[i] = sqrt(mse[i] / iend) } } accuracy = correct_predictions / (h * jlast) return(list(accuracy = accuracy, rmse = rmse, mse = mse)) } results_model_1 = rolling_accuracy(m1, SPY$log_return, 1500, 1) results_model_2 = rolling_accuracy(m2, SPY$log_return, 1500, 1) results_model_3 = rolling_accuracy(m3, SPY$log_return, 1500, 1) cat("Percentage of Correct Predictions for Model 1: ", results_model_1$accuracy * 100, "% ") cat("RMSE for Model 1: ", results_model_1$rmse, " ") cat("MSE for Model 1: ", results_model_1$mse, " ") cat("Percentage of Correct Predictions for Model 2: ", results_model_2$accuracy * 100, "% ") cat("RMSE for Model 2: ", results_model_2$rmse, " ") cat("MSE for Model 2: ", results_model_2$mse, " ") cat("Percentage of Correct Predictions for Model 3: ", results_model_3$accuracy * 100, "% ") cat("RMSE for Model 3: ", results_model_3$rmse, " ") cat("MSE for Model 3: ", results_model_3$mse, " ") here is my assignment 7 for reference