[Solved] import pandas as pd import os DATA_DIR =

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

import pandas as pd import os DATA_DIR = 'data' # -- covered this - this week - symbol_to_path() # -- provided - symbol to path

image text in transcribed

import pandas as pd import os DATA_DIR = 'data' # -- covered this - this week - symbol_to_path() # -- provided - symbol to path --- # -- sample function ------------- # keep directory paths portable across platforms def symbol_to_path( symbol, base_dir = os.path.join( ".", DATA_DIR ) ): """ Returns a string - which is the CSV file relative path given a ticker symbol. Example: symbol_to_path( GOOG, data_yyy ) returns: data_yyy/GOOG.csv (UNIX) Default: subdirectory name is DATA_DIR - hard-coded for now. """ return os.path.join(base_dir, "{}.csv".format( str(symbol) )) # Single symbol change def get_StockDataHD( symbol, dates ): """Read stock data (adjusted close) for given symbols from CSV files. format of CSV file is consistent to a yahoo finance downlaod """ # 0a. Create an empty frame indexed by a date range df = pd.DataFrame( index = dates ) print( f"before strip -- head(2) of df( {symbol} ): " ) print( df.head(2) ) # 0b. strip of the time stamp, just keep the dates df.index = pd.to_datetime(df.index, utc=True).date # just keep the date strip time print( f"after strip head(2) of df( {symbol} ): " ) print( df.head(2) ) # exit(0) symbols = ['GLD', 'GOOG', 'SPY'] # STEP XX --------------- add a for loop here --------------------------------- for symbol in symbols: # 1. read data from symbol # --> we need to use 'symbol_to_path' to be portable across platform datafile = symbol_to_path( symbol ) # portability df_temp = pd.read_csv( datafile, index_col='Date', parse_dates=True, usecols=['Date', 'Adj Close'], na_values=['nan'] ) df_temp = df_temp.rename( columns={'Adj Close': symbol} ) # --> Challenge part of LAB # left JOIN # df - restricted range of dates restricted by the parameter # df - all dates in file - has some holes on dates it didn't trace (e.g., weekends) # Keep continues dates and FILL with NaNs for data that is lacking. # left join - use callees index ("df" below), which is the given date range, fill with NaN # right join - use dates in downloaded file (only) df = df.join( df_temp, how='left') return df if __name__ == '__main__': date_start = "2020-01-01" date_end = "2020-02-28" symbol = 'GOOG' dates = pd.date_range( date_start, date_end ) print(f"first date {dates[0]} last date {dates[-1]}") df = get_StockDataHD( symbol, dates ) print( f" data frame = {df.head(10)}")

Task 1: modify as follows: def get_StockDataHD( symbol, dates ) 1) input parameter takes a list of symbols instead of a single symbol: test: ['SPY', 'CVNA','XOM', 'AMZN', 'GLD'] use date range: 2020-03-31 2) add a loop that adds symbols iteratively to the frame using join. Example Output (you should reproduce this output): Task 1: Suggested Steps 1. for symbol in symbols: to read in each symbol file 2. use rename() to rename 'Adj Close' column (already done) by symbol string. 3. next you use a left join() that's the default it simply retains the keys (index) from the left frame only, so you get the full range of the left frame. (left frame is the frame that's on the left of "." which is df in the below join: df = df.join( df_temp, how = left' ) df_temp is on the right. 4. make sure the return of the df is not inside the for loop. save frame as csv file name it task1.csv