Question

1 Approved Answer

Posted on Sep 21, 2024

I have python homework. Please help me to solve it and can you share also code text please.(only File IO,string formatting,numpy and numpy functions, engineering

I have python homework. Please help me to solve it and can you share also code text please.(only File IO,string formatting,numpy and numpy functions, engineering applications of numpy, basic matplot are allowed ).

[{"metadata":{},"cell_type":"markdown","source":"### Time Series Data The `daily_energy.csv`includes daily power consumption, wind power generation and solar power generation of Germany between January 1st 2015 to December 31st 2017. The first row has the column names (Date, Consumption, Wind, Solar) the following rows have their values. There are no missing values. Fill the functions given below. Coding requirements are written as comments!"},{"metadata":{"trusted":true},"cell_type":"code","source":"# Imports. Do not add any external libraries other than NumPy and Matplotlib import numpy as np import matplotlib.pyplot as plt","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"def read_data(fname): ''' Load the data given in the file in a desired format. Inputs: Path for the csv file (fname). Outputs: Dictionary of numpy arrays. See below for details The first row includes the column names. The rest contain the corresponding values. The first column is always the date in the format: YYYY-MM-DD (e.g. May 4, 2016 is given as 2016-05-04) You need to store the date in a numpy array as a string and put it in the dictionary with the 'Date' key. You also need to parse this into Year, Month and Day. Your return dictionary should have these as keys, with the numpy arrays that will contain the parsed numbers as values. Note that these numpy arrays must be integers. The other columns are general. You can assume they are all floats. The dictionary keys come from the first row. Their values are numpy arrays. For example: - Let the first row be: Date,Consumption,Wind,Solar - Let the i'th row be: 2016-05-04,1430.1360,84.9610,174.1850 Then: dict['Date'][i] = '2016-05-04' dict['Year'][i] = 2016 dict['Month'][i] = 5 dict['Day'][i] = 4 dict['Consumption'][i] = 1430.1360 dict['Wind'][i] = 84.9610 dict['Solar'][i] = 174.1850 You do not have to build the numpy array row by row. You can do other sutff too. This example is provided to clarify the values at the i'th row. Note that the dictionary values should be numpy arrays! WARNING: Other than the date column, other columns might be different during grading! You need to write general code ''' data = {} # Fill in the data as decribed above # Do not edit below this line return data","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Read data: energy_data = read_data('daily_energy.csv') # Simple tests: desired_keys = ['Date', 'Year', 'Month', 'Day', 'Consumption', 'Wind', 'Solar'] desired_num_cols = 1096 averages = {'Consumption':1383.146247, 'Wind':234.620880, 'Solar':96.124640} for key in desired_keys: if key not in energy_data.keys(): print(f'Missing key {key} in the dictionary') for key in energy_data.keys(): if not isinstance(energy_data[key],np.ndarray): print(f'The value corresponding to the key, {key}, in the dictionary is not a numpy array') elif energy_data[key].shape[0] != desired_num_cols: print(f'The numpy array corresponding to {key} does not have the correct number of elements {desired_num_cols} vs {averages[key].shape[0]}') for key, item in averages.items(): tmp = energy_data[key].mean() if abs(tmp-averages[key]) > 1e-6: print(f'The correct average ({averages[key]:.2f}) and the average of the dictionary ({tmp:.2f}) do not match.')","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"def plot_data(data, keys = None, date_range = None, ylabel = 'GWh', title = 'Germany Electricity'): ''' Plot the desired values given the keys. Input: - data: dictionary of data, at least containing the key 'Date' - keys: The list of keys whose values will be plotted. If None, all the keys of the dictionary should be plotted - date_range: Tuple of the first and last dates to plot. If None, use the date range given in the data. You can assume that the date is always increasing. An example: ('2015-01-01', '2017-12-31') - ylabel: The desired ylabel of the figure - title: The desired title of the figure Output: The handle of the plotted figure (done for you, do not change it!) Plotting requirements: - This should be a line plot (plt.plot(...)). The x's are the days, the y's are the values in the numpy arrays given by the keys. The style of the lines should be left as defaults - x-axis: The data within the given range. The plot will be for each day (entry) within the range. The ticks and their labels will be different (see below) You can assume the limits are already within the data (e.g. no 2014 or 2018 for the given example file) - y-axis: the numpy values corresponding to keys - The y-ticks should be left as default. - The x-ticks should be on the first day of March, June, September and December with the label \"MarYY\", \"JunYY\", \"SepYY\" and \"AugYY\" respectively where YY is the last two digits of the year. For example Jun16, Aug17 ... - The tick labelsize should be 12 for both axes. - xlabel: Should be 'Date'. Should have a fontsize of 18 - ylabel and title: Given as an input. Both should have a fontsize of 18 - Legend: The legend should have the keys corresponding to the lines. It should have a fontsize of 18 - Extra Text: Write the mean and standard deviation of the keys within the given time range somewhere on the figure. If there are multiple keys, write this for all of them. Play around with the placement to make sure the rest of the plot is visible! Hint: To set the fontsizes, look at the HistogramApplication.ipynb. For the legend, it is set the same way as the labels ''' plt.figure(figsize=(12,8)) # Your code goes below this line # Do not edit below this line return plt.gcf()","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"Example pictures are given with the homework (attaching them here makes the notebook impractically large and we do not want you to submit the notebook with images)"},{"metadata":{"trusted":true},"cell_type":"code","source":"all_data_plot = plot_data(energy_data) all_data_plot.savefig('alldata.png',dpi=300)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"subset_plot = plot_data(energy_data,['Wind','Solar']) subset_plot.savefig('subset.png',dpi=300)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"shorter_date_plot = plot_data(energy_data,date_range=('2015-07-01','2017-01-31')) shorter_date_plot.savefig('shorter.png',dpi=300)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"shorter_date_subset_plot = plot_data(energy_data,['Consumption','Wind'],date_range=('2016-01-01','2017-06-30')) shorter_date_subset_plot.savefig('shortersubset.png',dpi=300)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"def fit_line(data, key1, key2): ''' Fit a line between the values given by key1 and key2. Return the coefficients and the R2 score. Input: - data: dictionary of data - key1: The key of the first column (x) - key2: The key of the second column (y) Output: - coef: A list of values that correspond to the linear fit between x and y. i.e. y = coef[0]*x+coef[1] - r2: The R2 value. Note that you can calculate this using the residuals which can be obtained using the lstsq function of the numpy linear algebra module ''' coef = [0,0] r2 = 0 # Fill in the coef and the r2 variables with their correct values # Your code goes below this line # Do not edit below this line return coef, r2","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print(''' Expected Output (close to): [ -0.14 128.93] [0.15] [-1.41e-01 2.92e+02] [0.14] [ 0.11 88.49] [0.0097] ''') coefWS,r2WS = fit_line(energy_data, 'Wind', 'Solar') print(coefWS,r2WS) coefCS,r2CS = fit_line(energy_data, 'Consumption', 'Solar') print(coefCS,r2CS) coefCW,r2CW = fit_line(energy_data, 'Consumption', 'Wind') print(coefCW,r2CW)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"def plot_xy_line(data, key1, key2, coef): ''' Plot the key1 vs key2 and the line defined by the coef Input: - data: dictionary of data - key1: The key of the first column (x) - key2: The key of the second column (y) - coef: The coefficients of the linear function between the keys (coef[0]*x+coef[1]) Output: The handle of the plotted figure (done for you, do not change it!) Plotting requirements: - This should be a line plot (plt.plot(...)). The key1 vs key2 should only be diamond markers (hint: 'd'). The line should have the default parameters - xlabel: Should be key1. Should have a fontsize of 18 - ylabel: Should be key2. Should have a fontsize of 18 - Legend: Should be f'{key1} vs {key2}' and 'Fit' ''' plt.figure(figsize=(12,8)) # Your code goes below this line # Do not edit below this line return plt.gcf() ","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"Since these are close to trivial, we are not providing example plots. One hint, they do not look like good fits. R2 being close to 0 in each case already tells us this."},{"metadata":{"trusted":true},"cell_type":"code","source":"ws_plot = plot_xy_line(energy_data,'Wind','Solar',coefWS)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"cs_plot = plot_xy_line(energy_data,'Consumption','Solar',coefCS)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"cw_plot = plot_xy_line(energy_data,'Consumption','Wind',coefCW)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"def plot_hist(data, key, bin_width = None, date_range = None): ''' Plot the histogram of the values corresponding to the given key, in the given date_range. Input: - data: dictionary of data - key: The key of the values we want to plot - bin_width: The approximate width of each bin. When you are calculating the bin number, round the number up. If None, use 10 bins by default. - date_range: Tuple of the first and last dates to plot. If None, use the date range given in the data. You can assume that the date is always increasing. Output: - The handle of the plotted figure (done for you, do not change it!) - The number of items in each bin (set the counts variable) - The bin edges (set the edges variable) Hint: The last two are the outputs of the hist function ''' counts = 0 bins = 0 plt.figure(figsize=(12,8)) # Your code below this line # Do not edit below this line return plt.gcf(), counts, bins","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"This should also be easy to handle. Just pay attention to how to set the bins."},{"metadata":{"trusted":true},"cell_type":"code","source":"wh, whc, whe = plot_hist(energy_data,'Wind')","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"sh, shc, she = plot_hist(energy_data,'Solar')","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"ssh, sshc, sshe = plot_hist(energy_data,'Solar',25,('2017-06-01','2017-08-30'))","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"if(len(sshc)!=8): print('Check how you calculate the number of bins given bin width!')","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"def energy_generation_pie(data, date_range = None): ''' This function is specific to the given file. You should plot the pie chart of energy generation through wind, solar and other means. The other means can be calculated as Consumption - (Wind + Solar). This should be based on the averages in the given date range. Each slice/wedge should include the key name! Input: - data: dictionary of data - date_range: Tuple of the first and last dates to plot. If None, use the date range given in the data. You can assume that the date is always increasing. Output: - The handle of the plotted figure (done for you, do not change it!) - The averages of each key as a dictionary ''' # Fill this correctly avg_dict = {'Other':-1,'Wind':-1,'Solar':-1} plt.figure(figsize=(12,8)) # Your code below this line # Do not edit below this line return plt.gcf(), avg_dict ","execution_count":null,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"You are already given a way to check whether your averages are correct! (See the cell right after the part where you filled `read_data`"},{"metadata":{"trusted":true},"cell_type":"code","source":"pc, avgs = energy_generation_pie(energy_data)","execution_count":null,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"#We expect Solar contribution to increase pcs, avgss = energy_generation_pie(energy_data,('2017-06-01','2017-08-30'))","execution_count":null,"outputs":[]}]