In this project, we'll be looking at the performance of eight technology and retail stocks during the coronavirus pandemic. We'll pull information on each stock from Yahoo Finance and visualize different aspects of the data. We'll investigate the effects of the pandemic and we'll cover a few ways of analyzing the risks of each stock based on historical performance. Finally, we will predict future stock prices through a Monte Carlo method.
We'll be answering the following questions along the way:
Let's start by pulling our stock info and performing some initial exploratory analysis.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from datetime import datetime
from datetime import date
# To read our stock information from Yahoo:
import pandas_datareader as pdr
sns.set_style('whitegrid')
# Identify our stocks - four tech, four retail
ticker_list = ['AAPL','GOOG','MSFT','AMZN','COST','WBA','WMT','HD']
# Set timeframe
end = date(2020,8,19)
start = datetime(end.year-1,end.month,end.day)
# Pull stock info and assign ticker as name of dataframe
for stock in ticker_list:
globals()[stock] = pdr.get_data_yahoo(stock,start,end)
Now, let's do a bit of exploration on one of the stocks to better understand the info in our dataframes. We'll pick AAPL.
AAPL.describe()
AAPL.info()
Awesome! Now let's get historical views of the closing prices for each stock.
fig, axes = plt.subplots(4,2,figsize=(16,16))
frames = [AAPL,GOOG,MSFT,AMZN,COST,WBA,WMT,HD]
for i, ax in enumerate(axes.flatten()):
frames[i]['Adj Close'].plot(legend=True,ax=ax)
ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
ax.set_xlabel(None)
plt.tight_layout(h_pad=2)
Interesting, the tech stocks have seen positive growth over the past year, especially during the coronavirus pandemic time period. Retail is a bit less consistent, however, with COST, WMT and especially HD seeing growth during the pandemic, but not WBA.
Now let's get a historical view of the daily trading volume.
fig, axes = plt.subplots(4,2,figsize=(16,16))
frames = [AAPL,GOOG,MSFT,AMZN,COST,WBA,WMT,HD]
for i, ax in enumerate(axes.flatten()):
frames[i]['Volume'].plot(legend=True,ax=ax)
ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
ax.set_xlabel(None)
plt.tight_layout(h_pad=2)
Great, we can see a clear uptick in volume traded during the first few months of the pandemic across each stock.
Now that we have a good idea of the price and volume over the past year, let's make things a bit more interesting and calculate the moving average for each stock.
fig, axes = plt.subplots(4,2,figsize=(20,16))
ma_size = [10,20,50]
for i, ax in enumerate(axes.flatten()):
for ma in ma_size:
col = "MA for %s days" %(str(ma))
frames[i][col] = frames[i]['Adj Close'].rolling(ma).mean()
frames[i][['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(legend=True,ax=ax)
ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
ax.set(xlabel=None)
plt.tight_layout(h_pad=2)
Awesome! Again, we see fairly consistent behavior between the tech stocks' moving averages (Note, however, AMZN does not show as much of an inital downward dip at the beginning of the pandemic).
Now that we've done some baseline analysis, let's analyze the risk of these stocks. We'll start by calculating daily returns.
fig, axes = plt.subplots(4,2,figsize=(20,16))
for i, ax in enumerate(axes.flatten()):
frames[i]['Daily Return'] = frames[i]['Adj Close'].pct_change()
frames[i]['Daily Return'].plot(legend=True,linestyle='--',marker='o', ax=ax)
ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
ax.set(xlabel=None)
plt.tight_layout(h_pad=2)
Notice the substantial increase in volatility during the COVID-19 pandemic in March, quite a change!
Let's take a look at the average daily returns in a histogram, to get a better sense of their distribution.
fig = plt.figure(figsize=(16,8))
g = fig.add_gridspec(2,4)
ax1 = fig.add_subplot(g[0, 0:1])
ax2 = fig.add_subplot(g[0, 1:2])
ax3 = fig.add_subplot(g[0, 2:3])
ax4 = fig.add_subplot(g[0, 3:4])
ax5 = fig.add_subplot(g[1, 0:1])
ax6 = fig.add_subplot(g[1, 1:2])
ax7 = fig.add_subplot(g[1, 2:3])
ax8 = fig.add_subplot(g[1, 3:4])
axes = [ax1,ax2,ax3,ax4,ax5,ax6,ax7,ax8]
colors = ['Red','Orange','y','Green','Blue','Indigo','Violet','Brown']
for i, ax in enumerate(axes):
sns.distplot(frames[i]['Daily Return'].dropna(),bins=100, ax=ax,
kde_kws={'color':'darkolivegreen','label':'Kde','gridsize':1000,'linewidth':2},
hist_kws={'color':'goldenrod','label':"Histogram",'edgecolor':'darkslategray'})
ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
ax.set(xlabel=None)
plt.tight_layout(h_pad=2)
Awesome! Now, let's put our stocks' return info into one table so we can more easily calculate correlations between them.
Adj_Close_df = pdr.get_data_yahoo(ticker_list,start,end)['Adj Close']
returns_df = Adj_Close_df.pct_change()
returns_df.head()
Let's take a look at each pair of stock returns to see their correlations.
# Define corrfunc to plot pearson coefficient in top left corner of each plot
from scipy.stats import pearsonr
def corrfunc(x,y,ax=None, **kws):
r, _ = pearsonr(x, y)
ax = plt.gca()
# Unicode for lowercase rho (ρ)
rho = '\u03C1'
ax.annotate(f'{rho} = {r:.2f}', xy=(.1, .9), xycoords=ax.transAxes)
# Plot each pair's returns to see their correlation
g = sns.PairGrid(returns_df.dropna())
g.map_lower(corrfunc)
g.map_lower(plt.scatter)
g.map_upper(sns.kdeplot,cmap='Blues')
g.map_diag(plt.hist)
#plt.show()
Let's also look at the heatmap of correlations to get a better sense of what's going on.
plt.figure(figsize=(10,8))
sns.heatmap(returns_df.corr(),annot=True,square=True,cmap='Blues')
Interesting... The most highly-correlated pairs (p-val >= 0.8) all come from tech (specifically MSFT, GOOG & AAPL). This makes sense, given the common trend we saw in their Adjusted Closing prices above.
Costco and Walmart are the most correlated of the retail stocks, which seems reasonable, given how similar their businesses are.
Perhaps most interesting are HD's correlations. HD is much more correlated with MSFT, GOOG & AAPL than any of the retail stocks, and even sees higher correlation with these three stocks than does AMZN. One hypothesis for this is that during the coronavirus pandemic, many people were spending more time at home, and as they invested more in WFH technology (AAPL, MSFT, GOOG), they simultaneously invested more in home projects (HD).
It's somewhat surprising that AMZN isn't more correlated with the other tech stocks... let's take a look at the growth of AMZN's stock price throughout the year, relative to the other stocks.
fig, (ax1, ax2) = plt.subplots(2,1,figsize=(16,8),sharex=True)
def to_percent(y,position):
return str(str(int(round(y*100,0)))+"%")
# Plot Annual Growth (%)
normalized_df = Adj_Close_df.copy()
for ticker in ticker_list:
baseline = normalized_df[ticker].iloc[0]
normalized_df[ticker] = normalized_df[ticker] / baseline - 1
normalized_df.plot(legend=True, ax=ax1)
ax1.set_title(label='Annual Growth (%)', fontsize=16, fontweight='bold', pad=10)
ax1.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(to_percent))
# Plot Growth During Corona Pandemic (%)
normalized_corona_df = Adj_Close_df.copy().loc['2020-02-20':]
for ticker in ticker_list:
baseline = normalized_corona_df[ticker].iloc[0]
normalized_corona_df[ticker] = normalized_corona_df[ticker] / baseline - 1
normalized_corona_df.plot(legend=True, ax=ax2)
ax2.set_title(label='Growth During Corona Pandemic (%)', fontsize=16, fontweight='bold', pad=10)
ax2.set(xlabel=None)
ax2.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(to_percent))
Interesting... Notice how AAPL stock grew the most over the entire year, yet AMZN stock grew the most since the beginning of the pandemic. Also notice how up until the beginning of the pandemic, AMZN's growth lagged behind each of the other tech stocks. Clearly the pandemic was somewhat of a turning point for AMZN, which may help explain why AMZN is a bit less correlated with the other tech stocks.
Awesome! Now that we've investigated returns, let's do some actual risk analysis.
The risk of a stock can be evaluated in a variety of ways. First, let's look at each stock's daily returns, and simply compare the mean to the standard deviation for each.
returns = returns_df.dropna()
area = np.pi*20
plt.figure(figsize=(10,8))
plt.scatter(returns.mean(), returns.std(), s=50)
plt.xlim(-.002,.005)
plt.ylim(.015,.03)
plt.xlabel('Expected Daily Return')
plt.ylabel('Standard Deviation')
for label, x, y in zip(returns.columns, returns.mean(), returns.std()):
plt.annotate(
label,
xy = (x,y), xytext = (25,25),
textcoords = 'offset points', ha = 'right', va = 'center',
arrowprops = dict(arrowstyle = '-', connectionstyle = 'arc3,rad=-.5', ec='purple'))
plt.title(label='Daily Returns: Mean vs Standard Deviation', fontsize=16, fontweight='bold', pad=10)
Great! We can see that WBA would be a poor investment, not only because the expected return is slightly negative, but because their is a great deal of volatility (risk) as well. Stocks like AMZN and AAPL, however, have strong positive returns with relatively less risk than WBA.
We define Value At Risk (VaR) as the maximum amount we would expect to lose on an investment with a certain amount of confidence. We'll investigate VaR for our stocks via two methods: Bootstrap and Monte Carlo.
This method involves looking at the distribution of each stock's returns, and calculating VaR based on a certain quantile. In particular, to be 95% and 99% confident in our VaR, we'll look at the quantiles .05 and .01, respectively.
fig = plt.figure(figsize=(16,8))
g = fig.add_gridspec(2,4)
ax1 = fig.add_subplot(g[0, 0:1])
ax2 = fig.add_subplot(g[0, 1:2])
ax3 = fig.add_subplot(g[0, 2:3])
ax4 = fig.add_subplot(g[0, 3:4])
ax5 = fig.add_subplot(g[1, 0:1])
ax6 = fig.add_subplot(g[1, 1:2])
ax7 = fig.add_subplot(g[1, 2:3])
ax8 = fig.add_subplot(g[1, 3:4])
axes = [ax1,ax2,ax3,ax4,ax5,ax6,ax7,ax8]
for i, ax in enumerate(axes):
data = returns[ticker_list[i]]
std = data.std()
ax.text(min(data)*1.1, 45, s=f'Avg Daily Return = {(data.mean()):.2%}')
ax.text(min(data)*1.1, 55, s=f'95%_ci = {(data.quantile(.05)):.2%}')
ax.text(min(data)*1.1, 50, s=f'99%_ci = {(data.quantile(.01)):.2%}')
sns.distplot(data, bins=100, ax=ax,
kde_kws={'color':'darkolivegreen','gridsize':1000,'linewidth':2},
hist_kws={'color':'goldenrod','edgecolor':'darkslategray'})
ax.set(ylim=[0,60],title=ticker_list[i],xlabel=None)
Great! After a day of investing in AAPL for example, we can be 95% confident we won't lose more than 3.42% and 99% confident we won't lose more than 7.20%.
Estimating VaR using a Monte Carlo simulation involves performing many "random walks" for the stock price. These random walks will be determined by the mean and standard deviation of the stock's daily returns. We'll estimate VaR based on the distribution of final prices for these random walks.
Let's first define the function "monte_carlo" to perform our random walks:
def monte_carlo(start_price, days, mu, sigma):
price = np.zeros(days)
price[0] = start_price
change = np.zeros(days)
for i in range(1,days):
change[i] = np.random.normal(loc=mu, scale=sigma)
price[i] = price[i-1] * (1 + (change[i]))
return price
Great! Let's visualize the price array to get a better sense of the information monte_carlo is returning:
fig, axes = plt.subplots(4,2,figsize=(16,24))
for i, ax in enumerate(axes.flatten()):
days = 365
mu = returns.mean()[ticker_list[i]]
sigma = returns.std()[ticker_list[i]]
start_price = Adj_Close_df[ticker_list[i]][0]
for runs in range(100):
ax.plot(monte_carlo(start_price,days,mu,sigma))
ax.set(title=ticker_list[i])
ax.set(xlabel='Days',ylabel='Price')
Awesome! As we can see, stocks with higher mean daily returns (AAPL, AMZN) see a higher increase in their final prices, whereas stocks with zero-to-negative mean returns (WBA) do not.
Leveraging the computing power of Python, let's get a histogram of the final prices for a much larger run of simulations.
runs = 10000
simulations = np.zeros(runs)
for run in range(runs):
simulations[run] = monte_carlo(start_price,days,mu,sigma)[days-1];
fig, axes = plt.subplots(4,2,figsize=(16,24))
colors = ['Red','Orange','y','Green','Blue','Indigo','Violet','Brown']
for i, ax in enumerate(axes.flatten()):
runs = 10000
simulations = np.zeros(runs)
ticker = ticker_list[i]
days = 365
mu = returns.mean()[ticker]
sigma = returns.std()[ticker]
start_price = Adj_Close_df[ticker][0]
for run in range(runs):
simulations[run] = monte_carlo(start_price,days,mu,sigma)[days-1];
q = np.percentile(simulations, 1)
ax.hist(simulations, bins=200, color = colors[i])
ax.text(0.7, 0.9, s="Start price: $%.2f" %start_price, transform=ax.transAxes)
ax.text(0.7, 0.85, "Mean final price: $%.2f" % simulations.mean(), transform=ax.transAxes)
ax.text(0.7, 0.8, "VaR(0.99): $%.2f" % (start_price - q), transform=ax.transAxes)
ax.text(0.1, 0.9, "q(0.99): $%.2f" % q, transform=ax.transAxes)
ax.axvline(x=q, linewidth=4, color='r')
ax.set_title("Final price distribution for %s Stock after %s days" % (ticker,days), weight='bold')
Awesome! With 99% confidence (.01 quantile), we have an estimate of the one year VaR for each of our stocks. Using AAPL as an example, we are 99% confident that buying one share will not result in more than a $0.91 loss in one year.
One final note. These estimated returns are all based on the historical daily returns we saw over the past year. Given this year was so unique with the coronavirus pandemic, these results may not be realistic in the future (unless we go through a similar situation again, god forbid). For example, it's perhaps a bit hard to imagine that a company like HD will grow from 202.80 to 372.23 in a year's time, given people will probably not continue working on home renovation projects at the same rate they have during the pandemic so far... but if they do, then perhaps this projection isn't so unrealistic...
Regardless, the point is this was a unique past year, and these projections are based on the assumption that the future will be representative of the past. Please take these projections with a grain of salt. 😉