<  return home

Tech & Retail Stock Analysis During the Coronavirus Pandemic

In this project, we'll be looking at the performance of eight technology and retail stocks during the coronavirus pandemic. We'll pull information on each stock from Yahoo Finance and visualize different aspects of the data. We'll investigate the effects of the pandemic and we'll cover a few ways of analyzing the risks of each stock based on historical performance. Finally, we will predict future stock prices through a Monte Carlo method.

We'll be answering the following questions along the way:

  1. How did these stocks' prices change over time?
  2. How did daily trading volume fluctuate throughout the year?
  3. What were the moving average of these stocks?
  4. What were the daily returns of these stock prices on average?
  5. What were the correlations between different stocks' closing prices?
  6. What were the correlations between different stocks' daily returns?
  7. Which stocks benefited the most during the coronavirus pandemic?
  8. How much value do we put at risk by investing in each stock?
  9. How might we expect each stock to behave over the next year?

Basic Analysis of Stock Information

Let's start by pulling our stock info and performing some initial exploratory analysis.

In [234]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
from datetime import datetime
from datetime import date

# To read our stock information from Yahoo:
import pandas_datareader as pdr

sns.set_style('whitegrid')
In [235]:
# Identify our stocks - four tech, four retail
ticker_list = ['AAPL','GOOG','MSFT','AMZN','COST','WBA','WMT','HD']

# Set timeframe
end = date(2020,8,19)
start = datetime(end.year-1,end.month,end.day)

# Pull stock info and assign ticker as name of dataframe
for stock in ticker_list:
    globals()[stock] = pdr.get_data_yahoo(stock,start,end)

Now, let's do a bit of exploration on one of the stocks to better understand the info in our dataframes. We'll pick AAPL.

In [236]:
AAPL.describe()
Out[236]:
High Low Open Close Volume Adj Close
count 254.000000 254.000000 254.000000 254.000000 2.540000e+02 254.000000
mean 298.190709 291.140788 294.288858 295.059881 3.609855e+07 293.608820
std 61.221947 59.736081 60.416792 60.654505 1.720931e+07 61.153582
min 205.720001 201.000000 204.100006 202.639999 1.165440e+07 200.649200
25% 256.319992 246.450001 249.112499 248.832497 2.460498e+07 246.748463
50% 288.225006 280.930008 284.690002 286.869995 3.139560e+07 285.580643
75% 324.202492 318.739998 322.144997 321.775002 4.168990e+07 320.979256
max 468.649994 462.440002 464.250000 462.829987 1.067212e+08 462.829987
In [237]:
AAPL.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 254 entries, 2019-08-19 to 2020-08-19
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype
---  ------     --------------  -----
 0   High       254 non-null    float64
 1   Low        254 non-null    float64
 2   Open       254 non-null    float64
 3   Close      254 non-null    float64
 4   Volume     254 non-null    float64
 5   Adj Close  254 non-null    float64
dtypes: float64(6)
memory usage: 13.9 KB

Awesome! Now let's get historical views of the closing prices for each stock.

In [238]:
fig, axes = plt.subplots(4,2,figsize=(16,16))
frames = [AAPL,GOOG,MSFT,AMZN,COST,WBA,WMT,HD]

for i, ax in enumerate(axes.flatten()):
    frames[i]['Adj Close'].plot(legend=True,ax=ax)
    ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
    ax.set_xlabel(None)

plt.tight_layout(h_pad=2)

Interesting, the tech stocks have seen positive growth over the past year, especially during the coronavirus pandemic time period. Retail is a bit less consistent, however, with COST, WMT and especially HD seeing growth during the pandemic, but not WBA.

Now let's get a historical view of the daily trading volume.

In [239]:
fig, axes = plt.subplots(4,2,figsize=(16,16))
frames = [AAPL,GOOG,MSFT,AMZN,COST,WBA,WMT,HD]

for i, ax in enumerate(axes.flatten()):
    frames[i]['Volume'].plot(legend=True,ax=ax)
    ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
    ax.set_xlabel(None)

plt.tight_layout(h_pad=2)

Great, we can see a clear uptick in volume traded during the first few months of the pandemic across each stock.

Now that we have a good idea of the price and volume over the past year, let's make things a bit more interesting and calculate the moving average for each stock.

In [240]:
fig, axes = plt.subplots(4,2,figsize=(20,16))
ma_size = [10,20,50]

for i, ax in enumerate(axes.flatten()):
    for ma in ma_size:
        col = "MA for %s days" %(str(ma))
        frames[i][col] = frames[i]['Adj Close'].rolling(ma).mean()
    frames[i][['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(legend=True,ax=ax)
    ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
    ax.set(xlabel=None)

plt.tight_layout(h_pad=2)

Awesome! Again, we see fairly consistent behavior between the tech stocks' moving averages (Note, however, AMZN does not show as much of an inital downward dip at the beginning of the pandemic).

Daily Returns

Now that we've done some baseline analysis, let's analyze the risk of these stocks. We'll start by calculating daily returns.

In [241]:
fig, axes = plt.subplots(4,2,figsize=(20,16))

for i, ax in enumerate(axes.flatten()):
    frames[i]['Daily Return'] = frames[i]['Adj Close'].pct_change()
    frames[i]['Daily Return'].plot(legend=True,linestyle='--',marker='o', ax=ax)
    ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
    ax.set(xlabel=None)

plt.tight_layout(h_pad=2)

Notice the substantial increase in volatility during the COVID-19 pandemic in March, quite a change!

Let's take a look at the average daily returns in a histogram, to get a better sense of their distribution.

In [242]:
fig = plt.figure(figsize=(16,8))
g = fig.add_gridspec(2,4)
ax1 = fig.add_subplot(g[0, 0:1])
ax2 = fig.add_subplot(g[0, 1:2])
ax3 = fig.add_subplot(g[0, 2:3])
ax4 = fig.add_subplot(g[0, 3:4])
ax5 = fig.add_subplot(g[1, 0:1])
ax6 = fig.add_subplot(g[1, 1:2])
ax7 = fig.add_subplot(g[1, 2:3])
ax8 = fig.add_subplot(g[1, 3:4])

axes = [ax1,ax2,ax3,ax4,ax5,ax6,ax7,ax8]
colors = ['Red','Orange','y','Green','Blue','Indigo','Violet','Brown']

for i, ax in enumerate(axes):
    sns.distplot(frames[i]['Daily Return'].dropna(),bins=100, ax=ax,
             kde_kws={'color':'darkolivegreen','label':'Kde','gridsize':1000,'linewidth':2},
             hist_kws={'color':'goldenrod','label':"Histogram",'edgecolor':'darkslategray'})
    ax.set_title(label=ticker_list[i], fontsize=16, fontweight='bold', pad=10)
    ax.set(xlabel=None)

plt.tight_layout(h_pad=2)

Awesome! Now, let's put our stocks' return info into one table so we can more easily calculate correlations between them.

In [243]:
Adj_Close_df = pdr.get_data_yahoo(ticker_list,start,end)['Adj Close']
returns_df = Adj_Close_df.pct_change()
returns_df.head()
Out[243]:
Symbols AAPL GOOG MSFT AMZN COST WBA WMT HD
Date
2019-08-19 NaN NaN NaN NaN NaN NaN NaN NaN
2019-08-20 0.000048 -0.013150 -0.008309 -0.008116 -0.009579 -0.009578 -0.015464 0.043953
2019-08-21 0.010839 0.007238 0.011147 0.012302 0.009928 0.004539 -0.000268 0.015247
2019-08-22 -0.000846 -0.001444 -0.007277 -0.010353 0.015380 0.000982 -0.000982 0.002813
2019-08-23 -0.046220 -0.032147 -0.031862 -0.030499 -0.021113 -0.031992 -0.009651 -0.016062

Let's take a look at each pair of stock returns to see their correlations.

In [244]:
# Define corrfunc to plot pearson coefficient in top left corner of each plot
from scipy.stats import pearsonr
def corrfunc(x,y,ax=None, **kws):
    r, _ = pearsonr(x, y)
    ax = plt.gca()
    # Unicode for lowercase rho (ρ)
    rho = '\u03C1'
    ax.annotate(f'{rho} = {r:.2f}', xy=(.1, .9), xycoords=ax.transAxes)

# Plot each pair's returns to see their correlation
g = sns.PairGrid(returns_df.dropna())
g.map_lower(corrfunc)
g.map_lower(plt.scatter)
g.map_upper(sns.kdeplot,cmap='Blues')
g.map_diag(plt.hist)
#plt.show()
Out[244]:
<seaborn.axisgrid.PairGrid at 0x7fb7ca39eb10>

Let's also look at the heatmap of correlations to get a better sense of what's going on.

In [245]:
plt.figure(figsize=(10,8))
sns.heatmap(returns_df.corr(),annot=True,square=True,cmap='Blues')
Out[245]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fb7e5423210>

Interesting... The most highly-correlated pairs (p-val >= 0.8) all come from tech (specifically MSFT, GOOG & AAPL). This makes sense, given the common trend we saw in their Adjusted Closing prices above.

Costco and Walmart are the most correlated of the retail stocks, which seems reasonable, given how similar their businesses are.

Perhaps most interesting are HD's correlations. HD is much more correlated with MSFT, GOOG & AAPL than any of the retail stocks, and even sees higher correlation with these three stocks than does AMZN. One hypothesis for this is that during the coronavirus pandemic, many people were spending more time at home, and as they invested more in WFH technology (AAPL, MSFT, GOOG), they simultaneously invested more in home projects (HD).

It's somewhat surprising that AMZN isn't more correlated with the other tech stocks... let's take a look at the growth of AMZN's stock price throughout the year, relative to the other stocks.

In [246]:
fig, (ax1, ax2) = plt.subplots(2,1,figsize=(16,8),sharex=True)

def to_percent(y,position):
    return str(str(int(round(y*100,0)))+"%")

# Plot Annual Growth (%)
normalized_df = Adj_Close_df.copy()
for ticker in ticker_list:
    baseline = normalized_df[ticker].iloc[0]
    normalized_df[ticker] = normalized_df[ticker] / baseline - 1
normalized_df.plot(legend=True, ax=ax1)
ax1.set_title(label='Annual Growth (%)', fontsize=16, fontweight='bold', pad=10)
ax1.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(to_percent))

# Plot Growth During Corona Pandemic (%)
normalized_corona_df = Adj_Close_df.copy().loc['2020-02-20':]
for ticker in ticker_list:
    baseline = normalized_corona_df[ticker].iloc[0]
    normalized_corona_df[ticker] = normalized_corona_df[ticker] / baseline - 1
normalized_corona_df.plot(legend=True, ax=ax2)
ax2.set_title(label='Growth During Corona Pandemic (%)', fontsize=16, fontweight='bold', pad=10)
ax2.set(xlabel=None)
ax2.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(to_percent))

Interesting... Notice how AAPL stock grew the most over the entire year, yet AMZN stock grew the most since the beginning of the pandemic. Also notice how up until the beginning of the pandemic, AMZN's growth lagged behind each of the other tech stocks. Clearly the pandemic was somewhat of a turning point for AMZN, which may help explain why AMZN is a bit less correlated with the other tech stocks.

Awesome! Now that we've investigated returns, let's do some actual risk analysis.

Risk Analysis

The risk of a stock can be evaluated in a variety of ways. First, let's look at each stock's daily returns, and simply compare the mean to the standard deviation for each.

In [247]:
returns = returns_df.dropna()
area = np.pi*20

plt.figure(figsize=(10,8))
plt.scatter(returns.mean(), returns.std(), s=50)

plt.xlim(-.002,.005)
plt.ylim(.015,.03)
plt.xlabel('Expected Daily Return')
plt.ylabel('Standard Deviation')

for label, x, y in zip(returns.columns, returns.mean(), returns.std()):
    plt.annotate(
        label,
        xy = (x,y), xytext = (25,25),
        textcoords = 'offset points', ha = 'right', va = 'center',
        arrowprops = dict(arrowstyle = '-', connectionstyle = 'arc3,rad=-.5', ec='purple'))
plt.title(label='Daily Returns: Mean vs Standard Deviation', fontsize=16, fontweight='bold', pad=10)
Out[247]:
Text(0.5, 1.0, 'Daily Returns: Mean vs Standard Deviation')

Great! We can see that WBA would be a poor investment, not only because the expected return is slightly negative, but because their is a great deal of volatility (risk) as well. Stocks like AMZN and AAPL, however, have strong positive returns with relatively less risk than WBA.

Value At Risk

We define Value At Risk (VaR) as the maximum amount we would expect to lose on an investment with a certain amount of confidence. We'll investigate VaR for our stocks via two methods: Bootstrap and Monte Carlo.

Bootstrap

This method involves looking at the distribution of each stock's returns, and calculating VaR based on a certain quantile. In particular, to be 95% and 99% confident in our VaR, we'll look at the quantiles .05 and .01, respectively.

In [248]:
fig = plt.figure(figsize=(16,8))
g = fig.add_gridspec(2,4)
ax1 = fig.add_subplot(g[0, 0:1])
ax2 = fig.add_subplot(g[0, 1:2])
ax3 = fig.add_subplot(g[0, 2:3])
ax4 = fig.add_subplot(g[0, 3:4])
ax5 = fig.add_subplot(g[1, 0:1])
ax6 = fig.add_subplot(g[1, 1:2])
ax7 = fig.add_subplot(g[1, 2:3])
ax8 = fig.add_subplot(g[1, 3:4])

axes = [ax1,ax2,ax3,ax4,ax5,ax6,ax7,ax8]

for i, ax in enumerate(axes):
    data = returns[ticker_list[i]]
    std = data.std()
    ax.text(min(data)*1.1, 45, s=f'Avg Daily Return = {(data.mean()):.2%}')
    ax.text(min(data)*1.1, 55, s=f'95%_ci = {(data.quantile(.05)):.2%}')
    ax.text(min(data)*1.1, 50, s=f'99%_ci = {(data.quantile(.01)):.2%}')
    sns.distplot(data, bins=100, ax=ax,
             kde_kws={'color':'darkolivegreen','gridsize':1000,'linewidth':2},
             hist_kws={'color':'goldenrod','edgecolor':'darkslategray'})
    ax.set(ylim=[0,60],title=ticker_list[i],xlabel=None)

Great! After a day of investing in AAPL for example, we can be 95% confident we won't lose more than 3.42% and 99% confident we won't lose more than 7.20%.

Monte Carlo Simulation

Estimating VaR using a Monte Carlo simulation involves performing many "random walks" for the stock price. These random walks will be determined by the mean and standard deviation of the stock's daily returns. We'll estimate VaR based on the distribution of final prices for these random walks.

Let's first define the function "monte_carlo" to perform our random walks:

In [249]:
def monte_carlo(start_price, days, mu, sigma):
    price = np.zeros(days)
    price[0] = start_price
    change = np.zeros(days)

    for i in range(1,days):
        change[i] = np.random.normal(loc=mu, scale=sigma)
        price[i] = price[i-1] * (1 + (change[i]))

    return price

Great! Let's visualize the price array to get a better sense of the information monte_carlo is returning:

In [250]:
fig, axes = plt.subplots(4,2,figsize=(16,24))

for i, ax in enumerate(axes.flatten()):
    days = 365
    mu = returns.mean()[ticker_list[i]]
    sigma = returns.std()[ticker_list[i]]
    start_price = Adj_Close_df[ticker_list[i]][0]

    for runs in range(100):
        ax.plot(monte_carlo(start_price,days,mu,sigma))

    ax.set(title=ticker_list[i])
    ax.set(xlabel='Days',ylabel='Price')

Awesome! As we can see, stocks with higher mean daily returns (AAPL, AMZN) see a higher increase in their final prices, whereas stocks with zero-to-negative mean returns (WBA) do not.

Leveraging the computing power of Python, let's get a histogram of the final prices for a much larger run of simulations.

In [251]:
runs = 10000
simulations = np.zeros(runs)
for run in range(runs):
    simulations[run] = monte_carlo(start_price,days,mu,sigma)[days-1];

fig, axes = plt.subplots(4,2,figsize=(16,24))
colors = ['Red','Orange','y','Green','Blue','Indigo','Violet','Brown']

for i, ax in enumerate(axes.flatten()):
    runs = 10000
    simulations = np.zeros(runs)
    ticker = ticker_list[i]
    days = 365
    mu = returns.mean()[ticker]
    sigma = returns.std()[ticker]
    start_price = Adj_Close_df[ticker][0]
    for run in range(runs):
        simulations[run] = monte_carlo(start_price,days,mu,sigma)[days-1];

    q = np.percentile(simulations, 1)
    ax.hist(simulations, bins=200, color = colors[i])

    ax.text(0.7, 0.9, s="Start price: $%.2f" %start_price, transform=ax.transAxes)
    ax.text(0.7, 0.85, "Mean final price: $%.2f" % simulations.mean(), transform=ax.transAxes)
    ax.text(0.7, 0.8, "VaR(0.99): $%.2f" % (start_price - q), transform=ax.transAxes)
    ax.text(0.1, 0.9, "q(0.99): $%.2f" % q, transform=ax.transAxes)

    ax.axvline(x=q, linewidth=4, color='r')
    ax.set_title("Final price distribution for %s Stock after %s days" % (ticker,days), weight='bold')

Awesome! With 99% confidence (.01 quantile), we have an estimate of the one year VaR for each of our stocks. Using AAPL as an example, we are 99% confident that buying one share will not result in more than a $0.91 loss in one year.

One final note. These estimated returns are all based on the historical daily returns we saw over the past year. Given this year was so unique with the coronavirus pandemic, these results may not be realistic in the future (unless we go through a similar situation again, god forbid). For example, it's perhaps a bit hard to imagine that a company like HD will grow from 202.80 to 372.23 in a year's time, given people will probably not continue working on home renovation projects at the same rate they have during the pandemic so far... but if they do, then perhaps this projection isn't so unrealistic...

Regardless, the point is this was a unique past year, and these projections are based on the assumption that the future will be representative of the past. Please take these projections with a grain of salt. 😉

In [1]:
In [ ]: