Time Series Forecasting – ARIMA vs LSTM

All observations in Time Series data have a time stamp associated with them. These observations could be taken at equally spaced points in time (e.g. monthly revenue, weekly sales, etc) or they could be spread out unevenly (e.g. clinical trials to keep track of patients health, high-frequency trading in finance, etc). Any time series data has two components – trend (how data is increasing or decreasing over time) and seasonality (variations specific to a particular time frame).

Two of the most common types of analysis done on Time Series data include:
1. Pattern and outlier detection
2. Forecasting

Forecasting time series data has been around for several decades with techniques like ARIMA. Recently Recurrent neural networks (LSTM) have been used with much success. Here are a few pros and cons.

Advantages of ARIMA
1. Simple to implement, no parameter tuning
2. Easier to handle multivariate data
3. Quick to run

Advantages of LSTM
1. No pre-requisites (stationarity, no level shifts)
2. Can model non-linear function with neural networks
3. Needs a lot of data

Here’s how simple it can be to implement an ARIMA model in R.

alldata <- AirPassengers
train <- window(AirPassengers,end=1958.99)
test <- window(AirPassengers, start = 1959)

fit <- auto.arima(train)

Here’s how an LSTM can be implemented.

from keras.models import Sequential
from keras.layers.core import Dense, Activation, Droupout
from keras.layers.recurrent import LSTM
model = Sequential()
model.add(LSTM(2, 300, return_sequences=False))
model.add(Dense(2, 300))
model.compile(loss=“mae”, optimizer=“adam”) 

A more complex network can be created by adding more layers. Note the use of return_sequences flag that’s set to True. Since we set return_sequences=True in the LSTM layers, the output is a three-dimension vector. If we input that into the Dense layer, it will raise an error because the Dense layer only accepts two-dimension input, hence the return_sequence is set to False before passing to a dense layer.

model = Sequential()
model.add(LSTM(2, 300, return_sequences=True))
model.add(LSTM(300, 500, return_sequences=True))
model.add(LSTM(500, 200, return_sequences=False))
model.add(Dense(200, 3))
model.compile(loss=“mae”, optimizer=“adam”) 

One of the areas of confusion when building any NN is shaping the input data. It is important to decide how many observations the network will learn from before predicting the next value. To get this kind of structure we will add a new column by shifting values. For instance, if we decide that we need 50 observations to learn from then we create a new column that’s time-shifted down. This same approach can be extended for multivariate time series data – although it does require some additional data engineering. More on this in a future blog.