Stock Price Prediction using ARIMA Model

Gohil Rushabh Navinchandra
8 min readOct 28, 2021

Any kind of prediction is a difficult task in the real world, especially where the future is very dynamic. The stock market is highly volatile and unpredictable by nature. Therefore, investors are always taking risks in hopes of making a profit. People want to invest in the stock market and expect profit from their investments. There are many factors that influence stock prices, such as supply and demand, market trends, the global economy, corporate results, historical price, public sentiments, sensitive financial information, popularity (such as good or bad news related to a company name and product), all of which may result in an increase or decrease in the number of buyers etc. Even though one may analyze a lot of factors, it is still difficult to achieve a better performance in the stock market and to predict the future price in general. Predicting the price of a specific stock one day ahead is, by itself, a very complicated task. In this blogpost, next day stock prices are predicted for each of the individual days of a certain year. For each day, comparisons are made with the actual prices to validate the model. In this blogpost, I have been tasked with predicting the price of the TATAGLOBAL stock price and have been provided with historical data (time series data). This includes features like opening and closing stock prices, volume, date, and so on. A time series is a series of data that is collected over a period of time. Time series data are sequential data which follow some patterns. In order of time, data are points in an index or listed or graphed. Time series data are also called historical data or past data. Time series data are used for predicting a future value based on an historical value. This is called time series analysis. The daily closing price of stocks, heights of ocean tides, and counts of sunspots are some examples of time series data. Time series data are studied for several purposes, such as forecasting the future based on knowledge of the past, understanding of the phenomenon. Underlying measures, or simply succinctly describing the salient features of the series. Forecasting or predicting future prices of an observed time series plays an important role in nearly all fields of science, engineering, finance, business intelligence, economics, meteorology, telecommunications etc. To predict an outcome based on time series data, we can use a time series model which is called Auto Regressive Integrated Moving Average (ARIMA) is used as the machine learning technique to analyze and predict future stock prices based on historical prices.Any kind of prediction is a difficult task in the real world, especially where the future is very dynamic. The stock market is highly volatile and unpredictable by nature. Therefore, investors are always taking risks in hopes of making a profit. People want to invest in the stock market and expect profit from their investments. There are many factors that influence stock prices, such as supply and demand, market trends, the global economy, corporate results, historical price, public sentiments, sensitive financial information, popularity (such as good or bad news related to a company name and product), all of which may result in an increase or decrease in the number of buyers etc. Even though one may analyze a lot of factors, it is still difficult to achieve a better performance in the stock market and to predict the future price in general. Predicting the price of a specific stock one day ahead is, by itself, a very complicated task. In this blogpost, next day stock prices are predicted for each of the individual days of a certain year. For each day, comparisons are made with the actual prices to validate the model. In this blogpost, I have been tasked with predicting the price of the TATAGLOBAL stock price and have been provided with historical data (time series data). This includes features like opening and closing stock prices, volume, date, and so on. A time series is a series of data that is collected over a period of time. Time series data are sequential data which follow some patterns. In order of time, data are points in an index or listed or graphed. Time series data are also called historical data or past data. Time series data are used for predicting a future value based on an historical value. This is called time series analysis. The daily closing price of stocks, heights of ocean tides, and counts of sunspots are some examples of time series data. Time series data are studied for several purposes, such as forecasting the future based on knowledge of the past, understanding of the phenomenon. Underlying measures, or simply succinctly describing the salient features of the series. Forecasting or predicting future prices of an observed time series plays an important role in nearly all fields of science, engineering, finance, business intelligence, economics, meteorology, telecommunications etc. To predict an outcome based on time series data, we can use a time series model which is called Auto Regressive Integrated Moving Average (ARIMA) is used as the machine learning technique to analyze and predict future stock prices based on historical prices.Any kind of prediction is a difficult task in the real world, especially where the future is very dynamic. The stock market is highly volatile and unpredictable by nature. Therefore, investors are always taking risks in hopes of making a profit. People want to invest in the stock market and expect profit from their investments. There are many factors that influence stock prices, such as supply and demand, market trends, the global economy, corporate results, historical price, public sentiments, sensitive financial information, popularity (such as good or bad news related to a company name and product), all of which may result in an increase or decrease in the number of buyers etc. Even though one may analyze a lot of factors, it is still difficult to achieve a better performance in the stock market and to predict the future price in general. Predicting the price of a specific stock one day ahead is, by itself, a very complicated task. In this blogpost, next day stock prices are predicted for each of the individual days of a certain year. For each day, comparisons are made with the actual prices to validate the model. In this blogpost, I have been tasked with predicting the price of the TATAGLOBAL stock price and have been provided with historical data (time series data). This includes features like opening and closing stock prices, volume, date, and so on. A time series is a series of data that is collected over a period of time. Time series data are sequential data which follow some patterns. In order of time, data are points in an index or listed or graphed. Time series data are also called historical data or past data. Time series data are used for predicting a future value based on an historical value. This is called time series analysis. The daily closing price of stocks, heights of ocean tides, and counts of sunspots are some examples of time series data. Time series data are studied for several purposes, such as forecasting the future based on knowledge of the past, understanding of the phenomenon. Underlying measures, or simply succinctly describing the salient features of the series. Forecasting or predicting future prices of an observed time series plays an important role in nearly all fields of science, engineering, finance, business intelligence, economics, meteorology, telecommunications etc. To predict an outcome based on time series data, we can use a time series model which is called Auto Regressive Integrated Moving Average (ARIMA) is used as the machine learning technique to analyze and predict future stock prices based on historical prices.

Let’s first import the required libraries :

Load the Data:

Figure 1: Dataset head

Before starting working on Time Series prediction, I decided to analyze the autocorrelation plot of the “Open” feature (Figure 2) with respect to a fixed lag of 5. The results shown in Figure 2 confirmed the ARIMA would have been a good model to be applied to this type of data.

Figure 2: Autocorrelation plot using a Lag of 5

Successively, I divided the data into a training and test set. Once done so, I plotted both on the same figure in order to get a feeling of how does our Time Series looks like (Figure 3).

Figure 3: Graphical Representation of Train/Test Split

In order to evaluate the ARIMA model, I decided to use two different error functions: Mean Squared Error (MSE) and Symmetric Mean Absolute Percentage Error (SMAPE). SMAPE is commonly used as an accuracy measure based on relative errors (Figure 4).

SMAPE is not currently supported in Scikit-learn as a loss function I, therefore, had first to create this function on my own.

Figure 4: SMAPE (Symmetric mean absolute percentage error) [2]

ARIMA model:

ARIMA stands for Auto Regression Integrated Moving Average. It is specified by three ordered parameters (p,d,q). Where:

  • p is the order of the autoregressive model(number of time lags)
  • d is the degree of differencing (number of times the data have had past values subtracted)
  • q is the order of moving average model. Before building an ARIMA model, we have to make sure our data is stationary.

Afterwards, I created the ARIMA model to be used for this implementation. I decided to set in this case p=5, d=1 and q=0 as the ARIMA parameters.

Finally, I decided to plot the training, test and predicted prices against time to visualize how did the model performed against the actual prices (Figure 5).

Figure 5: TATAGLOBAL Price Prediction

Figure 6 provides instead a zoomed in version of Figure 5. From this can be noticed how the two curves closely follow each other. However, the predicted price seems to look like a “noisy” version of the actual price.

Figure 6: Prediction vs Actual Price Comparison

Conclusion

This analysis using ARIMA lead overall to appreciable results. This model demonstrated in fact to offer good prediction accuracy and to be relatively fast compared to other alternatives such as RRNs (Recurrent Neural Networks).

GitHub

--

--