In this paper , We will use neural networks , especially LSTM Model to predict the behavior of time series data . The problem to be solved is the classical stock market forecast . All the data and codes used can be trusted to me .
 Although this is an old problem , But it has not been solved until today . The fact is simple : Stock prices are determined by several factors , Historical stock prices are only a small part of it . therefore , Forecasting price behavior is a very difficult problem .
 abstract 
 first , I will introduce some data visualization 
 Data set for . then , I will briefly discuss the difficulty and limitations of using moving average algorithm to predict stock market behavior . next , Recursive neural networks and LSTM Concept of , And with LSTM For example, for a single company 
 The stock price is predicted . last , I'll show you how to predict at the same time 4 Company price LSTM, And compare the results , See as we use more companies at the same time , Does the forecast improve .
 Data visualization 
 The dataset is from Yahoo 
Finance with CSV Downloaded in format . It has 4 The stock prices of companies are 01/08/2010 to 01/07/2019 period . We call them companies A,B,C and D.
 The basic step is to use Pandas open CSV file . First look at the data :
df_A = pd.read_csv(‘data/company_A.csv’)df_A[‘Date’] = 
pd.to_datetime(df_A[‘Date’])df_A.tail() Plt.figure(figsize = 
(15,10))plt.plot(df_A['Date'], df_A['Close'], label='Company 
A')plt.plot(df_B['Date'], df_B['Close'], label='Company 
B')plt.plot(df_C['Date'], df_C['Close'], label='Company 
C')plt.plot(df_D['Date'], df_D['Close'], label='Company 
D')plt.legend(loc='best')plt.show() 
4 The closing price of all the company's shares 
 Moving average 
 A classical algorithm used in this problem is moving average (MA). It includes calculation m Average of past observation days , And use this result as the prediction of the next day . To prove this , Here is an example of a moving average , use m As the closing price of the company 10 and 20 day .
df['MA_window_10'] = df['Close'].rolling(10).mean().shift() #shift so the day 
we want to predict won't be useddf['MA_window_20'] = 
df['Close'].rolling(20).mean().shift() 
 Use moving average pairs A Make a one-step forecast of the company's closing price 
 When we try to use the moving average to predict the future 10 Day closing price , give the result as follows :
 Use moving average to company A of 10 Day closing price forecast 
 Each red line represents based on the past 10 Day forecast . therefore , The red line is discontinuous .
 Use exponential moving average (EMA), We have achieved some small improvements :
 Use the index moving average to the company A Make a one-step prediction of the closing price of .
 contrast MA and EMA:
 use MA and EMA yes A Comparison of one-step prediction of company closing price 
 This method is simple . What we really want is advance n Forecast the future trend of the stock ,MA and EMA Can't complete the task .
 Recurrent neural network (RNN)
 To understand LSTM network , We first need to understand recurrent neural networks . When past results have an impact on current results , This network is used to identify patterns .RNN An example used is the time series function , The data order is very important .
 In this network architecture , Neurons use not only conventional inputs ( Previous layer output ) As input , It also uses its previous status as input .
RNN framework 
 it is to be noted that H Represents neuronal state . therefore , When in state H_1 Time , Neuron usage parameters X_1 and H_0( Its previous state ) As input . The main problem with this model is memory loss . The old state will soon be forgotten . In the sequence we need to remember just in the past ,RNN Can't remember .
LSTM network 
LSTM Network originated from RNN. But it can solve memory loss by changing the structure of neurons .
LSTM Neuronal architecture 
 New neurons have 3 A door , Each door has a different goal . The door is :
 *  Input gate 
 *  Output gate 
 *  Forget the door  
LSTM Neurons still receive their previous state as input :
LSTM neuron n Pass its previous state as a parameter .
LSTM Forecast individual companies 
 last , Let's use LSTM To predict the company A act .
 But first , Consider the following parameters . We want to predict the future n day (foward_days) Enter past observations m day (look_back). therefore , If we have a past m Day input , Network output will be the future n Day forecast . We will Train and Test Split data in . The test will be conducted by k Cycles (num_period) form , Each cycle is a series of n Day forecast .
look_back = 40forward_days = 10num_periods = 20 
 Now? , We use Pandas open CSV file , Keep only the columns we will use , Date and closing price .A The company's initial closing price chart is :
plt.figure(figsize = (15,10))plt.plot(df)plt.show() 
A Closing price of the company 
 In order , We scale the input , stay Train/Validation and Test Split data in , And format it to provide a model .
 Now? , We build and train models .
NUM_NEURONS_FirstLayer = 128NUM_NEURONS_SecondLayer = 64EPOCHS = 220#Build the 
modelmodel = 
Sequential()model.add(LSTM(NUM_NEURONS_FirstLayer,input_shape=(look_back,1), 
return_sequences=True))model.add(LSTM(NUM_NEURONS_SecondLayer,input_shape=(NUM_NEURONS_FirstLayer,1)))model.add(Dense(foward_days))model.compile(loss='mean_squared_error', 
optimizer='adam')history = 
model.fit(X_train,y_train,epochs=EPOCHS,validation_data=(X_validate,y_validate),shuffle=True,batch_size=2, 
verbose=2) 
 The result is :
A The company's model applies to all data 
 Observe the test set carefully :
A The company's model is applicable to the test set 
 Please note that , Each red line represents a based on the past 40 day (look_back) of 10 Day forecast (forward - 
days). have 20 Red line , Because we chose to 20 Cycles (num_periods) Test on . This is why the red prediction line is discontinuous .
 By repeating the same process for all companies , The best results on the test set are for the company C Prediction of :
C The company's model is applicable to the test set 
 Although this is the best model , But the result is not very good . There are many possible reasons for this result . Some of them may be :
 *  Only the historical data of the closing price is not enough to predict the stock price 
 *  The model can also be improved  
LSTM forecast 4 Company 
 last , We will use LSTM The model predicts all 4 The behavior of this company ,A,B,C and D, And with a single LSTM Contrast the company's results . The purpose is to analyze whether the use of data from multiple companies can improve individual forecasts for each company .
 It should be pointed out that , All 4 individual csv All have the same date . such , The network will not receive future information from one company to predict the value of another company .
 The initial data is :
 All 4 Closing price of the company 
 After data normalization and formatting of the model , Train the model :
NUM_NEURONS_FirstLayer = 100NUM_NEURONS_SecondLayer = 50EPOCHS = 200#Build the 
modelmodel = 
Sequential()model.add(LSTM(NUM_NEURONS_FirstLayer,input_shape=(look_back,num_companies), 
return_sequences=True))model.add(LSTM(NUM_NEURONS_SecondLayer,input_shape=(NUM_NEURONS_FirstLayer,1)))model.add(Dense(foward_days 
* num_companies))model.compile(loss='mean_squared_error', 
optimizer='adam')history = 
model.fit(X_train,y_train,epochs=EPOCHS,validation_data=(X_validate,y_validate),shuffle=True,batch_size=1, 
verbose=2) 
 The result is :
4 Company LSTM Results of the model 
 Observe the test set carefully :
4 Company LSTM Test set prediction of model 
 It's time to compare the results . Single company LSTM The results are displayed on the left ,4 Companies LSTM The results are displayed on the right . The first row shows the predictions in the test set , The second row shows the forecasts in all data sets .
A company 
B company 
C company 
D company 
 conclusion 
 It is impossible to predict the behavior of the stock market only by historical prices .LSTM The forecast is unacceptable . Even using the historical prices of several companies , Predictions are getting worse .
Technology