Preprocessing data and creating stock market predictions with Tensorflow Keras
lstm.ipynb is the notebook for creating the LSTM model, where plotting is happening and predicting prices.
dense.ipynb is a notebook for a feed-forward neural network with Dense layers, classification expects movement in one direction.
classification.ipynb is a notebook for a convolutional neural network for expecting movements up or down.
As you can see in the .gitignore file, many directories have not been published to VCS, mainly because of their big size. The following files are serving the purpose of creating an all-round model, that was unsuccessful, but I've learned a lot:
main.pymain_lstm.pymain_train_test_aggregate.pymain_train_test_scale.pymain_train_test_split.py
As data is stationary I tried processing the step-by-step, that is why I have these files.
I used the following structure for local development, as you cannot see them on any of the VCS:
-
All the csv raw data are stored in tha
.\datafolder -
The data needs to be split into training and testing sets, I split the dataset with a
test_size=0.2with scikit learn library, and turned shuffle off -
training data is stored in
.\data_train -
testing data is stored in
.\data_test -
until now all stocks are stored separately
-
now the data needed to be aggregated to one file to be able to feed the neural network all at once: in
.\data_test\aggregatedand.\data_train\aggregatedI aggregated all the data to one file while converting all data to numbers: datetime relative in days to1970-01-01, adding a ticker field to be able to separate stocks and adding anIs Stockboolean field (0 or 1) -
after aggregating all the data to files, for further preprocessing I needed to scale data to eliminate biases from different sizes of values
Not only neural networks, we also can explore certain patterns with our own eyes with good visualization.
