LSTM Prediction, Coca Cola stock closing price $51.35 3/19/21

TL/DR: Teaching myself about investing and Tensorflow. My model predicts Coca Cola will close $51.35 on 3/19/2021. I have no idea what I’m doing. Do not use this to make financial decisions.

So last week I predicted Coca Cola would close at $42.56. It really closed at $50.36. I would have been better off using the lazy predictor and just assume last week’s close of $50.79 would not have moved. See my previous post:

https://www.reddit.com/r/tensorflow/comments/lzv277/silly_prediction_coca_cola_stock_closing_price/

So the first thing I did was scale my numbers. Having price points in the tens of dollars, and volume in the millions blew up my model pretty badly. My previous unscaled version had errors relative to my training data in the 10’s of dollars per week.

I scaled volume to shares outstanding, and share price to % change, and it looks like this model is performing better. The scaled model has errors much tighter. I’ve attached a histogram of my model’s error using the last 100 week’s of data. Next Friday will be the first real test though.

So the bad news. I define success as my model performing better than a “Lazy” predictor, which assumes that next week’s closing price will equal this week’s closing price. My model’s error histogram pretty much sits exactly on top of the Lazy predictor histogram.

Comparing root mean square error, my LSTM predictor is actually slightly worse…

There are a few known issues that I need to work out. I haven’t included dividend data, nor do I have a method to handle splits. I’ll need to be able to address splits if I want to include other companies.

Data is starting to get a bit messy to handle. I’m currently stuck doing some manual manipulation and manual data entry, as Alphavantage is missing historical shares outstanding. Ideally I would include float as well.

Looks like I might need to setup a SQL server locally to host all this data.

I’m starting to understand why many of the tutorials spend 80% of their time on data manipulation and only 20% on tensorflow.

submitted by /u/squirrelaway4all
[visit reddit] [comments]

Leave a Reply Cancel reply