I’m a little confused by what I’m getting vs. what I’m
expecting. I’m using Tensorflow 2.1 in Python 3.7 in Anaconda
3-2020.07
Here’s my problem:
- I want my output to be the next value in an hour-by-hour time
series.
- My input has 99 features.
- I have 24,444 data points for training. Some of the data was
corrupted/reserved for validation.
I’m trying to build a 2 layer deep neural network using LSTM
layers:
model = Sequential() model.add(tensorflow.keras.layers.LSTM(64,
return_sequences=True, input_dim=99))
model.add(tensorflow.keras.layers.LSTM(32,
return_sequences=True))
model.add(tensorflow.keras.layers.Dense(1)
I plan to give it sets of data with 72 hours (3 days) of
sequential training.
So when I give my model training data:
model.fit(X_data, Y_data,
…)
I planned on giving X_data with dimensions of size [24444, 72,
99], where the first dimension 24444 describes the data points, the
72 describes the 72 hours of history, and the 99 describes my
training features.
My Y_data has dimensions of size [24444, 72, 1] where first
dimension 24444 describes my training points, 72 describes the
history, and 1 is my output feature.
My question is, when training is done, and I’m actively using my
model for predictions, what should my production input size be?
prediction = model.predict(production_data)
Should my production size be [1, 72, 99]? Where 1 is the number
of output points I expect, 72 is my history, and 99 my feature
size?
When I do this, I get an output size of [72, 1]. That feels…
weird?
What is the difference between feeding my model input of [72, 1,
99] vs [1, 72, 99]? Does the first case not proprogate the internal
state forward?
If I give my model [1, 1, 99] do I need to loop my model
predictions? And how would I do this?