I am a beginner in machine learning and I would like to forecast some pollution data.
I am using a dataset with values for pm2.5, pm10 and pm1 as features and I am predicting the values for the pm2.5. I built an LSTM network but the predicted values are quite from the real values.
What I used:
win_length=2 batch_size=32 num_features=3 train_generator=TimeseriesGenerator(x_train,y_train,length=win_length,sampling_rate=1,batch_size=batch_size) test_generator=TimeseriesGenerator(x_test,y_test,length=win_length,sampling_rate=1,batch_size=batch_size)
`
The used model is LSTM:
model=tf.keras.Sequential() model.add(tf.keras.layers.LSTM(200,input_shape=(win_length,num_features),return_sequences=True)) model.add(tf.keras.layers.LeakyReLU(alpha=0.5)) model.add(tf.keras.layers.LSTM(128,return_sequences=True)) model.add(tf.keras.layers.LeakyReLU(alpha=0.5)) model.add(tf.keras.layers.Dropout(0.3)) model.add(tf.keras.layers.LSTM(64,return_sequences=False)) model.add(tf.keras.layers.Dropout(0.3)) model.add(tf.keras.layers.Dense(1))
This is the data snippet and how the predicted values look compared to the original pm2.5 values: prediction snippet
How to increase the accuracy of the forecast? I am also attaching the jupyter notebook, which contains all the analysis: https://github.com/creativitylab/dataset/blob/main/pollution%20data.ipynb
submitted by /u/MobileInformal460
[visit reddit] [comments]