Browsing through freely available sources I find both statements: DQN is good / is not good for stochastic environments.
As far as I understand it, the Q-Network predicts the expected return of an action in a state, which can then be used to decide e.g. greedily; and training makes that prediction better. If the environment is stochastic, repeated learning should nudge the prediction to the distribution center as the loss minimum.
So it should work, but might need a lot of time to get there (law of great numbers), especially since the game is being played by 2 agents suffering from the same problem, and being part of the “environment” stochastic behaviour for the opponent!
Maybe there is another technique in Deep Learning / Reinforcement Learning much better suited for such a strongly stochastic environment? Any advices?