Hello, I’m currently working through the tensorflow agent and bandit library/tutorials. But I can’t answer completely myself what the difference between the driver and the trainer is. Is it basically the trainer is training the agent and for example his neuronal network for the (approximately) perfect policy in a given environment? And the the driver is just the execution of a given policy without any regards to optimizing it along the way ?

