def collect_step(self, environment, policy, buffer): time_step = environment.current_time_step() observation_root = time_step.observation.numpy() observation_root_padded = np.array([np.pad(observation_root, (2,), 'median')]) player_matrix = observation_root_padded[:, 1, :, :, 3] player_location = np.unravel_index(np.argmax(player_matrix), np.array(player_matrix).shape)[1:3] new_observation = observation_root_padded[:, 2:102, player_location - 1:player_location + 2, player_location - 1:player_location + 2, 2:5] new_ts = ts.TimeStep(time_step.step_type, time_step.reward, time_step.discount, new_observation) action_step = policy.action(new_ts) next_time_step = environment.step(action_step.action) traj = trajectory.from_transition(new_ts, action_step, next_time_step) buffer.add_batch(traj)
Long story short, I don’t want my bot observing the entire playspace or state, only the immediate observable distance around them.
Imaging training a Minecraft bot to traverse land. If Tensorflow had their way, it would be the environments responsibility to reduce the scope of visibility (Fog). Why not have the bot put the blinders on?
In this sample, the environment provides a gradient for the bot to follow, the bot doesn’t need to see the entire 1000×1000 playspace and only the immediate 10×10 around them.