I understand the simple tf.distribute.mirroredStrategy(), it’s actually pretty simple. I’m hoping to scale a large problem across multiple computers so I’m trying to learn how to use multiWorkerMirrorredStrategy() but I have not found a good example yet.
My understanding is that I would write one python script and distribute it across the machines. Then I define the roles of each machine via the environment variable TF_CONFIG. I create the strategy and then do something like:
mystrategy = tf.distribute.multiWorkerMirroredStrategy() with mystrategy.scope(): model = buildModel() model.compile() model.fit(x_train, y_train)
That’s all very straightforward. My question is about the data. This code is executed on all nodes. Is each node supposed to parse TF_CONFIG and load its own subset of data? Does just the chief load all the data and then the scope block parses out shards? Or does each node load all the data?