Categories
Misc

Large spikes after each epoch using tf.Keras API

Large spikes after each epoch using tf.Keras API

I am training a model using tf.Keras. The code is the following.

class CustomCallback(tf.keras.callbacks.Callback): def __init__(self, val_dataset, **kwargs): self.val_dataset = val_dataset super().__init__(**kwargs) def on_train_batch_end(self, batch, logs=None): if batch%1000 == 0: val = self.model.evaluate(self.val_dataset, return_dict=True) print("*** Val accuracy: %.2f ***" % (val['sparse_categorical_accuracy'])) super().on_train_batch_end(batch, logs) ## DATASET ## # Create a dictionary describing the features. image_feature_description = { 'train/label' : tf.io.FixedLenFeature((), tf.int64), 'train/image' : tf.io.FixedLenFeature((), tf.string) } def _parse_image_function(example_proto): # Parse the input tf.train.Example proto using the dictionary above. parsed_features = tf.io.parse_single_example(example_proto, image_feature_description) image = tf.image.decode_jpeg(parsed_features['train/image']) image = tf.image.resize(image, [224,224]) # augmentation image = tf.image.random_flip_left_right(image) image = tf.image.random_brightness(image, 0.2) image = tf.image.random_jpeg_quality(image, 50, 95) image = image/255.0 label = tf.cast(parsed_features['train/label'], tf.int32) return image, label def load_dataset(filenames, labeled=True): ignore_order = tf.data.Options() ignore_order.experimental_deterministic = False # disable order, increase speed dataset = tf.data.TFRecordDataset(filenames) # automatically interleaves reads from multiple files dataset = dataset.with_options(ignore_order) # uses data as soon as it streams in, rather than in its original order dataset = dataset.map(partial(_parse_image_function), num_parallel_calls=AUTOTUNE) return dataset def get_datasets(filenames, labeled=True, BATCH=64): dataset = load_dataset(filenames, labeled=labeled) train_dataset = dataset.skip(2000) val_dataset = dataset.take(2000) train_dataset = train_dataset.shuffle(4096) train_dataset = train_dataset.prefetch(buffer_size=AUTOTUNE) train_dataset = train_dataset.batch(BATCH) val_dataset = val_dataset.batch(BATCH) return train_dataset, val_dataset train_dataset, val_dataset = get_datasets('data/train_224.tfrecords', BATCH=64) ## CALLBACKS ## log_path = './logs/' + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") checkpoint_path = './checkpoints/' + datetime.datetime.now().strftime("%Y%m%d-%H%M%S") tb_callback = tf.keras.callbacks.TensorBoard( log_path, update_freq=100, profile_batch=0) model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_path+'/weights.{epoch:02d}-{accuracy:.2f}.hdf5', save_weights_only=False, save_freq=200) custom_callback = CustomCallback(val_dataset=val_dataset) ## MODEL ## lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay( 0.005, decay_steps=300, decay_rate=0.98, staircase=True ) model = tf.keras.applications.MobileNetV2( include_top=True, weights=None, classes=2, alpha=0.25) model.compile( optimizer=tf.keras.optimizers.RMSprop(learning_rate=lr_schedule), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False), metrics=['accuracy', 'sparse_categorical_accuracy']) model.fit(train_dataset, epochs=NUM_EPOCHS, shuffle=True, validation_data=val_dataset, validation_steps=None, callbacks=[model_checkpoint_callback, tb_callback, custom_callback]) model.save('model.hdf5') 

At the end of each epoch I can see a spike in the batch accuracy and loss, as you can see in the figure below. After the spike, the metrics gradually return to previous values and keep improving.

What could be the reason for this strange behaviour?

https://preview.redd.it/sg8lcdieylh61.png?width=417&format=png&auto=webp&s=7c93ff11fe29da2cb8d90591ce44c69b8da92c3e

submitted by /u/fralbalbero
[visit reddit] [comments]

Categories
Misc

KeyError ‘metrics’ when trying to train a model in tensorflow

I was working off some github code where I’m training a model to recognize laughter. This particular bit of code is giving me problems:

from keras.models import Sequential from keras.layers import Dense, BatchNormalization, Flatten lr_model = Sequential() # lr_model.add(keras.Input((None, 128))) lr_model.add(BatchNormalization(input_shape=(10, 128))) lr_model.add(Flatten()) lr_model.add(Dense(1, activation='sigmoid')) # try using different optimizers and different optimizer configs lr_model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) batch_size=32 CV_frac = 0.1 train_gen = data_generator(batch_size,'../Data/bal_laugh_speech_subset.tfrecord', 0, 1-CV_frac) val_gen = data_generator(128,'../Data/bal_laugh_speech_subset.tfrecord', 1-CV_frac, 1) rec_len = 18768 lr_h = lr_model.fit_generator(train_gen,steps_per_epoch=int(rec_len*(1-CV_frac))//batch_size, epochs=100, validation_data=val_gen, validation_steps=int(rec_len*CV_frac)//128, verbose=0, callbacks=[TQDMNotebookCallback()]) 

I get the following error:

--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-15-9eced074de11> in <module> 7 rec_len = 18768 8 ----> 9 lr_h = lr_model.fit_generator(train_gen,steps_per_epoch=int(rec_len*(1-CV_frac))//batch_size, epochs=100, 10 validation_data=val_gen, validation_steps=int(rec_len*CV_frac)//128, 11 verbose=0, callbacks=[TQDMNotebookCallback()]) ~/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, validation_freq, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 1845 'will be removed in a future version. ' 1846 'Please use `Model.fit`, which supports generators.') -> 1847 return self.fit( 1848 generator, 1849 steps_per_epoch=steps_per_epoch, ~/.local/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1103 logs = tmp_logs # No error, now safe to assign to logs. 1104 end_step = step + data_handler.step_increment -> 1105 callbacks.on_train_batch_end(end_step, logs) 1106 if self.stop_training: 1107 break ~/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py in on_train_batch_end(self, batch, logs) 452 """ 453 if self._should_call_train_batch_hooks: --> 454 self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs) 455 456 def on_test_batch_begin(self, batch, logs=None): ~/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py in _call_batch_hook(self, mode, hook, batch, logs) 294 self._call_batch_begin_hook(mode, batch, logs) 295 elif hook == 'end': --> 296 self._call_batch_end_hook(mode, batch, logs) 297 else: 298 raise ValueError('Unrecognized hook: {}'.format(hook)) ~/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py in _call_batch_end_hook(self, mode, batch, logs) 314 self._batch_times.append(batch_time) 315 --> 316 self._call_batch_hook_helper(hook_name, batch, logs) 317 318 if len(self._batch_times) >= self._num_batches_for_timing_check: ~/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py in _call_batch_hook_helper(self, hook_name, batch, logs) 358 if numpy_logs is None: # Only convert once. 359 numpy_logs = tf_utils.to_numpy_or_python_type(logs) --> 360 hook(batch, numpy_logs) 361 362 if self._check_timing: ~/.local/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py in on_train_batch_end(self, batch, logs) 708 """ 709 # For backwards compatibility. --> 710 self.on_batch_end(batch, logs=logs) 711 712 @doc_controls.for_subclass_implementers ~/env/py385/lib/python3.8/site-packages/keras_tqdm/tqdm_callback.py in on_batch_end(self, batch, logs) 115 self.inner_count += update 116 if self.inner_count < self.inner_total: --> 117 self.append_logs(logs) 118 metrics = self.format_metrics(self.running_logs) 119 desc = self.inner_description_update.format(epoch=self.epoch, metrics=metrics) ~/env/py385/lib/python3.8/site-packages/keras_tqdm/tqdm_callback.py in append_logs(self, logs) 134 135 def append_logs(self, logs): --> 136 metrics = self.params['metrics'] 137 for metric, value in six.iteritems(logs): 138 if metric in metrics: KeyError: 'metrics' 

All the other KeyError ‘metrics’ problems I’ve googled have to do with something else called livelossplot. Any help will be much appreciated!

submitted by /u/imstupidfeelbad
[visit reddit] [comments]

Categories
Misc

Deep Neural network Hyper-parameter Tuning with Genetic Algorithm

Hello there. I’m thinking of using Genetic Algorithm to tune Hyper-parameters of Neural Networks.

Apart form this paper and this blog blog, I don’t find anything that relates with the topic. However there are many GA libraries such as PyGAD etc, but they only apply GA onto weights to fine tune the model instead of finding the best hyper-parameters.

By any chance, anyone here tried anything like this before, as in using GA to find the best hyperparameter in a Tensorflow/Keras Model? Mind share your thoughts?

thanks!

submitted by /u/Obvious-Salad4973
[visit reddit] [comments]

Categories
Misc

Assertion Error when training DNNClassifier

I’m trying to create a DNN classifier and am running into the following error:

Invalid argument: assertion failed: [Labels must be <= n_classes – 1] [Condition x <= y did not hold element-wise:] [x (head/losses/labels:0) = ] [[3][2][4]…] [y (head/losses/check_label_range/Const:0) = ] [4]

I am following the general structure from https://www.tensorflow.org/tutorials/estimator/premade and am not sure what I’m doing wrong.

In my data, there are 255 columns and 4 possible classifications for each row.

I have excluded the imports

training_data = pd.read_csv(data_file)
target_data = pd.read_csv(target_file)
train_y = training_data.pop(‘StateCode’)
target_y = target_data.pop(‘StateCode’)
def input_fn(features, labels, training=True, batch_size=256):
# Convert the inputs to a Dataset.
dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))
# Shuffle and repeat if you are in training mode.
if training:
dataset = dataset.shuffle(1000).repeat()

return dataset.batch(batch_size)
my_feature_columns = []
for key in training_data.keys():
my_feature_columns.append(tf.feature_column.numeric_column(key=key))
labels = list(training_data.columns)
print(labels)
classifier = tf.estimator.DNNClassifier(
feature_columns=my_feature_columns,
# Two hidden layers of 30 and 10 nodes respectively.
hidden_units=[30, 10],
# The model must choose between 4 classes.
n_classes=4)

classifier.train(
input_fn=lambda: input_fn(training_data, train_y, training=True),
steps=5000)

Any help would be appreciated.

submitted by /u/rk2danker
[visit reddit] [comments]

Categories
Misc

How to use my model to analyze real time data?

Hi everyone,

I am trying to use a model saved in .h5 format to analyze a video stream and identify the speed in real time. How do I implement this in Python?

Thank you in advance!

submitted by /u/sleepingmousie
[visit reddit] [comments]

Categories
Misc

GPU bug using tensorflow 2.x Object Detection API

Hey!

I’m following this installation guide for object detection using tensorflow 2, my goal is to train a CNN using my GPU. Tensorflow seems to recognize it after the GPU support section and everything runs smoothly. However, after I install the Object Detection API, tensorflow just starts ignoring it and runs on CPU. Any help would be deeply apreciated, thanks!

submitted by /u/smcsb
[visit reddit] [comments]

Categories
Misc

How do you install TF (and TFLite) on a Raspberry Pi Zero?

After several hours of Google searching, I cannot find a straightforward answer. I’m not working with the W model, so my RPi doesn’t have a wireless adapter. I have found results that all involve cloning repos on the RPi itself, but I have no clue how to go about it. Do I need a Raspberian build that already has TF built-in to it? Or is there a way I can scp all the files over to my Pi?

I’ve tried what’s listed here, but instead downloading TF and its dependencies on my local machine, but didn’t know what to do in regards to the file path.

If anyone knows of a good guide or knows how to do this…please let me know. Thanks!

submitted by /u/seekingstars
[visit reddit] [comments]

Categories
Misc

3D Scene Understanding with TensorFlow 3D

3D Scene Understanding with TensorFlow 3D submitted by /u/AR_MR_XR
[visit reddit] [comments]
Categories
Misc

Is Tensor Flow Compilable to a binary?

If I install all the external dependencies of Tensor Flow such as the Bazel, make, on my linux machine, and build Tensor Flow from source, would I be able to arrive at a binary code I can feed into some server machine to just run?

I am also asking whether the python API calls a bunch of dynamically linked libraries, and then somehow the have a model.build() command or a model.compile() command that produces a binary in the directory in my linux machine?

submitted by /u/GingerGengar123
[visit reddit] [comments]

Categories
Misc

tf.distribute.multiWorkerMirroredStrategy and data sharding

I understand the simple tf.distribute.mirroredStrategy(), it’s actually pretty simple. I’m hoping to scale a large problem across multiple computers so I’m trying to learn how to use multiWorkerMirrorredStrategy() but I have not found a good example yet.

My understanding is that I would write one python script and distribute it across the machines. Then I define the roles of each machine via the environment variable TF_CONFIG. I create the strategy and then do something like:

mystrategy = tf.distribute.multiWorkerMirroredStrategy() with mystrategy.scope(): model = buildModel() model.compile() model.fit(x_train, y_train) 

That’s all very straightforward. My question is about the data. This code is executed on all nodes. Is each node supposed to parse TF_CONFIG and load its own subset of data? Does just the chief load all the data and then the scope block parses out shards? Or does each node load all the data?

submitted by /u/Simusid
[visit reddit] [comments]