Getting Tensorflow PrefetchDataset through Kesas TextVectorization layer

I am on tf_nightly-2.7.0 and used tensorflow’s “make_csv_dataset” to make dataset from a TSV file, but it seems the Tensorflow PrefetchDataset doesn’t have shape information. I could have used Pandas dataframe but would like to try Tensorflow’s dataset. Here are codes without the import:

!wget train_file_path = "train-data.tsv" train_data =, header=False, field_delim='t', column_names=['label', 'text'], batch_size=5, label_name='label', num_epochs=1, ignore_errors=True) examples, labels = next(iter(train_data)) # Just the first batch. print("FEATURES: n", examples, "n") print("LABELS: n", labels) encoder = keras.layers.TextVectorization(max_tokens=None, output_mode='int', output_sequence_length=160) encoder.adapt(train_data) 

Here is how the dataset looks in the print output:

FEATURES: OrderedDict([('text', <tf.Tensor: shape=(5,), dtype=string, numpy= array([b'rt-king pro video club>> need help? or call 08701237397 you must be 16+ club credits redeemable at! enjoy!', b'good afternoon sunshine! how dawns that day ? are we refreshed and happy to be alive? do we breathe in the air and smile ? i think of you, my love ... as always', b'they have a thread on the wishlist section of the forums where ppl post nitro requests. start from the last page and collect from the bottom up.', b'no current and food here. i am alone also', b'die... i accidentally deleted e msg i suppose 2 put in e sim archive. haiz... i so sad...'], dtype=object)>)]) LABELS: tf.Tensor([b'spam' b'ham' b'ham' b'ham' b'ham'], shape=(5,), dtype=string) 

Here is the error on line encoder.adapt(train_data) :

AttributeError: 'NoneType' object has no attribute 'ndims 

The desired outcome would be no error message after manipulating the Tensorflow dataset.

Thank you for the help in advance!

submitted by /u/na_haran
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *