Manipulating batches prior to sending them to the model

I have a somewhat unique issue that I cannot solve because nothing on Google comes up.

My data are one-hot encoded DNA sequences of VARYING length. This is easily stored in a jagged NumPy array (4 x n x m), where n = number of samples, m = length of sequence (may vary.) However, the size requirements after zero-padding the entire array (padded by max sequence length) is insane and I need to avoid doing that.

The solution I have thought up is as follows:

  1. Generated jagged numpy array (varying input lengths)
  2. Extract k sequences from this large array where k = batch size
  3. Zero-pad the batch
  4. Pass to model
  5. Repeat from step 2

Any help would be greatly appreciated. Thanks!

submitted by /u/RAiD78
[visit reddit] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *