submitted by /u/maneesh123456 [visit reddit] [comments] |
Month: April 2022
Ultimate Guide to Activation Functions
submitted by /u/SirFletch
[visit reddit] [comments]
errors when trying to load dataset
I have been trying to develop my machine learning model dealing with Nifti files. I originally just loaded them into a numpy array but after augmentation the amount of data was too large for the RAM.
I discovered that one should make use of things such as generators / tools provided by tf.Data.
As such I attempted the following.
def load_images(imagePath): image = nib.load(imagePath) image = image.get_fdata() image = tf.image.per_image_standardization(image) label = (int)(imagePath.split('Grade')[1][0]) - 1 return (image, label) dataset = tf.data.Dataset.from_tensor_slices(all_paths) dataset = (dataset .shuffle(1024) .map(load_images, num_parallel_calls=AUTOTUNE) .cache() .repeat() .batch(64) .prefetch(AUTOTUNE) )
Hopefully the code is straightforward but if anything needs further clarification please do ask. Originally the first two lines of load_images() were it’s own function but I tried it like this to try and resolve the issue.
The issue is that I am getting the following error at the map line:
TypeError: in user code: File "<ipython-input-64-56e9744da5d3>", line 5, in load_images * image = nib.load(imagePath) File "/usr/local/lib/python3.7/dist-packages/nibabel/loadsave.py", line 42, in load * stat_result = os.stat(filename) TypeError: stat: path should be string, bytes, os.PathLike or integer, not Tensor
I don’t think it likes me using nibabelfunctions in the mapping function but I cannot think of any other way to do it. I tried following the answer here but this just gave me another error about using tf.function
submitted by /u/15150776
[visit reddit] [comments]
Object detection is a long-standing computer vision task that attempts to recognize and localize all objects of interest in an image. The complexity arises when trying to identify or localize all object instances while also avoiding duplication. Existing approaches, like Faster R-CNN and DETR, are carefully designed and highly customized in the choice of architecture and loss function. This specialization of existing systems has created two major barriers: (1) it adds complexity in tuning and training the different parts of the system (e.g., region proposal network, graph matching with GIOU loss, etc.), and (2), it can reduce the ability of a model to generalize, necessitating a redesign of the model for application to other tasks.
In “Pix2Seq: A Language Modeling Framework for Object Detection”, published at ICLR 2022, we present a simple and generic method that tackles object detection from a completely different perspective. Unlike existing approaches that are task-specific, we cast object detection as a language modeling task conditioned on the observed pixel inputs. We demonstrate that Pix2Seq achieves competitive results on the large-scale object detection COCO dataset compared to existing highly-specialized and well-optimized detection algorithms, and its performance can be further improved by pre-training the model on a larger object detection dataset. To encourage further research in this direction, we are also excited to release to the broader research community Pix2Seq’s code and pre-trained models along with an interactive demo.
Pix2Seq Overview
Our approach is based on the intuition that if a neural network knows where and what the objects in an image are, one could simply teach it how to read them out. By learning to “describe” objects, the model can learn to ground the descriptions on pixel observations, leading to useful object representations. Given an image, the Pix2Seq model outputs a sequence of object descriptions, where each object is described using five discrete tokens: the coordinates of the bounding box’s corners [ymin, xmin, ymax, xmax] and a class label.
Pix2Seq framework for object detection. The neural network perceives an image, and generates a sequence of tokens for each object, which correspond to bounding boxes and class labels. |
With Pix2Seq, we propose a quantization and serialization scheme that converts bounding boxes and class labels into sequences of discrete tokens (similar to captions), and leverage an encoder-decoder architecture to perceive pixel inputs and generate the sequence of object descriptions. The training objective function is simply the maximum likelihood of tokens conditioned on pixel inputs and the preceding tokens.
Sequence Construction from Object Descriptions
In commonly used object detection datasets, images have variable numbers of objects, represented as sets of bounding boxes and class labels. In Pix2Seq, a single object, defined by a bounding box and class label, is represented as [ymin, xmin, ymax, xmax, class]. However, typical language models are designed to process discrete tokens (or integers) and are unable to comprehend continuous numbers. So, instead of representing image coordinates as continuous numbers, we normalize the coordinates between 0 and 1 and quantize them into one of a few hundred or thousand discrete bins. The coordinates are then converted into discrete tokens as are the object descriptions, similar to image captions, which in turn can then be interpreted by the language model. The quantization process is achieved by multiplying the normalized coordinate (e.g., ymin) by the number of bins minus one, and rounding it to the nearest integer (the detailed process can be found in our paper).
After quantization, the object annotations provided with each training image are ordered into a sequence of discrete tokens (shown below). Since the order of the objects does not matter for the detection task per se, we randomize the order of objects each time an image is shown during training. We also append an End of Sequence (EOS) token at the end as different images often have different numbers of objects, and hence sequence lengths.
The Model Architecture, Objective Function, and Inference
We treat the sequences that we constructed from object descriptions as a “dialect” and address the problem via a powerful and general language model with an image encoder and autoregressive language encoder. Similar to language modeling, Pix2Seq is trained to predict tokens, given an image and preceding tokens, with a maximum likelihood loss. At inference time, we sample tokens from model likelihood. The sampled sequence ends when the EOS token is generated. Once the sequence is generated, we split it into chunks of 5 tokens for extracting and de-quantizing the object descriptions (i.e., obtaining the predicted bounding boxes and class labels). It is worth noting that both the architecture and loss function are task-agnostic in that they don’t assume prior knowledge about object detection (e.g., bounding boxes). We describe how we can incorporate task-specific prior knowledge with a sequence augmentation technique in our paper.
Results
Despite its simplicity, Pix2Seq achieves impressive empirical performance on benchmark datasets. Specifically, we compare our method with well established baselines, Faster R-CNN and DETR, on the widely used COCO dataset and demonstrate that it achieves competitive average precision (AP) results.
Since our approach incorporates minimal inductive bias or prior knowledge of the object detection task into the model design, we further explore how pre-training the model using the large-scale object detection COCO dataset can impact its performance. Our results indicate that this training strategy (along with using bigger models) can further boost performance.
Pix2Seq can detect objects in densely populated and complex scenes, such as those shown below.
Example complex and densely populated scenes labeled by a trained Pix2Seq model. Try it out here. |
Conclusion and Future Work
With Pix2Seq, we cast object detection as a language modeling task conditioned on pixel inputs for which the model architecture and loss function are generic, and have not been engineered specifically for the detection task. One can, therefore, readily extend this framework to different domains or applications, where the output of the system can be represented by a relatively concise sequence of discrete tokens (e.g., keypoint detection, image captioning, visual question answering), or incorporate it into a perceptual system supporting general intelligence, for which it provides a language interface to a wide range of vision and language tasks. We also hope that the release of our Pix2Seq’s code, pre-trained models and interactive demo will inspire further research in this direction.
Acknowledgements
This post reflects the combined work with our co-authors: Saurabh Saxena, Lala Li, Geoffrey Hinton. We would also like to thank Tom Small for the visualization of the Pix2Seq illustration figure.
CIFAR10 Models
I’m looking for a model that does well on CIFAR10. Most of the ones I have found overfit way too much. Any suggestions to some architectures? No transfer learning, I need to train them myself for research. I would appreciate any references as well.
submitted by /u/kaca541
[visit reddit] [comments]
Help with updating old code to new api
I’m trying to run a Jupyter notebook from this repository: https://github.com/artemyk/ibsgd. Specifically the MNIST_SaveActivations.ipynb file. Problem is the code is from two years ago, and The file was written in tensorflow 2.1.0. The notebook runs fine until the last cell, where I get the following error (see pictures below). loggingreporter.py file where the error occurs How inputs variable is used in loggingreporter.py I’ve been looking online but I can’t figure out how to fix this attribute error for the version of TensorFlow I’m running (2.8.0). Any guidance is appreciated! I’ve already tried downgrading to 2.1.0 but have not had success. I’m using the latest version of python3 submitted by /u/mtot10 |
Hi, i am trying to develop an image recognition app and i already had greate results with a 2 category dataset but the moment i add a third category the accuracy wont increase. Can anybody help me with that?
from tensorflow.keras.models import Sequential
for dense_layer in dense_layers:
NAME = f”CONV{conv_layer}_size{layer_size}_dense{dense_layer}_{time.time()}” submitted by /u/Mo_187_ |
Different parts of the globe are experiencing distinct climate challenges — severe drought, dangerous flooding, reduced biodiversity or dense air pollution. The challenges are so great that no country can solve them on their own. But innovative startups worldwide are lighting the way, demonstrating how these daunting challenges can be better understood and addressed with Read article >
The post By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet appeared first on NVIDIA Blog.
submitted by /u/maneesh123456 [visit reddit] [comments] |
Hi everyone, I’ve come up with a research solution for my Bachelors Thesis that provides MCUs an intelligent decision support system to transfer TinyML models from MCU to a smartphone for network based implementations (federated learning). I would appreciate it if you could spare a few minutes to read the document describing the specifics of my project (why and how I came up with the solution) and later fill out a form with a few questions that get your thoughts on the project.
Thank you in advance 🙂
submitted by /u/dieselVeasel
[visit reddit] [comments]