Categories
Misc

Classification predictions completely different base on data size, though data doesn’t change

Hello, I’ve just started learning and messing around with neural networks. I’m not sure if this is a problem, or this is how neural networks work, but I’ve noticed, that whenever I try to predict a binary classification outcome with my model, the predictions vary completely based on the size of the data i pass it.

For example, if I try to predict a single outcome with one row of data, I get something like 0.4. Then if I add another row of data and predict again, the first prediction of row 1 becomes 0.9, even though the data in row 1 did not change, I only added an additional row of data for an additional prediction.

My training data consists of 1266 entries with 54 features. I’ve tried reducing the batch_size to 1, different optimizers, number of layers, number of neurons and the result is mostly the same. Is this normal behavior?

submitted by /u/CandyPoper
[visit reddit] [comments]

Categories
Misc

Pushing Forward the Frontiers of Natural Language Processing

Idea generation, not hardware or software, needs to be the bottleneck to the advancement of AI, Bryan Catanzaro, vice president of applied deep learning research at NVIDIA, said this week at the AI Hardware Summit. “We want the inventors, the researchers and the engineers that are coming up with future AI to be limited only Read article >

The post Pushing Forward the Frontiers of Natural Language Processing  appeared first on The Official NVIDIA Blog.

Categories
Offsites

Toward Fast and Accurate Neural Networks for Image Recognition

As neural network models and training data size grow, training efficiency is becoming an important focus for deep learning. For example, GPT-3 demonstrates remarkable capability in few-shot learning, but it requires weeks of training with thousands of GPUs, making it difficult to retrain or improve. What if, instead, one could design neural networks that were smaller and faster, yet still more accurate?

In this post, we introduce two families of models for image recognition that leverage neural architecture search, and a principled design methodology based on model capacity and generalization. The first is EfficientNetV2 (accepted at ICML 2021), which consists of convolutional neural networks that aim for fast training speed for relatively small-scale datasets, such as ImageNet1k (with 1.28 million images). The second family is CoAtNet, which are hybrid models that combine convolution and self-attention, with the goal of achieving higher accuracy on large-scale datasets, such as ImageNet21 (with 13 million images) and JFT (with billions of images). Compared to previous results, our models are 4-10x faster while achieving new state-of-the-art 90.88% top-1 accuracy on the well-established ImageNet dataset. We are also releasing the source code and pretrained models on the Google AutoML github.

EfficientNetV2: Smaller Models and Faster Training
EfficientNetV2 is based upon the previous EfficientNet architecture. To improve upon the original, we systematically studied the training speed bottlenecks on modern TPUs/GPUs and found: (1) training with very large image sizes results in higher memory usage and thus is often slower on TPUs/GPUs; (2) the widely used depthwise convolutions are inefficient on TPUs/GPUs, because they exhibit low hardware utilization; and (3) the commonly used uniform compound scaling approach, which scales up every stage of convolutional networks equally, is sub-optimal. To address these issues, we propose both a training-aware neural architecture search (NAS), in which the training speed is included in the optimization goal, and a scaling method that scales different stages in a non-uniform manner.

The training-aware NAS is based on the previous platform-aware NAS, but unlike the original approach, which mostly focuses on inference speed, here we jointly optimize model accuracy, model size, and training speed. We also extend the original search space to include more accelerator-friendly operations, such as FusedMBConv, and simplify the search space by removing unnecessary operations, such as average pooling and max pooling, which are never selected by NAS. The resulting EfficientNetV2 networks achieve improved accuracy over all previous models, while being much faster and up to 6.8x smaller.

To further speed up the training process, we also propose an enhanced method of progressive learning, which gradually changes image size and regularization magnitude during training. Progressive training has been used in image classification, GANs, and language models. This approach focuses on image classification, but unlike previous approaches that often trade accuracy for improved training speed, can slightly improve the accuracy while also significantly reducing training time. The key idea in our improved approach is to adaptively change regularization strength, such as dropout ratio or data augmentation magnitude, according to the image size. For the same network, small image size leads to lower network capacity and thus requires weak regularization; vice versa, a large image size requires stronger regularization to combat overfitting.

Progressive learning for EfficientNetV2. Here we mainly focus on three types of regularizations: data augmentation, mixup, and dropout.

We evaluate the EfficientNetV2 models on ImageNet and a few transfer learning datasets, such as CIFAR-10/100, Flowers, and Cars. On ImageNet, EfficientNetV2 significantly outperforms previous models with about 5–11x faster training speed and up to 6.8x smaller model size, without any drop in accuracy.

EfficientNetV2 achieves much better training efficiency than prior models for ImageNet classification.

CoAtNet: Fast and Accurate Models for Large-Scale Image Recognition
While EfficientNetV2 is still a typical convolutional neural network, recent studies on Vision Transformer (ViT) have shown that attention-based transformer models could perform better than convolutional neural networks on large-scale datasets like JFT-300M. Inspired by this observation, we further expand our study beyond convolutional neural networks with the aim of finding faster and more accurate vision models.

In “CoAtNet: Marrying Convolution and Attention for All Data Sizes”, we systematically study how to combine convolution and self-attention to develop fast and accurate neural networks for large-scale image recognition. Our work is based on an observation that convolution often has better generalization (i.e., the performance gap between training and evaluation) due to its inductive bias, while self-attention tends to have greater capacity (i.e., the ability to fit large-scale training data) thanks to its global receptive field. By combining convolution and self-attention, our hybrid models can achieve both better generalization and greater capacity.

Comparison between convolution, self-attention, and hybrid models. Convolutional models converge faster, ViTs have better capacity, while the hybrid models achieve both faster convergence and better accuracy.

We observe two key insights from our study: (1) depthwise convolution and self-attention can be naturally unified via simple relative attention, and (2) vertically stacking convolution layers and attention layers in a way that considers their capacity and computation required in each stage (resolution) is surprisingly effective in improving generalization, capacity and efficiency. Based on these insights, we have developed a family of hybrid models with both convolution and attention, named CoAtNets (pronounced “coat” nets). The following figure shows the overall CoAtNet network architecture:

Overall CoAtNet architecture. Given an input image with size HxW, we first apply convolutions in the first stem stage (S0) and reduce the size to H/2 x W/2. The size continues to reduce with each stage. Ln refers to the number of layers. Then, the early two stages (S1 and S2) mainly adopt MBConv building blocks consisting of depthwise convolution. The later two stages (S3 and S4) mainly adopt Transformer blocks with relative self-attention. Unlike the previous Transformer blocks in ViT, here we use pooling between stages, similar to Funnel Transformer. Finally, we apply a classification head to generate class prediction.

CoAtNet models consistently outperform ViT models and its variants across a number of datasets, such as ImageNet1K, ImageNet21K, and JFT. When compared to convolutional networks, CoAtNet exhibits comparable performance on a small-scale dataset (ImageNet1K) and achieves substantial gains as the data size increases (e.g. on ImageNet21K and JFT).

Comparison between CoAtNet and previous models after pre-training on the medium sized ImageNet21K dataset. Under the same model size, CoAtNet consistently outperforms both ViT and convolutional models. Noticeably, with only ImageNet21K, CoAtNet is able to match the performance of ViT-H pre-trained on JFT.

We also evaluated CoAtNets on the large-scale JFT dataset. To reach a similar accuracy target, CoAtNet trains about 4x faster than previous ViT models and more importantly, achieves a new state-of-the-art top-1 accuracy on ImageNet of 90.88%.

Comparison between CoAtNets and previous ViTs. ImageNet top-1 accuracy after pre-training on JFT dataset under different training budget. The four best models are trained on JFT-3B with about 3 billion images.

Conclusion and Future Work
In this post, we introduce two families of neural networks, named EfficientNetV2 and CoAtNet, which achieve state-of-the-art performance on image recognition. All EfficientNetV2 models are open sourced and the pretrained models are also available on the TFhub. CoAtNet models will also be open-sourced soon. We hope these new neural networks can benefit the research community and the industry. In the future we plan to further optimize these models and apply them to new tasks, such as zero-shot learning and self-supervised learning, which often require fast models with high capacity.

Acknowledgements
Special thanks to our co-authors Hanxiao Liu and Quoc Le. We also thank the Google Research, Brain Team and the open source contributors.

Categories
Misc

Running tensorflow/etc. inside vms…? Is it workable for performance?

This feels like a profoundly stupid question, and maybe that’s why I’m not finding any answers to it… am new to machine learning.

I’m used to doing development inside VMs, but as I want to benefit from the GPU that’s not really an option here, right? I was thinking maybe I could do it in a Docker container instead (am on Windows) but not sure that’s viable, either. Would either a VM or Docker work for Windows and doing ML? Thanks.

submitted by /u/asking4afriend40631
[visit reddit] [comments]

Categories
Misc

Easier way to use an old TF model with latest TF/Keras?

I have an object detection model (Faster R-CNN sved as a frozen graph) that was trained over two years ago. It requires TF GPU 1.14 and the TF Object Detection API. It’s a bit of a hassle to setup that environment and was wondering if there was a more streamlined way to use that model with the latest version of TF/Keras?

submitted by /u/bc_uk
[visit reddit] [comments]

Categories
Misc

GeForce NOW Members Are Free to Play a Massive Library of Most-Played Games, Included With Membership

Want to play awesome PC games for free without having to buy an expensive gaming rig? This GFN Thursday takes a look at the 90+ free-to-play PC games — including this week’s Fortnite Season 8 release and the Epic Games Store free game of the week, Speed Brawl, free to claim Sept. 16-23 — all Read article >

The post GeForce NOW Members Are Free to Play a Massive Library of Most-Played Games, Included With Membership appeared first on The Official NVIDIA Blog.

Categories
Misc

Getting Tensorflow PrefetchDataset through Kesas TextVectorization layer

I am on tf_nightly-2.7.0 and used tensorflow’s “make_csv_dataset” to make dataset from a TSV file, but it seems the Tensorflow PrefetchDataset doesn’t have shape information. I could have used Pandas dataframe but would like to try Tensorflow’s dataset. Here are codes without the import:

!wget https://cdn.freecodecamp.org/project-data/sms/train-data.tsv train_file_path = "train-data.tsv" train_data = tf.data.experimental.make_csv_dataset(train_file_path, header=False, field_delim='t', column_names=['label', 'text'], batch_size=5, label_name='label', num_epochs=1, ignore_errors=True) examples, labels = next(iter(train_data)) # Just the first batch. print("FEATURES: n", examples, "n") print("LABELS: n", labels) encoder = keras.layers.TextVectorization(max_tokens=None, output_mode='int', output_sequence_length=160) encoder.adapt(train_data) 

Here is how the dataset looks in the print output:

FEATURES: OrderedDict([('text', <tf.Tensor: shape=(5,), dtype=string, numpy= array([b'rt-king pro video club>> need help? info@ringtoneking.co.uk or call 08701237397 you must be 16+ club credits redeemable at www.ringtoneking.co.uk! enjoy!', b'good afternoon sunshine! how dawns that day ? are we refreshed and happy to be alive? do we breathe in the air and smile ? i think of you, my love ... as always', b'they have a thread on the wishlist section of the forums where ppl post nitro requests. start from the last page and collect from the bottom up.', b'no current and food here. i am alone also', b'die... i accidentally deleted e msg i suppose 2 put in e sim archive. haiz... i so sad...'], dtype=object)>)]) LABELS: tf.Tensor([b'spam' b'ham' b'ham' b'ham' b'ham'], shape=(5,), dtype=string) 

Here is the error on line encoder.adapt(train_data) :

AttributeError: 'NoneType' object has no attribute 'ndims 

The desired outcome would be no error message after manipulating the Tensorflow dataset.

Thank you for the help in advance!

submitted by /u/na_haran
[visit reddit] [comments]

Categories
Misc

German language sentiment classification – NLP Deep Learning

I am trying to build a sentiment classification (hate speech) for German language using NLP + Deep Learning. Any code tutorial? I found lots of research papers but few code implementations.

submitted by /u/grid_world
[visit reddit] [comments]

Categories
Misc

Manufacturing the Future of AI with Edge Computing

Image of Jetson AGX XavierRead how the power of AI and edge computing is critical to driving operational efficiencies and productivity gains.Image of Jetson AGX Xavier

Automation and monitoring of industrial assets, systems, processes, and environments are increasingly important across manufacturing industries, including transportation, electronics, mining, and textiles. In order to implement safer and more productive practices, companies are automating their manufacturing processes with IoT sensors. IoT sensors generate vast amounts of data that, when combined with the power of AI, produce valuable insights that manufacturers can use to improve operational efficiency. 

Edge computing allows sensor-enabled devices to collect and process data locally to deliver insights on the factory floor without having to communicate with the cloud. Edge AI enables any device or computer to process data and make AI-led decisions in real time, with minimal latency. This convenience gives rise to new use cases where fast, real-time insights are required, like when scanning for product defects on assembly lines, identifying workplace hazards, flagging machines that require maintenance, and more.

By bringing AI processing tasks closer to the source, edge computing provides many advantages to manufacturers, including:

  • Ultra-Low Latency Processing: In manufacturing scenarios, throughput is critical.  Inspection processes can be a key bottleneck in the overall process. Processing data at the edge saves valuable microseconds as the data does not need to be sent to and from the cloud.
  • Enhanced Security: A manufacturer’s data is key IP. Keeping data within the device compared to sending it through the cloud means that it stays secure and is less vulnerable to attacks or data breaches.
  • Bandwidth Savings: Sending only AI processed smart data to the cloud and processing the remaining high velocity (for example, vibration) and high volume (for example, image and video) data locally on the device lowers data transmission rates and frees up bandwidth, cutting costs.
  • Harnessing OT Domain Knowledge: Empowering OT domain experts to control the data processing AI parameters by leveraging their tacit knowledge enables them to create a highly adaptive and outcome focused agile solution.
  • Robust Infrastructure: Processing data on site through edge devices allows companies to keep their manufacturing processes moving without disruption, even if network outages occur. 

Use Cases of Edge Computing in Manufacturing

Manufacturers globally have started to use AI at the edge to transform their manufacturing processes. The following use cases explore how edge computing is promoting enhanced efficiency and productivity in manufacturing. 

  • Predictive Maintenance: Sensor data can be used to detect anomalies early and predict when a machine will fail. Sensors on equipment scan for flaws and alert management if a machine needs a repair so the issue can be addressed early, avoiding downtime. The combination of sensor data, AI, and edge computing accurately assesses equipment condition and allows the manufacturer to avoid costly unplanned downtime. For example, sensor-equipped video cameras in chemical plants are used to detect corrosion in pipes and alert staff before they can cause any damage.
  • Quality Control: Defect detection is an essential part of the manufacturing process. When running an assembly line where millions of products are made, defects need to be caught in real time. Devices that use edge computing can make decisions in microseconds, catch defects instantly, and alert staff. This capability provides a significant advantage to factories as it can reduce waste and improve manufacturing efficiency.
  • Equipment Effectiveness: Manufacturers are continuously looking to improve processes. When combined with sensor data, edge computing can be used to assess overall equipment effectiveness. For example, in the automotive welding process, manufacturers need to meet many requirements to ensure that their welding is of the highest quality. Using sensor data and edge computing, companies can monitor the in real time, and catch defects or  safety risks before products leave the factory.
  • Yield Optimization: In food production plants, it is critical to know the exact quantity and quality of the ingredients being used in the manufacturing process. By using sensor data, AI, and edge computing, machines can recalibrate instantly if any parameters need to be changed in order to produce better quality products. There is no need for manual supervision, or to send data to a central location for review. The sensors on site are capable of making decisions in real time to improve yields.
  • Factory Floor Optimization: Manufacturers must understand how factory spaces are being used in order to improve processes. For example, in a car manufacturing plant it is inefficient if workers must walk to different locations to complete tasks. Supervisors may be unaware of this bottleneck if the data is not available. Sensors help analyze factory spaces—how are they being used, who is using them and why. Data and critical Edge AI processed information is sent to a central location for a supervisor to review. The supervisor can then make informed optimizations to factory processes.
  • Supply Chain Analytics: There is a growing need for companies to have constant visibility on procurement, production, and inventory management. By automating these processes with AI and edge computing, companies can better predict and manage their supply chain. For example, an electronic manufacturing company with automated  processes can immediately alert other production facilities across the country to generate more of a needed raw material so production is not affected.
  • Worker Safety: Industrial workers often operate heavy machinery and handle hazardous materials at manufacturing sites. Using a network of cameras and sensors equipped with AI-enabled video analytics, manufacturers can identify workers in unsafe conditions and quickly intervene to prevent accidents. Edge computing is critical to worker safety since life-saving decisions need to be made in real time. 

Edge computing will continue to transform the manufacturing industry by bringing about AI-driven operational efficiencies and productivity gains. Download this free e-book to learn how edge computing is helping build smarter and safer spaces around the world.

Categories
Offsites

Revisiting Mask-Head Architectures for Novel Class Instance Segmentation

Instance segmentation is the task of grouping pixels in an image into instances of individual things, and identifying those things with a class label (countable objects such as people, animals, cars, etc., and assigning unique identifiers to each, e.g., car_1 and car_2). As a core computer vision task, it is critical to many downstream applications, such as self-driving cars, robotics, medical imaging, and photo editing. In recent years, deep learning has made significant strides in solving the instance segmentation problem with architectures like Mask R-CNN. However, these methods rely on collecting a large labeled instance segmentation dataset. But unlike bounding box labels, which can be collected in 7 seconds per instance with methods like Extreme clicking, collecting instance segmentation labels (called “masks”) can take up to 80 seconds per instance, an effort that is costly and creates a high barrier to entry for this research. And a related task, pantopic segmentation, requires even more labeled data.

The partially supervised instance segmentation setting, where only a small set of classes are labeled with instance segmentation masks and the remaining (majority of) classes are labeled only with bounding boxes, is an approach that has the potential to reduce the dependence on manually-created mask labels, thereby significantly lowering the barriers to developing an instance segmentation model. However this partially supervised approach also requires a stronger form of model generalization to handle novel classes not seen at training time—e.g., training with only animal masks and then tasking the model to produce accurate instance segmentations for buildings or plants. Further, naïve approaches, such as training a class-agnostic Mask R-CNN, while ignoring mask losses for any instances that don’t have mask labels, have not worked well. For example, on the typical “VOC/Non-VOC” benchmark, where one trains on masks for a subset of 20 classes in COCO (called “seen classes”) and is tested on the remaining 60 classes (called “unseen classes”), a typical Mask R-CNN with Resnet-50 backbone gets to only ~18% mask mAP (mean Average Precision, higher is better) on unseen classes, whereas when fully supervised it can achieve a much higher >34% mask mAP on the same set.

In “The surprising impact of mask-head architecture on novel class segmentation”, to be presented at ICCV 2021, we identify the main culprits for Mask R-CNN’s poor performance on novel classes and propose two easy-to-implement fixes (one training protocol fix, one mask-head architecture fix) that work in tandem to close the gap to fully supervised performance. We show that our approach applies generally to crop-then-segment models, i.e., a Mask R-CNN or Mask R-CNN-like architecture that computes a feature representation of the entire image and then subsequently passes per-instance crops to a second-stage mask prediction network—also called a mask-head network. Putting our findings together, we propose a Mask R-CNN–based model that improves over the current state-of-the-art by a significant 4.7% mask mAP without requiring more complex auxiliary loss functions, offline trained priors, or weight transfer functions proposed by previous work. We have also open sourced the code bases for two versions of the model, called Deep-MAC and Deep-MARC, and published a colab to interactively produce masks like the video demo below.

A demo of our model, DeepMAC, which learns to predict accurate masks, given user specified boxes, even on novel classes that were not seen at training time. Try it yourself in the colab. Image credits: Chris Briggs, Wikipedia and Europeana.

Impact of Cropping Methodology in Partially Supervised Settings
An important step of crop-then-segment models is cropping—Mask R-CNN is trained by cropping a feature map as well as the ground truth mask to a bounding box corresponding to each instance. These cropped features are passed to another neural network (called a mask-head network) that computes a final mask prediction, which is then compared against the ground truth crop in the mask loss function. There are two choices for cropping: (1) cropping directly to the ground truth bounding box of an instance, or (2) cropping to bounding boxes predicted by the model (called, proposals). At test time, cropping is always performed with proposals as ground truth boxes are not assumed to be available.

Cropping to ground truth boxes vs. cropping to proposals predicted by a model during training. Standard Mask R-CNN implementations use both types of crops, but we show that cropping exclusively to ground truth boxes yields significantly stronger performance on novel categories.
We consider a general family of Mask R-CNN–like architectures with one small, but critical difference from typical Mask R-CNN training setups: we crop using ground truth boxes (instead of proposal boxes) at training time.

Typical Mask R-CNN implementations pass both types of crops to the mask head. However, this choice has traditionally been considered an unimportant implementation detail, because it does not affect performance significantly in the fully supervised setting. In contrast, for partially supervised settings, we find that cropping methodology plays a significant role—while cropping exclusively to ground truth boxes during training doesn’t change the results significantly in the fully supervised setting, it has a surprising and dramatic positive impact in the partially supervised setting, performing significantly better on unseen classes.

Performance of Mask R-CNN on unseen classes when trained with either proposals and ground truth (the default) or with only ground truth boxes. Training mask heads with only ground truth boxes yields a significant boost to performance on unseen classes, upwards of 9% mAP. We report performance with the ResNet-101-FPN backbone.

Unlocking the Full Generalization Potential of the Mask Head
Even more surprisingly, the above approach unlocks a novel phenomenon—with cropping-to-ground truth enabled during training, the mask head of Mask R-CNN takes on a disproportionate role in the ability of the model to generalize to unseen classes. As an example, in the following figure, we compare models that all have cropping-to-ground-truth enabled, but different out-of-the-box mask-head architectures on a parking meter, cell phone, and pizza (classes unseen during training).

Mask predictions for unseen classes with four different mask-head architectures (from left to right: ResNet-4, ResNet-12, ResNet-20, Hourglass-20, where the number refers to the number of layers of the neural network). Despite never having seen masks from the ‘parking meter’, ‘pizza’ or ‘mobile phone’ class, the rightmost mask-head architecture can segment these classes correctly. From left to right, we show better mask-head architectures predicting better masks. Moreover, this difference is only apparent when evaluating on unseen classes — if we evaluate on seen classes, all four architectures exhibit similar performance.

Particularly notable is that these differences between mask-head architectures are not as obvious in the fully supervised setting. Incidentally, this may explain why previous works in instance segmentation have almost exclusively used shallow (i.e., low number of layers) mask heads, as there has been no benefit to the added complexity. Below we compare the mask mAP of three different mask-head architectures on seen versus unseen classes. All three models do equally well on the set of seen classes, but the deep hourglass mask heads stand out when applied to unseen classes. We find hourglass mask heads to be the best among the architectures we tried and we use hourglass mask heads with 50 or more layers to get the best results.

Performance of ResNet-4, Hourglass-10 and Hourglass-52 mask-head architectures on seen and unseen classes. There is a significant difference in performance on unseen classes, even though the performance on seen classes barely changes.

Finally, we show that our findings are general, holding for a variety of backbones (e.g., ResNet, SpineNet, Hourglass) and detector architectures including anchor-based and anchor-free detectors and even when there is no detector at all.

Putting It Together
To achieve the best result, we combined the above findings: We trained a Mask R-CNN model with cropping-to-ground-truth enabled and a deep Hourglass-52 mask head with a SpineNet backbone on high resolution images (1280×1280). We call this model Deep-MARC (Deep Mask heads Above RCNN). Without using any offline training or other hand-crafted priors, Deep-MARC exceeds previous state-of-the-art models by > 4.5% (absolute) mask mAP. Demonstrating the general nature of this approach, we also see strong results with a CenterNet-based (as opposed to Mask R-CNN-based) model (called Deep-MAC), which also exceeds the previous state of the art.

Comparison of Deep-MAC and Deep-MARC to other partially supervised instance segmentation approaches like MaskX R-CNN, ShapeMask and CPMask.

Conclusion
We develop instance segmentation models that are able to generalize to classes that were not part of the training set. We highlight the role of two key ingredients that can be applied to any crop-then-segment model (such as Mask R-CNN): (1) cropping-to-ground truth boxes during training, and (2) strong mask-head architectures. While neither of these ingredients have a large impact on the classes for which masks are available during training, employing both leads to significant improvement on novel classes for which masks are not available during training. Moreover, these ingredients are sufficient for achieving state-of-the-art-performance on the partially-supervised COCO benchmark. Finally, our findings are general and may also have implications for related tasks, such as panoptic segmentation and pose estimation.

Acknowledgements
We thank our co-authors Zhichao Lu, Siyang Li, and Vivek Rathod. We thank David Ross and our anonymous ICCV reviewers for their comments which played a big part in improving this research.