submitted by /u/SwapApp [visit reddit] [comments] |
GeForce NOW is always evolving, and so is this week’s GFN Thursday. Biomutant, the new open-world action RPG from Experiment 101 and THQ Nordic, is coming to GeForce NOW when it releases on May 25. Everybody Was Kung Fu Fighting Biomutant puts you in the role of an anthropomorphic rodent with swords, guns and martial Read article >
The post GFN Thursday Set to Evolve as Biomutant Comes to GeForce NOW on May 25 appeared first on The Official NVIDIA Blog.
MintNV, an AI/ML educational exercise that showcases how an adversary can bypass defensive ML mechanisms to compromise a host, is now on the NVIDIA NGC catalog.
Machine Learning (ML) comes in many forms that have evaded the standard tools and techniques of cybersecurity professionals. Attacking ML requires an intersection of knowledge between data science and offensive security to answer the question, “How can this be attacked?” Cybersecurity professionals and data scientists need to hone these new skills to answer this difficult question. NVIDIA wants to inspire the ecosystem to better address this gap.
MintNV, an AI/ML educational exercise that showcases how an adversary can bypass defensive ML mechanisms to compromise a host, is now on the NVIDIA NGC catalog, NVIDIA’s hub of GPU-optimized HPC and AI applications. The MintNV docker container challenges the user to apply an adversarial thought process to ML. Creating MintNV as a vulnerable environment is a step in the right direction for ML, aligning closely with other NVIDIA contributions such as the Adversarial ML Threat Matrix.
MintNV is a bridge between AI/ML researchers and cybersecurity professionals throughout the ML landscape. It enables the offensive security community to practice adversarial ML techniques. We will continue contributing research, tools and training to promote community growth and to inspire more of this kind.
Share this exercise and enjoy learning about various offensive security concepts such as enumeration, networking protocols, and administrative functions as you compromise MintNV. Learning about potential vulnerabilities of a ML system using the MintNV simulation helps ML developers understand how to build more secure solutions.
For more information, please visit MintNV’s NGC page.
NVIDIA would like to thank Will Pearce from Microsoft for providing the guidance necessary to implement Machine Learning elements into this educational exercise.
Happy Hacking!
NVIDIA Product Security Team
About NVIDIA Product Security Team:
NVIDIA takes security seriously and values contributions to secure, safe and unbiased use of Artificial Intelligence and Machine Learning. We will continue to create additional educational opportunities for the community. If you have any questions or feedback, please contact psirt@nvidia.com or tweet us at @NVIDIAPSIRT. See NVIDIA’s Corporate Social Responsibility website and NVIDIA’s 2020 CSR Report for more information.
Hi,
This is a cross post of a stack-overflow query. Sample code is attached there.
Task
- I am trying to implement Weight Standardization in Tensorflow 2.4.
- The goal here is to standardize the weights to mean=0, variance=1, BEFORE using them for convolution.
Methods tried
- I have tried two methods
- Subclassing by inheriting tf.keras.layers.Conv3D
- tf.keras.layers.Wrapper by following the example of Weight Normalization from the tensorflow_addons package
Problem
- In both the cases when I pass an input through the layer, and then check for the trainable weights, I only see the bias, and the kernel disappears. So what have I gotten wrong here? Is it the build() method?
submitted by /u/prerakmody
[visit reddit] [comments]
Tensorflow is awesome
submitted by /u/CrazyHelpful [visit reddit] [comments] |
Machine learning techniques attract a great deal of popular attention: it isn’t difficult to find articles explaining how to extract features or train models. But the end-to-end discovery process that produces a usable machine learning system consists of more than mere techniques. Furthermore, solving business problems with data extends beyond machine learning to encompass exploratory … Continued
Machine learning techniques attract a great deal of popular attention: it isn’t difficult to find articles explaining how to extract features or train models. But the end-to-end discovery process that produces a usable machine learning system consists of more than mere techniques. Furthermore, solving business problems with data extends beyond machine learning to encompass exploratory data science, business analytics, and scalable data processing.
This is the second installment of a series describing an end-to-end blueprint for predicting customer churn. In this article, we show how reporting and exploratory data analysis fit into discovery workflows and machine learning systems. We also explain how the RAPIDS Accelerator for Apache Spark makes it possible to execute these workloads on NVIDIA GPUs — enabling nearly a 700% speedup on the analytics portion of our churn prediction application.
If you haven’t yet read the first installment, in which we described the problem, discovery workflows, and data federation and preparation, check it out first!
Exploratory analysis
The first step in the data science discovery workflow is formalizing the problem we’re trying to solve, which depends on understanding the data and understanding the business. A well-defined problem can help to codify the ways in which our analytics efforts ultimately provide business value (rather than merely achieving excellent model performance metrics). Exploratory analysis can support formalizing the problem, developing these necessary understandings, and more:
- Understanding the business impact of an effective solution is important for prioritizing efforts
- Business context helps practitioners identify finer-grained success criteria.
- Data scientists and business analysts need to define meaningful prediction targets for the models they’ll ultimately be training.
- Better understanding of the data can inspire novel modeling approaches.
- Exploratory analysis can support the data science workflow’s “inner loop” of feature extraction, model training, and validation, and simulation (see Figure 1).
Prioritizing efforts to maximize business impact
Understanding the business impact of an effective solution is important for prioritizing efforts (not to mention that individual data scientists, like other employees, are interested in avenues to demonstrate the quantifiable impact of their work). While data scientists may take great pride in producing robust, general models with minimal prediction error, the business impact — not merely the predictive power of their models — should guide their efforts, and exploratory analysis is an important tool to identify the extent to which improved predictive performance might affect business metrics.
Tightening success criteria with business context
Business context helps practitioners identify finer-grained success criteria. While any organization wants to treat every customer equally in providing excellent service, it may not want to treat the risk of churn for every customer as equally important to the business. We may thus be interested in ranking churn risks by their expected remaining lifetime account value, by the expected cost to acquire a comparable account, or by other metrics.
Defining meaningful prediction targets
Data scientists and business analysts need to define meaningful prediction targets for the models they’ll ultimately be training. On a long enough timeline, every customer will fail to renew their subscription, but a model that asserts “yes, eventually” for every customer isn’t useful or actionable. A model that exclusively identifies that customers who have recently begun the process to close their accounts are likely to churn is similarly dubious. Exploratory analysis can inform a carefully crafted prediction target by enabling data scientists and analysts to simulate the plausibility and impact of various prediction targets on historical data.
Inspiring novel modeling approaches
Better understanding of the data can inspire novel modeling approaches. For example, two customer attributes may not be strongly correlated with churning individually, but their combination may be. Exploratory analysis can thus inform the process of feature engineering.
Supporting discovery workflows with ubiquitous exploratory analysis
Exploratory analysis can support the data science workflow’s “inner loop” of feature extraction, model training and tuning, and validation and simulation in two ways: directly, by providing summary statistics, domains for categorical features, and distributions for numerical features, and indirectly, by enabling data scientists to disregard uninformative features before training a model and thus making the model and the system built around it more robust.
Business analytics and reporting
While exploratory analysis is a valuable part of the early stages of the data science discovery workflow, similar workloads are important in production and can provide insight and value even without training a model. The main difference between these workloads is one of context: while exploratory analysis is generally ad hoc, business analytics workloads are typically run regularly in production. Techniques or queries used in exploratory analysis may inform or even become parts of automated reporting or analytics workloads.
Analytics in churn modeling
In our blueprint application, we’ll incorporate a pair of analytics workloads. The first workload produces a machine-readable summary report that is — along with the single wide table of customer data we described in the previous installment — part of the input to the model training application. The second workload produces a series of reports intended to help analysts and stakeholders better understand the factors that make customers more likely to renew or churn, in order to guide human decisions in the modeling process as well as to inform business decisions and service offerings that incorporate effective renewal incentives. We’ll now examine each of these in turn.
Calculating feature summaries and domains
Part of understanding our data is understanding each feature individually; this is also a prerequisite to effective feature engineering or model training. Basic summary statistics — like minimum, count, mean, median, variance, and so on — are useful but may insufficient to characterize a dataset alone, since radically different datasets may have similar summary statistics. A famous example of this phenomenon is Anscombe’s Quartet (Figure 2), which is four two-variable datasets that have identical means, variances, correlations, and linear relationships but which exhibit obviously different shapes when plotted.
In order to more faithfully characterize our datasets, we’ll need to compute more descriptive summaries of individual features in addition to the basic descriptive statistics. One such summary is the cumulative distribution, which can both inform feature engineering decisions and provide valuable business context. For example, in many cases, it is more useful to know that a given customer’s monthly spend is in the 97th percentile than to know that that customer’s monthly spend is two standard deviations above the mean. Apache Spark supports efficient techniques for calculating approximate quantiles of columns in data frames, which we can use to produce cumulative distributions of these values.
We also produce a report including basic summary statistics, such as mean, minimum, maximum, and variance (for numeric features) and the identities and counts of distinct values (for discrete or categorical columns). These statistics and distributions are useful in themselves but will also make the data scientist’s job easier, since they can inform feature encoding approaches.
Exploratory analysis and reporting in churn modeling
The second component of our analytics workload simulates both exploratory analysis and scheduled reporting by producing two kinds of reports:
- A set of data cubes, or multidimensional spreadsheets, showing the counts of customers with combinations of given attributes that churned or renewed, and
- A collection of rollup reports showing the total lifetime account value of every customer that churned in a given quarter.
The data cube reports enable analysts to quickly drill down on a combination of features and see which are most strongly correlated with renewal or churn. It is worth noting that many real-world analytics pipelines may end at this point, since reports like this can provide actionable insight even without a trained predictive model. Figures 3 and 4 show the value of these reports: by drilling down to plot the interaction of a customer’s contract status (month-to-month, annual, or two-year) and their tenure in quarters, we can clearly establish that most of the customers who churn are relatively new and on month-to-month contracts.
These figures show another advantage of exploratory analysis and reporting: since some of the features we consider in our summaries are numeric (e.g., months of tenure), we can also use data cube reports to investigate the impact of various quantization or bucketing strategies. Instead of seeing how many customers churned at each discrete month of tenure, for example, we might be more interested in abstracting the tenure of our customers by looking at how many quarters, half-years, or years an account had been active. (In Figures 3 and 4, we bucketed customer tenures by quarter.) Identifying how best to abstract numeric data in order to convey the most information to human stakeholders and to downstream analyses alike is another goal of exploratory analysis.
Performance improvements and future work
In our previous installment, we saw how the RAPIDS Accelerator for Apache Spark could execute data federation workloads on NVIDIA GPUs. While some organizations may execute federation pipelines regularly, others may execute federation pipelines only when the pipelines themselves or the source data materially change. Exploratory analytics and reporting are more performance-sensitive, though: exploratory analytics workloads are typically interactive and human time is precious, and reporting workloads may be batch scheduled but run regularly. These workloads are also typically more complex than data federation workloads and thus also more amenable to performance improvement. By executing our analytics workloads on NVIDIA GPUs with the RAPIDS Accelerator for Apache Spark, we were able to achieve speedups of nearly 700% relative to CPU execution.
In future installments of this series, we’ll provide more details on our overall system as well as show how we improved the performance of our federation and analytics workloads. Stay tuned for more!
tf.data.dataset pipeline. Practical part.
submitted by /u/Denis_Vo [visit reddit] [comments] |
submitted by /u/nbortolotti [visit reddit] [comments] |
From real-time ray tracing, to streaming from the cloud, find out more about the breakthroughs that are helping organizations across industries enhance their XR workflows.
NVIDIA technology has powered some of the most stunning extended reality experiences across all industries. This year at GTC, several sessions showcased how the latest advancements are driving the future of XR — and all of these sessions are now available on NVIDIA On-Demand.
From real-time ray tracing, to streaming from the cloud, find out more about the breakthroughs that are helping organizations across industries enhance their XR workflows.
Check out some of the most popular XR sessions you might have missed at GTC ’21 (note: some sessions may require a free NVIDIA Developer Program membership).
Autodesk VRED with NVIDIA CloudXR and Varjo XR3: Unparalleled XR Quality and Data Complexity
See a beautifully detailed car presented by Autodesk VRED, and learn how the car was streamed to a mobile device using NVIDIA CloudXR. This was shown using a single NVIDIA RTX 8000 GPU and a Varjo XR-3 headset.
Look, Mum, No Computer! How the Cloud can Revolutionize VR Experiences (Presented by Google Cloud)
Learn how to connect a VR headset to an instance running in Google Cloud using NVIDIA CloudXR. Explore the advantages and limitations of this solution, and see a number of use cases to test performance.
Making 3D Content Creation Fast and Easy for Creatives by using VR and ML
Check out how NVIDIA CloudXR helps creatives use high-performance software on a lightweight mobile headset. This marks a major milestone on the path to democratizing 3D content creation.
The Technology Empowering Lucid Motors’ Luxury Automotive Purchase Experience
Get an inside look at how Lucid Motors partnered with ZeroLight. The two companies launched a cloud-powered purchase journey for customers interested in exploring, customizing, or buying the new Lucid Air pure-electric luxury vehicle. Learn how this experience reflects the need to provide a digital shopping experience — linking the virtual and physical worlds.
NVIDIA CloudXR and XR Streaming 101
Explore the various streaming approaches and strategies being developed, and dive into the pros and cons vis-à-vis different devices and use cases. Experience the NVIDIA CloudXR SDK, and learn more about how it works and how you can use it.
Collaborative Virtual Workspaces: Pop-Up XR Experiences with NVIDIA CloudXR
See how flexible, no-setup XR experiences can support manufacturing use cases, such as virtual 3P Workshops (Production Preparation Process). Learn how XR can enhance exploring, validating, and confirming designed manufacturing processes against the physical factory layout situation and assets on location, as well as improve workflows for digital designed products, human-centric assembly processes and worker ergonomics.
NVIDIA CloudXR Client for iOS, Creating AR Applications using the CloudXR SDK
NVIDIA CloudXR continues to add additional client device support. With the NVIDIA CloudXR SDK Release 2.1, iOS devices can now use NVIDIA GPUs for advanced AR rendering, inferencing and real-time graphics. This session provided a step-by-step walkthrough of the NVIDIA CloudXR client build, deploying to a device, and testing with advanced real-time visualization tools.
Did you miss GTC? All of the AR/VR sessions are available at no charge on NVIDIA On-Demand.
Learning good visual and vision-language representations is critical to solving computer vision problems — image retrieval, image classification, video understanding — and can enable the development of tools and products that change people’s daily lives. For example, a good vision-language matching model can help users find the most relevant images given a text description or an image input and help tools such as Google Lens find more fine-grained information about an image.
To learn such representations, current state-of-the-art (SotA) visual and vision-language models rely heavily on curated training datasets that require expert knowledge and extensive labels. For vision applications, representations are mostly learned on large-scale datasets with explicit class labels, such as ImageNet, OpenImages, and JFT-300M. For vision-language applications, popular pre-training datasets, such as Conceptual Captions and Visual Genome Dense Captions, all require non-trivial data collection and cleaning steps, limiting the size of datasets and thus hindering the scale of the trained models. In contrast, natural language processing (NLP) models have achieved SotA performance on GLUE and SuperGLUE benchmarks by utilizing large-scale pre-training on raw text without human labels.
In “Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision“, to appear at ICML 2021, we propose bridging this gap with publicly available image alt-text data (written copy that appears in place of an image on a webpage if the image fails to load on a user’s screen) in order to train larger, state-of-the-art vision and vision-language models. To that end, we leverage a noisy dataset of over one billion image and alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. We show that the scale of our corpus can make up for noisy data and leads to SotA representation, and achieves strong performance when transferred to classification tasks such as ImageNet and VTAB. The aligned visual and language representations also set new SotA results on Flickr30K and MS-COCO benchmarks, even when compared with more sophisticated cross-attention models. The representations also enable zero-shot image classification and cross-modality search with complex text and text + image queries.
Creating the Dataset
Alt-texts usually provide a description of what the image is about, but the dataset is “noisy” because some text may be partly or wholly unrelated to its paired image.
Example image-text pairs randomly sampled from the training dataset of ALIGN. One clearly noisy text label is marked in italics. |
In this work, we follow the methodology of constructing the Conceptual Captions dataset to get a version of raw English alt-text data (image and alt-text pairs). While the Conceptual Captions dataset was cleaned by heavy filtering and post-processing, this work scales up visual and vision-language representation learning by relaxing most of the cleaning steps in the original work. Instead, we only apply minimal frequency-based filtering. The result is a much larger but noisier dataset of 1.8B image-text pairs.
ALIGN: A Large-scale ImaGe and Noisy-Text Embedding
For the purpose of building larger and more powerful models easily, we employ a simple dual-encoder architecture that learns to align visual and language representations of the image and text pairs. Image and text encoders are learned via a contrastive loss (formulated as normalized softmax) that pushes the embeddings of matched image-text pairs together while pushing those of non-matched image-text pairs (within the same batch) apart. The large-scale dataset makes it possible for us to scale up the model size to be as large as EfficientNet-L2 (image encoder) and BERT-large (text encoder) trained from scratch. The learned representation can be used for downstream visual and vision-language tasks.
Figure of ImageNet credit to (Krizhevsky et al. 2012) and VTAB figure credit to (Zhai et al. 2019) |
The resulting representation can be used for vision-only or vision-language task transfer. Without any fine-tuning, ALIGN powers cross-modal search – image-to-text search, text-to-image search, and even search with joint image+text queries, examples below.
Evaluating Retrieval and Representation
The learned ALIGN model with BERT-Large and EfficientNet-L2 as text and image encoder backbones achieves SotA performance on multiple image-text retrieval tasks (Flickr30K and MS-COCO) in both zero-shot and fine-tuned settings, as shown below.
Flickr30K (1K test set) R@1 | MS-COCO (5K test set) R@1 | ||||
Setting | Model | image → text | text → image | image → text | text → image |
Zero-shot | ImageBERT | 70.7 | 54.3 | 44.0 | 32.3 |
UNITER | 83.6 | 68.7 | – | – | |
CLIP | 88.0 | 68.7 | 58.4 | 37.8 | |
ALIGN | 88.6 | 75.7 | 58.6 | 45.6 | |
Fine-tuned | GPO | 88.7 | 76.1 | 68.1 | 52.7 |
UNITER | 87.3 | 75.6 | 65.7 | 52.9 | |
ERNIE-ViL | 88.1 | 76.7 | – | – | |
VILLA | 87.9 | 76.3 | – | – | |
Oscar | – | – | 73.5 | 57.5 | |
ALIGN | 95.3 | 84.9 | 77.0 | 59.9 |
Image-text retrieval results (recall@1) on Flickr30K and MS-COCO datasets (both zero-shot and fine-tuned). ALIGN significantly outperforms existing methods including the cross-modality attention models that are too expensive for large-scale retrieval applications. |
ALIGN is also a strong image representation model. Shown below, with frozen features, ALIGN slightly outperforms CLIP and achieves a SotA result of 85.5% top-1 accuracy on ImageNet. With fine-tuning, ALIGN achieves higher accuracy than most generalist models, such as BiT and ViT, and is only worse than Meta Pseudo Labels, which requires deeper interaction between ImageNet training and large-scale unlabeled data.
Model (backbone) | Acc@1 w/ frozen features | Acc@1 | Acc@5 |
WSL (ResNeXt-101 32x48d) | 83.6 | 85.4 | 97.6 |
CLIP (ViT-L/14) | 85.4 | – | – |
BiT (ResNet152 x 4) | – | 87.54 | 98.46 |
NoisyStudent (EfficientNet-L2) | – | 88.4 | 98.7 |
ViT (ViT-H/14) | – | 88.55 | – |
Meta-Pseudo-Labels (EfficientNet-L2) | – | 90.2 | 98.8 |
ALIGN (EfficientNet-L2) | 85.5 | 88.64 | 98.67 |
ImageNet classification results comparison with supervised training (fine-tuning). |
Zero-Shot Image Classification
Traditionally, image classification problems treat each class as independent IDs, and people have to train the classification layers with at least a few shots of labeled data per class. The class names are actually also natural language phrases, so we can naturally extend the image-text retrieval capability of ALIGN for image classification without any training data.
On the ImageNet validation dataset, ALIGN achieves 76.4% top-1 zero-shot accuracy and shows great robustness in different variants of ImageNet with distribution shifts, similar to the concurrent work CLIP. We also use the same text prompt engineering and ensembling as in CLIP.
ImageNet | ImageNet-R | ImageNet-A | ImageNet-V2 | |
CLIP | 76.2 | 88.9 | 77.2 | 70.1 |
ALIGN | 76.4 | 92.2 | 75.8 | 70.1 |
Top-1 accuracy of zero-shot classification on ImageNet and its variants. |
Application in Image Search
To illustrate the quantitative results above, we build a simple image retrieval system with the embeddings trained by ALIGN and show the top 1 text-to-image retrieval results for a handful of text queries from a 160M image pool. ALIGN can retrieve precise images given detailed descriptions of a scene, or fine-grained or instance-level concepts like landmarks and artworks. These examples demonstrate that the ALIGN model can align images and texts with similar semantics, and that ALIGN can generalize to novel complex concepts.
Image retrieval with fine-grained text queries using ALIGN’s embeddings. |
Multimodal (Image+Text) Query for Image Search
A surprising property of word vectors is that word analogies can often be solved with vector arithmetic. A common example, “king – man + woman = queen”. Such linear relationships between image and text embeddings also emerge in ALIGN.
Specifically, given a query image and a text string, we add their ALIGN embeddings together and use it to retrieve relevant images using cosine similarity, as shown below. These examples not only demonstrate the compositionality of ALIGN embeddings across vision and language domains, but also show the feasibility of searching with a multi-modal query. For instance, one could now look for the “Australia” or “Madagascar” equivalence of pandas, or turn a pair of black shoes into identically-looking beige shoes. Also, it is possible to remove objects/attributes from a scene by performing subtraction in the embedding space, shown below.
Image retrieval with image text queries. By adding or subtracting text query embedding, ALIGN retrieves relevant images. |
Social Impact and Future Work
While this work shows promising results from a methodology perspective with a simple data collection method, additional analysis of the data and the resulting model is necessary before the responsible use of the model in practice. For instance, considerations should be made towards the potential for the use of harmful text data in alt-texts to reinforce such harms. With regard to fairness, data balancing efforts may be required to prevent reinforcing stereotypes from the web data. Additional testing and training around sensitive religious or cultural items should be taken to understand and mitigate the impact from possibly mislabeled data.
Further analysis should also be taken to ensure that the demographic distribution of humans and related cultural items, such as clothing, food, and art, do not cause skewed model performance. Analysis and balancing would be required if such models will be used in production.
Conclusion
We have presented a simple method of leveraging large-scale noisy image-text data to scale up visual and vision-language representation learning. The resulting model, ALIGN, is capable of cross-modal retrieval and significantly outperforms SotA models. In visual-only downstream tasks, ALIGN is also comparable to or outperforms SotA models trained with large-scale labeled data.
Acknowledgement
We would like to thank our co-authors in Google Research: Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. This work was also done with invaluable help from other colleagues from Google. We would like to thank Jan Dlabal and Zhe Li for continuous support in training infrastructure, Simon Kornblith for building the zero-shot & robustness model evaluation on ImageNet variants, Xiaohua Zhai for help on conducting VTAB evaluation, Mingxing Tan and Max Moroz for suggestions on EfficientNet training, Aleksei Timofeev for the early idea of multimodal query retrieval, Aaron Michelony and Kaushal Patel for their early work on data generation, and Sergey Ioffe, Jason Baldridge and Krishna Srinivasan for the insightful feedback and discussion.