Categories
Misc

Announcing Latest Nsight Graphics 2021.4 – Download Now

Learn about the latest release of Nsight Graphics 2021.4, an all-in-one graphics debugger and profiler to help game developers get the most out of NVIDIA hardware.

Nsight Graphics 2021.4 is an all-in-one graphics debugger and profiler to help game developers get the most out of NVIDIA hardware. From analyzing API setup, to solving nasty bugs, to providing deep insight into how applications use the GPU for better performance, Nsight Graphics is the ultimate tool.

The latest release is available to download now >>

Key features include:

  • GPU Trace with One-shot capture feature
  • GPU Trace now supports applications that utilize Vulkan-CUDA interop
  • Analysis view for GPU Trace
  • Resizable BAR capabilities

GPU Trace

GPU Trace introduces a new capture type called One-shot. The One-shot capture type supports profiling applications, which do not have a specific frame beginning and ending. This makes it easier to profile and optimize tools that rely on compute workloads—such as generating normal maps or optimizing geometry/LODs. One-shot captures are supported for D3D12 and Vulkan applications using compute or ray tracing features. Ray tracing with DirectML and WinML is also supported.

Figure 1. GPU Trace

Trace Analysis helps identify work regimes with the most potential for performance improvement. Select the “Analyze” button after taking a GPU Trace, and the advanced analysis engine will provide a new report with explanations and suggestions on how to improve GPU utilization. 

Figure 2. Trace Analysis

In March 2021, NVIDIA introduced new Resizable BAR capabilities with Game Ready GeForce drivers. Users with a compatible motherboard and GPU can enable all of the GPU memory to be accessed by the CPU at once. GPU Trace also reveals if BAR memory transfers are happening efficiently. View more information >> 

Figure 3. Resizable BAR

Using VK_NV_cuda_kernel_launch, it is now possible to launch CUDA kernels from a Vulkan graphics application without the overhead of the context switch. GPU Trace now supports this capability.

Figure 4. CUDA kernels

C++ Captures

When working with C++ Captures, it can be useful to open up an integrated development environment with a project that allows for code browsing or modification. In this release, the added button in the C++ Capture document opens up a Visual Studio environment with the associated project, taking advantage of Visual Studio’s native CMake support

Figure 5. Added C++ Capture button

Read the Nsight Graphics 2021.4 release notes >>
Check out the GDC session on DevTools for Harnessing Ray Tracing in Games >>

Please continue to use the integrated feedback button that lets you send comments, feature requests, and bugs directly. You can send feedback anonymously, or provide an email, for follow up. 

Just click on the little speech bubble at the top right of the window. 

Figure 5. Feedback form

Resources

Categories
Misc

New Machine Learning Model Taps into the Problem-Solving Potential of Satellite Data

New research creates a low-cost and easy-to-use machine learning model to analyze streams of data from earth-imaging satellites.

New research from a group of scientists at UC Berkeley is giving data-poor regions across the globe the power to analyze data-rich satellite imagery. The study, published in Nature Communications, develops a machine learning model resource-constrained organizations and researchers can use to draw out regional socioeconomic and environmental information. Being able to evaluate local resources remotely could help guide effective interventions and benefit communities globally. 

“We saw that many researchers—ourselves included—were passing up on this valuable data source because of the complexities and upfront costs associated with building computer vision pipelines to translate raw pixel values into useful information. We thought that there might be a way to make this information more accessible while maintaining the predictive skill offered by state-of-the-art approaches. So, we set about constructing a way to do this,” said coauthor Ian Bolliger, who worked on the study while pursuing a PhD in Energy and Resources at UC Berkeley.

At any given time, hundreds of image-collecting satellites circle the earth, sending massive amounts of information to databases daily. This data holds valuable insight into global challenges, including health, economic, and environmental conditions—even offering a look into data-poor and remote regions.

Combining satellite imagery with machine learning (SIML) has become an effective tool for turning these raw data streams into usable information. Researchers have used SIML on a broad-range of studies, from calculating poverty rates, to water availability, to educational access. However, most SIML projects capture information on a narrow topic, creating data tailored to a specific study and location. 

The researchers sought to create an accessible system capable of analyzing and organizing satellite images from multiple sources while lowering compute requirements. The tool they created, called the Multi-Task Observation using Satellite Imagery & Kitchen Sinks (MOSAIKS), does this by using a relatively simpler and more efficient unsupervised machine learning algorithm. 

“We designed MOSAIKS keeping in mind that a single satellite image simultaneously holds information about many different prediction variables (like forest cover or population density.) We chose to use an unsupervised embedding of the imagery to create a statistical summary of each image. The unsupervised nature of the featurization step makes the learning and prediction steps of the pipeline very fast, while the specifics of how those features are computed from imagery are well suited to satellite image data,” said coauthor Esther Rolf, a Ph.D. student in computer science at Berkeley.

To develop the model, the researchers used CUDA-accelerated NVIDIA V100 Tensor Core GPUs on AWS. The publicly available CodeOcean capsule, which provides code, compute, and storage, for anyone to interactively run, uses NVIDIA GPUs.

Figure 1. Training data (left) and predictions using a single featurization of daytime imagery (right). Insets (far right) marked by black squares in global maps. Training sample is a uniform random sampling of 1,000,000 land grid cells, 498,063 for which imagery were available and could be matched to task labels. 

“We want policymakers in resource-constrained settings and without specialized computational expertise to be able to painlessly gather satellite imagery, build a model of a variable they care about (say, the presence of adequate sanitation systems), and test whether this model is actually performing well. If they can do this, it will dramatically improve the usefulness of this information in implementing policy objectives,” Bolliger said.

Currently the team is developing and testing a public-facing web interface tool, making it easy for people to query for MOSAIKS features in user-specified locations. The researchers encourage interested researchers to sign up for the beta version.


Read the full article in Nature Communications >>
Read more >>   

Categories
Offsites

Discovering Anomalous Data with Self-Supervised Learning

Anomaly detection (sometimes called outlier detection or out-of-distribution detection) is one of the most common machine learning applications across many domains, from defect detection in manufacturing to fraudulent transaction detection in finance. It is most often used when it is easy to collect a large amount of known-normal examples but where anomalous data is rare and difficult to find. As such, one-class classification, such as one-class support vector machine (OC-SVM) or support vector data description (SVDD), is particularly relevant to anomaly detection because it assumes the training data are all normal examples, and aims to identify whether an example belongs to the same distribution as the training data. Unfortunately, these classical algorithms do not benefit from the representation learning that makes machine learning so powerful. On the other hand, substantial progress has been made in learning visual representations from unlabeled data via self-supervised learning, including rotation prediction and contrastive learning. As such, combining one-class classifiers with these recent successes in deep representation learning is an under-explored opportunity for the detection of anomalous data.

In “Learning and Evaluating Representations for Deep One-class Classification”, presented at ICLR 2021, we outline a 2-stage framework that makes use of recent progress on self-supervised representation learning and classic one-class algorithms. The algorithm is simple to train and results in state-of-the-art performance on various benchmarks, including CIFAR, f-MNIST, Cat vs Dog and CelebA. We then follow up on this in “CutPaste: Self-Supervised Learning for Anomaly Detection and Localization”, presented at CVPR 2021, in which we propose a new representation learning algorithm under the same framework for a realistic industrial defect detection problem. The framework achieves a new state-of-the-art on the MVTec benchmark.

A Two-Stage Framework for Deep One-Class Classification
While end-to-end learning has demonstrated success in many machine learning problems, including deep learning algorithm designs, such an approach for deep one-class classifiers often suffer from degeneration in which the model outputs the same results regardless of the input.

To combat this, we apply a two stage framework. In the first stage, the model learns deep representations with self-supervision. In the second stage, we adopt one-class classification algorithms, such as OC-SVM or kernel density estimator, using the learned representations from the first stage. This 2-stage algorithm is not only robust to degeneration, but also enables one to build more accurate one-class classifiers. Furthermore, the framework is not limited to specific representation learning and one-class classification algorithms — that is, one can easily plug-and-play different algorithms, which is useful if any advanced approaches are developed.

A deep neural network is trained to generate the representations of input images via self-supervision. We then train one-class classifiers on the learned representations.

Semantic Anomaly Detection
We test the efficacy of our 2-stage framework for anomaly detection by experimenting with two representative self-supervised representation learning algorithms, rotation prediction and contrastive learning.

Rotation prediction refers to a model’s ability to predict the rotated angles of an input image. Due to its promising performance in other computer vision applications, the end-to-end trained rotation prediction network has been widely adopted for one-class classification research. The existing approach typically reuses the built-in rotation prediction classifier for learning representations to conduct anomaly detection, which is suboptimal because those built-in classifiers are not trained for one-class classification.

In contrastive learning, a model learns to pull together representations from transformed versions of the same image, while pushing representations of different images away. During training, as images are drawn from the dataset, each is transformed twice with simple augmentations (e.g., random cropping or color changing). We minimize the distance of the representations from the same image to encourage consistency and maximize the distance between different images. However, usual contrastive learning converges to a solution where all the representations of normal examples are uniformly spread out on a sphere. This is problematic because most of the one-class algorithms determine the outliers by checking the proximity of a tested example to the normal training examples, but when all the normal examples are uniformly distributed in an entire space, outliers will always appear close to some normal examples.

To resolve this, we propose distribution augmentation (DA) for one-class contrastive learning. The idea is that instead of learning representations from the training data only, the model learns from the union of the training data plus augmented training examples, where the augmented examples are considered to be different from the original training data. We employ geometric transformations, such as rotation or horizontal flip, for distribution augmentation. With DA, the training data is no longer uniformly distributed in the representation space because some areas are occupied by the augmented data.

Left: Illustrated examples of perfect uniformity from the standard contrastive learning. Right: The reduced uniformity by the proposed distribution augmentation (DA), where the augmented data occupy the space to avoid the uniform distribution of the inlier examples (blue) throughout the whole sphere.

We evaluate the performance of one-class classification in terms of the area under receiver operating characteristic curve (AUC) on the commonly used datasets in computer vision, including CIFAR10 and CIFAR-100, Fashion MNIST, and Cat vs Dog. Images from one class are given as inliers and those from remaining classes are given as outliers. For example, we see how well cat images are detected as anomalies when dog images are inliers.

   CIFAR-10       CIFAR-100       f-MNIST       Cat v.s. Dog   
Ruff et al. (2018) 64.8
Golan and El-Yaniv (2018) 86.0 78.7 93.5 88.8
Bergman and Hoshen (2020) 88.2 94.1
Hendrycks et al. (2019) 90.1
Huang et al. (2019) 86.6 78.8 93.9
2-stage framework: rotation prediction    91.3±0.3 84.1±0.6 95.8±0.3 86.4±0.6
2-stage framework: contrastive (DA) 92.5±0.6 86.5±0.7 94.8±0.3 89.6±0.5
Performance comparison of one-class classification methods. Values are the mean AUCs and their standard deviation over 5 runs. AUC ranges from 0 to 100, where 100 is perfect detection.

Given the suboptimal built-in rotation prediction classifiers typically used for rotation prediction approaches, it’s notable that simply replacing the built-in rotation classifier used in the first stage for learning representations with a one-class classifier at the second stage of the proposed framework significantly boosts the performance, from 86 to 91.3 AUC. More generally, the 2-stage framework achieves state-of-the-art performance on all of the above benchmarks.

With classic OC-SVM, which learns the area boundary of representations of normal examples, the 2-stage framework results in higher performance than existing works as measured by image-level AUC.

Texture Anomaly Detection for Industrial Defect Detection
In many real-world applications of anomaly detection, the anomaly is often defined by localized defects instead of entirely different semantics (i.e., being different in general). For example, the detection of texture anomalies is useful for detecting various kinds of industrial defects.

The examples of semantic anomaly detection and defect detection. In semantic anomaly detection, the inlier and outlier are different in general, (e.g., one is a dog, the other a cat). In defect detection, the semantics for inlier and outlier are the same (e.g., they are both tiles), but the outlier has a local anomaly.

While learning representations with rotation prediction and distribution-augmented contrastive learning have demonstrated state-of-the-art performance on semantic anomaly detection, those algorithms do not perform well on texture anomaly detection. Instead, we explored different representation learning algorithms that better fit the application.

In our second paper, we propose a new self-supervised learning algorithm for texture anomaly detection. The overall anomaly detection follows the 2-stage framework, but the first stage, in which the model learns deep image representations, is specifically trained to predict whether the image is augmented via a simple CutPaste data augmentation. The idea of CutPaste augmentation is simple — a given image is augmented by randomly cutting a local patch and pasting it back to a different location of the same image. Learning to distinguish normal examples from CutPaste-augmented examples encourages representations to be sensitive to local irregularity of an image.

The illustration of learning representations by predicting CutPaste augmentations. Given an example, the CutPaste augmentation crops a local patch, then pasties it to a randomly selected area of the same image. We then train a binary classifier to distinguish the original image and the CutPaste augmented image.

We use MVTec, a real-world defect detection dataset with 15 object categories, to evaluate the approach above.

  DOCC
(Ruff et al., 2020)  
  U-Student
(Bergmann et al., 2020)  
  Rotation Prediction     Contrastive (DA)     CutPaste  
87.9 92.5 86.3 86.5 95.2
Image-level anomaly detection performance (in AUC) on the MVTec benchmark.

Besides image-level anomaly detection, we use the CutPaste method to locate where the anomaly is, i.e., “patch-level” anomaly detection. We aggregate the patch anomaly scores via upsampling with Gaussian smoothing and visualize them in heatmaps that show where the anomaly is. Interestingly, this provides decently improved localization of anomalies. The below table shows the pixel-level AUC for localization evaluation.

  Autoencoder
(Bergmann et al., 2019)  
  FCDD
(Ruff et al., 2020)  
  Rotation Prediction     Contrastive (DA)     CutPaste  
86.0 92.0 93.0 90.4 96.0
Pixel-level anomaly localization performance (in AUC) comparison between different algorithms on the MVTec benchmark.

Conclusion
In this work we introduce a novel 2-stage deep one-class classification framework and emphasize the importance of decoupling building classifiers from learning representations so that the classifier can be consistent with the target task, one-class classification. Moreover, this approach permits applications of various self-supervised representation learning methods, attaining state-of-the-art performance on various applications of visual one-class classification from semantic anomaly to texture defect detection. We are extending our efforts to build more realistic anomaly detection methods under the scenario where training data is truly unlabeled.

Acknowledgements
We gratefully acknowledge the contribution from other co-authors, including Jinsung Yoon, Minho Jin and Tomas Pfister. We release the code in our GitHub repository.

Categories
Misc

Can NOT fit LSTM layer with default parameters?

I have a problem using LSTM from keras. When I try to train the model, the training stops at “Epoch 1/50” and never progresses. The program just stops with a Process finished code and shows no error messages regarding the missing training.

The problem only occurs when I try to use LSTM’s default parameters. So if I for example give a new activation argument with such as “relu” then it works fine?

It seems to be on my local computer the problem pertains as the code can run on Colab with and without default parameters, but for some reason I can not be allowed to train the model at all. This is especially frustrating as there are no error messages displayed.

I really hope there is a skilled person who can help or guide me in the right direction with this problem.

Thanks 🙂

import tensorflow as tf

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Dropout, LSTM, InputLayer

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train / 255

y_train = y_train / 255

model = Sequential()

model.add(InputLayer((x_train.shape[1:])))

model.add(LSTM(32)) # <— the problem occurs here

model.add(Dense(10, activation=’softmax’))

opt = tf.keras.optimizers.Adam(learning_rate=1e-3, decay=1e-5)

model.compile( loss=’sparse_categorical_crossentropy’, optimizer=opt, metrics=[‘accuracy’] )

model.fit(x_train, y_train, epochs=50, validation_data=(x_test, y_test), verbose=’auto’)

print(“Model is trained.”)

The output from my console:

2021-08-27 10:20:22.799484: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-08-27 10:20:26.443509: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2021-08-27 10:20:26.488972: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5 coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s 2021-08-27 10:20:26.489580: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2021-08-27 10:20:26.532650: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2021-08-27 10:20:26.533109: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2021-08-27 10:20:26.559639: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2021-08-27 10:20:26.565224: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2021-08-27 10:20:26.637251: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2021-08-27 10:20:26.660612: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2021-08-27 10:20:26.661815: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2021-08-27 10:20:26.662201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-08-27 10:20:26.662770: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-27 10:20:26.664656: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 2060 computeCapability: 7.5 coreClock: 1.2GHz coreCount: 30 deviceMemorySize: 6.00GiB deviceMemoryBandwidth: 245.91GiB/s 2021-08-27 10:20:26.665660: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2021-08-27 10:20:27.284225: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-08-27 10:20:27.284551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2021-08-27 10:20:27.284738: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2021-08-27 10:20:27.285152: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3961 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5) 2021-08-27 10:20:28.198186: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2) Epoch 1/50 2021-08-27 10:20:29.513381: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll Process finished with exit code -1073740791 (0xC0000409) 

  • TensorFlow version: 2.5.0
  • Python version: 3.9.6

submitted by /u/j_whoooo
[visit reddit] [comments]

Categories
Misc

[Question] Anyone interested in a AI Computer Vision Drone Workshop?

Would you be interested in building a Drone from scratch and programming it using Python, Tensorflow and OpenCV?

submitted by /u/NickFortez06
[visit reddit] [comments]

Categories
Misc

Down to a Science: How Johnson & Johnson Boosts Its Business With MLOps

Healthcare giant Johnson & Johnson is injecting data science across its business to improve its manufacturing, clinical trial enrollment, forecasting and more. “I actually like to call it decision science,” said Jim Swanson, the company’s executive vice president and enterprise chief information officer, in a panel discussion at the most recent NVIDIA GPU Technology Conference. Read article >

The post Down to a Science: How Johnson & Johnson Boosts Its Business With MLOps appeared first on The Official NVIDIA Blog.

Categories
Misc

RAPIDS Accelerator for Apache Spark Release v21.08

NVIDIA Decision Support (NDS) is our adaptation of an industry-standard data science benchmark often used in the Apache Spark community. NDS consists of the same 105 SQL queries as the industry standard benchmark TPC-DS, but has modified parts for dataset generation and execution scripts.

Introduction

The August release (21.08) of RAPIDS Accelerator for Apache Spark is now available. It has been a little over a year since the first release at NVIDIA GTC 2020. We have improved in so many ways, particularly in terms of ease-of-use with minimal to no-code change for Apache Spark applications. This last year, the team has been focused on adding both functionality and continuously improving performance. As a testament to that, we periodically measure performance and functionality over time with the NVIDIA Data Science (NDS) benchmark at a scale factor of 3,000 (3 TB uncompressed). In this release, apart from adding new features, we are extremely proud to make progress on improving end-to-end speed for all passing queries and lowering the total cost of ownership for NVIDIA EGX servers.

Benchmark updates

NVIDIA Decision Support (NDS) is our adaptation of an industry-standard data science benchmark often used in the Apache Spark community. NDS consists of the same 105 SQL queries as the industry standard benchmark TPC-DS, but has modified parts for dataset generation and execution scripts. In our GTC 2021 update, we had 95 queries passing. With the 21.08 release, with new features such as out-of-core group by, window rank, and dense_rank, we have enabled all of the 105 queries to run on the GPU.

Benchmark setup

  • Scale Factor — 3K (3TB Dataset with floats)
  • Systems: 4x NVIDIA Certified EGX Server
  • EGX Server Hardware Spec: 4-node Dell R740xd, each with (2) 24-core CPUs, 512GB RAM, HDFS on NVMe, (1) CX-6 Dx 25/100Gb NIC, 2x NVIDIA A30 GPU
  • CPU Hardware Spec: 4-node Dell R740xd, each with (2) 24-core CPUs, 512GB RAM, HDFS on NVMe, (1) CX-6 Dx 25/100Gb NIC
  • Software: RAPIDS Accelerator v21.08.0, cuDF 21.08.0, Apache Spark 3.1.1, UCX 1.10.1

Results summary

A bar chart showing GPU speed-up for each NDS query compared to a CPU cluster. The chart also overlays the line chart showing 1. total cost of ownership, which is 1.29 times CPU cluster costs for the GPU cluster used for this benchmark 2. Average speed-up across all queries which varies from 0.2x to 18x in this chart.
Figure 1: NDS Queries Speed-up on EGX Servers: GPU vs CPU.

Based on this release, we are excited to show that all the 105 queries can now run without any code change on the GPU.

  • The benchmark servers used for these benchmarks cost little under $170,000 for four servers without GPUs and $220,000 to include one NVIDIA A100 GPU in each server.
  • In simple terms, benchmark GPU servers would cost 1.29 times CPU servers.
  • As shown by the chart above (figure 1), more than 95 queries are now 1.29x faster and thereby cheaper to run on GPU.
  • Some of the queries that are slower on GPU are currently being addressed and we are relentlessly working to improve those queries as well as improve the overall speed-ups.
  • Users can easily deduce that GPU speed-up varies from 1x to 18x and therefore it’s suggested that users qualify the right use cases for GPUs.
  • The Qualification Tool would be a handy asset if users are unsure about the right use case for GPU. For more information about the Qualification Tool, refer to the section below.

Profiling & qualification tool

The Profiling & Qualification tool, released in 21.06, saw positive feedback from the user community as well as requests for new features. In 21.08 the qualification tool now has the ability to handle event logs generated by Apache Spark 2.x versions. The tool will also support event logs generated by AWS EMR 6.3.0, Google Dataproc 2.0, Microsoft Azure Synapse, and the Databricks 7.3 and 8.2 runtimes. The qualification tool will no longer require a Spark runtime. Users can now use the qualification tool with just Apache Spark 3.x jars on their machine. The latest version also has new filtering capabilities to choose event logs. The tool also looks for read data formats and types that the plugin doesn’t support and removes these from the score (based on the total task time in SQL Dataframe operations). The output will be reported in a concise format on the terminal and a detailed analysis of each of the processed event logs will be stored as a csv output.

New functionality

This release adds more functionality for arrays and structs. We can now do a union on multi-level struct data types and can also write array data types in Parquet format. We have added rank and dense_rank window functions to the existing lead, lag and row_number functionality. With this added functionality, the RAPIDS Accelerator can now support the most commonly used window operators in SQL. For the timestamp operators, we have added support for LEGACY timestamps. With this functionality, users can read legacy timestamp formats supported in Spark 2.0. For Databricks users, we have added the ability to cache data in GPU (this was already supported for all other platforms).

We continue to make the user experience better with the ability to handle datasets that spill out of GPU memory for group by and windowing operations. This improvement will save users time creating partitions to avoid out-of-memory errors on the GPU. Similarly, the adoption of UCX 1.11 has improved error handling for RAPIDS Spark Accelerated Shuffle Manager.

Growing community

Join us for “Accelerate Data Pipelines with NVIDIA RAPIDS Accelerator for Spark” to learn how Informatica is removing the barrier to using GPUs, unlocking dramatic performance improvements in operationalizing machine learning projects at scale. You can read more about this online seminar and register here.

Coming Soon

As we noted in the last release, we moved to CalVer and a bi-monthly release cadence since the last release (21.06). The upcoming versions will add expanded support for additional decimal types and continue to add more nested data type support for multi-level struct and maps. In addition, lookout for micro-benchmarks with code-samples and notebooks that will highlight operations best suited for GPUs. We want to hear from you, the users. Reach out to us on GitHub and let us know how we can continue to improve your experience using RAPIDS Spark.

Categories
Misc

Streaming in September: GFN Thursday Welcomes 16 Day-and-Date Game Launches, Including ‘Life Is Strange: True Colors’

GFN Thursday is here to wake you up when September begins because there are a bunch of awesome day-and-date launch games coming to GeForce NOW this month. September brings 16 new day-and-date games to the cloud — including the anticipated Life is Strange: True Colors. They’re part of the 34 games being added throughout the Read article >

The post Streaming in September: GFN Thursday Welcomes 16 Day-and-Date Game Launches, Including ‘Life Is Strange: True Colors’ appeared first on The Official NVIDIA Blog.

Categories
Misc

PyCharm Installation

Has anyone using PyCharm found any luck installing the Tensorflow library? What guides did you follow? How can I approach installing and using the library? The traditional package installation via PyCharm doesn’t seem to work for me unfortunately. I’m using Python 3.8, trying to install versions 2.6.0

Thank you!

submitted by /u/JonesJohnson3000
[visit reddit] [comments]

Categories
Misc

Pyinstaller to turn .py (that uses tensorflow object detection) into .exe?

Is this possible? I wrote a python script that uses tensorflow object detection and I got everything working correctly but now I want to turn this into a .exe so that I can bring it on a flash drive easily and show it to friends.

submitted by /u/Simshaffer
[visit reddit] [comments]