DataBloom - Part 342

Misc

Deploying XR Applications in Private Networks on a Server Platform

Post author By
Post date March 23, 2022
No Comments on Deploying XR Applications in Private Networks on a Server Platform

Learn how servers can be built to deliver immersive workloads and take advantage of compute power to combine streaming XR applications with AI and other computing functions.

The current distribution of extended reality (XR) experiences is limited to desktop setups and local workstations, which contain the high-end GPUs necessary to meet computing requirements. For XR solutions to scale past their currently limited user base and support higher-end functionality such as AI services integration and on-demand collaboration, we need a purpose-built platform.

NVIDIA Project Aurora is a hardware and software platform that simplifies the deployment of enterprise XR applications onto corporate on-premises networks. This platform is also designed to support the integration of AI services within XR workloads.

Project Aurora leverages the XR streaming backbone of NVIDIA CloudXR and NVIDIA RTX Virtual Workstation (RTX vWS), bringing the horsepower of NVIDIA RTX A6000 and NVIDIA A40 to the edge to stream rich, real-time graphics from a machine room over a private 5G network.

Opportunities ahead

For XR developers, Project Aurora’s streamlined delivery of NVIDIA CloudXR streaming dramatically broadens the customer base from users tethered to high-power workstations to anyone with a simple headset or handheld device.

Users no longer need to leave their work areas to use a dedicated, attended, tethered “VR room” setup to experience high-power, high-quality immersion. Collaborators around the world, using their own separate local on-prem networks, can access and alter the same virtual environments at the same time.

Current use cases

Powerful XR use cases exist across multiple industries, allowing you to optimize workflows.

Medical professionals can explore immersive representations of anatomical models or even real patient data to train and work.
Manufacturing designers and engineers can shorten project lifecycles by leveraging digital twins of parts, assemblies, or entire manufacturing floors and plants.
Those same personnel can train in a virtual manufacturing environment without the need of physical resources or floor downtime.

Woman using a VR headset in an empty room. — *Figure 1. Using a virtual experience through a reality headset*

In collaboration with an ever-growing list of partners, NVIDIA has shown the value of moving these use cases into Project Aurora for virtualized distribution of XR over high performance networks.

BT and Ericsson

BT and Ericsson deployed a VR digital twin manufacturing solution on a 5G mobile private network using the world’s first 5G-enabled VR headset powered by the Qualcomm Snapdragon XR2 Platform. The experience runs on Masters of Pie’s Radical SDK, enabling cloud-based virtual reality within computer-aided design (CAD) software.

By seamlessly integrating the high-performance edge rendering provided by the Project Aurora platform, existing factory operations have benefited from high-fidelity VR experience that is available on the manufacturing floor.

EE 5G

Using augmented reality, The Green Planet AR Experience, powered by EE 5G, takes guests on an immersive journey into the secret kingdom of plants. Visitors travel through changing seasons on six digitally enhanced worlds, including rainforests, deserts, freshwater, and saltwater. The worlds are all powered by an Ericsson 5G standalone private network. Audiences engage and interact with the plant life by using a handheld mobile device, which acts as a window into the natural world.

The compute power of a handheld device on its own would not be nearly enough to render these volumetric models under normal conditions, but with Project Aurora, the experience is delivered seamlessly.

AT&T

AT&T recently teamed with Warner Bros., Ericsson, Qualcomm, Dreamscape, NVIDIA, and Wevr on an immersive, location-based VR experience, Chaos at Hogwarts. This proof-of-concept offers a peek into how Project Aurora combined with 5G can enhance future user-generated experiences.

By using the high-bandwidth and low-latency characteristics of 5G paired with Project Aurora’s XR infrastructure, we can change today’s architecture to one that is more comfortable for fans, more productive for creators, and more profitable for venue operators.

Project Aurora components

Project Aurora is an EGX certified server powered by a GPU/CPU configuration optimized for XR with a virtualization system using RTX vWS. The build is available through multiple OEMs (HPE, Dell) and supports multiple orchestration tools, including Linux KVM and VMWare vSphere.

NVIDIA CloudXR is layered on this virtualized workstation server to form the Project Aurora base platform. Although it is built specifically to stream XR applications, the Project Aurora server also supports any graphics-based workloads natively supported by RTX vWS.

A “sandwich” diagram of the components of Project Aurora. Each layer is built on the layer below it, starting with the EGX server platform. — *Figure 2. Project Aurora architecture*

The Project Aurora hardware and software stack is a scalable design built with the help of multiple partners. The base unit of an Aurora build consists of four highly optimized servers from Dell or HPE that support multiple NVIDIA A40 GPUs with RTX Virtual Workstation software and high-performance, low-latency NVIDIA networking components.

Network traffic has been carefully separated. It runs across multiple NVIDIA ConnectX network interface cards (NICs) and can be broken down into the following main areas:

Internal
External
Storage

These traffic flows are distributed across the three NICs to provide performance, security, and redundancy in each area in the event of any NIC failure. Project Aurora is designed with a “performance first” approach, and to scale from the base level to any size client deployment.

Project Aurora partners

Project Aurora is more than just a scalable hardware and software platform. To make the Project Aurora platform easy for any IT crew to implement, NVIDIA has partnered with NVIDIA Partner Network (NPN) integration experts to assure that delivery and installation is simple, repeatable, and fully “white-glove.” Our sales distribution channel was built with the following partners:

The Grid Factory: Immersive technology integrator specializing in NVIDIA-based technologies, such as NVIDIA CloudXR, vGPU, Omniverse, and the EGX Server architecture.
Enterprise Integration: Value-added reseller coordinating between clients and various partners, including OEMs such as Dell and HPE and distributors such as Arrow.
NVIDIA Pro Services (NVPS): Delivers white-glove service to facilitate the standup of durable XR experiences.

Get started with Project Aurora

With the support of the NPN partners, to activate Project Aurora’s white-glove service, just contact one of our team members. The Project Aurora team works from application sizing and site survey through the ‘last-mile’ integration steps (authentication, profiling, server/network setup, tuning parameters, etc.) to achieve a successful XR distribution system.

Sales distribution partner logos: The Grid Factory, Enterprise Integrations, Dell, Arrow, and Hewlett Packard Enterprise — *Figure 3. Project Aurora partner network*

Project Aurora customers are actively increasing towards hundreds, even thousands of users. If you are interested in learning more about Project Aurora or would like to get involved with our distribution channel to evaluate a new use case, contact Aurora_Outreach@nvidia.com.

Misc

Boosting Application Performance with GPU Memory Prefetching

Post author By
Post date March 23, 2022
No Comments on Boosting Application Performance with GPU Memory Prefetching

This CUDA post examines the effectiveness of methods to hide memory latency using explicit prefetching.

NVIDIA GPUs have enormous compute power and typically must be fed data at high speed to deploy that power. That is possible, in principle, because GPUs also have high memory bandwidth, but sometimes they need your help to saturate that bandwidth.

In this post, we examine one specific method to accomplish that: prefetching. We explain the circumstances under which prefetching can be expected to work well, and how to find out whether these circumstances apply to your workload.

Context

NVIDIA GPUs derive their power from massive parallelism. Many warps of 32 threads can be placed on a streaming multiprocessor (SM), awaiting their turn to execute. When one warp is stalled for whatever reason, the warp scheduler switches to another with zero overhead, making sure the SM always has work to do.

On the high-performance NVIDIA Ampere Architecture A100 GPU, up to 64 active warps can share an SM, each with its own resources. On top of that, A100 has 108 SMs that can all execute warp instructions simultaneously.

Most instructions must operate on data, and that data almost always originates in the device memory (DRAM) attached to the GPU. One of the main reasons why even the abundance of warps on an SM can run out of work is because they are waiting for data to arrive from memory.

If this happens, and the bandwidth to memory is not fully utilized, it may be possible to reorganize the program to improve memory access and reduce warp stalls, which in turn makes the program complete faster. This is called latency hiding.

Prefetching

A technology commonly supported in hardware on CPUs is called prefetching. The CPU sees a stream of requests from memory arriving, figures out the pattern, and starts fetching data before it is actually needed. While that data travels to the execution units of the CPU, other instructions can be executed, effectively hiding the travel costs (memory latency).

Prefetching is a useful technique but expensive in terms of silicon area on the chip. These costs would be even higher, relatively speaking, on a GPU, which has many more execution units than the CPU. Instead, the GPU uses excess warps to hide memory latency. When that is not enough, you may employ prefetching in software. It follows the same principle as hardware-supported prefetching but requires explicit instructions to fetch the data.

To determine if this technique can help your program run faster, use a GPU profiling tool such as NVIDIA Nsight Compute to check the following:

Confirm that not all memory bandwidth is being used.
Confirm the main reason warps are blocked is Stall Long Scoreboard, which means that the SMs are waiting for data from DRAM.
Confirm that these stalls are concentrated in sizeable loops whose iterations do not depend on each other.

Unrolling

Consider the simplest possible optimization of such a loop, called unrolling. If the loop is short enough, you can tell the compiler to unroll it completely and the iterations are expanded explicitly. Because the iterations are independent, the compiler can issue all requests for data (“loads”) upfront, provided that it assigns distinct registers to each load.

These requests can be overlapped with each other, so that the whole set of loads experiences only a single memory latency, not the sum of all individual latencies. Even better, part of the single latency is hidden by the succession of load instructions itself. This is a near-optimal situation, but it may require a lot of registers to receive the results of the loads.

If the loop is too long, it could be unrolled partially. In that case, batches of iterations are expanded, and then you follow the same general strategy as before. Work on your part is minimal (but you may not be that lucky).

If the loop contains many other instructions whose operands need to be stored in registers, even just partial unrolling may not be an option. In that case, and after you have confirmed that the earlier conditions are satisfied, you must make some decisions based on further information.

Prefetching means bringing data closer to the SMs’ execution units. Registers are closest of all. If enough are available, which you can find out using the Nsight Compute occupancy view, you can prefetch directly into registers.

Consider the following loop, where array arr is stored in global memory (DRAM). It implicitly assumes that just a single, one-dimensional thread block is being used, which is not the case for the motivating application from which it was derived. However, it reduces code clutter and does not change the argument.

In all code examples in this post, uppercase variables are compile-time constants. BLOCKDIMX assumes the value of the predefined variable blockDim.x. For some purposes, it must be a constant known at compile time whereas for other purposes, it is useful for avoiding computations at run time.

for (i=threadIdx.x; i
}

Imagine that you have eight registers to spare for prefetching. This is a tuning parameter. The following code fetches four double-precision values occupying eight 4-byte registers at the start of each fourth iteration and uses them one by one, until the batch is depleted, at which time you fetch a new batch.

To keep track of the batches, introduce a counter (ctr) that increments with each successive iteration executed by a thread. For convenience, assume that the number of iterations per thread is divisible by 4.

double v0, v1, v2, v3;
for (i=threadIdx.x, ctr=0; i
}

Typically, the more values can be prefetched, the more effective the method is. While the preceding example is not complex, it is a little cumbersome. If the number of prefetched values (PDIST, or prefetch distance) changes, you have to add or delete lines of code.

It is easier to store the prefetched values in shared memory, because you can use array notation and vary the prefetch distance without any effort. However, shared memory is not as close to the execution units as registers. It requires an extra instruction to move the data from there into a register when it is ready for use. For convenience, we introduce macro vsmem to simplify indexing the array in shared memory:

#define vsmem(index)  v[index+PDIST*threadIdx.x]
__shared__ double v[PDIST* BLOCKDIMX];
for (i=threadIdx.x, ctr=0; i
}

Instead of prefetching in batches, you can also do a “rolling” prefetch. In that case, you fill the prefetch buffer before entering the main loop and subsequently prefetch exactly one value from memory during each loop iteration, to be used PDIST iterations later. The next example implements rolling prefetching, using array notation and shared memory.

__shared__ double v[PDIST* BLOCKDIMX];
for (k=0; k
}

Contrary to the batched method, the rolling prefetch does not suffer anymore memory latencies during the execution of the main loop for a sufficiently large prefetch distance. It also uses the same amount of shared memory or register resources, so it would appear to be preferred. However, a subtle issue may limit its effectiveness.

A synchronization within the loop—for example, syncthreads—constitutes a memory fence and forces the loading of arr to complete at that point within the same iteration, not PDIST iterations later. The fix is to use asynchronous loads into shared memory, the simplest version of which is explained in the Pipeline interface section of the CUDA programmer guide. These asynchronous loads do not need to complete at a synchronization point, but only when they are explicitly waited on.

Here’s the corresponding code:

#include 
__shared__ double v[PDIST* BLOCKDIMX];
for (k=0; k
}

As each __pipeline_wait_prior instruction must be matched by a __pipeline_commit instruction, we put the latter inside the loop that prefills the prefetch buffer, before entering the main computational loop, to keep bookkeeping of matching instruction pairs simple.

Performance results

Figure 1 shows, for various prefetch distances, the performance improvement of a kernel taken from a financial application under the five algorithmic variations described earlier.

Batched prefetch into registers (scalar batched)
Batched prefetch into shared memory (smem batched)
Rolling prefetch into registers (scalar rolling)
Rolling prefetch into shared memory (smem rolling)
Rolling prefetch into shared memory using asynchronous memory copies (smem rolling async)

Graph shows that smem rolling async speeds up by -60% at a distance of 6. — *Figure 1. Kernel speedups for different prefetch strategies*

Clearly, the rolling prefetching into shared memory with asynchronous memory copies gives good benefit, but it is uneven as the prefetch buffer size grows.

A closer inspection of the results, using Nsight Compute, shows that bank conflicts occur in shared memory, which cause a warp worth of asynchronous loads to be split into more successive memory requests than strictly necessary. The classical optimization approach of padding the array size in shared memory to avoid bad strides works in this case. The value of PADDING is chosen such that the sum of PDIST and PADDING equals a power of two plus 1. Apply it to all variations that use shared memory:

#define vsmem(index) v[index+(PDIST+PADDING)*threadIdx.x]

This leads to the improved shared memory results shown in Figure 2. A prefetch distance of just 6, combined with asynchronous memory copies in a rolling fashion, is sufficient to obtain optimal performance at almost 60% speedup over the original version of the code. We could actually have arrived at this performance improvement without resorting to padding by changing the indexing scheme of the array in shared memory, which is left as an exercise for the reader.

Graph shows speedup percentages where scalar rolling alone slows performance by ~60% and other rolling/batched strategies shows speedups of 20-30%. — *Figure 2. Kernel speedups for different prefetch strategies with shared memory padding*

A variation of prefetching not yet discussed moves data from global memory to the L2 cache, which may be useful if space in shared memory is too small to hold all data eligible for prefetching. This type of prefetching is not directly accessible in CUDA and requires programming at the lower PTX level.

Summary

In this post, we showed you examples of localized changes to source code that may speed up memory accesses. These do not change the amount of data being moved from memory to the SMs, only their timing. You may be able to optimize more by rearranging memory accesses such that data is reused many times after it arrives on the SM.

Misc

Supercharging AI-Accelerated Cybersecurity Threat Detection

Post author By
Post date March 23, 2022
No Comments on Supercharging AI-Accelerated Cybersecurity Threat Detection

NVIDIA Morpheus, now available for download, enables you to use AI to achieve up to 1000x improved performance.

Cybercrime worldwide is costing as much as the gross domestic product of countries like Mexico or Spain, hitting more than $1 trillion annually. And global trends point to it only getting worse.

Data centers face staggering increases in users, data, devices, and apps increasing the threat surface amid ever more sophisticated attack vectors.

Stop emerging threats

NVIDIA Morpheus enables cybersecurity developers and independent software vendors to build high-performance pipelines for security workflows with minimal development effort.

You can easily leverage the benefits of back pressure, reactive programming, and fibers to build cybersecurity solutions. The higher-level API enables you to program traditionally but gain the benefits of accelerated computing, allowing you to achieve orders of magnitude improvements in throughput. These optimizations don’t exist in any other streaming framework. Morpheus now enables building custom pipelines with Python and C++ abstraction layers.

You might typically have had to choose between writing something quickly in Python with minimal lines of code or writing something that doesn’t have the performance ceiling that Python does. With Morpheus, you get both.

You can write orders of magnitude less code and get an unbounded performance ceiling. This enables better results in less time, translating to cost savings and superior outcomes.

F5 malware detection

NVIDIA partner F5 used a Morpheus-based machine learning model for their malware detection use case. With its highly scalable, customizable, and accelerated data processing, training and inference capabilities, Morpheus enabled a 200x performance improvement to the F5 pipeline over their CPU-based implementation.

The Morpheus pipeline helps you quickly create highly performant code and workflows, which can incorporate innovative models, with minimal development friction. As a result, you extract better performance from GPUs, boosting processing of the logs required to find domain generation algorithms (DGAs).

For F5, this meant going from processing 1013 DGA logs per second to 20,833 logs per second, all with just 136 lines of code. For more information, see the Detection of DGA-based malicious domain names using real-time ML techniques F5 GTC session.

Scaling the pipeline

Morpheus makes it easy to build and scale cybersecurity applications that harness adaptive pipelines supporting a wider range of model complexity than previously possible. Beyond just hardware acceleration, the programming model plays a critical role in performance. Morpheus uses reactive programming methods, which means that it can adapt and automatically redirect resources on the fly to any portion of the pipeline under pressure.

Figure 1. AI-Based, Real-Time Threat Detection at Scale

If part of the pipeline sees a dramatic increase in data, Morpheus can adapt and create additional paths for the data to continue processing. The depth of these buffers is monitored, and additional segments can be added as necessary. Just as easily, Morpheus removes these when they’re no longer necessary.

Using fibers, Morpheus can take work from other processes, if they’re being underused. You don’t have to spin up anything; just borrow the work available on those underused portions of the pipeline.

All this comes together to enable Morpheus to adapt intelligently to the high variability in cybersecurity data streams. It provides complete visibility into what’s happening on your network in real time and enables you to write sequential code that Morpheus scales out automatically.

With Morpheus, you can analyze up to 100% of your data in real time, for more accurate detection and faster remediation of threats as they occur. Morpheus also uses AI to adjust to threats and compensate on the fly.

Real-time fraud detection at scale

The Morpheus cybersecurity AI framework for developers is a first-of-its-kind offering for creating AI-accelerated, real-time fraud detection at massive scale.

Unleashing streaming graph neural networks (GNNs) for fraud detection, it unlocks capabilities that weren’t previously available to independent software vendors and security developers without hefty sums of labeled data.

GNNs achieve next-generation breakthroughs in fraud detection because they are uniquely designed to identify and analyze relationships between seemingly unconnected pieces of data to make predictions and do this at massive scale. It’s also why GNNs have historically been used for applications such as recommender systems and optimizing delivery routes for drivers.

Morpheus GNNs enable development on feature engineering for fraud detection with far less training data. With traditional approaches, experts identify pieces of data that are important, such as geolocation information and label them with their significance.

Because GNNs require less training data, you reduce the need for human expertise. You also enable the detection of threats that might not be otherwise recognized due to the amount of labeled training data required to train other models. Even with less data, you can improve the accuracy of fraud detection, which could potentially represent hundreds of millions of dollars to an organization.

Halt ransomware at the point of entry

Brazen global ransomware threats, like the high-profile shutdown of the Colonial Pipeline gas network, were an increased concern in 2021. Organizations are struggling to keep up with the volume and velocity of new threats. Costs of a data breach for an organization can run in the tens of millions per security breach and continue to rise.

The Morpheus AI application framework is built on NVIDIA RAPIDS and NVIDIA AI, together with NVIDIA GPUs. It enables the creation of powerful tools for implementing cybersecurity for this challenging era. When combined with the NVIDIA BlueField DPU accelerators and NVIDIA DOCA telemetry, this ushers in new standards for security development.

Diagram of input from SIEM/SOAP, app logs, cloud logs, BlueField, and converged card to Morpheus layer, on top of RAPIDS, cyber log accelerators, Triton Inference Server, and Tensor RT. — *Figure 2. Morpheus architecture*

Use cases for Morpheus include natural language processing (NLP) for phishing detection. Digital fingerprinting is another use case, as it analyzes the behavior of every human and machine across the enterprise to detect anomalies.

Join us at NVIDIA GTC to hear about how NVIDIA partners are integrating NVIDIA-accelerated AI with their cybersecurity solutions. NVIDIA Morpheus is open-source and available in April for download through GitHub and NGC.

For more information, see the following resources:

Misc

Model runs fine as a directory, but throws an error when used as an .h5 file: ValueError: All `axis` values to be kept must have known shape. Got axis: (-1,), input shape: [None, None], with unknown axis at index: 1

Interesting issue that I can’t quite wrap my head around.

We have a working Python project using Tensorflow to create and then use a model. This works great when we output the model as a directory, but if we output the model as an .h5 file, we run into the following error whenever we try to use the model:

ValueError: All `axis` values to be kept must have known shape. Got axis: (-1,), input shape: [None, None], with unknown axis at index: 1

Here is how we were and how we are currently saving the model:

# this technique works (saves model to a directory) tf.keras.models.save_model( dnn_model, filepath='./true_overall', overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None, save_traces=True ) #this saves the file, but throws an error when the file is used tf.keras.models.save_model( dnn_model, filepath='./true_overall.h5', overwrite=True, include_optimizer=True, save_format=None, signatures=None, options=None, save_traces=True )

This is how we’re importing the model for use:

dnn_model = tf.keras.models.load_model('./neural_network/true_overall.) #works dnn_model = tf.keras.models.load_model('./neural_network/true_overall.h5') #doesn't work

What would cause a model to work when saved as a directory but have issues when saved as an h5 file?

submitted by /u/jengl
[visit reddit] [comments]

Misc

How to convert keras tensor to numpy array?

Post author By
Post date March 23, 2022
No Comments on How to convert keras tensor to numpy array?

I am working on a project in which I am using layer wise relevance propagation to get the relevances of each input. But the output of LRP is in keras tensor. Is there any way to convert it to numpy array?

submitted by /u/jer_in_
[visit reddit] [comments]

Misc

Added data augmentation through contrast and brightness with poor results, why?

Post author By
Post date March 22, 2022
No Comments on Added data augmentation through contrast and brightness with poor results, why?

Added data augmentation through contrast and brightness with poor results, why?

Hi there, I am a couple of weeks in with learning ML, and trying to get a decent image classifier. I have 60 or so labels, and only about 175-300 images each. I found that augmentation via flips and rotations suits the data and has helped bump up the accuracy a bit (maybe 7-10%).

The images have mostly white backgrounds, but some are not (greys, some darker) and this is not evenly distributed I think it was causing issues when making predictions from test photos: some incorrect labels came up frequently despite little visual similarity. I thought perhaps the background was involved as the darker background/shadows matched my photos. I figured adding contrast/brightness variation would nullify this behavior so I followed this here which adds a layer to randomize contrast and brightness to images in the training dataset. Snippet below:

 #defaults contrast_range=[0.5, 1.5], brightness_delta=[-0.2, 0.2] contrast = np.random.uniform( self.contrast_range[0], self.contrast_range[1]) brightness = np.random.uniform( self.brightness_delta[0], self.brightness_delta[1]) images = tf.image.adjust_contrast(images, contrast) images = tf.image.adjust_brightness(images, brightness) images = tf.clip_by_value(images, 0, 1) return images

With slight adjustments to contrast and brightness. I reviewed the output and it looks exactly how I wanted it, and I figured it would at least help, but it appears to cause a trainwreck! Does this make sense? Where should I look to improve on this?

Without contrast/brightness

With contrast/brightness

As well, most tutorials focus on two labels, for 60 or so labels with 200-300 images each, in projects that deal with plants/nature/geology for example what is typically attainable for accuracy?

submitted by /u/m1g33
[visit reddit] [comments]

Misc

"kernel driver does not appear to be running on this host"

Post author By
Post date March 22, 2022
No Comments on "kernel driver does not appear to be running on this host"

i looked the problem up but i didnt find any solutions plus the only threads i found were problems with poeple who wanted to use tensorflow with gpu. so here i post:

My situation:

i know basics in python and know a little bit about virtual environments and im using tensorflow object detection api without gpu on ubuntu 18.04

I installed the tensorflow object detection api with this anaconda guide “https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/” , tho im not sure if i activated the tensorflow environment (“conda activate tensorflow”) doing this. It worked fine and wrote various programs with spyder 5.2.3 using tensorflow and object detection.

Then i did a terrible rookie mistake and updated anaconda and i believe conda too cause i was pretty much mindlessly copying some pip commands and everything stopped working cause of a dependency chaos.

i tried with conda revisions to revert the update but it wasnt working and i tried deleting anaconda with

conda install anaconda-clean

anaconda-clean –yes

rm -rf ~/anaconda3

and uninstalling tensorflow with

pip uninstall tensorflow

and tried reinstalling the whole thing twice but since then i get the classic error or hint for not using a gpu but additionally some error message like “kernel driver does not appear to be running on this host” and UNKOWN ERROR: 303 with some luda files missing which are associated with Cuda, but i dont use cuda since i have no gpu.

does it have something to do with a virtual environment i dont use or did i not uninstall tensorflow or anaconda properly or something else.

would appreciate some help if possible

submitted by /u/Mumm13
[visit reddit] [comments]

Misc

can someone please tell me how to upload training data I’m trying to find nums 0-9 on a page

Post author By
Post date March 22, 2022
No Comments on can someone please tell me how to upload training data I’m trying to find nums 0-9 on a page

submitted by /u/Living-Aardvark-952
[visit reddit] [comments]

Misc

First Wave of Startups Harnesses UK’s Most Powerful Supercomputer to Power Digital Biology Breakthroughs

Post author By
Post date March 22, 2022
No Comments on First Wave of Startups Harnesses UK’s Most Powerful Supercomputer to Power Digital Biology Breakthroughs

Four NVIDIA Inception members have been selected as the first cohort of startups to access Cambridge-1, the U.K.’s most powerful supercomputer. The system will help British companies Alchemab Therapeutics, InstaDeep, Peptone and Relation Therapeutics enable breakthroughs in digital biology. Officially launched in July, Cambridge-1 — an NVIDIA DGX SuperPOD cluster powered by NVIDIA DGX A100 Read article >

The post First Wave of Startups Harnesses UK’s Most Powerful Supercomputer to Power Digital Biology Breakthroughs appeared first on NVIDIA Blog.

Misc

NVIDIA Launches Omniverse for Developers: A Powerful and Collaborative Game Creation Environment

Post author By
Post date March 22, 2022
No Comments on NVIDIA Launches Omniverse for Developers: A Powerful and Collaborative Game Creation Environment

Enriching its game developer ecosystem, NVIDIA today announced the launch of new NVIDIA Omniverse™ features that make it easier for developers to share assets, sort asset libraries, collaborate and deploy AI to animate characters’ facial expressions in a new game development pipeline.