Categories
Misc

NVIDIA Recommends Stockholders Reject ‘Mini-Tender’ Offer by Tutanota LLC

SANTA CLARA, Calif., May 27, 2022 — NVIDIA today announced that it recently became aware of an unsolicited “mini-tender” offer by Tutanota LLC to purchase up to 215,000 …

Categories
Misc

Prototyping Faster with the Newest UDF Enhancements in the NVIDIA cuDF API

This post highlights helpful new cuDF features that allow you to think about a single row of data and write code faster.

Over the past few releases, the NVIDIA cuDF team has added several new features to user-defined functions (UDFs) that can streamline the development process while improving overall performance. In this post, I walk through the new UDF enhancements and show how you can take advantage of them within your own applications:

  • The cuDF Series.apply API and how to use it
  • The cuDF DataFrame.apply API and how to write a UDF in terms of “rows”
  • Enhanced support for missing data using both apply APIs
  • A real-world use case example with timing
  • Practical considerations, limitations, and future plans

apply API for cuDF series

If you’re not familiar with pandas, series apply is the main entry point used for mapping an arbitrary Python function onto a single series of data. For example, you might want to convert temperature in Celsius to Fahrenheit using a formula already written as a Python function.

Here is a quick refresher followed by the output of running this code:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

Technically, you can write any valid Python code within the function f and pandas runs the function in a loop over the series. This makes apply extremely flexible in the context of pandas, as any UDF can be successfully applied as long as it can successfully handle all of the input data—even UDFs that rely on external libraries or ones that expect or return arbitrary Python objects.

But, this flexibility comes at the cost of performance. Running a Python function in a long loop is not known for being an efficient strategy for a variety of reasons (for example, Python being interpreted from the outset). As a result, this performance constraint can be frustrating if your UDFs are simpler, such as those composed of purely mathematical operations on scalar values.

Luckily, these use cases are what cuDF was built for. Recent cuDF improvements within the scope of UDFs have motivated the introduction of the equivalent apply API:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

view raw

ctof_cudf.ipynb

hosted with ❤ by GitHub

If you are familiar with pandas, you can produce the same results as you did using pandas for numeric values. The only notable difference is that the resulting data is always a cuDF dtype and not an object, which is usually the case in pandas.

The function f can contain any Python UDF that is composed of pure math or Python operations. cuDF deduces an appropriate return dtype based on the inspection of the function through Numba and compiles and runs an equivalent function on the GPU.

Functions can also be written to accept an arbitrary number of scalar arguments. In the following code example, you can see that args= is supported:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

view raw

cudf_args.ipynb

hosted with ❤ by GitHub

While there are other ways of accomplishing the same goal in cuDF using custom kernels and other methods, this method of writing UDFs helps to abstract the GPU away from the process, which can cut down on development time for data scientists working on fast-paced, real-world projects.

So far, I’ve covered only the case of Series-based data. That is, I’ve shown you how to write a UDF with a single input and output. Many use cases require multi-column input, however, and this requires slightly different thinking.

DataFrame UDFs and thinking in terms of rows

UDFs that expect multiple columns as input and produce a single column as output are the set of functions supported by the pandas DataFrame apply API.

In these cases, the first function argument represents a row of data rather than just one value from a single input column. By row, I mean some kind of data structure that is keyable to obtain values, where the keys are the column names and the values are the scalars corresponding to the values of those columns in that row. It is conceptually what you get when you use iloc in pandas:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

view raw

pd_iloc.ipynb

hosted with ❤ by GitHub

The following code example shows how you would write and use a UDF in pandas that consumes this kind of row object:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

cuDF now enables you to do the exact thing without rewriting your UDF.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

When applying these functions, it is important to note that even though the cuDF API expects you to write the functions in terms of rows, no rows are actually involved when it comes to the execution of this function.

cuDF avoids the use of a for-loop and instead executes CUDA kernels that “pretend” rows of data exist. With a little magic, Numba knows how to write a proper kernel to get the same result as pandas. Because there is no loop, you should see higher performance when executing functions through this API.

Support for missing values using the series and DataFrame apply

Historically, UDFs in cuDF have not provided full support for missing values. This is due to architectural choices inside cuDF that relate to the way cuDF records which elements are null, specifically its use of a null mask to conserve memory.

The looping design of pandas apply APIs just works if the data contains null values. If a null is encountered in the data, the UDF receives the special value pd.NA. As a result, if the special value does not trigger an error, the execution proceeds as normal. However, cuDF does not work this way, and it requires a little extra machinery to support the same functionality. If you use the cuDF apply API, you should find that your UDFs treat null values in a natural manner:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

You can even condition on the cudf.NA singleton and get the expected answer, or return it directly from the function:

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

Evidently, the same thing is true here as is the case with rows: cuDF does not actually run the Python function as pandas does. Instead, it uses more Numba magic to translate this class of functions into an equivalent CUDA kernel and then returns the result of that instead.

In the next section, I look at a real-world example and perform some rough timing.

Real-world example using apply

Consider this scenario: An online streaming service is investigating which segments of its subscribers tend to hold their subscriptions the longest. Additionally, leadership has requested a specific segmentation scheme that breaks subscribers up by age:

  • 18–19
  • 20–29
  • 30–39
  • 40–49
  • 50–59
  • 60–69
  • 69+

The provided data only has two fields: age and days_subscribed.

Here’s how a UDF can solve the problem. First, write the row-wise custom function that applies the grouping. Next, take the results, group by the group ID, and average over the number of renewals.

Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Viewer requires iframe.

In this code example, the data is randomly generated so your mileage may vary on the actual answer. However, it demonstrates the process. Timing the UDF section of the code involves creating a variable pdf through pdf = df.to_pandas , and accomplishing a rough comparison using IPython:

%timeit df.apply(f, axis=1)

# 1.64 ms ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%timeit pdf.apply(f, axis=1)
# 19.2 s ± 63.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Although this is not an official benchmark, the CUDA kernel is over four orders of magnitude faster on average in this particular case, which was run on a 32 GB V100 GPU.

Practical considerations, limitations, and the future

While these cuDF improvements represent significantly broader capabilities than previous iterations, there is always room to grow. Here is a list of key items to consider when writing UDFs for apply in cuDF:

  • JIT compilation. The first time a function is executed against a cuDF object, you encounter overhead effects of compiling the correct CUDA kernel. Subsequent uses of the function do not require recompilation, unless the dtypes of the target dataset change.
  • dtype support. So far, only numeric dtypes are supported in apply. However, support for additional types is on the roadmap, starting with strings.
  • External libraries. A common pattern is performing data prep in pandas and then using an external library for processing inside the UDF for each row. Because you cannot map external code onto the GPU arbitrarily, this is not currently supported.

Summary

UDFs are an easy way of solving particular problems quickly. They help you think in terms of a single datum when designing the logic of your pipeline. With these new cuDF UDF enhancements, the aim is to expedite the development of workflows involving cuDF and allow you to quickly prototype solutions, as well as reuse existing business logic. In addition, null support lets you be explicit about how to handle missing values without needing extra processing steps.

As a reminder, UDFs are an area of active development in cuDF and updates are ongoing. If you choose to try these new UDF enhancements out, as always, I’d love to hear about your experience in the comments section.

Categories
Misc

Can anyone helpme with this code? requests.post is not working. When Im checking in postman the page is still loading as sending request?

@/app.route(‘/myapp/detectObjects’,methods = [‘POST’])

def detect():

imge = request.files.getlist(“image”)

img = imge[0].filename

req = requests.post(‘http://127.0.0.1:5000/give‘,json={“image”:img})

return req.json()

submitted by /u/Savings-Stop-3300
[visit reddit] [comments]

Categories
Misc

Jetson nano Ubuntu 18.04 AARCH64 architecture

I’m a beginner so please be patient with my ignorance.

I’m trying to install Tensorflow but there doesn’t seem to be a method that is supported by the aarch64 architecture.

Has anyone dealt with and solved this issue? If so…. Wtf do I do.. I’m about 25 hours into attempting this install

submitted by /u/DamnitName
[visit reddit] [comments]

Categories
Misc

Cats and Dogs Kaggle dataset corrupt jpeg images

I’m just trying to train the Kaggle Cats and Dogs dataset but I keep getting the error

tensorflow.python.framework.errors_impl.InvalidArgumentError: Input size should match (header_size + row_size * abs_height) but they differ by 2

[[{{node decode_image/DecodeImage}}]] [Op:IteratorGetNext]

Along with:

Corrupt JPEG data: 242 extraneous bytes before marker 0xd9

Corrupt JPEG data: 217 extraneous bytes before marker 0xd9

Corrupt JPEG data: 133 extraneous bytes before marker 0xd9

Even though I implemented 3 different methods to check for and remove corrupt image files… So I have no idea why I’m still getting this error. Please help!

CODE:

#check for corrupted files (method 1)
from pathlib import Path
import imghdr
classes = [‘Cat’,’Dog’]
image_extensions = [“.png”, “.jpg”] # add there all your images file extensions
for item in classes:
data_dir = f’kagglecatsanddogs_5340/PetImages/{item}’
img_type_accepted_by_tf = [“bmp”, “gif”, “jpeg”, “png”]
for filepath in Path(data_dir).rglob(“*”):
if filepath.suffix.lower() in image_extensions:
img_type = imghdr.what(filepath)
if img_type is None:
print(f”{filepath} is not an image”)
# remove image
os.remove(filepath)
print(f”successfully removed {filepath}”)
elif img_type not in img_type_accepted_by_tf:
print(f”{filepath} is a {img_type}, not accepted by TensorFlow”)
try:
img = Image.open(filepath) # open the image file
img.verify() # verify that it is, in fact an image
# print(‘valid image’)
except Exception:
print(‘Bad file:’, filepath)
# method 2
import glob
img_paths = glob.glob(os.path.join(‘kagglecatsanddogs_5340/PetImages/Dog’,’*/*.*’)) # assuming you point to the directory containing the label folders.
bad_paths = []
for image_path in img_paths:
try:
img_bytes = tf.io.read_file(image_path)
decoded_img = tf.decode_image(img_bytes)
except Exception:
print(f”Found bad path {image_path}”)
bad_paths.append(image_path)

print(f”{image_path}: OK”)
print(“BAD PATHS:”)
for bad_path in bad_paths:
print(f”{bad_path}”)
# method 3
num_skipped = 0
for folder_name in (“Cat”, “Dog”):
folder_path = os.path.join(“kagglecatsanddogs_5340/PetImages”, folder_name)
for fname in os.listdir(folder_path):
fpath = os.path.join(folder_path, fname)
try:
fobj = open(fpath, “rb”)
is_jfif = tf.compat.as_bytes(“JFIF”) in fobj.peek(10)
finally:
fobj.close()
if not is_jfif:
num_skipped += 1
# Delete corrupted image
os.remove(fpath)
print(“Deleted %d images” % num_skipped)

def normalize(x,y):
x = tf.cast(x,tf.float32) / 255.0
return x, y
def convert_to_categorical(input):
if input == 1:
return “Dog”
else:
return “Cat”
def to_list(ds):
ds_list = []
for sample in ds:
image, label = sample
ds_list.append((image, label))
return ds_list

# load dataset
directory = ‘archive/PetImages’
ds_train = tf.keras.utils.image_dataset_from_directory(
directory,
labels=’inferred’,
label_mode=’binary’,
color_mode=’rgb’,
batch_size=1,
shuffle=False,
validation_split=0.3,
subset=’training’,
image_size=(180,180)
)
ds_test = tf.keras.utils.image_dataset_from_directory(
directory,
labels=’inferred’,
label_mode=’binary’,
color_mode=’rgb’,
batch_size=1,
shuffle=False,
validation_split=0.3,
subset=’validation’,
image_size=(180,180)
)

# normalize data
ds_train.map(normalize)
ds_test.map(normalize)
# plot 10 random images from training set
num = len(ds_train)
ds_train_list = to_list(ds_train)
for i in range(1,11):
random_index = np.random.randint(num)
img, label = ds_train_list[random_index]
label = convert_to_categorical(np.array(label))
img = np.reshape(img,(300,300,3))
plt.subplot(2,5,i)
plt.imshow(img)
plt.title(label)
plt.savefig(‘figures/example_images.png’)

submitted by /u/berimbolo21
[visit reddit] [comments]

Categories
Misc

Plotting example images from TensorFlow dataset not working

I’m trying to plot a few example images using Matplotlib from a cats and dogs Kaggle dataset that I loaded into the script using TensorFlow’s import_image_dataset_from_directory, but the images aren’t displaying correctly. The Matplotlib plots are either empty or contain some speckled blue or yellow dots… Does anyone know how to fix this? (code below)

def normalize(x,y):
x = tf.cast(x,tf.float32) / 255.0
return x, y
def convert_to_categorical(input):
if input == 1:
return “Dog”
else:
return “Cat”
def to_list(ds):
ds_list = []
for sample in ds:
image, label = sample
ds_list.append((image, label))
return ds_list

# load dataset
directory = ‘train’
ds_train = tf.keras.utils.image_dataset_from_directory(
directory,
labels=’inferred’,
label_mode=’binary’,
batch_size=1,
shuffle=False,
validation_split=0.3,
subset=’training’,
image_size=(300,300)
)
ds_test = tf.keras.utils.image_dataset_from_directory(
directory,
labels=’inferred’,
label_mode=’binary’,
batch_size=1,
shuffle=False,
validation_split=0.3,
subset=’validation’,
image_size=(300,300)
)
# normalize data
ds_train.map(normalize)
ds_test.map(normalize)
# plot 10 random images from training set
num = len(ds_train)
ds_train_list = to_list(ds_train)
for i in range(1,11):
random_index = np.random.randint(num)
img, label = ds_train_list[random_index]
label = convert_to_categorical(np.array(label))
img = np.reshape(img,(300,300,3))
plt.subplot(2,5,i)
plt.imshow(img)
plt.title(label)
plt.savefig(‘figures/example_images.png’)

submitted by /u/berimbolo21
[visit reddit] [comments]

Categories
Misc

I can’t manage to make training work… please help me out

I m a newbie when it comes to deep learning, but I am trying to use a code from git and train the network on my own data. however, it takes forever..it took 80 minutes for 1 epoch, and the number of epoches is 1000. i also tried reducing batch size and using google collab.. please,i dont get what i am doing wrong… at first i tried running on cpu,then on gpu,but i get OOM error even when changing parameters.. any help is appreciated. This is the code : https://github.com/markusaksli/ai-music

submitted by /u/pitic1
[visit reddit] [comments]

Categories
Misc

The Fluid Dynamics Revolution Driven by GPU Acceleration

Computational fluid dynamics tools, such as those used in vehicle aerodynamics, can be used to evaluate the drag produced by a designed surface which has direct implications on vehicle performance.The end of 2021 and beginning of 2022 saw the two largest commercial CFD tool vendors, Ansys and Siemens, both launch versions of their flagship CFD tools with support for GPU acceleration.Computational fluid dynamics tools, such as those used in vehicle aerodynamics, can be used to evaluate the drag produced by a designed surface which has direct implications on vehicle performance.

When a technology reaches the required level of maturity, adoption transitions from those considered visionaries to early majority adopters. Now is such a critical and transitional moment for the largest single segment of industrial high-performance computing (HPC). 

The end of 2021 and beginning of 2022 saw the two largest commercial computational fluid dynamics (CFD) tool vendors, Ansys and Siemens, both launch versions of their flagship CFD tools with support for GPU acceleration. This fact alone is enough proof to show the new age of CFD has arrived.

Evolution of engineering applications for CFD

The past decade saw a wider adoption of CFD as a critical tool for engineers and equipment designers to study or predict the behavior of their designs. However CFD isn’t only an analysis tool, it is now used to make design improvements without having to resort to time-consuming and expensive physical testing for every design/operation point that is being evaluated. This ubiquity is part of why there are so many CFD tools, commercial, and open-source software available today.

The growing need for accuracy in simulations to help minimize testing led to the incorporation of multi-physics capabilities into CFD tools, such as the inclusion of heat transfer, mass transfer, chemical reactions, particulate flows, and more. The other reason for the growth of CFD tools is the fact that capturing all the relevant physics for every type of use case within a single tool is time-consuming to build and validate.

For instance, in the use case of vehicle aerodynamics, a digital wind tunnel can be used to study and evaluate the flow over the geometry and to evaluate the drag produced by the designed surface which has direct implications on vehicle performance. Depending on the intended purpose of the simulation, users get to pick if they want to run a steady or a transient simulation using the traditional Navier-Stokes formulation for fluid flow or use alternative frameworks like the lattice Boltzmann method.

Even within the realm of Navier-Stokes solutions, one has a variety of turbulence models and methodologies, such as what scales are resolved and what are modeled, to choose from for the simulations. The complexity in the model quickly grows when additional physics are considered when making design choices, such as studying automotive aeroacoustics which has an influence on customer perception, passenger safety and comfort, or studying road vehicle platooning.

All the tools used for modeling different flow situations take a staggering amount of compute processing power. As organizations are starting to incorporate CFD earlier on in their design cycles, while simultaneously growing the complexity of their models, both in terms of model size and representative physics, to increase the fidelity of the simulations – the industry has reached a tipping point. 

Parallelism equates with performance

It is no longer uncommon for a single simulation to require thousands of CPU core hours to provide a result, and a single design product can require 10,000 to 1,000,000 simulations or more.

Just recently, an NVIDIA partner, Resolved Analytics, published a survey on CFD users and tools. One of the statistics shown is the commonly-used levels of parallelism by CFD users today. In CFD, parallel execution refers to dividing the domain or grid into sub-grids and assigning a processing unit to each sub-grid. At each numerical iteration, the sub-grids communicate boundary information with the adjacent sub-grids and the CFD solution advances toward convergence.

The survey finishes with the conclusion that hardware and software costs continue to limit the parallelization of CFD.

Donut chart shows that only 12% use more than 256 processors.
Figure 1. Resolved Analytics survey of CFD users

Resolved Analytics surveyed CFD users and found that the overwhelming majority are using fewer than 257 processors, impacting parallel programming capacity:

  • 25% of CFD users use less than nine processors.
  • 34% use 9-32 processors.
  • 29% use 33-256 processors.
  • 8% use 257-1,028 processors.
  • 4% use more than 1,028 processors.

Another way to think about this is that parallelism equates to performance and runtime equates to minimization. Meaning, you could push performance farther than you do today if you were not limited by hardware and software licenses.

Getting to higher levels of performance is the right thing to do, because it optimizes the most expensive resource: engineer and researcher time. Often skilled personnel time can be 5–10x the cost of the next most expensive resource, which is software licenses or computing hardware. Logic dictates allocating funding to remove bottlenecks caused by these lower-cost resources.

Another NVIDIA partner, Rescale, stated this perspective in a similar way:

Most HPC economic models ignore engineering time or engineering productivity, and it is the most valuable and expensive resource that needs to be optimized first. Assuring that hardware and software assets keep researchers generating IP at a maximum rate is the most rational way to treat the core value generators of an organization.

NVIDIA is pleased to share with the CFD user community that the hardware limitation is lifting. Recently, the two most popular CFD tools—Simcenter STAR-CCM+ from Siemens Digital Industries Software and Ansys Fluent—have made available software versions to help support specific physics. Those physics simulations can take significant advantage of the extreme speed of accelerated GPU computing.

At the time of this post, the Simcenter STAR-CCM+ 2022.1 GPU-accelerated version is generally available, currently supporting vehicle external aerodynamics applications for steady and unsteady simulations. The Ansys Fluent release is currently in public beta.

Bar chart shows that NVIDIA A100 (eight GPUs) have a 20.2x speedup with Simcenter STAR-CCM+ 2022.1 over the AMD EPYC 7763 and the Intel Platinum 8380.
Figure 2. Simcenter STAR-CCM+ 2022.1 performance for model LeMans 104M on GPUs in comparison to CPU-only execution shows the top performing platform, with 20x higher speed, is the NVIDIA A100 PCIe 80GB.

Figure 2 shows the performance of the first release of Simcenter STAR-CCM+ 2022.1 against commonly available CPU-only servers. For the tested benchmark, an NVIDIA GPU-equipped server delivers results almost 20x faster than over 100 cores of CPU.

The AMD EPYC 7763 achieved 10% speedup of 1.1x, compared to the NVIDIA V100 (six GPUs) with a 9.6x speedup, NVIDIA V100 (eight GPUs) with a 12.4x speedup, NVIDIA A100 (six GPUs) with a 15.9x speedup, and NVIDIA A100 (eight GPUs) with a 20.2x speedup.

To put that into more practical terms, this means a simulation that takes a full day on a CPU server could be done in a little over an hour with a single node and eight NVIDIA A100 GPUs.

With the Simcenter STAR-CCM+ team continuing to work on improving and optimizing their GPU offering, you can expect even better performance in upcoming releases.

A comparison of the Simcenter STAR-CCM+ mean of pressure coefficient compared between the GPU and CPU on a Corvette C6 ZR1 shows little difference between the two simulations.
Figure 3. Simcenter STAR-CCM+ results for the mean of pressure coefficient compared between (left) GPU- and (right) CPU-based runs.

Corvette C6 ZR1 external aerodynamics, pseudo-steady simulation, 110M cells run with SST-DDES and Moving Reference Frame (MRF) for the wheels. GPU runs on 4xA100 DGX station.

GPU-accelerated runs are delivering consistent results compared to CPU-only runs, and Siemens delivered a product that can be seamlessly moved from CPUs to GPUs to get the results faster and effortlessly. The result is that you can now run simulations on-premises or on the cloud, as A100 GPU instances are available from all the major cloud service providers.

Siemens showed similar results in their announcement of GPU support in version 2022.1 when comparing CPU-only servers on-premises and in the cloud for both previous-generation V100 GPUs and current generation A100 GPUs. They also showed the performance of a large, industrial-scale model and the equivalent number of CPU cores required to get similar run times as that of a single node with eight GPUs on it.

Never to be left behind on technology trends, NVIDIA and Ansys announced the public beta availability of a GPU-accelerated, limited-functionality Fluent at the 2021 GTC Fall keynote.

The performance of the Ansys FLUENT 2022 beta1 server compared to CPU only servers shows that Intel Xeon had 1x speedup, AMD Rome had 1.1 speedup, and AMD Milan had 1.1x speedup compared to the NVIDIA A100 which had speedups from 5.2x (one GPU) to an impressive 33x speedup (eight GPUs).
Figure 4. Performance of Ansys FLUENT 2022 beta1 for a 105M cell car model server vs. CPU-only servers.

This comparison is based on a 100-iteration timing, steady-state, GEKO turbulence model.

The performance of the Ansys Fluent 2022 beta1 server compared to CPU-only servers shows that Intel Xeon, AMD Rome, and AMD Milan had ~1.1x speedups compared to the NVIDIA A100 PCIe 80GB, which had speedups from 5.2x (one GPU) to an impressive 33x (eight GPUs).

The Ansys Fluent numbers drove some major excitement. They showed that a single GPU accelerated server for their selected benchmark and associated physics could deliver nearly 33x the performance of the standard Intel processor-only servers common today.

Such fast turnaround times are due to GPU acceleration of the two most used commercial CFD applications. This means that design engineers can not only incorporate simulations earlier into their design cycles but also explore several design iterations within a single day. They can make informed decisions about product performance quickly instead of having to wait for weeks.

Other options for GPU acceleration

At such speeds, other bottlenecks in the product research process can emerge. Sometimes a major consumer of engineering time is preprocessing, or the manual process of building the models to be run.

It is especially important to address this problem because it takes engineering person-time to solve. This is different from other factors, like simulation run time, that leave the researcher free to concentrate on other tasks. This is an active area of focus recently highlighted in CFD Mesh Generators: Top 3 Reasons They Slow Analysis and How to Fix Them.

All that said though, GPU acceleration is not an entirely new phenomenon. Some of the more niche tools have either been born in a GPU-accelerated world or have come to it sooner rather than later:

  • Altair CFD (NanoFluidX and UltraFluidX)
  • Cascade Technologies, CharLES
  • ESS Rocky
  • CPFD Barracuda
  • Dassault, XFlow
  • M-STAR CFD
  • NASA, FUN3D

NVIDIA has featured exciting and visually stunning results from NASA’s FUN3D tool, including the time Jensen Huang shared a simulation of a Mars lander entering the atmosphere.

NASA’s FUN3D 14.0 tool shows a 72-84x improvement using the NVIDIA V100 SXM2 16 GB 2x EDR InfiniBand compared to the IBM AC922 Dual Power9 CPU.
Figure 5. A 72-84x improvement shown for CPU vs. GPU performance of FUN3D version 14.0 (unreleased), courtesy of the NASA FUN3D team.

Hardware access provided by ORNL Summit using IBM AC922 Dual Power9 CPUs with 6x NVIDIA V100 SXM2 16 GB 2x EDR InfiniBand.

The most recent Supercomputing Conference featured research by a team that studied algorithmic changes which produce reductions in floating-point atomic updates required by large-scale parallel GPU computing environments. The runtime of several kernels is dominated by the update speeds, and therefore efficiencies found in this area have the potential for large benefits. Also, though FUN3D is a NASA and United States government-only tool, the discussion in this paper has applicability to other unstructured Reynolds-averaged Navier-Stokes CFD tools.

Beyond savings and removing roadblocks, maybe the most exciting part of mainstream CFD tools becoming GPU-accelerated is the new science and engineering that cut runtimes by factors of 15–30x. Until now, without access to leadership-class supercomputing capabilities, investigations into these areas have been too difficult from both a runtime and a problem-size standpoint:

  • Vehicle underhood modeling: Turbulent flow with heat transfer
  • Large eddy and combustion: Needed for detailed environmental emissions modeling
  • Magneto-hydrodynamics: Flows influenced by magnetic fields important to modeling fusion energy generators, internals of stars and gas giant planets
  • Machine learning training: Automatic generation of models and solutions that are used to train machine learning algorithms to estimate flow initial conditions, model turbulence, mixing, and so on

For more information about accelerated computing being used for other fluids or industrial simulations, watch the recommended recent GTC 2022 sessions focused on manufacturing and HPC:

Categories
Misc

Ready, Set, Game: GFN Thursday Brings 10 New Titles to GeForce NOW

It’s a beautiful day to play video games. And it’s GFN Thursday, which means we’ve got those games. Ten total titles join the GeForce NOW library of over 1,300 games, starting with the release of Roller Champions – a speedy, free-to-play roller skating title launching with competitive season 0. Rollin’ Into the Weekend Roll with Read article >

The post Ready, Set, Game: GFN Thursday Brings 10 New Titles to GeForce NOW appeared first on NVIDIA Blog.

Categories
Misc

A Devotion to Emotion: Hume AI’s Alan Cowen on the Intersection of AI and Empathy

Can machines experience emotions? They might, according to Hume AI, an AI research lab and technology company that aims to “ensure artificial intelligence is built to serve human goals and emotional well-being.” So how can AI genuinely understand how we are feeling, and respond appropriately? On this episode of NVIDIA’s AI Podcast, host Noah Kravitz Read article >

The post A Devotion to Emotion: Hume AI’s Alan Cowen on the Intersection of AI and Empathy appeared first on NVIDIA Blog.