Categories
Misc

NASA and NVIDIA Collaborate to Accelerate Scientific Data Science Use Cases, Part 2

Over the past couple of years, NVIDIA and NASA have been working closely on accelerating data science workflows using RAPIDS, and integrating these GPU-accelerated libraries with scientific use cases. This is the second post in a series that will discuss the results from an air pollution monitoring use case conducted during the COVID-19 pandemic, and … Continued

Over the past couple of years, NVIDIA and NASA have been working closely on accelerating data science workflows using RAPIDS, and integrating these GPU-accelerated libraries with scientific use cases. This is the second post in a series that will discuss the results from an air pollution monitoring use case conducted during the COVID-19 pandemic, and share code snippets to port existing CPU workflows to RAPIDS on NVIDIA GPUs. This first post of this series, we covered Accelerated Simulation of Air Pollution.

Monitoring the Decline of Air Pollution Across the Globe During the COVID-19 Pandemic

Another air quality application leveraging XGBoost and RAPIDS is the live monitoring of air quality through the combination of surface monitoring data and near real-time model data produced by the NASA GEOS-CF model. This approach is particularly useful to detect and quantify air pollution anomalies, i.e., patterns in air quality observations that cannot be explained by the model. The most prominent (and extreme) example of this is the decline of air pollution in the wake of the COVID-19 pandemic. As a result of the stay-at-home orders, traffic emissions of air pollutants such as nitrogen dioxide (NO2) decreased significantly, as apparent from both satellite observations and surface monitoring data. However, exactly quantifying the impact of COVID-19 restrictions on surface air quality solely based on these atmospheric observations is very difficult given that many other factors impact surface air pollution, including weather, chemistry, or wildfires.

Video 1. Daily nitrogen dioxide (NO2) measurements and percentage difference from a baseline model. Credits: Christoph Keller and  NASA’s Scientific Visualization Studio.

The study conducted by Christoph and his colleagues fuses millions of observations – taken at 4,778 monitoring sites in 47 countries – with co-located model output produced by GEOS-CF. The sample code found at the covid_no2 repo, demonstrates the application for 10 selected cities (New York, Washington DC, San Francisco, Los Angeles, Beijing, Wuhan, London, Paris, Madrid, and Milan). The air quality observations for years 2018 through 2020 at these cities were obtained from the OpenAQ database (https://openaq.org/#/) and the European Environment Agency EEA (https://discomap.eea.europa.eu/map/fme/AirQualityExport.htm) and pre-processed into a single file for convenience:

Similarly, we preprocessed the GEOS-CF model output by subsampling the gridded native output (available in netCDF format at https://portal.nccs.nasa.gov/datashare/gmao/geos-cf/v1/das/) to the observation locations and saved the corresponding data as a table in text format that can be read similarly to the observation data:

The model data contains not only model predicted NO2 concentrations but also a number of ancillary model variables, such as information about the local weather and atmospheric composition (as taken from GEOS-CF) or calendar information.

The model data is then combined with the surface observation data to build an XGBoost bias-correction model that relates the model NO2 prediction to the observations. This is to account for the fact that the model prediction can be systematically different from the observations, e.g., because the model output represents the average over a 25×25 km2 domain while the surface observation is typically much more local in nature.

To train the XGBoost model, the model data is merged with the observations and the model bias is calculated from the merged data set to provide the label for the training (see sample code in repository mentioned above for full example):

Using the trained model, we can also calculate the SHAP values on GPU to analyze the factors that contribute most to the bias correction:

As shown in the figure below, the SHAP values for New York indicate that the most important predictors for the NO2 model bias (relative to the actual observation) is NO2 itself, followed by the hour of the day, wind speed (V10M) and the height of the planetary boundary layer (ZPBL). (note: to output the SHAP values in the example code the input argument shap needs to be set to 1, as well as gpu argument for accelerated code on the GPU).

Figure 2: Distribution of the 20 most important SHAP values for
 the bias correction model for New York City
.

After training the XGBoost model on the 2018 and 2019 data (using 8-fold cross-validation), we extend the NO2 bias correction to the model data produced for year 2020, resulting in a time series of the expected NO2 concentrations at a given observation site if there had been no mobility restrictions due to the pandemic (the NO2 ‘baseline’). The difference between the actual observations and these bias-corrected model predictions offers an estimate of the impact of COVID-19 restrictions on NO2 concentrations. The figure below shows the difference between observations and model predictions at New York City from Jan 2019 to Jan 2021. The solid green line shows the best estimate, defined as the 21-day rolling average across all four observation sites available for New York City, and the dark and light shaded areas show two uncertainty estimates derived from the time-averaged and hourly model-observation samples, respectively.

Throughout year 2019, the bias-corrected model mean estimate is in close agreement with the observations. Coinciding with the outbreak of the pandemic, the observed NO2 over New York City declines by up to 40% and only gradually recovers to the expected value by year end.

Difference between observed NO2 concentration and model-predicted values for New York City. Negative anomalies indicate a reduction in observed NO2 concentration relative to
the expected ‘business-as-usual' scenario.
Figure 3: Dark and light-shaded areas indicate low and high uncertainty estimates.

Conducting this analysis at 4,778 locations across the world enables us to identify regional patterns in air quality anomalies, and our study shows that these patterns tend to be related to differences in timing and intensity of COVID-19 restrictions.

The here described approach is not only useful to analyze the impact of COVID-19 on air pollution but can be generally used to monitor air pollution across the world (both the observations and model data are available in near real-time). Given the growing number of available air quality observations, fast data processing becomes ever more critical for such an application. The sample code available at https://github.com/GEOS-CF/covid_no2 demonstrates that conducting the analysis on a V100 GPU using cuDF offers an overall speed-up of up to 5x for, each city compared to 20-core Intel Xeon E5-2689 CPU.

References:

Keller, C. A., Evans, M. J., Knowland, K. E., Hasenkopf, C. A., Modekurty, S., Lucchesi, R. A., Oda, T., Franca, B. B., Mandarino, F. C., Díaz Suárez, M. V., Ryan, R. G., Fakes, L. H., and Pawson, S.: Global impact of COVID-19 restrictions on the surface concentrations of nitrogen dioxide and ozone, Atmos. Chem. Phys., 21, 3555–3592, https://doi.org/10.5194/acp-21-3555-2021, 2021.

NASA Model Reveals How Much COVID-related Pollution Levels Deviated from the Norm

A Data Science Series (Part 1): NASA and NVIDIA Collaborate to Accelerate Scientific Data Science Use Cases

Categories
Misc

Machine Learning with ML.NET – NLP with BERT

Machine Learning with ML.NET - NLP with BERT submitted by /u/RubiksCodeNMZ
[visit reddit] [comments]
Categories
Misc

Using feature columns on 3D data: How to avoid having 750+ input layers ?

Hello,

I’m trying to use Tensorflow to predict the outcome of a sport contest. What I have for every sample is the context of the competition (weather, type of stadium, …) and the competition history for every competing team.

Here is an overview of the data of every sample:

Context Teams History
CompetitionData [[CompetitionData], [CompetitionData]] (for every team, the past competition data and their result (win/lose/ranking)

I’m going to try to develop a Learning to Rank System, where given the context and every team history, predict the final ranking.

I think that feature columns are useful in this case, as they can ease the processing of the Competition Data. However, I can’t find a way to reuse the feature column code across all Competition data dimensions. The ideal would be to reuse the DenseFeatures layers across all competition data, but it doesn’t seems to work as tf requires the data to be of dict type to be fed to the Densefeature layers, which needs to be passed one by one trough an input layer to be correctly inputted.

I have also tried this:

history_input = tf.keras.layers.DenseFeatures([info_columns, info_columns2, info_columns3, age])

statics_hist = {
‘rapport’: Input((1,), dtype=tf.dtypes.int32, name=”rapport”),
‘weight’: Input((1,), dtype=tf.dtypes.int32, name=”weight”),
‘age’: Input((1,), name=”age”),
‘first’: Input((1,), dtype=tf.dtypes.int32, name=”first”),
‘stadium’: Input((1,), name=”stadiul”, dtype=tf.dtypes.string)
}
test = [stack([history_input(statics_hist) for _ in range(NB_RACE_HISTORY)], axis=1) for __ in range(MAX_NUMBER_PLAYERS)]
test = stack(test, axis=1)
But as I have 15 players with each 10 competition history of several columns, this gives me 750+ input layers, which can’t be the right way to go.

I have thought about flattening the data beforehand, but then I would lose the ability to run an LSTM trough a player history, which is important to modelize his current performance.

I’m not really sure of the right way to go, could anyone point me in the right direction ?

submitted by /u/Wats0ns
[visit reddit] [comments]

Categories
Misc

Infrared Image is better than visible light image for machine navigation in such environments

submitted by /u/Z_future1
[visit reddit] [comments]

Categories
Misc

AI and 5G to Fuel Next Wave of IoT Services, Says GTC Panel of Telecom Experts

The rollout of 5G for edge AI services promises to fuel a magic carpet ride into the future for everything from autonomous vehicles, to supply chains and education. That was a key takeaway from a panel of five 5G experts speaking at NVIDIA’s GPU Technology Conference this week. With speed boosts up to 10x that Read article >

The post AI and 5G to Fuel Next Wave of IoT Services, Says GTC Panel of Telecom Experts appeared first on The Official NVIDIA Blog.

Categories
Misc

Deep Learning Classifies Largest-Ever Catalog of Distant Galaxies

University of Pennsylvania researchers have used convolutional neural networks to catalog the morphology of 27 million galaxies, giving astronomers a massive dataset for studying the evolution of the universe.

University of Pennsylvania researchers have used convolutional neural networks to catalog the morphology of 27 million galaxies, giving astronomers a massive dataset for studying the evolution of the universe. 

“Galaxy morphology is one of the key aspects of galaxy evolution,” said study author Helena Domínguez Sánchez, former postdoc at Penn. “The shape and structure of galaxies has a lot of information about the way they were formed, and knowing their morphologies gives us clues as to the likely pathways for the formation of the galaxies.”

While past research projects have focused on classifying images of bright, nearby galaxies, the team focused their neural network on fainter, further galaxies captured by the Dark Energy Survey, an international project to image an eighth of the sky. 

The further away a galaxy is from the Milky Way, the longer it takes for light to reach our corner of the universe. So images from the Dark Energy Survey, which contains more images of distant galaxies than previous studies, “show us what galaxies looked like more than 6 billion years ago,” said Mariangela Bernardi, professor in the Department of Physics and Astronomy at Penn. 

While the researchers already had a CNN that could categorize galaxies as spiral or elliptical, the model had been trained on nearby galaxies captured in the Sloan Digital Sky Survey. To teach the neural network to process further, more pixelated images from the Dark Energy Survey, the team collected a labeled dataset of 20,000 galaxies from both astronomical surveys, where the morphological classifications were already known. 

They then created a synthetic dataset that simulated how the images would look if they depicted galaxies that were further away.

Simulated spiral and elliptical galaxy images illustrate how fainter and more distant galaxies would look in the Dark Energy Survey dataset.

Once trained on a combination of simulated and real galaxy images, the CNN was applied to the massive Dark Energy Survey dataset, cataloging 27 million galaxies as either early-type or late-type galaxies, and as face-on or edge-on images. 

The team used NVIDIA GPUs on Amazon Web Services for training and inference of their neural network. They found the model was 97 percent accurate at classifying the morphology of even faint galaxies too difficult to categorize by eye. 

The resulting collection is the largest multi-band catalog of automated galaxy morphologies to date.

“We pushed the limits by three orders of magnitude, to objects that are 1,000 times fainter than the original ones,” said lead author Jesús Vega-Ferrero. “That is why we were able to include so many more galaxies in the catalog.”

The researchers are next combining the morphological classification predictions with additional factors including the age, mass, distance, star-formation rate, and chemical composition of the galaxies to enable a better understanding of the relationship between galaxy morphology and star formation.

Find the full study in Monthly Notices of the Royal Astronomical Society. A preprint of the paper is available on ArXiv. 

Read more >> 

Categories
Misc

I Made This Using TF Lite On a RPI 4

I Made This Using TF Lite On a RPI 4 submitted by /u/NathanielF478
[visit reddit] [comments]
Categories
Misc

I published a tutorial where I build a preprocessing pipeline for audio data

I often get a ton of questions from programmers and data scientists about audio data preprocessing:

– How can I extract spectrograms?

– How can I normalise the signal?

– What if I have files of different lengths?

To answer these questions and more, I published a tutorial where you can learn how to build an audio preprocessing pipeline for AI applications. The pipeline batch preprocesses audio files applying Short-Time Fourier Transform, zero-padding, normalisation all in one go!

This video is a new installment of the series “Generating sound with neural nets”, where you can learn to generate sound using Variational AutoEncoders.

Here’s the video:

https://www.youtube.com/watch?v=O04v3cgHNeM&list=PL-wATfeyAMNpEyENTc-tVH5tfLGKtSWPp&index=12

submitted by /u/diabulusInMusica
[visit reddit] [comments]

Categories
Misc

EV Technology Goes into Hyperdrive with Mercedes-Benz EQS

Mercedes-Benz is calling on its long heritage of luxury to accelerate electric vehicle technology with the new EQS sedan. The premium automaker lifted the wraps off the long-awaited flagship EV during a digital event today. The focal point of the revolutionary vehicle is the MBUX Hyperscreen, a truly intuitive and personalized AI cockpit, powered by Read article >

The post EV Technology Goes into Hyperdrive with Mercedes-Benz EQS appeared first on The Official NVIDIA Blog.

Categories
Misc

Knight Rider Rides a GAN: Bringing KITT to Life with AI, NVIDIA Omniverse

Fasten your seatbelts. NVIDIA Research is revving up a new deep learning engine that creates 3D object models from standard 2D images — and can bring iconic cars like the Knight Rider’s AI-powered KITT to life — in NVIDIA Omniverse. Developed by the NVIDIA AI Research Lab in Toronto, the GANverse3D application inflates flat images Read article >

The post Knight Rider Rides a GAN: Bringing KITT to Life with AI, NVIDIA Omniverse appeared first on The Official NVIDIA Blog.