Categories
Offsites

Music Conditioned 3D Dance Generation with AIST++

Dancing is a universal language found in nearly all cultures, and is an outlet many people use to express themselves on contemporary media platforms today. The ability to dance by composing movement patterns that align to music beats is a fundamental aspect of human behavior. However, dancing is a form of art that requires practice. In fact, professional training is often required to equip a dancer with a rich repertoire of dance motions needed to create expressive choreography. While this process is difficult for people, it is even more challenging for a machine learning (ML) model, because the task requires the ability to generate a continuous motion with high kinematic complexity, while capturing the non-linear relationship between the movements and the accompanying music.

In “AI Choreographer: Music-Conditioned 3D Dance Generation with AIST++”, presented at ICCV 2021, we propose a full-attention cross-modal Transformer (FACT) model can mimic and understand dance motions, and can even enhance a person’s ability to choreograph dance. Together with the model, we released a large-scale, multi-modal 3D dance motion dataset, AIST++, which contains 5.2 hours of 3D dance motion in 1408 sequences, covering 10 dance genres, each including multi-view videos with known camera poses. Through extensive user studies on AIST++, we find that the FACT model outperforms recent state-of-the-art methods, both qualitatively and quantitatively.

We present a novel full-attention cross-modal transformer (FACT) network that can generate realistic 3D dance motion (right) conditioned on music and a new 3D dance dataset, AIST++ (left).

We generate the proposed 3D motion dataset from the existing AIST Dance Database — a collection of videos of dance with musical accompaniment, but without any 3D information. AIST contains 10 dance genres: Old School (Break, Pop, Lock and Waack) and New School (Middle Hip-Hop, LA-style Hip-Hop, House, Krump, Street Jazz and Ballet Jazz). Although it contains multi-view videos of dancers, these cameras are not calibrated.

For our purposes, we recovered the camera calibration parameters and the 3D human motion in terms of parameters used by the widely used SMPL 3D model. The resulting database, AIST++, is a large-scale, 3D human dance motion dataset that contains a wide variety of 3D motion, paired with music. Each frame includes extensive annotations:

  • 9 views of camera intrinsic and extrinsic parameters;
  • 17 COCO-format human joint locations in both 2D and 3D;
  • 24 SMPL pose parameters along with the global scaling and translation.

The motions are equally distributed among all 10 dance genres, covering a wide variety of music tempos in beat per minute (BPM). Each genre of dance contains 85% basic movements and 15% advanced movements (longer choreographies freely designed by the dancers).

The AIST++ dataset also contains multi-view synchronized image data, making it useful for other research directions, such as 2D/3D pose estimation. To our knowledge, AIST++ is the largest 3D human dance dataset with 1408 sequences, 30 subjects and 10 dance genres, and with both basic and advanced choreographies.

An example of a 3D dance sequence in the AIST++ dataset. Left: Three views of the dance video from the AIST database. Right: Reconstructed 3D motion visualized in 3D mesh (top) and skeletons (bottom).

Because AIST is an instructional database, it records multiple dancers following the same choreography for different music with varying BPM, a common practice in dance. This posits a unique challenge in cross-modal sequence-to-sequence generation as the model needs to learn the one-to-many mapping between audio and motion. We carefully construct non-overlapping train and test subsets on AIST++ to ensure neither choreography nor music is shared across the subsets.

Full Attention Cross-Modal Transformer (FACT) Model
Using this data, we train the FACT model to generate 3D dance from music. The model begins by encoding seed motion and audio inputs using separate motion and audio transformers. The embeddings are then concatenated and sent to a cross-modal transformer, which learns the correspondence between both modalities and generates N future motion sequences. These sequences are then used to train the model in a self-supervised manner. All three transformers are jointly learned end-to-end. At test time, we apply this model in an autoregressive framework, where the predicted motion serves as the input to the next generation step. As a result, the FACT model is capable of generating long range dance motion frame-by-frame.

The FACT network takes in a music piece (Y) and a 2-second sequence of seed motion (X), then generates long-range future motions that correlate with the input music.

FACT involves three key design choices that are critical for producing realistic 3D dance motion from music.

  1. All of the transformers use a full-attention mask, which can be more expressive than typical causal models because internal tokens have access to all inputs.
  2. We train the model to predict N futures beyond the current input, instead of just the next motion. This encourages the network to pay more attention to the temporal context, and helps prevent the model from motion freezing or diverging after a few generation steps.
  3. We fuse the two embeddings (motion and audio) early and employ a deep 12-layer cross-modal transformer module, which is essential for training a model that actually pays attention to the input music.

Results
We evaluate the performance based on three metrics:

Motion Quality: We calculate the Frechet Inception Distance (FID) between the real dance motion sequences in the AIST++ test set and 40 model generated motion sequences, each with 1200 frames (20 secs). We denote the FID based on the geometric and kinetic features as FIDg and FIDk, respectively.

Generation Diversity: Similar to prior work, to evaluate the model’s ability to generate divers dance motions, we calculate the average Euclidean distance in the feature space across 40 generated motions on the AIST++ test set, again comparing geometric feature space (Distg) and in the kinetic feature space (Distk).

Four different dance choreographies (right) generated using different music, but the same two second seed motion (left). The genres of the conditioning music are: Break, Ballet Jazz, Krump and Middle Hip-hop. The seed motion comes from hip-hop dance.

Motion-Music Correlation: Because there is no well-designed metric to measure the correlation between input music (music beats) and generated 3D motion (kinematic beats), we propose a novel metric, called Beat Alignment Score (BeatAlign).

Kinetic velocity (blue curve) and kinematic beats (green dotted line) of the generated dance motion, as well as the music beats (orange dotted line). The kinematic beats are extracted by finding local minima from the kinetic velocity curve.

Quantitative Evaluation
We compare the performance of FACT on each of these metrics to that of other state-of-the-art methods.

Compared to three recent state-of-the-art methods (Li et al., Dancenet, and Dance Revolution), the FACT model generates motions that are more realistic, better correlated with input music, and more diversified when conditioned on different music. *Note that the Li et al. generated motions are discontinuous, making the average kinetic feature distance abnormally high.

We also perceptually evaluate the motion-music correlation with a user study in which each participant is asked to watch 10 videos showing one of our results and one random counterpart, and then select which dancer is more in sync with the music. The study consisted of 30 participants, ranging from professional dancers to people who rarely dance. Compared to each baseline, 81% prefered the FACT model output to that of Li et al., 71% prefered FACT to Dancenet, and 77% prefered it Dance Revolution. Interestingly, 75% of participants preferred the unpaired AIST++ dance motion to that generated by FACT, which is unsurprising since the original dance captures are highly expressive.

Qualitative Results
Compared with prior methods like DanceNet (left) and Li et. al. (middle), 3D dance generated using the FACT model (right) is more realistic and better correlated with input music.

More generated 3D dances using the FACT model.

Conclusion and Discussion
We present a model that can not only learn the audio-motion correspondence, but also can generate high quality 3D motion sequences conditioned on music. Because generating 3D movement from music is a nascent area of study, we hope our work will pave the way for future cross-modal audio to 3D motion generation. We are also releasing AIST++, the largest 3D human dance dataset to date. This proposed, multi-view, multi-genre, cross-modal 3D motion dataset can not only help research in the conditional 3D motion generation research but also human understanding research in general. We are releasing the code in our GitHub repository and the trained model here.

While our results show a promising direction in this problem of music conditioned 3D motion generation, there are more to be explored. First, our approach is kinematic-based and we do not reason about physical interactions between the dancer and the floor. Therefore the global translation can lead to artifacts, such as foot sliding and floating. Second, our model is currently deterministic. Exploring how to generate multiple realistic dances per music is an exciting direction.

Acknowledgements
We gratefully acknowledge the contribution of other co-authors, including Ruilong Li and David Ross. We thank Chen Sun, Austin Myers, Bryan Seybold and Abhijit Kundu for helpful discussions. We thank Emre Aksan and Jiaman Li for sharing their code. We also thank Kevin Murphy for the early attempts in this direction, as well as Peggy Chi and Pan Chen for the help on user study experiments.

Categories
Misc

How to Use NVIDIA Highlights, Freestyle and Montage in GeForce NOW

Imagine you’re sitting in Discord chat, telling your buddies about the last heroic round of your favorite game, where you broke through the enemy’s defenses and cinched the victory on your own. Your friends think you’re bluffing and demand proof. With GeForce NOW’s content capture tools running automatically in the cloud, you’ll have all the Read article >

The post How to Use NVIDIA Highlights, Freestyle and Montage in GeForce NOW appeared first on The Official NVIDIA Blog.

Categories
Misc

Trouble installing Tensorflow-Lite on a Raspberry Pi 4

Hi all, i am struggeling to get Tensorflow-Lite running on a Raspberry Pi 4. The problem is that the model (BirdNET-Lite on GitHub) uses one special operator from Tensorflow (RFFT) which has to be included. I would rather use a prebuilt bin than compiling myself. I have found the prebuilt bins from PINTO0309 in GitHub but don’t understand if they would be useable or if i have to look somewhere else. BirdNET is a software to identify birds by their sounds, and also a really cool (and free) app. Many thanks!

submitted by /u/FalsePlatinum
[visit reddit] [comments]

Categories
Misc

Question about TF lute models for mobile

Hi, I thought this would be the right place to ask.

I have a python program that uses an audio and image classification, would I be able to convert them and use them on mobile?

If so what language would be best for the mobile application.

Thanks.

submitted by /u/why________________
[visit reddit] [comments]

Categories
Misc

AutoDeploy – an automated model deployment library!!

AutoDeploy What is AutoDeploy?

A one liner : For the DevOps nerds, AutoDeploy allows configuration based MLOps.

For the rest : So you’re a data scientist and have the greatest model on planet earth to classify dogs and cats! :). What next? It’s a steeplearning cusrve from building your model to getting it to production. MLOps, Docker, Kubernetes, asynchronous, prometheus, logging, monitoring, versioning etc. Much more to do right before you The immediate next thoughts and tasks are

  • How do you get it out to your consumer to use as a service.
  • How do you monitor its use?
  • How do you test your model once deployed? And it can get trickier once you have multiple versions of your model. How do you perform A/B testing?
  • Can i configure custom metrics and monitor them?
  • What if my data distribution changes in production – how can i monitor data drift?
  • My models use different frameworks. Am i covered? … and many more.

What if you could only configure a single file and get up and running with a single command. That is what AutoDeploy is!

Read our documentation to know how to get setup and get to serving your models.

Feature Support.

  • Single Configuration file support.
  • Production Deployment.
  • Logging.
  • Model Monitoring.
  • Custom Metrics.
  • Visual Dashboard.
  • Docker.
  • Docker Compose.
  • Custom Exeption Handler.
  • Pydantic Validators.
  • Dynamic Database.
  • Data Drift Monitoring.
  • Async API Server.
  • Async Model Monitoring.
  • Production Architecture.
  • Kubernetes.
  • Batch Prediction.
  • Preprocess configuration.
  • Posprocess configuration.

submitted by /u/kartik4949
[visit reddit] [comments]

Categories
Misc

The Bright Continent: AI Fueling a Technological Revolution in Africa

AI is at play on a global stage, and local developers are stealing the show. Grassroot communities are essential to driving AI innovation, according to Kate Kallot, head of emerging areas at NVIDIA. On its opening day, Kallot gave a keynote speech at the largest AI Expo Africa to date, addressing a virtual crowd of Read article >

The post The Bright Continent: AI Fueling a Technological Revolution in Africa appeared first on The Official NVIDIA Blog.

Categories
Misc

Autonomy, Electrification, Sustainability Take Center Stage at Germany’s IAA Auto Show

The transportation industry is adding more torque toward realizing autonomy, electrification and sustainability. That was a key takeaway from Germany’s premier auto show, IAA Mobility 2021 (Internationale Automobil-Ausstellung), which took place this week in Munich. The event brought together leading automakers, as well as execs at companies that deliver mobility solutions spanning from electric vehicles Read article >

The post Autonomy, Electrification, Sustainability Take Center Stage at Germany’s IAA Auto Show appeared first on The Official NVIDIA Blog.

Categories
Misc

Streamline Your Model Builds with PyCaret + RAPIDS on NVIDIA GPUs

Running PyCarert on GPU not only streamline model building but offsets the time cost.

PyCaret is a low-code Python machine learning library based on the popular Caret library for R. It automates the data science process from data preprocessing to insights, such that short lines of code can accomplish each step with minimal manual effort. In addition, the ability to compare and tune many models with simple commands streamlines efficiency and productivity with less time spent in the weeds of creating useful models.

The PyCaret team added NVIDIA GPU support in version 2.2, including all the latest and greatest from RAPIDS. With GPU acceleration, PyCaret modeling times can be between 2 and 200 times faster depending on the workload.

This post will go over how to use PyCaret on GPUs to save both development and computation costs by an order of magnitude.

All benchmarks were run with nearly identical code on a machine with a 32-core CPU and four NVIDIA Tesla T4s. For simplicity, GPU code was written to run on a single GPU.

Getting started with PyCaret

Using PyCaret is as simple as importing the library and executing a set-up statement. The setup() function creates the environment and offers a host of pre-processing features all in one go.

from pycaret.regression import *
exp_reg = setup(data = df, target = ‘Year’, session_id = 123, normalize = True)

After a simple setup, a data scientist can develop the rest of their pipeline, including data preprocessing/preparation, model training, ensembling, analysis, and deployment. After the data is prepared, a great place to start is by comparing models.

True to PyCaret’s ethos of simplicity, we can compare a host of standard models to see which are best for our data with a single line of code. The compare_models command trains all the models in PyCaret’s model library using default hyperparameters and evaluates performance metrics using cross-validation. A data scientist can then select the models they’d like to use, tune, and ensemble based on this info.

top3 = compare_models(exclude = [‘ransac’], n_select=3)

Comparing Models

Figure 1: Output of the compare_models command in PyCaret.

**Models are sorted best to worst, and PyCaret highlights the top results in each metric category for ease of use.

Accelerating PyCaret with RAPIDS cuML

PyCaret is a great tool for any data scientist to have in their arsenal, as it streamlines model building and makes running many models easy. PyCaret can be made even better with GPUs. Since PyCaret does so much work behind the scenes, seemingly simple commands can take a long time. For example, we ran the commands preceding on a dataset with roughly half a million instances and over 90 attributes (UC Irvine’s Year Prediction MSD dataset). On the CPU, it took over 3 hours. On a GPU, it took less than half that.

In the past, using PyCaret on a GPU would have required many manual coding, but thankfully, the PyCaret team has integrated the RAPIDS machine learning library (cuML), meaning you can use the same simple API that makes PyCaret so effective while also using the computational ability of your GPU.

Running PyCaret on a GPU tends to be much faster-meaning you can make full use of everything PyCaret has to offer without balancing time costs. Using the same dataset just mentioned, we tested PyCaret ML functionality on both a CPU and a GPU, including comparing, creating, tuning, and ensembling models. Performing the switch to GPU is simple; we set use_gpu to True in the setup function:

exp_reg = setup(data = df, target = ‘Year’, session_id = 123, normalize = True, use_gpu = True)

With PyCaret set to run on GPU, it uses cuML to train all of the following models:

  • Logistic Regression
  • Ridge Classifier
  • Random Forest
  • K Neighbors Classifier
  • K Neighbors Regressor
  • Support Vector Machine
  • Linear Regression
  • Ridge Regression
  • Lasso Regression
  • K-Means Clustering
  • Density-Based Spatial Clustering

Running the same compare_models code solely on GPU was over 2.5 times as fast.

The impact was even greater on a model-by-model basis with popular but computationally expensive models. The K Neighbors Regressor, for example, was 265 times as fast on GPU.

Figure 2: Comparison of common PyCaret actions run on CPU versus GPU.

Impact

The simplicity of PyCaret’s API frees up time that would otherwise be spent coding so data scientists can do more experiments and fine-tune their experiments. When paired with GPUs, this impact is even greater, as the computation costs of taking full advantage of PyCaret’s suite of evaluation and comparison tools are significantly lower.

Conclusion

Extensive comparing and evaluating models can help improve the quality of your results, and doing so efficiently is exactly what PyCaret is for. PyCaret on GPU offsets the time costs that go along with so much processing.

The goal of RAPIDS is to accelerate your data science, and PyCaret is among a growing list of libraries whose compatibility with the RAPIDS suite can help bring a new layer of efficiency to your machine learning pursuits.

**Code used for this notebook can be found here.

Categories
Misc

Tips for Creating a Meaningful and Successful Virtual Hackathon

Combining mentoring, socializing, and specialized training proved key for the virtual 2021 KISTI GPU Hackathon.

Due to the coronavirus, the 2021 Korea Institute of Science and Technology Information (KISTI) GPU Hackathon was held virtually, under the guidance of expert mentors from KISTI, NVIDIA, and the OpenACC Organization. With the goal of inspiring possibilities for scientists to accelerate their AI research or HPC codes, the hackathon provided opportunities for solving research problems and expanding expertise using NVIDIA GPU parallel computing technology. 

Known for being a face-to-face event, a virtual hackathon poses its own challenges for both attendees and hosts. The new format also required juggling a diversity of teams—composed of three HPC and AI teams, four higher education and research teams, and two industry teams. 

The event team found the following recipe helped create a meaningful and successful experience for the participants:

Mentoring 

Based on their expertise in specific domains or programming languages, dedicated mentors were paired with teams for guidance in setting goals, and considering different approaches. The mentors collaboratively worked to solve problems and troubleshoot obstacles the teams encountered. Daily mentor sync-up calls each day kept everyone focused and working toward the best strategy for meeting their goals. 

Image of participants in a virtual meeting.
Figure 1. KISTI GPU Hackathon 2021.

Socializing 

Everyone knows that all work and no play can actually hinder a team’s productivity. The hackathon provided a TGIF social hour session for participants and mentors. Using the Metaverse Gather Town Space, mentors and teams shared experiences, recharged their batteries, and developed connections that helped them continue forward for the duration of the event.

Image of people gathering virtually for a happy hour.
Figure 2. The TGIF social hour.

Resources and Live Seminars

Another important ingredient to success was making specialized training and resources available to attendees. For example, an NVIDIA Deep Learning Institute (DLI) workshop covering CUDA C/C++ topics was presented by a DLI ambassador and mentor. Other mentors provided team-dedicated tech sessions focused on TRT and NVIDIA Triton, OpenACC, and Nsight systems for profiling, parallel computing, and optimization.

Image of people working at computer.
Figure 3. PaScaL team working on their project.

Hard Work Pays Off

The PasCal team from Yonsei University is developing a thermal fluid solver that efficiently calculates the thermal motion of turbulence. At this hackathon, the team converted existing code based on CPUs to a multi-GPU environment through OpenACC and cuFFT Library. This resulted in accelerating the calculations by 4.84 times RHS (right-hand side, fraction step) of one of most time consuming subroutines. 

The Amore Opt team from AmorePacific cosmetics company worked on GPU optimization of DeepLabV3 + segmentation model. By applying what they learned about the TensorRT Inference optimizer and NVIDIA Triton inference server, they improved inference speed making it 26 times faster. They did this while maintaining the accuracy in their AI models to detect skin problems for their future large-scale customer service. 

Video 1. TFC Team interview for KISTI Hackathon.

The TFC team from Seoul National University joined a project to accelerate a CPU-based Fortran in-house fluid calculation code. By using NVIDIA GPUs at KISTI, the team accelerated the time-consuming Tri-Diagonal Matrix Algorithm (TDMA) for thermal solver and momentum solver and Fast Fourier Transform (FFT) for pressure solver calculations. They achieved a speed 11.15 times faster on a single V100 GPU.

NVIDIA Inception member Nota and HangYang University teamed-up to optimize the Nota Model compression engine by leveraging the Tensor Core in NVIDIA GPUs for INT4 quantization. Named NOTA-HYU, the team learned to use NVIDIA profiling tools Nsight system and Nsight Compute. They then applied NVIDIA library CUTLASS to achieve an overall speed 1.85 times faster for their residual block with CUDA optimization.

For more information on GPU Hackathons and future events, visit https://www.gpuhackathons.org/

Also don’t miss out on the OpenACC Summit 2021 scheduled from September 14-15, 2021.

Categories
Misc

Trying to experiment with the concept of fine-tuning and transfer learning but am getting very low accuracy, would someone be willing to take a quick look at a PDF version of my Jupyter Notebook?

Hello,

I just started using Tensorflow and Keras not long along ago, and I really like the field of deep learning. Right now I am doing it as more of a hobby than anything, and I recently learned about the concepts of transfer learning and fine-tuning. I tried to apply them to a dataset of microscopic images using the tutorial here: https://www.tensorflow.org/tutorials/images/transfer_learning.

I am using ResNet50 with the ImageNet weights, but am far from getting good results. I think it might be because of the learning rate OR because of the activation function in my last layer OR because of the fact that I use the Adam optimizer and not SGD.

Would someone be willing to look into my code to see what’s wrong? I have uploaded it as a pdf here: https://www.mediafire.com/file/vteka9uje8lthnb/NNonNema.pdf/file

Please note that the document is long because I printed model.summary() at some point, which showed all the layers!

submitted by /u/ignoreorchange
[visit reddit] [comments]