Categories
Misc

How XSplit Delivers Rich Content for Live Streaming with NVIDIA Broadcast

In this interview, Miguel Molina, Director of Developer Relations at SplitmediaLabs, the makers of XSplit, discussed how they were able to easily integrate NVIDIA Broadcast into their vastly popular streaming service.

In this interview, Miguel Molina, Director of Developer Relations at SplitmediaLabs, the makers of XSplit, discussed how they were able to easily integrate NVIDIA Broadcast into their vastly popular streaming service. 

For those who may not know, tell us about yourself?

My name is Miguel Molina, currently the Director of Developer Relations at SplitmediaLabs, the makers of XSplit. I’ve been with the company since before its inception, starting out as a software engineer, moving onto product management, and finally landing in business development where I work with our industry partners to find integrations and opportunities that bring value to our customers.

Tell us about Xsplit and the success of the company thus far.

XSplit is the brand that got us to where we are now and XSplit Broadcaster is the hero product behind it all. It’s a simple yet powerful live streaming and recording software for producing and delivering rich video content that powers countless live streams and recordings around the world.




What excited you most about NVIDIA Broadcast Engine?

Being able to add value to our products is a priority for us and the NVIDIA Broadcast Engine gives us just that in a straightforward package. With features that improve video, audio, and augmented reality, the SDK has the potential to massively improve the output of different types of media, vastly improving the user experience for various use cases.

Why were you interested in integrating the Audio Effects SDK?

We were looking for an alternative to CPU-based background noise removal and NVIDIA’s demo videos showing off NVIDIA’s noise removal feature got us sold on the idea. After receiving  a sample, we decided to commit to integrating it into XSplit Broadcaster.

How was the experience integrating the SDK?

It was as simple as looking at the sample code, putting the relevant code segments in their proper places, and hitting compile. The initial integration itself just took a few hours and a working build was available the same day we started on it.

Any surprises or unexpected challenges?

We were initially having massive CUDA utilization in an early alpha build of the SDK but NVIDIA engineers were very responsive and quickly isolated the issue on their end and were able to provide an updated build that fixed the problem. 

How have your users responded to the improved experience?

Our users love the fact that they are able to utilize NVIDIA’s noise removal natively within XSplit Broadcaster. It’s as simple as turning it on and it just works.

What new features or SDKs from NVIDIA are you looking forward to now?

We are looking to update our NVIDIA Video Codec SDK implementation so we can provide better granular preset control over quality versus performance on NVENC.

Which of the NBX SDKs are you most interested in beyond Audio?

Definitely the Video Effects SDK as their Virtual Background and Super Resolution features would be quite useful with people mostly staying at home these days.

+++

Developers can download XSplit Broadcaster here.

To learn more about NVIDIA Broadcast, or to get started, visit our page here.

Categories
Misc

How do I identify matching objects in a pair of stereo images?


How do I identify matching objects in a pair of stereo images?


Left and Right images

So, for instance, I have a pair of stereo images (as an example,
here I have duplicated the photo to represent left and right
images) of certain objects (in this case dogs and cats). I want to
match the dogs in the 2 images, i.e the network should identify
that if there’s a ‘Dog 1’ in the left image, then which dog in the
right image is the corresponding match for ‘Dog 1’. And similarly
for other objects as well.

I can perform instance segmentation on the images and get the
object boundaries and the masks for both left and right images, but
how do match the objects in the stereo image pair?

I was thinking of using Siamese Networks to get a similarity
score, but pretty clueless on how to proceed with that.

Any help would be great! TIA!

submitted by /u/chinmaygrg

[visit reddit]

[comments]

Categories
Misc

Amid CES, NVIDIA Packs Flying, Driving, Gaming Tech News into a Single Week

Flying, driving, gaming, racing… amid the first-ever virtual Consumer Electronics Show this week, NVIDIA-powered technologies spilled out in all directions. In automotive, Chinese automakers SAIC and NIO announced they’ll use NVIDIA DRIVE in future vehicles. In gaming, NVIDIA on Tuesday led off a slew of gaming announcements by revealing the affordable new RTX 3060 GPU Read article >

The post Amid CES, NVIDIA Packs Flying, Driving, Gaming Tech News into a Single Week appeared first on The Official NVIDIA Blog.

Categories
Misc

I published a step-by-step tutorial on how to save autoencoders with Python/Keras

I published a tutorial where I explain how to save an
AutoEncoder with Python + Keras. In particular, in this video
you’ll learn how to save/load the Autoencoder class parameters
with pickle and the model weights with methods native to the Keras
API.

This video is part of a series called “Generating Sound with
Neural Networks”. In this series, you’ll learn how to generate
sound from audio files and spectrograms 🎧 🎧 using Variational
Autoencoders 🤖 🤖

Here’s the video:


https://www.youtube.com/watch?v=UIC0Irq-Eok&list=PL-wATfeyAMNpEyENTc-tVH5tfLGKtSWPp&index=7

submitted by /u/diabulusInMusica

[visit reddit]

[comments]

Categories
Misc

How do I visualize data from my Chat Bot?

I made a chatbot using TensorFlow, from Tech With Tim’s tutorial. I
changed it for a discord bot and flask. But for my project I want
to somehow show ANY DATA, but in visual form, graphs, pie charts,
bars. I don’t know how to use TensorBoard to visualize my chatbot
data.

This is my code:
https://github.com/hootloot/Tensorflow-Question/blob/main/main.py

Thank you

submitted by /u/chopchopstiicks

[visit reddit]

[comments]

Categories
Offsites

Recognizing Pose Similarity in Images and Videos

Everyday actions, such as jogging, reading a book, pouring water, or playing sports, can be viewed as a sequence of poses, consisting of the position and orientation of a person’s body. An understanding of poses from images and videos is a crucial step for enabling a range of applications, including augmented reality display, full-body gesture control, and physical exercise quantification. However, a 3-dimensional pose captured in two dimensions in images and videos appears different depending on the viewpoint of the camera. The ability to recognize similarity in 3D pose using only 2D information will help vision systems better understand the world.

In “View-Invariant Probabilistic Embedding for Human Pose” (Pr-VIPE), a spotlight paper at ECCV 2020, we present a new algorithm for human pose perception that recognizes similarity in human body poses across different camera views by mapping 2D body pose keypoints to a view-invariant embedding space. This ability enables tasks, such as pose retrieval, action recognition, action video synchronization, and more. Compared to existing models that directly map 2D pose keypoints to 3D pose keypoints, the Pr-VIPE embedding space is (1) view-invariant, (2) probabilistic in order to capture 2D input ambiguity, and (3) does not require camera parameters during training or inference. Trained with in-lab setting data, the model works on in-the-wild images out of the box, given a reasonably good 2D pose estimator (e.g., PersonLab, BlazePose, among others). The model is simple, results in compact embeddings, and can be trained (in ~1 day) using 15 CPUs. We have released the code on our GitHub repo.

Pr-VIPE can be directly applied to align videos from different views.

Pr-VIPE
The input to Pr-VIPE is a set of 2D keypoints, from any 2D pose estimator that produces a minimum of 13 body keypoints, and the output is the mean and variance of the pose embedding. The distances between embeddings of 2D poses correlate to their similarities in absolute 3D pose space. Our approach is based on two observations:

  • The same 3D pose may appear very different in 2D as the viewpoint changes.
  • The same 2D pose can be projected from different 3D poses.

The first observation motivates the need for view-invariance. To accomplish this, we define the matching probability, i.e., the likelihood that different 2D poses were projected from the same, or similar 3D poses. The matching probability predicted by Pr-VIPE for matching pose pairs should be higher than for non-matching pairs.

To address the second observation, Pr-VIPE utilizes a probabilistic embedding formulation. Because many 3D poses can project to the same or similar 2D poses, the model input exhibits an inherent ambiguity that is difficult to capture through deterministic mapping point-to-point in embedding space. Therefore, we map a 2D pose through a probabilistic mapping to an embedding distribution, of which we use the variance to represent the uncertainty of the input 2D pose. As an example, in the figure below the third 2D view of the 3D pose on the left is similar to the first 2D view of a different 3D pose on the right, so we map them into a similar location in the embedding space with large variances.

Pr-VIPE enables vision systems to recognize 2D poses across views. We embed 2D poses using Pr-VIPE such that the embeddings are (1) view-invariant (2D projections of similar 3D poses are embedded close together) and (2) probabilistic. By embedding detected 2D poses, Pr-VIPE enables direct retrieval of pose images from different views, and can also be applied to action recognition and video alignment.

View-Invariance
During training, we use 2D poses from two sources: multi-view images and projections of groundtruth 3D poses. Triplets of 2D poses (anchor, positive, and negative) are selected from a batch, where the anchor and positive are two different projections of the same 3D pose, and the negative is a projection of a non-matching 3D pose. Pr-VIPE then estimates the matching probability of 2D pose pairs from their embeddings.
During training, we push the matching probability of positive pairs to be close to 1 with a positive pairwise loss in which we minimize the embedding distance between positive pairs, and the matching probability of negative pairs to be small by maximizing the ratio of the matching probabilities between positive and negative pairs with a triplet ratio loss.

Overview of the Pr-VIPE model. During training, we apply three losses (triplet ratio loss, positive pairwise loss, and a prior loss that applies a unit Gaussian prior to our embeddings). During inference, the model maps an input 2D pose to a probabilistic, view-invariant embedding.

Probabilistic Embedding
Pr-VIPE maps a 2D pose to a probabilistic embedding as a multivariate Gaussian distribution using a sampling-based approach for similarity score computation between two distributions. During training, we use a Gaussian prior loss to regularize the predicted distribution.

Evaluation
We propose a new cross-view pose retrieval benchmark to evaluate the view-invariance property of the embedding. Given a monocular pose image, cross-view retrieval aims to retrieve the same pose from different views without using camera parameters. The results demonstrate that Pr-VIPE retrieves poses more accurately across views compared to baseline methods in both evaluated datasets (Human3.6M, MPI-INF-3DHP).

Pr-VIPE retrieves poses across different views more accurately relative to the baseline method (3D pose estimation).

Common 3D pose estimation methods (such as the simple baseline used for comparison above, SemGCN, and EpipolarPose, amongst many others), predict 3D poses in camera coordinates, which are not directly view-invariant. Thus, rigid alignment between every query-index pair is required for retrieval using estimated 3D poses, which is computationally expensive due to the need for singular value decomposition (SVD). In contrast, Pr-VIPE embeddings can be directly used for distance computation in Euclidean space, without any post-processing.

Applications
View-invariant pose embedding can be applied to many image and video related tasks. Below, we show Pr-VIPE applied to cross-view retrieval on in-the-wild images without using camera parameters.


We can retrieve in-the-wild images from different views without using camera parameters by embedding the detected 2D pose using Pr-VIPE. Using the query image (top row), we search for a matching pose from a different camera view and we show the nearest neighbor retrieval (bottom row). This enables us to search for matching poses across camera views more easily.

The same Pr-VIPE model can also be used for video alignment. To do so, we stack Pr-VIPE embeddings within a small time window, and use the dynamic time warping (DTW) algorithm to align video pairs.

Manual video alignment is difficult and time-consuming. Here, Pr-VIPE is applied to automatically align videos of the same action repeated from different views.

The video alignment distance calculated via DTW can then be used for action recognition by classifying videos using nearest neighbor search. We evaluate the Pr-VIPE embedding using the Penn Action dataset and demonstrate that using the Pr-VIPE embedding without fine-tuning on the target dataset, yields highly competitive recognition accuracy. In addition, we show that Pr-VIPE even achieves relatively accurate results using only videos from a single view in the index set.

Pr-VIPE recognizes action across views using pose inputs only, and is comparable to or better than methods using pose only or with additional context information (such as Iqbal et al., Liu and Yuan, Luvizon et al., and Du et al.). When action labels are only available for videos from a single view, Pr-VIPE (1-view only) can still achieve relatively accurate results.

Conclusion
We introduce the Pr-VIPE model for mapping 2D human poses to a view-invariant probabilistic embedding space, and show that the learned embeddings can be directly used for pose retrieval, action recognition, and video alignment. Our cross-view retrieval benchmark can be used to test the view-invariant property of other embeddings. We look forward to hearing about what you can do with pose embeddings!

Acknowledgments
Special thanks to Jiaping Zhao, Liang-Chieh Chen, Long Zhao (Rutgers University), Liangzhe Yuan, Yuxiao Wang, Florian Schroff, Hartwig Adam, and the Mobile Vision team for the wonderful collaboration and support.

Categories
Misc

IM AI: China Automaker SAIC Unveils EV Brand Powered by NVIDIA DRIVE Orin

There’s a new brand of automotive intelligence equipped with the brains — and the battery — to go the distance. SAIC, the largest automaker in China, joined forces with etail giant Alibaba to unveil a new premium EV brand, dubbed IM, or “intelligence in motion.” The long-range electric vehicles will feature AI capabilities powered by Read article >

The post IM AI: China Automaker SAIC Unveils EV Brand Powered by NVIDIA DRIVE Orin appeared first on The Official NVIDIA Blog.

Categories
Misc

Glassdoor Ranks NVIDIA No. 2 in Latest Best Places to Work List

NVIDIA is the second-best place to work in the U.S. according to a ranking released today by Glassdoor. The site’s Best Places to Work in 2021 list rates the 100 best U.S. companies with more than 1,000 employees, based on how their own employees rate career opportunities, company culture and senior management. The survey’s top Read article >

The post Glassdoor Ranks NVIDIA No. 2 in Latest Best Places to Work List appeared first on The Official NVIDIA Blog.

Categories
Misc

Question regarding an error in my NumPy array.

def load_data(dir_list, image_size):

X = []

Y = []

image_width, image_height = image_size

for directory in dir_list:

for filename in listdir(directory):

image = cv2.imread(directory + ‘//’ + filename)

imgres = resize(image, (240,240,3))

img_resized = cv2.resize(imgres, dsize = (image_width,
image_height),interpolation=cv2.INTER_CUBIC)

X.append(image)

if directory[-3:] == ‘yes’:

Y.append([1])

else:

Y.append([0])

X = np.array(X)

Y = np.array(Y)

X, Y = shuffle(X, Y)

print(f’Number of examples is: {len(X)}’)

print(f’X shape is: {X.shape}’)

print(f’y shape is: {Y.shape}’)

return X, Y

———————————————–

yes = ‘yes’

no = ‘no’

IMG_WIDTH, IMG_HEIGHT = (240, 240)

X, Y = load_data([yes, no], (IMG_WIDTH, IMG_HEIGHT))

——————————————————–

OUTPUT:

Number of examples is: 253

X shape is: (253,)

Y shape is: (253, 1)

——————————————————-

The X-Shape should be (253,240,240,3), however, I do not know
why it is missing the other numbers. Thank you for helping.

submitted by /u/-KingKrazy-

[visit reddit]

[comments]

Categories
Misc

Machine Learning Metadata (MLMD) : A Library To Track Full Lineage Of Machine Learning Workflow

Version control is used to keep track of modifications made in a
software code. Similarly, when building machine learning (ML)
systems, it is essential to track things, such as the datasets used
to train the model, the hyperparameters and pipeline used, the
version of tensorflow used to create the model, and many more.

ML artifacts’ history and lineage are very complicated than a
simple, linear log. Git can be used to track the code to one
extent, but we need something to track your models, datasets, and
more. The complexity of ML code and artifacts like models,
datasets, and much more requires a similar approach.

Article:
https://www.marktechpost.com/2021/01/12/machine-learning-metadata-mlmd-a-library-to-track-full-lineage-of-machine-learning-workflow/

Github: https://github.com/google/ml-metadata

submitted by /u/ai-lover

[visit reddit]

[comments]