Categories
Misc

NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim

Robots are not just limited to the assembly line. At NVIDIA, Liila Torabi works on making the next generation of robotics possible. Torabi is the senior product manager for Isaac Sim, a robotics and AI simulation platform powered by NVIDIA Omniverse. Torabi spoke with NVIDIA AI Podcast host Noah Kravitz about the new era of Read article >

The post NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim appeared first on The Official NVIDIA Blog.

Categories
Misc

Tensorflow working in python2 not python3

Hey All, I’m facing a very weird issue with tensorflow on my remote GPU machine.

When i do import tensorflow in python2, it seems to work. However, it doesn’t work in python3.

How is this happening? Has someone faced this before and knows the fix to this?

submitted by /u/Conanobrain
[visit reddit] [comments]

Categories
Misc

Strange training results: why is a batch size of 1 more efficient than larger batch sizes, despite using a GPU/TPU?

Hey all!

I’m currently doing some tests in preparation for my first real bit of training. I’m using Google Cloud AI Platform to train, and am trying to find the optimal machine setup. It’s a work in progress, but here’s a table I’m putting together to get a sense of the efficiency of each setup. On the left you’ll see the accelerator type, ordered from least to most expensive. Here you’ll also find the number of accelerator’s used, the cost per hour, and the batch size. To the right are the average time it took to complete an entire training iteration and how long it took to complete the minimization step. You’ll notice that the values are almost identical for each setup; I’m using Google Research’s SEED RL, so I thought to record both values since I’m not sure exactly everything that happens between iterations. Turns out it’s not much. There’s also a calculation of the the time it takes to complete a single “step” (aka, a single observation from a single environment), as well as the average cost per step.

So, the problem. I was under the assumption that batching with a GPU or TPU would increase the efficiency of the training. Instead, it turns out that a batch size of 1 is the most efficient both in time per step and cost per step. I’m still fairly new to ML so maybe it’s just a matter of me being uninformed, but this goes against everything I thought I knew about ML training. Since I’m using Google’s own SEED codebase I would assume that it’s not a problem with the code, but I can’t be sure about that.

Is this just a matter of me misunderstanding how training works, or am I right in thinking something is really off?

submitted by /u/EdvardDashD
[visit reddit] [comments]

Categories
Misc

What’s New in Optical Flow SDK 3.0

NVIDIA Optical Flow SDK exposes the APIs to use this Optical Flow hardware (also referred to as NVOFA) to accelerate applications.

The NVIDIA Turing architecture introduced a new hardware functionality for computing optical flow between a pair of images with very high performance. NVIDIA Optical Flow SDK exposes the APIs to use this Optical Flow hardware (also referred to as NVOFA) to accelerate applications. We are excited to announce the availability of Optical Flow SDK 3.0 with the following new features: 

  1. DirectX 12 Optical Flow API 
  2. Forward-Backward Optical Flow via a single API 
  3. Global Flow vector 

DirectX 12 Optical Flow API

DirectX 12 is a low-level programming API from Microsoft which reduces driver overhead in comparison to its predecessor DirectX 11. DirectX 12 provides more flexibility and fine-grained control to the developer. Developers can now take advantage of the low-level programming APIs in DirectX 12 and optimize their applications to give better performance over earlier DirectX versions – at the same time, the client application, on its own, must take care of resource management, synchronization, etc. DirectX 12 has rapidly grown amongst game titles and other graphics applications.  

Optical Flow SDK 3.0 enables DirectX 12 applications to use the NVIDIA Optical Flow engine. The computed optical flow can be used to increase frame rate in games and videos for smoother experience or in object tracking. To increase the frame rate, Frame Rate Up Conversion (FRUC) techniques are used by inserting interpolated frames between original frames. Interpolation algorithms use the flow between frame pair(s) to generate the intermediate frame.

All generations of Optical Flow Hardware support DirectX 12 Optical Flow interface. The Optical Flow SDK package contains header(s), sample applications that demonstrate the usage, C++ wrapper classes which can be re-used or modified as required and documentation. All the other components for accessing the Optical Flow hardware are included in the NVIDIA display driver. DirectX 12 Optical Flow API is supported on Windows 20H1 or later operating system.

Barring the explicit synchronization, DirectX 12 Optical Flow API is designed to be close to the other interfaces that are already available in the SDK (CUDA and DirectX 11). The DirectX 12 Optical Flow API consists of three core functions: initialization, flow estimation and destruction.

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFINIT) (NvOFHandle hOf, const NV_OF_INIT_PARAMS* initParams);

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFEXECUTED3D12) (NvOFHandle hOf, const NV_OF_EXECUTE_INPUT_PARAMS_D3D12* executeInParams, NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12* executeOutParams);

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFDESTROY) (NvOFHandle hOf);

Initialization and destroy APIs are same across all interfaces but Execute API differs between DirectX 12 and other interfaces i.e., DirectX 11 and CUDA. Even though most of the parameters passed to the Execute API in DirectX 12 are same as those in other two interfaces, there are some functional differences. Synchronization in DirectX 11 and CUDA interfaces is automatically taken care by OS runtime and driver. However, in DirectX 12, additional information about fence and fence values are required as input parameters to the Execute API. These fence objects will be used to synchronize the CPU↔GPU and GPU↔GPU operations.  For more details, please refer to the programming guide included with the Optical Flow SDK.

Buffer management API interface in DirectX 12 also needs fence objects for synchronization.

The Optical Flow output quality is same across all interfaces. Performance in DirectX 12 should be very close compared to the other two interfaces.

Forward-Backward Optical Flow (FB flow)

No Optical Flow algorithm can give 100% accurate flow. The flow is typically distorted in occluded regions. Sometimes, the cost provided by the NVOFA may not also represent true confidence of the flow. One simple check usually employed is to compare the forward and backward flow. If the Euclidean distance between forward flow and backward flow exceeds a threshold, the flow can be marked as invalid.

To estimate flow in both directions, client must call the Execute API twice: one call with input and reference images and second call after reversing the input and reference images. Calling the Optical Flow Execute API two times like this can result in suboptimal performance due to overheads such as context switching, thread switching etc. Optical Flow SDK 3.0 exposes a new API to generate flow in both directions in a single Execute call. This feature can be enabled by setting NV_OF_INIT_PARAMS::predDirection to NV_OF_PRED_DIRECTION_BOTH in initialization and providing necessary buffers to receive backward flow and/or cost in NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::bwdOutputBuffer, NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::bwdOutputCostBuffer.  

Figure 1. Source Images (http://ultravideo.fi/#testsequences)
NVIDIA Optical Flow
Figure 2. (b) Forward Flow, (c) Backward Flow, (d) Consistency Check. Black pixels in the image shows inconsistent flow, (e) Nearest Neighbor Infill 

Once the flow is generated in both directions, client application can compare the flow vectors of both directions, discard the inaccurate ones based on a suitable criteria (e.g. Euclidian distance between forward and backward flow vectors), and use hole filling algorithms to fill such discarded flow vectors.

Note that the output quality of FB flow could be different from unidirectional flow due to some optimizations.

Sample code that demonstrates FB flow API programming and consistency check:

// Initialization of API
NV_OF_INIT_PARAMS initParams = { 0 };
...
initParams.predDirection = NV_OF_PRED_DIRECTION_BOTH;
...
NvOFAPI->nvOFInit(hNvOF, &initParams);
// Estimation of forward and backward flow
NV_OF_EXECUTE_INPUT_PARAMS executeInParams = { 0 };
...
NV_OF_EXECUTE_OUTPUT_PARAMS executeOutParams = { 0 };
...
executeOutParams.outputBuffer = forwardFlowBuffer;
executeOutParams.outputCostBuffer = forwardFlowCostBuffer;
executeOutParams.bwdOutputBuffer = backwardFlowBuffer;
executeOutParams.bwdOutputCostBuffer = backwardFlowCostBuffer;

NvOFAPI->nvOFExecute(hNvOF, &executeInparams, &executeOutParams)


// Invalidating flow vectors
for (int y = 0; y  width - 1 || y2  thresh) {
            SetFlowInvalid(forwardFlowBuffer, x, y);
        }
    }
}

Global Flow Estimation

Global flow in a video sequence or game is caused by camera panning motion. Global flow estimation is an important tool widely used in image segmentation, video stitching or motion-based video analysis applications.

Global Flow vector can also be heuristically used in calculating background motion. Once background motion is estimated, this can be used to fill the flow vectors in occluded regions.  It can also be used to handle collisions of warped pixels in interpolated frames.

Global flow is calculated on the forward flow vectors, based on frequency of the occurrence and a few other heuristics.

To enable generation of global flow, initialization API needs to set the flag NV_OF_INIT_PARAMS:: enableGlobalFlow and provide the additional buffer NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::globalFlowBuffer in Execute API.

References

  1. NVIDIA Optical Flow SDK
  2. Developer Blog: An Introduction to the NVIDIA Optical Flow SDK
  3. Developer Blog: Accelerate OpenCV: Optical Flow Algorithms with NVIDIA Turing GPUs
Categories
Misc

NVIDIA Inception Partners Won Veterans Affairs AI Tech Sprint Awards with Latest AI Technologies

Hosted by the Department of Veterans Affairs (VA), the sprint is designed to foster collaboration with industry and academic partners on AI-enabled tools that leverage federal data to address a need for veterans.

Five NVIDIA Inception partners were named finalists at the 2020-2021 Artificial Intelligence Tech Sprint, a competition aimed at improving healthcare for veterans using the latest AI technology. Hosted by the Department of Veterans Affairs (VA), the sprint is designed to foster collaboration with industry and academic partners on AI-enabled tools that leverage federal data to address a need for veterans. See the official news release for more details about the competition. 

Participating teams gave presentations and demonstrations judged by panels of Veterans and other experts. 44 teams from industry and universities participated, addressing a range of health care challenges such as chronic conditions management, cancer screening, rehabilitation, patient experiences and more. 

Majority of the solutions from NVIDIA Inception partners were powered by NVIDIA Clara Guardian, a smart hospital solution for patient monitoring, mental health, and patient care.  

  • JumpStartCSR created ‘Holmz’ – a data source agnostic explainable AI (XAI) that provides a digital physical therapy to prevent and treat overexertion/repetitive stress injuries. Holmz can predict plantar fasciitis and identify root causes two weeks in advance of its occurrence with 97% accuracy and 99% accuracy one week in advance of occurrence. Holmz is also capable of predicting fatigue related falls and identifying their root causes 15 minutes in advance of their occurrence with 99% accuracy. The trained solutions are 4 to 5x faster using TensorFlow with NVIDIA T4 GPUs on AWS. They are piloting their XAI with the VA’s Physical Medicine and Rehabilitation group and expanding their digital physical therapy solution to include performance improvement and prediction for the US Army. 
  • Ouva is an Autonomous Remote Monitoring and Ambient Intelligence platform that is designed to improve patient safety and operational efficiency. Their solution leverages cuDNN, TensorRT, Transfer Learning Toolkit in NVIDIA Clara Guardian and runs on an NVIDIA RTX 6000 GPU. During the course of VA tech sprint, Ouva was able to predict sepsis infection 24 hours ahead of clinical diagnosis by looking at nine biomarkers from EMR collected from 40,000 patients. By processing data from telehealth cameras with AI, Ouva allows nurses to keep an eye on more patients at remote or isolation conditions. They have started deploying their remote care monitoring solution in Europe and piloting it in the US. Their near-term goal is to combine Ouva’s unique patient activity data with Electronic Medical Records (EMR) to uniquely predict more cases like Sepsis with increased accuracy.
  • PATH Decision Support Software, a GPU-accelerated expert system that recommends drug regimens for patients with type 2 diabetes. Their solution was developed on CUDA using NVIDIA GPUs on AWS and achieved 20x speedup. The software’s recommendations have shown a 2.1-point reduction in HbA1c and a $770/patient/year reduction in unplanned diabetes-related Medicaid claims. The team is currently working on EMR integration and expanding the list of medicines it can evaluate.
  • Dialisa is a digital nutritional Intervention platform that detects onset and monitors progression of chronic kidney disease (CKD) and delivers digital nutritional intervention to delay the progress of the disease. They use TensorFlow and CUDA optimized on NVIDIA GPUs on AWS, achieving an accuracy of 93.2% in detecting onset of CKD using generic longitudinal dataset provided by the National Artificial Intelligence Institute (NAII) at the Veterans Administrations. Dialisa looks to demonstrate the feasibility of their algorithms and remote monitoring platform in the veterans community, and welcome additional clinical partners to conduct pilot study in the general population.
  • KELLS, an AI-powered dental diagnostic platform, used TensorFlow and PyTorch optimized on NVIDIA T4 GPUs on AWS P3 instances and achieved 3x higher efficiency in training than previously used AWS G4 instances. KELLS’ technology has demonstrated to improve accuracy of detecting common dental pathologies up to 30% for average dentists, especially for detecting early signs of oral disease, enabling more timely and effective preventative care. They continue to expand their platform to support more clinical findings across different data modalities and increase quality and accessibility of dental care for patients.

“We were overwhelmed with the overall quality of proposals in this very competitive cycle, it’s a great tribute to our mission to serve our veterans,” said Artificial Intelligence Tech Sprint lead Rafael Fricks. “Very sophisticated AI capabilities are more accessible than ever; a few years ago these proposals wouldn’t have been possible outside the realm of very specialized high performance computing.” 

NVIDIA looks forward to providing continued support to these winning Inception Partners in the coming Pilot Implementation program phase, and to the contributions their AI solutions will make to the important VA/VHA healthcare mission serving our Nation’s veterans. 

Build your patient care solutions on Clara Guardian >

Categories
Misc

Fighting Disease-Carrying Mosquitoes with Neural Networks

Targeting areas populated with disease-carrying mosquitoes just got easier thanks to a new study. The research, recently published in IEEE Explore, uses deep learning to recognize tiger mosquitoes from images taken by citizen scientists with near perfect accuracy. “Identifying the mosquitoes is fundamental, as the diseases they transmit continue to be a major public health … Continued

Targeting areas populated with disease-carrying mosquitoes just got easier thanks to a new study. The research, recently published in IEEE Explore, uses deep learning to recognize tiger mosquitoes from images taken by citizen scientists with near perfect accuracy.

“Identifying the mosquitoes is fundamental, as the diseases they transmit continue to be a major public health issue,” said lead author Gereziher Adhane.

The study, from researchers at the Scene understanding and artificial intelligence (SUNAI) research group, of the Universitat Oberta de Catalunya’s (UOC) Faculty of Computer Science, Multimedia and Telecommunications and of the eHealth Center, uses images from the Mosquito Alert app. Developed in Spain and currently expanding globally, the platform brings together citizens, entomologists, public health authorities, and mosquito control services to reduce mosquito-borne diseases. 

Anyone in the world can upload geo-tagged images of mosquitoes to the app. Once submitted, three expert entomologists inspect and validate the images before they are added to the database, classified, and mapped. 

Travel and migration, along with climate change and urbanization has broadened the range and habitat of mosquitoes. Quick identification of species such as the tiger mosquito –  known to transmit dengue, Zika, chikungunya, and yellow fever – remains a key step in assisting relevant authorities to curb their spread. 

A comparison of tiger mosquito and other mosquito images.
Sample of tiger [first row] and non-tiger [second row] mosquitoes from the Mosquito Alert data set. Credit: G. Adhane et al/IEEE Explore

“This type of analysis depends largely on human expertise and requires the collaboration of professionals, is typically time-consuming, and is not cost-effective because of the possible rapid propagation of invasive species,” said Adhane. “This is where neural networks can play a role as a practical solution for controlling the spread of mosquitoes.”

The research team developed a deep convolutional neural network (CNN) that distinguishes between mosquito species. Starting with a pre-trained model, they fine-tuned it using the hand-labeled Mosquito Alert dataset. Using NVIDIA GPUs and the cuDNN-accelerated PyTorch deep learning framework, the classification models were taught to pinpoint tiger mosquitoes based on identifiable morphological features such as white stripes on the legs, abdominal patches, head, and thorax shape.

Deep learning models typically rely on millions of samples. However, using only 6,378 images of both tiger and non-tiger mosquitoes from Mosquito Alert, the researchers were able to train the model with about 94% accuracy. 

“The neural network we have developed can perform as well or nearly as well as a human expert and the algorithm is sufficiently powerful to process massive amounts of images,” said Adhane.

According to the researchers, as Mosquito Alert scales up, the study can be expanded to classify multiple species of mosquitoes and their breeding sites across the globe. 

“The model we have developed could be used in practical applications with small modifications to work with mobile apps. Using this trained network it is possible to make predictions about images of mosquitoes taken using smartphones efficiently and in real time,” Adhane said. 

The GPU used in the research was a donation provided by the NVIDIA Academic Hardware Grant Program.

Read the full article in IEEE Explore >>

Read more >>  

Categories
Misc

Object Detection Dependencies take hours to install

I’m kind of a noob at tensorflow, built a few classifieds but that’s it. I’m now trying to use the tensorflow object detection API, but the command to install the dependencies takes hours to install everything.

This would be fine if it weren’t for the fact that I’m in a research setting that requires me to run in a docker container. I could create a custom docker image and solve it that way, but does anyone know how to make this process go faster?

submitted by /u/prinse4515
[visit reddit] [comments]

Categories
Misc

QuickDraw – an online game developed by Google, combined with AirGesture – a simple gesture recognition application

QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application submitted by /u/1991viet
[visit reddit] [comments]
Categories
Offsites

Reducing the Computational Cost of Deep Reinforcement Learning Research

It is widely accepted that the enormous growth of deep reinforcement learning research, which combines traditional reinforcement learning with deep neural networks, began with the publication of the seminal DQN algorithm. This paper demonstrated the potential of this combination, showing that it could produce agents that could play a number of Atari 2600 games very effectively. Since then, there have been several approaches that have built on and improved the original DQN. The popular Rainbow algorithm combined a number of these recent advances to achieve state-of-the-art performance on the ALE benchmark. This advance, however, came at a very high computational cost, which has the unfortunate side effect of widening the gap between those with ample access to computational resources and those without.

In “Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research”, to be presented at ICML 2021, we revisit this algorithm on a set of small- and medium-sized tasks. We first discuss the computational cost associated with the Rainbow algorithm. We explore how the same conclusions regarding the benefits of combining the various algorithmic components can be reached with smaller-scale experiments, and further generalize that idea to how research done on a smaller computational budget can provide valuable scientific insights.

The Cost of Rainbow
A major reason for the computational cost of Rainbow is that the standards in academic publishing often require evaluating new algorithms on large benchmarks like ALE, which consists of 57 Atari 2600 games that reinforcement learning agents may learn to play. For a typical game, it takes roughly five days to train a model using a Tesla P100 GPU. Furthermore, if one wants to establish meaningful confidence bounds, it is common to perform at least five independent runs. Thus, to train Rainbow on the full suite of 57 games required around 34,200 GPU hours (or 1425 days) in order to provide convincing empirical performance statistics. In other words, such experiments are only feasible if one is able to train on multiple GPUs in parallel, which can be prohibitive for smaller research groups.

Revisiting Rainbow
As in the original Rainbow paper, we evaluate the effect of adding the following components to the original DQN algorithm: double Q-learning, prioritized experience replay, dueling networks, multi-step learning, distributional RL, and noisy nets.

We evaluate on a set of four classic control environments, which can be fully trained in 10-20 minutes (compared to five days for ALE games):

Upper left: In CartPole, the task is to balance a pole on a cart that the agent can move left and right. Upper right: In Acrobot, there are two arms and two joints, where the agent applies force to the joint between the two arms in order to raise the lower arm above a threshold. Lower left: In LunarLander, the agent is meant to land the spaceship between the two flags. Lower right: In MountainCar, the agent must build up momentum between two hills to drive to the top of the rightmost hill.

We investigated the effect of both independently adding each of the components to DQN, as well as removing each from the full Rainbow algorithm. As in the original Rainbow paper, we find that, in aggregate, the addition of each of these algorithms does improve learning over the base DQN. However, we also found some important differences, such as the fact that distributional RL — commonly thought to be a positive addition on its own — does not always yield improvements on its own. Indeed, in contrast to the ALE results in the Rainbow paper, in the classic control environments, distributional RL only yields an improvement when combined with another component.

Each plot shows the training progress when adding the various components to DQN. The x-axis is training steps,the y-axis is performance (higher is better).
Each plot shows the training progress when removing the various components from Rainbow. The x-axis is training steps,the y-axis is performance (higher is better).

We also re-ran the Rainbow experiments on the MinAtar environment, which consists of a set of five miniaturized Atari games, and found qualitatively similar results. The MinAtar games are roughly 10 times faster to train than the regular Atari 2600 games on which the original Rainbow algorithm was evaluated, but still share some interesting aspects, such as game dynamics and having pixel-based inputs to the agent. As such, they provide a challenging mid-level environment, in between the classic control and the full Atari 2600 games.

When viewed in aggregate, we find our results to be consistent with those of the original Rainbow paper — the impact resulting from each algorithmic component can vary from environment to environment. If we were to suggest a single agent that balances the tradeoffs of the different algorithmic components, our version of Rainbow would likely be consistent with the original, in that combining all components produces a better overall agent. However, there are important details in the variations of the different algorithmic components that merit a more thorough investigation.

Beyond the Rainbow
When DQN was introduced, it made use of the Huber loss and the RMSProp Optimizer. It has been common practice for researchers to use these same choices when building on DQN, as most of their effort is spent on other algorithmic design decisions. In the spirit of reassessing these assumptions, we revisited the loss function and optimizer used by DQN on a lower-cost, small-scale classic control and MinAtar environments. We ran some initial experiments using the Adam optimizer, which has lately been the most popular optimizer choice, combined with a simpler loss function, the mean-squared error loss (MSE). Since the selection of optimizer and loss function is often overlooked when developing a new algorithm, we were surprised to see that we observed a dramatic improvement on all the classic control and MinAtar environments.

We thus decided to evaluate the different ways of combining the two optimizers (RMSProp and Adam) with the two losses (Huber and MSE) on the full ALE suite (60 Atari 2600 games). We found that Adam+MSE is a superior combination than RMSProp+Huber.

Measuring the improvement Adam+MSE gives over the default DQN settings (RMSProp + Huber); higher is better.

Additionally, when comparing the various optimizer-loss combinations, we find that when using RMSProp, the Huber loss tends to perform better than MSE (illustrated by the gap between the solid and dotted orange lines).

Normalized scores aggregated over all 60 Atari 2600 games, comparing the different optimizer-loss combinations.

Conclusion
On a limited computational budget we were able to reproduce, at a high-level, the findings of the Rainbow paper and uncover new and interesting phenomena. Evidently it is much easier to revisit something than to discover it in the first place. Our intent with this work, however, was to argue for the relevance and significance of empirical research on small- and medium-scale environments. We believe that these less computationally intensive environments lend themselves well to a more critical and thorough analysis of the performance, behaviors, and intricacies of new algorithms.

We are by no means calling for less emphasis to be placed on large-scale benchmarks. We are simply urging researchers to consider smaller-scale environments as a valuable tool in their investigations, and reviewers to avoid dismissing empirical work that focuses on smaller-scale environments. By doing so, in addition to reducing the environmental impact of our experiments, we will get both a clearer picture of the research landscape and reduce the barriers for researchers from diverse and often underresourced communities, which can only help make our community and scientific advances stronger.

Acknowledgments
Thank you to Johan, the first author of this paper, for his hard work and persistence in seeing this through! We would also like to thank Marlos C. Machado, Sara Hooker, Matthieu Geist, Nino Vieillard, Hado van Hasselt, Eleni Triantafillou, and Brian Tanner for their insightful comments on this work.

Categories
Misc

Metropolis Spotlight: Viisights Uses AI to Understand Behavior and Predict What Might Happen Next

Developers can use the new viisights wise application powered by GPU-accelerated AI technologies to help organizations and municipalities worldwide avoid operational hazards and risk in their most valuable spaces.

Developers can use the new viisights wise application powered by GPU-accelerated AI technologies to help organizations and municipalities worldwide avoid operational hazards and risk in their most valuable spaces. 

With conventional imaging and analytics solutions, AI vision is used to detect the presence of people and objects and their basic attributes to better understand how a physical space is being used. For example, is there a pedestrian, cyclist, or vehicle on a roadway? While understanding what’s happening on a road or in a retail store is important, predicting what might happen next is critical to taking this vision to the next level and building even smarter spaces. 

viisights — an NVIDIA Metropolis partner — helps cities and enterprises make better decisions faster by offering an AI-enabled platform that analyzes complex behavior within video content. viisights uses the NVIDIA Deepstream SDK to accelerate their development workflow for its video analytics pipeline, cutting development time by half. DeepStream is a path to developing highly optimized video analysis pipelines including video decoding, multi-stream processing and accelerated image transformations in real-time, critical for high performance video analytics applications. For deployment, viisights uses NVIDIA TensorRT to further accelerate their application’s inference throughput at the edge with INT8 calibration. 

The viisights Wise application recognizes and classifies over 200 different types of objects and dozens of different types of actions and events. The ability to recognize that a person is running, falling, or throwing an object provides a far richer and more holistic understanding of how spaces are being used. By using viisights Wise, operations teams can gain deeper insights and identify behavioral patterns that can predict what might happen next. viisights Wise can process over 20 video streams on a single GPU in near real-time.

viisights technology protects public privacy, as it only analyzes general behavior patterns of individuals and groups and does not identify specifics like faces or license plates. Because the application only analyzes behavior patterns, viisights technology is incredibly easy to use, further increasing ROI.

Real-world situations and use cases for viisights Wise include: 

Hazard Detection: Detect operational hazards in real-time. These include large objects that might block human or vehicle mobility in both outdoor and indoor surroundings such as streets, sidewalks, production floors, and operational areas in transportation hubs. It can even detect environmental hazards like smoke or fire in outdoor areas where smoke detectors cannot be installed.

Traffic Management: Cities can make informed decisions on how to best modify roadways, traffic patterns, public transit infrastructure, and more. Real-time data also lets city traffic agencies take immediate action to mitigate traffic congestion and address other issues such as vehicles colliding, occupying restricted areas, driving the wrong direction, and blocking roads or junctions.

Crowd Insights: Large groups of people can behave in a wide variety of ways. viisights Wise can recognize the action of a crowd and determine whether it’s moving, running, gathering, or dispersing. This real-time detection of escalated behavior can help officials predict how quickly an event is growing and decide if there’s a reason for concern or a need to take action. In addition to analyzing behavior, viisights can identify three classifications of human groupings—“groups”, “gatherings”, and “crowds”—based on a user-specified size for each category. These user-configurable classifications reduce false alarms by configuring grouping alerts to suit the context. For example, a small group of six to eight people will be categorized as ordinary in the context of a playground but as a cause for concern in an airport setting. 

Group Density Determination: Determine group density (classified as “low”, “medium” or “high”) and count people crossing a line or getting into and out from a zone. Cities can use this insight to enforce social distancing mandates during public health emergencies and pandemics. Retail stores or entertainment facilities like stadiums or cinemas can also use the technology to estimate queue wait times and determine spacing between people in queues to preserve public health. This helps businesses, cities, and authorities assess staff allocation and work more efficiently. Counting people can also be used in operational scenarios, such as at train stations, where train operators can improve train line scheduling in real time based on the number of people entering and exiting the trains.

Sense Body Movements: Their AI enabled video analytics solution notices if people are falling or lying on the floor. Implications include medical distress, accidents, or even violence. Real-time behavioral recognition analytics can alert operations teams where every second of latency matters.

Abnormal Behavior: Identify actions that are a potential cause for operational team attention such as when running is being observed where walking is normal (ex. department store aisles, shopping malls, etc.) or climbing or jumping over fences. It can also understand contextual loitering and detect if individuals are spending excessive amounts of time near ATMs, parks, building lobbies, etc., or are moving irregularly from object to object (i.e., from car door to car door in a parking garage). viisights behavioral analytics apply the intelligence and power of AI-driven contextual understanding to detect and identify these incidents as they unfold so they can be addressed accordingly and issue alerts in real time. 

As one example, viisights is helping to manage operations in the city of Eilat, one of Israel’s largest tourist destinations. Their application has helped identify and manage events in densely populated entertainment areas. The application has helped in maintaining transportation safety. In one incident, a motorcycle was detected driving in parks where only bicycles are allowed and the situation was quickly handled and safety maintained. In another example, the city of Ashdod is using viisights to manage traffic efficiently. viisights traffic management capabilities monitor several dozen intersections in real-time for congestion and accidents. The data collected by viisights enables a quick response to ease traffic jams and helps improve traffic light scheduling at intersections. The value and impact of behavior analytics are significant and incredibly unique for many industries.

Want to learn more about computer vision and video analytics? Click here to read more Metropolis Spotlight posts.