Categories
Offsites

From Vision to Language: Semi-supervised Learning in Action…at Scale

Supervised learning, the machine learning task of training predictive models using data points with known outcomes (i.e., labeled data), is generally the preferred approach in industry because of its simplicity. However, supervised learning requires accurately labeled data, the collection of which is often labor intensive. In addition, as model efficiency improves with better architectures, algorithms, and hardware (GPUs / TPUs), training large models to achieve better quality becomes more accessible, which, in turn, requires even more labeled data for continued progress.

To mitigate such data acquisition challenges, semi-supervised learning, a machine learning paradigm that combines a small amount of labeled data with a large amount of unlabeled data, has recently seen success with methods such as UDA, SimCLR, and many others. In our previous work, we demonstrated for the first time that a semi-supervised learning approach, Noisy Student, can achieve state-of-the-art performance on ImageNet, a large-scale academic benchmark for image classification, by utilizing many more unlabeled examples.

Inspired by these results, today we are excited to present semi-supervised distillation (SSD), a simplified version of Noisy Student, and demonstrate its successful application to the language domain. We apply SSD to language understanding within the context of Google Search, resulting in high performance gains. This is the first successful instance of semi-supervised learning applied at such a large scale and demonstrates the potential impact of such approaches for production-scale systems.

Noisy Student Training
Prior to our development of Noisy Student, there was a large body of research into semi-supervised learning. In spite of this extensive research, however, such systems typically worked well only in the low-data regime, e.g., CIFAR, SVHN, and 10% ImageNet. When labeled data were abundant, such models were unable to compete with fully supervised learning systems, which prevented semi-supervised approaches from being applied to important applications in production, such as search engines and self-driving cars. This shortcoming motivated our development of Noisy Student Training, a semi-supervised learning approach that worked well in the high-data regime, and at the time achieved state-of-the-art accuracy on ImageNet using 130M additional unlabeled images.

Noisy Student Training has 4 simple steps:

  1. Train a classifier (the teacher) on labeled data.
  2. The teacher then infers pseudo-labels on a much larger unlabeled dataset.
  3. Then, it trains a larger classifier on the combined labeled and pseudo-labeled data, while also adding noise (noisy student).
  4. (Optional) Going back to step 2, the student may be used as a new teacher.
An illustration of Noisy Student Training through four simple steps. We use two types of noise: model noise (DropoutStochastic Depth) and input noise (data augmentation, such as RandAugment).

One can view Noisy Student as a form of self-training, because the model generates pseudo-labels with which it retrains itself to improve performance. A surprising property of Noisy Student Training is that the trained models work extremely well on robustness test sets for which it was not optimized, including ImageNet-A, ImageNet-C, and ImageNet-P. We hypothesize that the noise added during training not only helps with the learning, but also makes the model more robust.

Examples of images that are classified incorrectly by the baseline model, but correctly by Noisy Student. Left: An unmodified image from ImageNet-A. Middle and Right: Images with noise added, selected from ImageNet-C. For more examples including ImageNet-P, please see the paper.

Connections to Knowledge Distillation
Noisy Student is similar to knowledge distillation, which is a process of transferring knowledge from a large model (i.e., the teacher) to a smaller model (the student). The goal of distillation is to improve speed in order to build a model that is fast to run in production without sacrificing much in quality compared to the teacher. The simplest setup for distillation involves a single teacher and uses the same data, but in practice, one can use multiple teachers or a separate dataset for the student.

Simple illustrations of Noisy Student and knowledge distillation.

Unlike Noisy Student, knowledge distillation does not add noise during training (e.g., data augmentation or model regularization) and typically involves a smaller student model. In contrast, one can think of Noisy Student as the process of “knowledge expansion”.

Semi-Supervised Distillation
Another strategy for training production models is to apply Noisy Student training twice: first to get a larger teacher model T’ and then to derive a smaller student S. This approach produces a model that is better than either training with supervised learning or with Noisy Student training alone. Specifically, when applied to the vision domain for a family of EfficientNet models, ranging from EfficientNet-B0 with 5.3M parameters to EfficientNet-B7 with 66M parameters, this strategy achieves much better performance for each given model size (see Table 9 of the Noisy Student paper for more details).

Noisy Student training needs data augmentation, e.g., RandAugment (for vision) or SpecAugment (for speech), to work well. But in certain applications, e.g., natural language processing, such types of input noise are not readily available. For those applications, Noisy Student Training can be simplified to have no noise. In that case, the above two-stage process becomes a simpler method, which we call Semi-Supervised Distillation (SSD). First, the teacher model infers pseudo-labels on the unlabeled dataset from which we then train a new teacher model (T’) that is of equal-or-larger size than the original teacher model. This step, which is essentially self-training, is then followed by knowledge distillation to produce a smaller student model for production.

An illustration of Semi-Supervised Distillation (SSD), a 2-stage process that self-trains an equal-or-larger teacher (T’) before distilling to a student (S).

Improving Search
Having succeeded in the vision domain, an application in the language understanding domain, like Google Search, is a logical next step with broader user impact. In this case, we focus on an important ranking component in Search, which builds on BERT to better understand languages. This task turns out to be well-suited for SSD. Indeed, applying SSD to the ranking component to better understand the relevance of candidate search results to queries achieved one of the highest performance gains among top launches at Search in 2020. Below is an example of a query where the improved model demonstrates better language understanding.

With the implementation of SSD, Search is able to find documents that are more relevant to user queries.

Future Research & Challenges
We have presented a successful instance of semi-supervised distillation (SSD) in the production scale setting of Search. We believe SSD will continue changing the landscape of machine learning usage in the industry from predominantly supervised learning to semi-supervised learning. While our results are promising, there is still much research needed in how to efficiently utilize unlabeled examples in the real world, which is often noisy, and apply them to various domains.

Acknowledgements
Zhenshuai Ding, Yanping Huang, Elizabeth Tucker, Hai Qian, and Steve He contributed immensely to this successful launch. The project would not have succeeded without contributions from members of both the Brain and Search teams: Shuyuan Zhang, Rohan Anil, Zhifeng Chen, Rigel Swavely, Chris Waterson, Avinash Atreya. Thanks to Qizhe Xie and Zihang Dai for feedback on the work. Also, thanks to Quoc Le, Yonghui Wu, Sundeep Tirumalareddy, Alexander Grushetsky, Pandu Nayak for their leadership support.

Categories
Misc

eCommerce and Open Ethernet: Criteo Clicks with SONiC

When you see a browser ad for a new restaurant, or the perfect gift for that hard-to-please family member, you probably aren’t thinking about the infrastructure used to deliver that ad. However, that infrastructure is what allows advertising companies like Criteo to provide these insights. The NVIDIA networking portfolio is essential to Criteo technology stack. … Continued

When you see a browser ad for a new restaurant, or the perfect gift for that hard-to-please family member, you probably aren’t thinking about the infrastructure used to deliver that ad. However, that infrastructure is what allows advertising companies like Criteo to provide these insights. The NVIDIA networking portfolio is essential to Criteo technology stack.

Criteo is an online advertising platform, the tier between digital advertisers and publishers. This business requires Criteo to solve problems related to quantities of “web scale.” Criteo processes hundreds of billions of dollars in sales, driven by billions of ads a day over tens of thousands of servers, thousands of networking devices, and terabits of east-west traffic per second. The communication within and between Criteo’s 10 data centers (across three continents) is of paramount importance, with the network taking center stage.

Moving away from lock-in

Starting in 2014, Criteo embarked on an initiative to completely overhaul their networking strategy, modernizing infrastructure, and reducing costs. By multisourcing hardware from different vendors, Criteo would be able to reduce costs, gain more flexibility in the procurement process, and become less dependent on individual vendor supply chains.

Criteo's networking journey including monolithic era, multi-vendor, and network agility.
Figure 1. Criteo’s journey to change their networking approach started in 2014 and continues today

 

With a new hardware approach, software came next. Criteo needed their OS to be compatible with their networking automation stack, consisting of in-house, hardware-agnostic tooling built mostly in Python. However, each new OS added to the mix would require unique updates to the rest of the stack to support it. Additionally, while vendor hardware was often affordable, the proprietary software attached ballooned the budget. 

Picking one OS for all the platforms solved both problems. Enter SONiC: after attending the Open Compute Project (OCP) Global Summit, Criteo began to evaluate the NOS in early 2018. As an open-source OS conceived by Microsoft and the OCP to meet the needs of the hyperscalers, SONiC had the design and functionality to meet Criteo’s needs. Moreover, SONiC’s openness meshed perfectly with Criteo’s flexible hardware sourcing strategy and would fully unlock their networking stack.

Turning over a New Leaf with NVIDIA

More than viewing NVIDIA as just a vendor, Criteo and NVIDIA partner on SONiC, with NVIDIA maintaining and developing SONiC’s feature set, and Criteo helping provide input. This comes from the way NVIDIA offers SONiC to customers. Rather than build a proprietary branch from the community releases, NVIDIA supports the community release of the OS as “pure SONiC,” without any add-ons. As one of the leading contributors to the SONiC codebase, NVIDIA is uniquely positioned to influence SONiC’s roadmap, and make Criteo’s visions a reality. 

Additionally, with NVIDIA providing ASIC-to-Protocol (A2P) support, the network team can fully rely on NVIDIA to offload and triage networking issues at any level with minimal interruption. Criteo also benefits from the reach of NVIDIA in the space. NVIDIA develops the features and uploads them into the community main branch, maintaining a pure SONiC commitment and allowing Criteo freedom of choice.

Timeline of Criteo evaluation and achievements with SONiC.
Figure 2. Criteo was an early adopter of SONiC in 2018, working through early challenges on the way to full data center rollout

Summary

Evaluating the mission, the goals of Criteo’s 2014 project are on target, with costs being brought under control, deployment flexibility growing, and a network team that has picked up some handy DevOps + CI/CD skills. But the goal remains a work in progress; Criteo sees a day when all infrastructure, including their management network, is running SONiC, with truly one NOS to rule them all. So next time, when you see that killer ad, maybe you’ll also think about the network fabric that makes it possible.

For more information see the following resources:

Categories
Misc

NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim

Robots are not just limited to the assembly line. At NVIDIA, Liila Torabi works on making the next generation of robotics possible. Torabi is the senior product manager for Isaac Sim, a robotics and AI simulation platform powered by NVIDIA Omniverse. Torabi spoke with NVIDIA AI Podcast host Noah Kravitz about the new era of Read article >

The post NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim appeared first on The Official NVIDIA Blog.

Categories
Misc

Tensorflow working in python2 not python3

Hey All, I’m facing a very weird issue with tensorflow on my remote GPU machine.

When i do import tensorflow in python2, it seems to work. However, it doesn’t work in python3.

How is this happening? Has someone faced this before and knows the fix to this?

submitted by /u/Conanobrain
[visit reddit] [comments]

Categories
Misc

Strange training results: why is a batch size of 1 more efficient than larger batch sizes, despite using a GPU/TPU?

Hey all!

I’m currently doing some tests in preparation for my first real bit of training. I’m using Google Cloud AI Platform to train, and am trying to find the optimal machine setup. It’s a work in progress, but here’s a table I’m putting together to get a sense of the efficiency of each setup. On the left you’ll see the accelerator type, ordered from least to most expensive. Here you’ll also find the number of accelerator’s used, the cost per hour, and the batch size. To the right are the average time it took to complete an entire training iteration and how long it took to complete the minimization step. You’ll notice that the values are almost identical for each setup; I’m using Google Research’s SEED RL, so I thought to record both values since I’m not sure exactly everything that happens between iterations. Turns out it’s not much. There’s also a calculation of the the time it takes to complete a single “step” (aka, a single observation from a single environment), as well as the average cost per step.

So, the problem. I was under the assumption that batching with a GPU or TPU would increase the efficiency of the training. Instead, it turns out that a batch size of 1 is the most efficient both in time per step and cost per step. I’m still fairly new to ML so maybe it’s just a matter of me being uninformed, but this goes against everything I thought I knew about ML training. Since I’m using Google’s own SEED codebase I would assume that it’s not a problem with the code, but I can’t be sure about that.

Is this just a matter of me misunderstanding how training works, or am I right in thinking something is really off?

submitted by /u/EdvardDashD
[visit reddit] [comments]

Categories
Misc

What’s New in Optical Flow SDK 3.0

NVIDIA Optical Flow SDK exposes the APIs to use this Optical Flow hardware (also referred to as NVOFA) to accelerate applications.

The NVIDIA Turing architecture introduced a new hardware functionality for computing optical flow between a pair of images with very high performance. NVIDIA Optical Flow SDK exposes the APIs to use this Optical Flow hardware (also referred to as NVOFA) to accelerate applications. We are excited to announce the availability of Optical Flow SDK 3.0 with the following new features: 

  1. DirectX 12 Optical Flow API 
  2. Forward-Backward Optical Flow via a single API 
  3. Global Flow vector 

DirectX 12 Optical Flow API

DirectX 12 is a low-level programming API from Microsoft which reduces driver overhead in comparison to its predecessor DirectX 11. DirectX 12 provides more flexibility and fine-grained control to the developer. Developers can now take advantage of the low-level programming APIs in DirectX 12 and optimize their applications to give better performance over earlier DirectX versions – at the same time, the client application, on its own, must take care of resource management, synchronization, etc. DirectX 12 has rapidly grown amongst game titles and other graphics applications.  

Optical Flow SDK 3.0 enables DirectX 12 applications to use the NVIDIA Optical Flow engine. The computed optical flow can be used to increase frame rate in games and videos for smoother experience or in object tracking. To increase the frame rate, Frame Rate Up Conversion (FRUC) techniques are used by inserting interpolated frames between original frames. Interpolation algorithms use the flow between frame pair(s) to generate the intermediate frame.

All generations of Optical Flow Hardware support DirectX 12 Optical Flow interface. The Optical Flow SDK package contains header(s), sample applications that demonstrate the usage, C++ wrapper classes which can be re-used or modified as required and documentation. All the other components for accessing the Optical Flow hardware are included in the NVIDIA display driver. DirectX 12 Optical Flow API is supported on Windows 20H1 or later operating system.

Barring the explicit synchronization, DirectX 12 Optical Flow API is designed to be close to the other interfaces that are already available in the SDK (CUDA and DirectX 11). The DirectX 12 Optical Flow API consists of three core functions: initialization, flow estimation and destruction.

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFINIT) (NvOFHandle hOf, const NV_OF_INIT_PARAMS* initParams);

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFEXECUTED3D12) (NvOFHandle hOf, const NV_OF_EXECUTE_INPUT_PARAMS_D3D12* executeInParams, NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12* executeOutParams);

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFDESTROY) (NvOFHandle hOf);

Initialization and destroy APIs are same across all interfaces but Execute API differs between DirectX 12 and other interfaces i.e., DirectX 11 and CUDA. Even though most of the parameters passed to the Execute API in DirectX 12 are same as those in other two interfaces, there are some functional differences. Synchronization in DirectX 11 and CUDA interfaces is automatically taken care by OS runtime and driver. However, in DirectX 12, additional information about fence and fence values are required as input parameters to the Execute API. These fence objects will be used to synchronize the CPU↔GPU and GPU↔GPU operations.  For more details, please refer to the programming guide included with the Optical Flow SDK.

Buffer management API interface in DirectX 12 also needs fence objects for synchronization.

The Optical Flow output quality is same across all interfaces. Performance in DirectX 12 should be very close compared to the other two interfaces.

Forward-Backward Optical Flow (FB flow)

No Optical Flow algorithm can give 100% accurate flow. The flow is typically distorted in occluded regions. Sometimes, the cost provided by the NVOFA may not also represent true confidence of the flow. One simple check usually employed is to compare the forward and backward flow. If the Euclidean distance between forward flow and backward flow exceeds a threshold, the flow can be marked as invalid.

To estimate flow in both directions, client must call the Execute API twice: one call with input and reference images and second call after reversing the input and reference images. Calling the Optical Flow Execute API two times like this can result in suboptimal performance due to overheads such as context switching, thread switching etc. Optical Flow SDK 3.0 exposes a new API to generate flow in both directions in a single Execute call. This feature can be enabled by setting NV_OF_INIT_PARAMS::predDirection to NV_OF_PRED_DIRECTION_BOTH in initialization and providing necessary buffers to receive backward flow and/or cost in NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::bwdOutputBuffer, NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::bwdOutputCostBuffer.  

Figure 1. Source Images (http://ultravideo.fi/#testsequences)
NVIDIA Optical Flow
Figure 2. (b) Forward Flow, (c) Backward Flow, (d) Consistency Check. Black pixels in the image shows inconsistent flow, (e) Nearest Neighbor Infill 

Once the flow is generated in both directions, client application can compare the flow vectors of both directions, discard the inaccurate ones based on a suitable criteria (e.g. Euclidian distance between forward and backward flow vectors), and use hole filling algorithms to fill such discarded flow vectors.

Note that the output quality of FB flow could be different from unidirectional flow due to some optimizations.

Sample code that demonstrates FB flow API programming and consistency check:

// Initialization of API
NV_OF_INIT_PARAMS initParams = { 0 };
...
initParams.predDirection = NV_OF_PRED_DIRECTION_BOTH;
...
NvOFAPI->nvOFInit(hNvOF, &initParams);
// Estimation of forward and backward flow
NV_OF_EXECUTE_INPUT_PARAMS executeInParams = { 0 };
...
NV_OF_EXECUTE_OUTPUT_PARAMS executeOutParams = { 0 };
...
executeOutParams.outputBuffer = forwardFlowBuffer;
executeOutParams.outputCostBuffer = forwardFlowCostBuffer;
executeOutParams.bwdOutputBuffer = backwardFlowBuffer;
executeOutParams.bwdOutputCostBuffer = backwardFlowCostBuffer;

NvOFAPI->nvOFExecute(hNvOF, &executeInparams, &executeOutParams)


// Invalidating flow vectors
for (int y = 0; y  width - 1 || y2  thresh) {
            SetFlowInvalid(forwardFlowBuffer, x, y);
        }
    }
}

Global Flow Estimation

Global flow in a video sequence or game is caused by camera panning motion. Global flow estimation is an important tool widely used in image segmentation, video stitching or motion-based video analysis applications.

Global Flow vector can also be heuristically used in calculating background motion. Once background motion is estimated, this can be used to fill the flow vectors in occluded regions.  It can also be used to handle collisions of warped pixels in interpolated frames.

Global flow is calculated on the forward flow vectors, based on frequency of the occurrence and a few other heuristics.

To enable generation of global flow, initialization API needs to set the flag NV_OF_INIT_PARAMS:: enableGlobalFlow and provide the additional buffer NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::globalFlowBuffer in Execute API.

References

  1. NVIDIA Optical Flow SDK
  2. Developer Blog: An Introduction to the NVIDIA Optical Flow SDK
  3. Developer Blog: Accelerate OpenCV: Optical Flow Algorithms with NVIDIA Turing GPUs
Categories
Misc

NVIDIA Inception Partners Won Veterans Affairs AI Tech Sprint Awards with Latest AI Technologies

Hosted by the Department of Veterans Affairs (VA), the sprint is designed to foster collaboration with industry and academic partners on AI-enabled tools that leverage federal data to address a need for veterans.

Five NVIDIA Inception partners were named finalists at the 2020-2021 Artificial Intelligence Tech Sprint, a competition aimed at improving healthcare for veterans using the latest AI technology. Hosted by the Department of Veterans Affairs (VA), the sprint is designed to foster collaboration with industry and academic partners on AI-enabled tools that leverage federal data to address a need for veterans. See the official news release for more details about the competition. 

Participating teams gave presentations and demonstrations judged by panels of Veterans and other experts. 44 teams from industry and universities participated, addressing a range of health care challenges such as chronic conditions management, cancer screening, rehabilitation, patient experiences and more. 

Majority of the solutions from NVIDIA Inception partners were powered by NVIDIA Clara Guardian, a smart hospital solution for patient monitoring, mental health, and patient care.  

  • JumpStartCSR created ‘Holmz’ – a data source agnostic explainable AI (XAI) that provides a digital physical therapy to prevent and treat overexertion/repetitive stress injuries. Holmz can predict plantar fasciitis and identify root causes two weeks in advance of its occurrence with 97% accuracy and 99% accuracy one week in advance of occurrence. Holmz is also capable of predicting fatigue related falls and identifying their root causes 15 minutes in advance of their occurrence with 99% accuracy. The trained solutions are 4 to 5x faster using TensorFlow with NVIDIA T4 GPUs on AWS. They are piloting their XAI with the VA’s Physical Medicine and Rehabilitation group and expanding their digital physical therapy solution to include performance improvement and prediction for the US Army. 
  • Ouva is an Autonomous Remote Monitoring and Ambient Intelligence platform that is designed to improve patient safety and operational efficiency. Their solution leverages cuDNN, TensorRT, Transfer Learning Toolkit in NVIDIA Clara Guardian and runs on an NVIDIA RTX 6000 GPU. During the course of VA tech sprint, Ouva was able to predict sepsis infection 24 hours ahead of clinical diagnosis by looking at nine biomarkers from EMR collected from 40,000 patients. By processing data from telehealth cameras with AI, Ouva allows nurses to keep an eye on more patients at remote or isolation conditions. They have started deploying their remote care monitoring solution in Europe and piloting it in the US. Their near-term goal is to combine Ouva’s unique patient activity data with Electronic Medical Records (EMR) to uniquely predict more cases like Sepsis with increased accuracy.
  • PATH Decision Support Software, a GPU-accelerated expert system that recommends drug regimens for patients with type 2 diabetes. Their solution was developed on CUDA using NVIDIA GPUs on AWS and achieved 20x speedup. The software’s recommendations have shown a 2.1-point reduction in HbA1c and a $770/patient/year reduction in unplanned diabetes-related Medicaid claims. The team is currently working on EMR integration and expanding the list of medicines it can evaluate.
  • Dialisa is a digital nutritional Intervention platform that detects onset and monitors progression of chronic kidney disease (CKD) and delivers digital nutritional intervention to delay the progress of the disease. They use TensorFlow and CUDA optimized on NVIDIA GPUs on AWS, achieving an accuracy of 93.2% in detecting onset of CKD using generic longitudinal dataset provided by the National Artificial Intelligence Institute (NAII) at the Veterans Administrations. Dialisa looks to demonstrate the feasibility of their algorithms and remote monitoring platform in the veterans community, and welcome additional clinical partners to conduct pilot study in the general population.
  • KELLS, an AI-powered dental diagnostic platform, used TensorFlow and PyTorch optimized on NVIDIA T4 GPUs on AWS P3 instances and achieved 3x higher efficiency in training than previously used AWS G4 instances. KELLS’ technology has demonstrated to improve accuracy of detecting common dental pathologies up to 30% for average dentists, especially for detecting early signs of oral disease, enabling more timely and effective preventative care. They continue to expand their platform to support more clinical findings across different data modalities and increase quality and accessibility of dental care for patients.

“We were overwhelmed with the overall quality of proposals in this very competitive cycle, it’s a great tribute to our mission to serve our veterans,” said Artificial Intelligence Tech Sprint lead Rafael Fricks. “Very sophisticated AI capabilities are more accessible than ever; a few years ago these proposals wouldn’t have been possible outside the realm of very specialized high performance computing.” 

NVIDIA looks forward to providing continued support to these winning Inception Partners in the coming Pilot Implementation program phase, and to the contributions their AI solutions will make to the important VA/VHA healthcare mission serving our Nation’s veterans. 

Build your patient care solutions on Clara Guardian >

Categories
Misc

Fighting Disease-Carrying Mosquitoes with Neural Networks

Targeting areas populated with disease-carrying mosquitoes just got easier thanks to a new study. The research, recently published in IEEE Explore, uses deep learning to recognize tiger mosquitoes from images taken by citizen scientists with near perfect accuracy. “Identifying the mosquitoes is fundamental, as the diseases they transmit continue to be a major public health … Continued

Targeting areas populated with disease-carrying mosquitoes just got easier thanks to a new study. The research, recently published in IEEE Explore, uses deep learning to recognize tiger mosquitoes from images taken by citizen scientists with near perfect accuracy.

“Identifying the mosquitoes is fundamental, as the diseases they transmit continue to be a major public health issue,” said lead author Gereziher Adhane.

The study, from researchers at the Scene understanding and artificial intelligence (SUNAI) research group, of the Universitat Oberta de Catalunya’s (UOC) Faculty of Computer Science, Multimedia and Telecommunications and of the eHealth Center, uses images from the Mosquito Alert app. Developed in Spain and currently expanding globally, the platform brings together citizens, entomologists, public health authorities, and mosquito control services to reduce mosquito-borne diseases. 

Anyone in the world can upload geo-tagged images of mosquitoes to the app. Once submitted, three expert entomologists inspect and validate the images before they are added to the database, classified, and mapped. 

Travel and migration, along with climate change and urbanization has broadened the range and habitat of mosquitoes. Quick identification of species such as the tiger mosquito –  known to transmit dengue, Zika, chikungunya, and yellow fever – remains a key step in assisting relevant authorities to curb their spread. 

A comparison of tiger mosquito and other mosquito images.
Sample of tiger [first row] and non-tiger [second row] mosquitoes from the Mosquito Alert data set. Credit: G. Adhane et al/IEEE Explore

“This type of analysis depends largely on human expertise and requires the collaboration of professionals, is typically time-consuming, and is not cost-effective because of the possible rapid propagation of invasive species,” said Adhane. “This is where neural networks can play a role as a practical solution for controlling the spread of mosquitoes.”

The research team developed a deep convolutional neural network (CNN) that distinguishes between mosquito species. Starting with a pre-trained model, they fine-tuned it using the hand-labeled Mosquito Alert dataset. Using NVIDIA GPUs and the cuDNN-accelerated PyTorch deep learning framework, the classification models were taught to pinpoint tiger mosquitoes based on identifiable morphological features such as white stripes on the legs, abdominal patches, head, and thorax shape.

Deep learning models typically rely on millions of samples. However, using only 6,378 images of both tiger and non-tiger mosquitoes from Mosquito Alert, the researchers were able to train the model with about 94% accuracy. 

“The neural network we have developed can perform as well or nearly as well as a human expert and the algorithm is sufficiently powerful to process massive amounts of images,” said Adhane.

According to the researchers, as Mosquito Alert scales up, the study can be expanded to classify multiple species of mosquitoes and their breeding sites across the globe. 

“The model we have developed could be used in practical applications with small modifications to work with mobile apps. Using this trained network it is possible to make predictions about images of mosquitoes taken using smartphones efficiently and in real time,” Adhane said. 

The GPU used in the research was a donation provided by the NVIDIA Academic Hardware Grant Program.

Read the full article in IEEE Explore >>

Read more >>  

Categories
Misc

Object Detection Dependencies take hours to install

I’m kind of a noob at tensorflow, built a few classifieds but that’s it. I’m now trying to use the tensorflow object detection API, but the command to install the dependencies takes hours to install everything.

This would be fine if it weren’t for the fact that I’m in a research setting that requires me to run in a docker container. I could create a custom docker image and solve it that way, but does anyone know how to make this process go faster?

submitted by /u/prinse4515
[visit reddit] [comments]

Categories
Misc

QuickDraw – an online game developed by Google, combined with AirGesture – a simple gesture recognition application

QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application submitted by /u/1991viet
[visit reddit] [comments]