Categories
Misc

Model was constructed with shape (None, 1061, 4) for input … but it was called on an input with incompatible shape (None, 4).

EDIT: SOLVED. Thank you all so much!

I’m building a neural network where my inputs are 2d arrays, each representing one day of data.

I have a container array that holds 7 days’ arrays, each of which has 1,061 4×1 arrays. That sounds very confusing to me so here’s a diagram:

container array [ matrix 1 [ vector 1 [a, b, c, d] ... vector 1061 [e, f, g, h] ] ... matrix 7 [ vector 1 [i, j, k, l] ... vector 1061 [m, n, o, p] ] ] 

In other words, the container’s shape is (7, 1061, 4).

That container array is what I pass to the fit method for “x”. And here’s how I construct the network:

input_shape = (1061, 4) network = Sequential() network.add(Input(shape=input_shape)) network.add(Dense(2**6, activation="relu")) network.add(Dense(2**3, activation="relu")) network.add(Dense(2, activation="linear")) network.compile( loss="mean_squared_error", optimizer="adam", ) 

The network compiles and trains, but I get the following warning while training:

WARNING:tensorflow:Model was constructed with shape (None, 1061, 4) for input KerasTensor(type_spec=TensorSpec(shape=(None, 1061, 4), dtype=tf.float32, name=’input_1′), name=’input_1′, description=”created by layer ‘input_1′”), but it was called on an input with incompatible shape (None, 4).

I double-checked my inputs, and indeed there are 7 arrays of shape (1061, 4). What am I doing wrong here?

Thank you in advance for the help!

submitted by /u/bens_scraper
[visit reddit] [comments]

Categories
Misc

Siege the Day as Stronghold Series Headlines GFN Thursday

It’s Thursday, which means it’s GFN Thursday — when GeForce NOW members can learn what new games and updates are streaming from the cloud. This GFN Thursday, we’re checking in on one of our favorite gaming franchises, the Stronghold series from Firefly Studios. We’re also sharing some sales Firefly is running on the Stronghold franchise. Read article >

The post Siege the Day as Stronghold Series Headlines GFN Thursday appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA’s Shalini De Mello Talks Self-Supervised AI, NeurIPS Successes

Shalini De Mello, a principal research scientist at NVIDIA who’s made her mark inventing computer vision technology that contributes to driver safety, finished 2020 with a bang — presenting two posters at the prestigious NeurIPS conference in December. A 10-year NVIDIA veteran, De Mello works on self-supervised and few-shot learning, 3D reconstruction, viewpoint estimation and Read article >

The post NVIDIA’s Shalini De Mello Talks Self-Supervised AI, NeurIPS Successes appeared first on The Official NVIDIA Blog.

Categories
Offsites

Announcing the 2021 Research Scholar Program Recipients

In March 2020 we introduced the Research Scholar Program, an effort focused on developing collaborations with new professors and encouraging the formation of long-term relationships with the academic community. In November we opened the inaugural call for proposals for this program, which was received with enthusiastic interest from faculty who are working on cutting edge research across many research areas in computer science, including machine learning, human computer interaction, health research, systems and more.

Today, we are pleased to announce that in this first year of the program we have granted 77 awards, which included 86 principal investigators representing 15+ countries and over 50 universities. Of the 86 award recipients, 43% identify as an historically marginalized group within technology. Please see the full list of 2021 recipients on our web page, as well as in the list below.

We offer our congratulations to this year’s recipients, and look forward to seeing what they achieve!

Algorithms and Optimization
Alexandros Psomas, Purdue University
Auction Theory Beyond Independent, Quasi-Linear Bidders
Julian Shun, Massachusetts Institute of Technology
Scalable Parallel Subgraph Finding and Peeling Algorithms
Mary Wootters, Stanford University
The Role of Redundancy in Algorithm Design
Pravesh K. Kothari, Carnegie Mellon University
Efficient Algorithms for Robust Machine Learning
Sepehr Assadi, Rutgers University
Graph Clustering at Scale via Improved Massively Parallel Algorithms

Augmented Reality and Virtual Reality
Srinath Sridhar, Brown University
Perception and Generation of Interactive Objects

Geo
Miriam E. Marlier, University of California, Los Angeles
Mapping California’s Compound Climate Hazards in Google Earth Engine
Suining He, The University of Connecticut
Fairness-Aware and Cross-Modality Traffic Learning and Predictive Modeling for Urban Smart Mobility Systems

Human Computer Interaction
Arvind Satyanarayan, Massachusetts Institute of Technology
Generating Semantically Rich Natural Language Captions for Data Visualizations to Promote Accessibility
Dina EL-Zanfaly, Carnegie Mellon University
In-the-making: An intelligence mediated collaboration system for creative practices
Katharina Reinecke, University of Washington
Providing Science-Backed Answers to Health-related Questions in Google Search
Misha Sra, University of California, Santa Barbara
Hands-free Game Controller for Quadriplegic Individuals
Mohsen Mosleh, University of Exeter Business School
Effective Strategies to Debunk False Claims on Social Media: A large-scale digital field experiments approach
Tanushree Mitra, University of Washington
Supporting Scalable Value-Sensitive Fact-Checking through Human-AI Intelligence

Health Research
Catarina Barata, Instituto Superior Técnico, Universidade de Lisboa
DeepMutation – A CNN Model To Predict Genetic Mutations In Melanoma Patients
Emma Pierson, Cornell Tech, the Jacobs Institute, Technion-Israel Institute of Technology, and Cornell University
Using cell phone mobility data to reduce inequality and improve public health
Jasmine Jones, Berea College
Reachout: Co-Designing Social Connection Technologies for Isolated Young Adults
Mojtaba Golzan, University of Technology Sydney, Jack Phu, University of New South Wales
Autonomous Grading of Dynamic Blood Vessel Markers in the Eye using Deep Learning
Serena Yeung, Stanford University
Artificial Intelligence Analysis of Surgical Technique in the Operating Room

Machine Learning and data mining
Aravindan Vijayaraghavan, Northwestern University, Sivaraman Balakrishnan, Carnegie Mellon University
Principled Approaches for Learning with Test-time Robustness
Cho-Jui Hsieh, University of California, Los Angeles
Scalability and Tunability for Neural Network Optimizers
Golnoosh Farnadi, University of Montreal, HEC Montreal/MILA
Addressing Algorithmic Fairness in Decision-focused Deep Learning
Harrie Oosterhuis, Radboud University
Search and Recommendation Systems that Learn from Diverse User Preferences
Jimmy Ba, University of Toronto
Model-based Reinforcement Learning with Causal World Models
Nadav Cohen, Tel-Aviv University
A Dynamical Theory of Deep Learning
Nihar Shah, Carnegie Mellon University
Addressing Unfairness in Distributed Human Decisions
Nima Fazeli, University of Michigan
Semi-Implicit Methods for Deformable Object Manipulation
Qingyao Ai, University of Utah
Metric-agnostic Ranking Optimization
Stefanie Jegelka, Massachusetts Institute of Technology
Generalization of Graph Neural Networks under Distribution Shifts
Virginia Smith, Carnegie Mellon University
A Multi-Task Approach for Trustworthy Federated Learning

Mobile
Aruna Balasubramanian, State University of New York – Stony Brook
AccessWear: Ubiquitous Accessibility using Wearables
Tingjun Chen, Duke University
Machine Learning- and Optical-enabled Mobile Millimeter-Wave Networks

Machine Perception
Amir Patel, University of Cape Town
WildPose: 3D Animal Biomechanics in the Field using Multi-Sensor Data Fusion
Angjoo Kanazawa, University of California, Berkeley
Practical Volumetric Capture of People and Scenes
Emanuele Rodolà, Sapienza University of Rome
Fair Geometry: Toward Algorithmic Debiasing in Geometric Deep Learning
Minchen Wei, The Hong Kong Polytechnic University
Accurate Capture of Perceived Object Colors for Smart Phone Cameras
Mohsen Ali, Information Technology University of the Punjab, Pakistan, Izza Aftab, Information Technology University of the Punjab, Pakistan
Is Economics From Afar Domain Generalizable?
Vineeth N Balasubramanian, Indian Institute of Technology Hyderabad
Bridging Perspectives of Explainability and Adversarial Robustness
Xin Yu, University of Technology Sydney, Linchao Zhu, University of Technology Sydney
Sign Language Translation in the Wild

Networking
Aurojit Panda, New York University
Bertha: Network APIs for the Programmable Network Era
Cristina Klippel Dominicini, Instituto Federal do Espirito Santo
Polynomial Key-based Architecture for Source Routing in Network Fabrics
Noa Zilberman, University of Oxford
Exposing Vulnerabilities in Programmable Network Devices
Rachit Agarwal, Cornell University
Designing Datacenter Transport for Terabit Ethernet

Natural Language Processing
Danqi Chen, Princeton University
Improving Training and Inference Efficiency of NLP Models
Derry Tanti Wijaya, Boston University, Anietie Andy, University of Pennsylvania
Exploring the evolution of racial biases over time through framing analysis
Eunsol Choi, University of Texas at Austin
Answering Information Seeking Questions In The Wild
Kai-Wei Chang, University of California, Los Angeles
Certified Robustness to against language differences in Cross-Lingual Transfer
Mohohlo Samuel Tsoeu, University of Cape Town
Corpora collection and complete natural language processing of isiXhosa, Sesotho and South African Sign languages
Natalia Diaz Rodriguez, University of Granada (Spain) + ENSTA, Institut Polytechnique Paris, Inria. Lorenzo Baraldi, University of Modena and Reggio Emilia
SignNet: Towards democratizing content accessibility for the deaf by aligning multi-modal sign representations

Other Research Areas
John Dickerson, University of Maryland – College Park, Nicholas Mattei, Tulane University
Fairness and Diversity in Graduate Admissions
Mor Nitzan, Hebrew University
Learning representations of tissue design principles from single-cell data
Nikolai Matni, University of Pennsylvania
Robust Learning for Safe Control

Privacy
Foteini Baldimtsi, George Mason University
Improved Single-Use Anonymous Credentials with Private Metabit
Yu-Xiang Wang, University of California, Santa Barbara
Stronger, Better and More Accessible Differential Privacy with autodp

Quantum Computing
Ashok Ajoy, University of California, Berkeley
Accelerating NMR spectroscopy with a Quantum Computer
John Nichol, University of Rochester
Coherent spin-photon coupling
Jordi Tura i Brugués, Leiden University
RAGECLIQ – Randomness Generation with Certification via Limited Quantum Devices
Nathan Wiebe, University of Toronto
New Frameworks for Quantum Simulation and Machine Learning
Philipp Hauke, University of Trento
ProGauge: Protecting Gauge Symmetry in Quantum Hardware
Shruti Puri, Yale University
Surface Code Co-Design for Practical Fault-Tolerant Quantum Computing

Structured data, extraction, semantic graph, and database management
Abolfazl Asudeh, University Of Illinois, Chicago
An end-to-end system for detecting cherry-picked trendlines
Eugene Wu, Columbia University
Interactive training data debugging for ML analytics
Jingbo Shang, University of California, San Diego
Structuring Massive Text Corpora via Extremely Weak Supervision

Security
Chitchanok Chuengsatiansup, The University of Adelaide, Markus Wagner, The University of Adelaide
Automatic Post-Quantum Cryptographic Code Generation and Optimization
Elette Boyle, IDC Herzliya, Israel
Cheaper Private Set Intersection via Advances in “Silent OT”
Joseph Bonneau, New York University
Zeroizing keys in secure messaging implementations
Yu Feng , University of California, Santa Barbara, Yuan Tian, University of Virginia
Exploit Generation Using Reinforcement Learning

Software engineering and programming languages
Kelly Blincoe, University of Auckland
Towards more inclusive software engineering practices to retain women in software engineering
Fredrik Kjolstad, Stanford University
Sparse Tensor Algebra Compilation to Domain-Specific Architectures
Milos Gligoric, University of Texas at Austin
Adaptive Regression Test Selection
Sarah E. Chasins, University of California, Berkeley
If you break it, you fix it: Synthesizing program transformations so that library maintainers can make breaking changes

Systems
Adwait Jog, College of William & Mary
Enabling Efficient Sharing of Emerging GPUs
Heiner Litz, University of California, Santa Cruz
Software Prefetching Irregular Memory Access Patterns
Malte Schwarzkopf, Brown University
Privacy-Compliant Web Services by Construction
Mehdi Saligane, University of Michigan
Autonomous generation of Open Source Analog & Mixed Signal IC
Nathan Beckmann, Carnegie Mellon University
Making Data Access Faster and Cheaper with Smarter Flash Caches
Yanjing Li, University of Chicago
Resilient Accelerators for Deep Learning Training Tasks

Categories
Misc

Elevate Game Content Creation and Collaboration with NVIDIA Omniverse

With everyone shifting to a remote work environment, game development and professional visualization teams around the world need a solution for real-time collaboration and more efficient workflows.

With everyone shifting to a remote work environment, game development and professional visualization teams worldwide need a solution for real-time collaboration and more efficient workflows.

To boost creativity and innovation, developers need access to powerful technology accelerated by GPUs and easy access to secure datasets, no matter where they’re working from. And as many developers concurrently work on a project, they need to be able to manage version control of a dataset to ensure everyone is working on the latest assets.

NVIDIA Omniverse addresses these challenges. It is an open, multi-GPU enabled platform that makes it easy to accelerate development workflows and collaborate in real time. 

The primary goal of Omniverse is to support universal interoperability across various applications and 3D ecosystems. Using Pixar’s Universal Scene Description and NVIDIA RTX technology, Omniverse allows people to easily work with leading 3D applications and collaborate simultaneously with colleagues and customers, wherever they may be.

USD is the foundation for Omniverse — the open-source 3D scene description is easily extensible, originally developed to simplify content creation and facilitate frictionless interchange of assets between disparate software tools. 

The Omniverse platform is comprised of multiple components designed to help developers connect 3D applications and transform workflows:

  • Move assets throughout your pipeline seamlessly. Omniverse Connect opens the portals that allow content creation tools to connect to the Omniverse platform. With Omniverse Connect, users can work in their favorite industry software applications. 
  • Manage and store assets. Omniverse Nucleus allows users to store, share, and collaborate on project data and provides the unique ability to collaborate live across multiple applications. Nucleus works on a local machine, on-premises, or in the cloud. 
  • Quickly build tools. Omniverse Kit is a powerful toolkit for developers to create new Omniverse apps and extensions.
  • Access new technology, such as Omniverse Simulation including PhysX 5, Flow, and Blast – plus NVIDIA AI SDKs and apps such as Omniverse Audio2Face, Omniverse Deep Search, and many more. 

Learn more about NVIDIA Omniverse, which is currently in open beta. 

In addition to Omniverse, there are several other SDKs that enable developers to create more rich and lifelike content.

NVIDIA OptiX Ray Tracing Engine

OptiX provides a programmable GPU-accelerated Ray-Tracing Pipeline that is scalable across multiple NVIDIA GPU architectures. Developers can easily use this framework with other existing NVIDIA tools and OptiX has already been successfully deployed in a broad range of commercial applications.

NanoVDB

Accelerates OpenVDB applications which is the industry standard for motion picture visual effects. NanoVDB is fully optimized for high performance and quality in real time on NVIDIA GPUs and is completely compatible with OpenVDB structures, which allows for efficient creation and visualization.

Texture Tools Exporter

Enables both the creation of highly compressed texture files, saving memory in their applications and the processing of complex high quality images. Texture Tools Exporter supports all modern compression algorithms, making it a very seamless and versatile tool for developers. 

At GTC, starting on Monday, April 12, there will be over 70 technical sessions that dive into NVIDIA Omniverse. Register for free and experience its impact on the future of game development.

And don’t forget to check out all the game development sessions at GTC

Categories
Misc

Speedy Model Training With RAPIDS + Determined AI

Model developers no longer face a steep learning curve to accelerate model training. By utilizing two open-source software projects, Determined AI’s Deep Learning Training Platform and the RAPIDS accelerated data science toolkit, they can easily achieve up to 10x speedups in data preprocessing and train models at scale.  Making GPUs accessible As the field of … Continued

Model developers no longer face a steep learning curve to accelerate model training. By utilizing two open-source software projects, Determined AI’s Deep Learning Training Platform and the RAPIDS accelerated data science toolkit, they can easily achieve up to 10x speedups in data preprocessing and train models at scale. 

Making GPUs accessible

As the field of deep learning advances, practitioners are increasingly expected to make a significant investment in GPUs, either on-prem or from the cloud. Hardware is only half the story behind the proliferation of AI, though. NVIDIA’s success in powering data science has as much to do with software as hardware: widespread GPU adoption would be very difficult without convenient software abstractions that make GPUs easy for model developers to use. RAPIDS is a software suite that bridges the gap from CUDA primitives to data-hungry analytics and machine learning use cases.

Similarly, Determined AI’s deep learning training platform frees the model developer from hassles: operational hassles they are guaranteed to hit in a cluster setting, and model development hassles as they move from toy prototype to scale.  On the operational side, the platform handles distributed systems concerns like training job orchestration, storage layer integration, centralized logging, and automatic fault tolerance for long-running jobs.  On the model development side, machine learning engineers only need to maintain one version of code from the model prototype phase to more advanced tasks like multi-GPU (and multi-node) distributed training and hyperparameter tuning.  Further, the platform handles the boilerplate engineering required to track workload dependencies, metrics, and checkpoints.

At their core, both Determined AI and RAPIDS make the GPU accessible to machine learning engineers via intuitive APIs: Determined as the platform for accelerating and tracking deep learning training workflows, and RAPIDS as the suite of libraries speeding up parts of those training workflows.

For the remainder of this post, we’ll examine a model development process in which RAPIDS accelerates training data set construction within a Determined cluster, at which point Determined handles scaled out, fault-tolerant model training and hyperparameter tuning.

The RAPIDS is not alone in offering familiar interfaces atop GPU acceleration.  E.g., CuPy is NumPy-compatible, and OpenCV’s GPU module API interface is “kept similar with the CPU interface where possible.” experience will look familiar to ML engineers who are accustomed to tackling data manipulation with pandas or NumPy, and model training with PyTorch or TensorFlow.

Getting started

To use Determined and RAPIDS to accelerate model training, there are a few requirements to meet upfront. On the RAPIDS side, OS and CUDA version requirements are listed here.  One is worth calling out explicitly: RAPIDS requires NVIDIA P100 or later generation GPUs, ruling out the NVIDIA K80 in AWS P2 instances.

After satisfying these prerequisites, making RAPIDS available to tasks running on Determined is simple.  Because Determined supports custom Docker images for running training workloads, we can create an image that contains the appropriate version of RAPIDS1 installed via conda.  This is as simple as specifying the RAPIDS dependency in a Conda environment file:

name: Rapids
channels:
 - rapidsai
 - nvidia
 - conda-forge
dependencies:
 - rapids=0.14

And updating the base Conda environment in your custom image Dockerfile:

FROM determinedai/environments:cuda-10.0-pytorch-1.4-tf-1.15-gpu-0.7.0 as base
COPY environment.yml /tmp/
RUN conda --version && 
   conda env update --name base --file /tmp/environment.yml && 
   conda clean --all --force-pkgs-dirs --yes
RUN eval "$(conda shell.bash hook)" && conda activate base

After building and pushing this image to a Docker repository, you can run experiments, notebooks, or shell sessions by configuring the environment image that these tasks should use.

The model

To showcase the potency of integrating RAPIDS and Determined, we picked a tabular learning task that would typically benefit from nontrivial data preprocessing, based on the TabNet architecture and the pytorch-tabnet library implementing it. TabNet brings the power of deep learning to tabular data-driven use cases and offers some nice interpretability properties to boot.  One benchmark explored in the TabNet paper is the Rossman store sales prediction task of building a model to predict revenue across thousands of stores based on tabular data describing the stores, promotions, and nearby competitors.  Since Rossman dataset access requires signing off on an agreement, we train our model on generated data of a similar schema and scale so that users can more easily run this example.  All assets for this experiment are available on GitHub.

Data prep with RAPIDS, training with Determined

With multiple CSVs to ingest and denormalize, the Rossman revenue prediction task is ripe for RAPIDS.  The high level flow to develop a revenue prediction model looks like this:

  • Read location and historical sales CSVs into cuDF DataFrames residing in GPU memory.
  • Join these data sets into a denormalized DataFrame. This GPU-accelerated join is handled by cuDF.
  • Construct a PyTorch Dataset from the denormalized DataFrame.
  • Train with Determined!

RAPIDS cuDF’s familiar pandas-esque interface makes data ingest and manipulation a breeze:

df_store = cudf.read_csv(STORE_CSV)
df_train = cudf.read_csv(TRAIN_CSV).join(df_store,
                                        how='left',
                                        on='store_id',
                                        rsuffix='store')
df_valid = cudf.read_csv(VAL_CSV).join(df_store,
                                      how='left',
                                      on='store_id',
                                      rsuffix='store')

We then use CuPy to get from a cuDF DataFrame to a PyTorch Dataset and DataLoader to expose via Determined’s Trial interface.

Given that RAPIDS cuDF is a drop-in replacement for pandas, it’s trivial to toggle between the two libraries and compare performance of their analogous APIs. In this simplified case, cuDF showed a 10x speedup over pandas, requiring only 6 seconds to complete on a single NVIDIA V100 GPU that took a minute on the vCPU.

  Another option is to use DLPack as the intermediate format that both cuDF and PyTorch support, either directly, or using NVTabular’s Torch Dataloader which does the same under the covers.

Figure 1: Faster enterprise AI with RAPIDS and Determined.

On an absolute scale, this might not seem like a big deal: whether the overall training job takes 20 or 21 minutes doesn’t seem to matter much. However, given the iterative nature of deep learning model tuning, the time and cost savings quickly add up. For a hyperparameter tuning experiment training hundreds or thousands of models, on data larger than the couple of GB, and perhaps with more complex data transformations, savings on the order of GPU-minutes per trained model can translate to savings on the order of GPU-days or weeks at scale, netting your organization hundreds or thousands of dollars in infrastructure cost.

Determined and the broader RAPIDS toolkit

The RAPIDS library suite goes far beyond manipulation of data frames that we leveraged in this example.  To name a couple:

  • RAPIDS cuML offers GPU-accelerated ML algorithms mirroring sklearn.
  • NVTabular, which sits atop RAPIDS, offers high-level abstractions for feature engineering and building recommenders.

If you’re using these libraries, you’ll soon be able to train on a Determined cluster and get the platform’s resource management, experiment tracking, and hyperparameter tuning capabilities. We’ve heard from our users that the need for these tools isn’t limited to deep learning, so we are pushing into the broader ML space and making Determined not only the platform for PyTorch and TensorFlow model development, but for any Python-based model development. Stay tuned, and in the meantime you can learn more about this development from our 2021 roadmap discussion during our most recent community meetup.

Get started

If you’d like to learn more about (and test drive!) RAPIDS and Determined, check out the RAPIDS quick start and Determined’s quick start documentation. We’d love to hear your feedback on the RAPIDS and Determined community Slack channels. Happy training!

Categories
Misc

NVIDIA Deepens Commitment to Streamlining Recommender Workflows with GTC Spring Sessions

Here a few key sessions from industry leaders in media, delivery-on-demand, and retail at GTC Spring 2021.

Ensuring recommenders are meaningful, personalized, and relevant to a single customer is not easy. Scaling a personalized recommender experience to hundreds of thousands, or millions of customers, comes with unique challenges that data scientists and machine learning engineers tackle every day. Scaling challenges often provide obstacles to effective ETL, training, retraining, or deploying models into production.

To tackle these challenges, machine learning engineers and data scientists within the industry utilize a combination, or hybrid of tools, techniques, and algorithms. NVIDIA is committed to help streamline recommender workflows with Merlin, an open-source framework that is interoperable and designed to support machine learning engineers and data scientists with preprocessing, feature engineering, training, and inference. Merlin supports industry leaders who are tackling common recommender challenges as they provide relevant, impactful, and fresh recommenders at scale.

Here a few key sessions from industry leaders in media, delivery-on-demand, and retail at GTC Spring 2021.

  • AI-First Social Media Feeds: A View From the Trenches
    Discusses efficient training of extremely large recommender models with billions of parameters distributed across multiple GPUs and workers as well as the importance of continual model updates in near-real time to deal with the key challenge of concept drift. The session includes how solutions in NVIDIA’s Merlin stack resolve key bottlenecks faced in general purpose deep learning frameworks.

Registration is free, visit the GTC website for more information.

Categories
Misc

Completing a TensorFlow android app

Hello everyone!

I have some questions on finishing the implementation of my TensorFlow application. I need advice on how to optimize my model.

Background

I have been working on an object detection Android app based on the one provided by TensorFlow. I have added bluetooth capabilities and implemented my own standalone Simple Online and Realtime Tracking algorithm (just so I could understand the code better in case I have to tune things). I do not want to get into the specifics of my application of the Android app but the simplest analogy is an Android app looking down at a conveyor belt. When the Android app sees a specific object on the conveyor belt at a certain location, it sends a bluetooth signal for some mechanism to take action on the specific object at the certain location (this probably describes half the possible apps here haha).

My application has been tested and works successfully when using one of the default tflite models in a simulation environment. However, the objects I plan to track are not in the standard tflite models. Therefore I need to create my own custom model. This is the final step of my app development.

I have (with much pain) figured out how to create a model generation pipeline: Tfrecords > train > convert to tflite > test on android app. I have not studied machine learning but realize that with my technical/programming/math skills I can kind of brute force a basic model and then learn the theory in more detail once my prototype is working. I have spent a fair bit of time browsing the TensorFlow’s github issues to produce a model that can somewhat detect my objects but not well enough and slower than the example tflite model (on my phone inference time is now 150ms instead of average of 50ms). I am now looking to decrease inference time and accuracy of my model.

My current model generation pipeline uses ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 (as I couldn’t get ssd_mobilenet_v2_320x320_coco17_tpu-8 to work), uses my tfrecords then trains on the data, then converts to tflite (with optimization tf.lite.Optimize.DEFAULT flag) and finally attaches metadata. I plug this into the android app and then test.

My computer is slow, so I eventually plan on renting an EC2 and going through a bunch of parameters in ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’s pipeline.config and thus generating a bunch of tflite models and rate their accuracy. As a final final test step, I will test the models for speed on my phone. Combination of fastest/accurate will be the tflite model of choice.

Questions

In ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’s pipeline.config what parameters are good to vary to get a good parameter sweep?

What parameters are good to vary so that the resultant tflite model is faster ( 5mb tflite model is 50ms inference while 10mb model is 150ms inference time)?

What EC2 machine do you recommend using? I understand that amazon has machine learning tools, but with the time I spent creating my model and generating pipeline I am very hesitant to jump into additional exploratory work.

I’ll add the ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’s pipeline.config file in the comments.

submitted by /u/tensorpipelinetest
[visit reddit] [comments]

Categories
Misc

What Will NVIDIA CEO Jensen Huang Cook Up This Time at NVIDIA GTC?

Don’t blink. Accelerated computing is moving innovation forward faster than ever. And there’s no way to get smarter, quicker, about how it’s changing your world than to tune in to NVIDIA CEO Jensen Huang’s GTC keynote Monday, April 12, starting at 8:30 a.m. PT. The keynote, delivered again from the kitchen in Huang’s home, will Read article >

The post What Will NVIDIA CEO Jensen Huang Cook Up This Time at NVIDIA GTC? appeared first on The Official NVIDIA Blog.

Categories
Misc

GTC 21: 5 Data Center Networking and Ecosystem Sessions You Shouldn’t Miss!

We at NVIDIA are on a mission to bring the next generation data center vision to reality. Join us at NVIDIA GTC’21 (Apr 12-16, 2021) to witness the data center innovation we are pioneering.

As NVIDIA CEO Jensen Huang stated in last year’s GTC, “the data center is the new unit of computing.” Traditional way of using the server as the unit of computing is fading away quickly. More and more applications are moving to data centers that are located at the edge, in different availability zones or in private enterprise clouds. Modern workloads such as AI/ML, edge computing, cloud-native microservices and 5G services are becoming increasingly disaggregated, distributed and data hungry. These applications demand efficient, secure and accelerated computing and data processing across all the layers of the application stack. 

Computing accelerated by NVIDIA GPUs and data processing units (DPUs) is at the heart of modern data centers. DPUs are a game-changing new technology that accelerates GPU and CPU access to data while enabling software-defined, hardware-accelerated infrastructure. With DPUs, organizations can efficiently and effectively deploy networking, cyber security, and storage in virtualized as well as containerized environments.

We at NVIDIA are on a mission to bring the next generation data center vision to reality. Join us at NVIDIA GTC’21 (Apr 12-16, 2021) to witness the data center innovation we are pioneering.  Register for the top DPU sessions at GTC to learn how NVIDIA Networking solutions are powering the next generation data centers.

Palo Alto Networks and NVIDIA present  Accelerated 5G Security: DPU-Based Acceleration of Next-Generation Firewalls [S31671]

5G offers many new capabilities such as lower latency, higher reliability, throughput, agile service deployment through cloud-native architectures, greater device density, etc. A new approach is needed to achieve L7 security at these rates with software-based firewalls. Integrating Palo Alto Network next generation firewall with the NVIDIA DPU enables industry-leading high-performance security. NVIDIA’s Bluefield-2 DPU provides a rich set of network offload engines designed to address the acceleration needs of security-focused network functions in today’s most demanding markets such as 5G and the cloud.

Speakers:

  • Sree Koratala, VP Product Management Mobility Security, Palo Alto Networks
  • John McDowall, Senior Distinguished Engineer, Palo Alto Networks
  • Ash Bhalgat, Senior Director, Cloud, Telco & Security Market Development, NVIDIA

China Mobile, Nokia and NVIDIA present Turbocharge Cloud-Native Applications with Virtual Data Plane Accelerated Networking [S31563]

Great progress has been made leveraging hardware to accelerate cloud networking for a wide range of cloud-based applications. This talk will examine how cloud networking for public cloud infrastructure as a service can be accelerated using the NVIDIA’s new open standards technology called virtual data plane acceleration (vDPA). In addition, this presentation will examine the early validation results and acceleration benefits of deploying NVIDIA ASAP2 vDPA technology in China Mobile’s BigCloud cloud service.

Speakers:

  • Sharko Cheng, Senior Network Architect, Cloud Networking Products Department, CMCC
  • Mark Iskra, Director, Nokia/Nuage Networks
  • Ash Bhalgat, Senior Director, Cloud, Telco & Security Market Development, NVIDIA

Red Hat and NVIDIA present  Implementing Virtual Network Offloading Using Open Source tools on BlueField-2 [S31380]

NVIDIA and Red Hat have been working together to provide an elegant and 100% open-source solution using the BlueField DPU for hardware offloading of the software-defined networking tasks in cloud-native environments. With BlueField DPUs, we can encrypt, encapsulate, switch, and route packets right on the DPU, effectively dedicating all the server’s processing capacity to running business applications. This talk will discuss typical use cases and demonstrate the performance advantages of using BlueField’s hardware offload capabilities with Red Hat Enterprise Linux and the Red Hat OpenShift container platform.

Speakers:

  • Rashid Khan, Director of Networking, Red Hat
  • Rony Efraim, Networking Software and Systems Architect, NVIDIA

NVIDIA DPU team presents  Program Data Center Infrastructure Acceleration with the Release of DOCA and the Latest DPU Software [S32205]

NVIDIA is releasing the first version of DOCA, a set of libraries, SDKs, and tools for programming the NVIDIA BlueField DPU, as well as the new version 3.6 of the DPU software. Together these enable new infrastructure acceleration and management features in BlueField, simplify programming and application integration. DPU developers can offload and accelerate networking, virtualization, security, and storage features including VirtIO for NFV/VNFs, BlueField SNAP for elastic storage virtualization, regular expression matching for malware detection, and deep packet inspection to enable sophisticated routing, firewall, and load-balancing applications.

Speakers:

  • Ami Badani, VP Marketing, NVIDIA
  • Ariel Kit, Director of Product Marketing for Networking, NVIDIA

F5/NGINX and NVIDIA presents  kTLS Hardware Offload Performance Benchmarking for NGINX Web Server [S31551]

Encrypted communication usage is steadily growing across most internet services. TLS is the leading security protocol implemented on top of TCP. kTLS (kernel TLS) serves as a unit that offers TLS operations support in the Linux kernel layer. It was introduced in kernel v4.13 as a software offload for user-space TLS libraries and was extended in kernel v4.18 with an infrastructure for performing hardware-accelerated encryption/decryption in the SmartNIC and DPUs. This session will review the life cycle of a hardware-offloaded kTLS connection and the driver-hardware interaction to support it while demonstrating and analyzing the significant performance gain by offloading kTLS operations to the hardware, using NGINX as the target workload and NVIDIA’s mlx5e driver on top of a ConnectX-6 Dx SmartNIC.

Speakers:

  • Damian Curry, Business Development Technical Director, F5
  • Bar Tuaf, Software engineer, NVIDIA

Register today for free and start building your schedule. 

Once you are signed in, you can explore all GTC conference topics here. Topics include areas of interest such as data center networking and virtualization, HPC, deep learning, data science, and autonomous machines, or industries including healthcare, public sector, retail, and telecommunications.

See you at the GTC’21!