Categories
Misc

Digital Renaissance: NVIDIA Neuralangelo Research Reconstructs 3D Scenes

Neuralangelo, a new AI model by NVIDIA Research for 3D reconstruction using neural networks, turns 2D video clips into detailed 3D structures — generating lifelike virtual replicas of buildings, sculptures and other real-world objects. Like Michelangelo sculpting stunning, life-like visions from blocks of marble, Neuralangelo generates 3D structures with intricate details and textures. Creative professionals Read article >

Categories
Misc

A New Age: ‘Age of Empires’ Series Joins GeForce NOW, Part of 20 Games Coming in June

The season of hot sun and longer days is here, so stay inside this summer with 20 games joining GeForce NOW in June. Or stream across devices by the pool, from grandma’s house or in the car — whichever way, GeForce NOW has you covered. Titles from the Age of Empires series are the next Read article >

Categories
Misc

A New Frontier for 5G Network Security

Decorative image of networks.Wireless technology has evolved rapidly and the 5G deployments have made good progress around the world. Up until recently, wireless RAN was deployed using…Decorative image of networks.

Wireless technology has evolved rapidly and the 5G deployments have made good progress around the world. Up until recently, wireless RAN was deployed using closed-box appliance solutions by traditional RAN vendors. This closed-box approach is not scalable, underuses the infrastructure, and does not deliver optimal RAN TCO. It has many shortcomings.

We have come to realize that such closed-box solutions are not scalable and effective for the 5G era.

As a result, the telecoms industry has come together to promote and build virtualized and cloud-native RAN solutions on commercial-off-the-shelf (COTS) hardware platforms with open and standard interfaces. This enables a larger ecosystem and flexible solutions on general-purpose server platforms, leveraging the virtues of virtualization and cloud-native technologies.

There are many positives of such an approach: lower cost, larger ecosystem and vendor choices, faster innovation cycles, automation, and scalability. However, one area of concern is that the open RAN architecture may result in a larger attack surface and may cause new security risks.

As a technology leader in accelerated computing platforms, NVIDIA has been working closely with the standards community (3GPP and O-RAN Alliance), partners, and customers to define and deliver a robust set of security capabilities for vRAN platforms.

Our vision is to drive faster innovation at the confluence of cloud, AI, and 5G to enable the applications of tomorrow. We will ensure that the platforms at the foundation of such innovations are built inherently with the utmost security principles in mind.

Tackling the security challenges of open RAN architecture

The introduction of new standard interfaces in open RAN architecture, along with the decoupling of hardware and software, expands the threat surface of RAN systems. Some examples:

  • New open interfaces for disaggregated RAN, such as Open Fronthaul (OFH), A1, or E2.
  • Near-real-time RIC and vendor provided xApps that could exploit the RAN system.
  • Decoupling of hardware from software increases threats to the trust chain.
  • Management interfaces such as OFH M-plane, O1, or O2 may induce new vulnerabilities.

The strict latency requirements on the RAN must be considered when you implement security features, such as accelerated cryptographic operations, on the OFH interface. The increased reliance on open-source software also increases the open RAN dependence on secure development practices within open-source communities. Finally, the dramatic growth in the number of IoT devices requires all RAN deployments to protect themselves against the increasing likelihood of attacks by compromised devices.

CSRIC Council

To promote the security, reliability, and resiliency of the United States’ communications systems, the Federal Communications Commission (FCC) established the Communications Security, Reliability, and Interoperability (CSRIC) Council VIII.

The council published a detailed report on the challenges to the development of secure open RAN technology, and a series of recommendations for the industry on how to overcome them. The report also recommends that the open RAN industry adopts security requirements standardized by O-RAN Alliance Working Group 11. The next sections discuss these recommendations and requirements.

CSRIC Council architectural recommendations

The key set of security-centric architectural recommendations for the open RAN industry stemming from FCC CSRIC VIII’s report are as follows:

  • Digital signing of production software should apply to open RAN workloads, including network functions and applications.
  • Ethernet-based fronthaul networks should be segmented to isolate fronthaul traffic from other traffic flows.
  • Port-based authentication should be used to enable the authorization of network elements attached to the FH network.
  • Secure protocols providing mutual authentication should be used when deploying radio units (RUs) with Ethernet-based fronthaul in US production networks.
  • IEEE 802.1X port-based network access control should be implemented for all network elements that connect to the FH network deployed in hybrid mode.
  • Open RAN implementations should be based on the principles of zero trust architecture (ZTA).
  • Open RAN software should be deployed on secure server hardware. The credentials and keys used by the open RAN software should be encrypted and stored securely.
  • Open RAN architectures implement defenses to prevent adversarial machine learning (AML) attacks. Industry should work within the O-RAN Alliance to drive security specifications that mitigate AML attacks.
  • MNO should use secure boot based on hardware root of trust (RoT), with credentials securely stored (for example, in a hardware security module (HSM)), and software signing to establish an end-to-end chain of trust.

O-RAN WG11 defines robust security requirements

The O-RAN Alliance Security Working Group (WG11) is responsible for security guidelines that span across the entire O-RAN architecture. The security analysis and specifications are being developed in close coordination with other O-RAN Working Groups, as well as regulators, and standards development organizations. It complements the security recommendations published by FCC CSRIC VIII discussed earlier.

The fundamental security principles (SP) of open RAN systems, as per the O-RAN WG11 specification, are based on 16 pillars (Table 1).

Security principles Requirements NVIDIA features for O-RAN WG11 support
1. SP-AUTH: Mutual authentication Detect fake base stations and unauthorized users or applications. Supports access lists and access control list (ACL) based filtering as well as per-port authentication.
2. SP-ACC: Access control Forbid unauthorized access, anytime and anywhere. Supports access list and ACL-based filtering.
3. SP-CRYPTO: Secure cryptographic, key management, and PKI (public key infrastructure) · Advanced cryptographic schemes and protocols
· Secure key management and PKI.
Supports IPSEC/TLS and cryptographic protocols for secure handshaking.
4. SP-TCOMM: Trusted communication Integrity, confidentiality, availability, authenticity, and replay protection in transit. Supports timestamps and encapsulation to protect data in flight.
5. SP-SS: Secure storage Integrity, confidentiality, availability protection at rest. Supports data-at-rest encryption and isolation from the host.
6. SP-SB: Secure boot and self-configuration · Secure boot process
·  Signature verification
·  Self-configuration
Support secure boot and root of trust for verification.
7. SP-UPDT: Secure update Secure update management process for software update or new software integration. Supports root of trust (RoT) and local secured BMC.
8. SP-RECO: Recoverability & backup Recover and reset under malicious attack (for example, denial of service or DoS). Supports secure boot from a trusted source and isolation to contain DDoS.
9. SP-OPNS: Security management of risks in open-source components ·  Software bill of materials (SBOM)
·  Security analysis (audit, vulnerability scans, and so on)
Supported by the partner ecosystem.
10. SP-ASSU: Security assurance ·  Risk assessment
·  Secure code review
·  Penetration testing
Supported by the partner ecosystem.
11. SP-PRV: Privacy Data privacy, identity privacy, and personal information privacy of end users Supports Isolation and tagging as well as encryption and encapsulation.
12. SP-SLC: Continuous security development, testing, logging, monitoring, and vulnerability handling ·  Continuous integration and continuous development (CI/CD)
·  Software security auditing
·  Security event logging and analysis in real time
Supported by the partner ecosystem.
13. SP-ISO: Robust isolation Intra-domain host isolation. Supports complete isolation from host.
14. SP-PHY: Physical security Physically secure environment for the following:
-Sensitive data storage
-Sensitive functions execution
-Execution of boot and update processes.
Has hardened and ruggedized versions.
15. SP-CLD: Secure cloud computing and virtualization Trust in the end-to-end stack from hardware and firmware to virtualized software. Supported by the partner ecosystem.
16. SP-ROB: Robustness Robustness of software and hardware resources Supports as part of our software development, validation, QA, and release practices.
Table 1. Security principles of open RAN systems, key requirements, and how NVIDIA delivers these requirements

Security is built into NVIDIA platforms

A key NVIDIA goal is to provide robust security capabilities that span all aspects of security:

  • Zero-trust platform security for data in execution on the platform.
  • Network security to protect all data traversing across the air interface, fronthaul, and backhaul.
  • Storage security for all data stored on the platform and protection in all management interfaces.
Workflow diagram shows accelerated RAN compute platforms with NVIDIA DPU and GPU, for fully virtualized, cloud-native, and end-to-end secure networks and applications.
Figure 1. Comprehensive E2E security for O-RAN-based virtualized RAN and the 5G core
Table 2. Delivering comprehensive E2E security for O-RAN-based virtualized RAN and the 5G Core
Fronthaul gNB (DU/CU), CRAN Transport (midhaul/backhaul) 5G Edge/Core
  • MACsec
  • Port-based authentication & network access control 
  • IPsec
  • HW root of trust
  • Secure boot
  • Infra/tenant isolation
  • DDoS
  • Firewall
  • Physical security countermeasures 
  • Overlay networks (VxLAN, EVPN)
  • Full transport security IPsec, SSL/TLS
  • Network ACLs
  • IDS/IPS
  • HW root of trust
  • Secure boot
  • Infra/tenant isolation
  • DDoS
  • NG-Firewall
  • Cloud SDN (OVN/OVS)
  • GTP-U tunnel classification, acceleration, and security
O&M

Kubernetes security
Kubernetes API authentication
POD Security
Role-based access control (RBAC)
All SW PKI authenticated
Zero Trust Architecture (ZTA)

Multi-tenancy/AI

MIG (Multi-Instance GPU): Workload isolation, guaranteed latency, and performance
Predictive service assurance: AI to identify, capture, and act on threats

To achieve this, we rely on the industry’s best security practices. NVIDIA ConnectX SmartNICs and NVIDIA BlueField data processing units (DPUs) were developed with far-edge, near-edge, and cloud security in mind. They implement all the requirements necessary for the edge and cloud providers and security vendors to shape their solutions based on NVIDIA platform features.

The ConnectX SmartNIC includes engines that offload and support MACSEC, IPSEC (as well as other encryption-based solutions), TLS, rule-based filtering, and precision time-stamping all at line-rate speeds.

The DPU adds more to the list with a complete isolated platform (server in a server), which includes:

  • Secure BMC
  • Secure boot
  • Root of trust
  • Deep packet inspection (DPI)
  • Additional engines for custom crypto operations and data plane pipeline processing

These features enable you to deploy a network that is encrypted, secured, and completely isolated from the hosts. The DPU works to create a secure cloudRAN architecture while also providing direct connection to the GPU and providing the post-screened packets while not involving the host.

We have implemented O-RAN WG11 requirements in our hardware and software platforms, with NVIDIA Aerial 5G vRAN running on converged accelerators consisting of the Bluefield DPU and NVIDIA A100 GPU. The NVIDIA Aerial software implements a full inline offload of RAN Layer 1 with key security capabilities as outlined.

Summary

At NVIDIA, security is top of mind as we transform and virtualize the RAN with open and standards-based architecture. NVIDIA also supports key security capabilities for 5G transport network, 5G Core, orchestration and management layers, and edge AI applications.

When planning your 5G security for open RAN or vRAN, speak to our experts and consider using NVIDIA architectures.

For more information, see the following resources:

Categories
Misc

Protecting Sensitive Data and AI Models with Confidential Computing

Image of an AI model, data, and an application locked, with an NVIDIA H100 GPU underneath.Rapid digital transformation has led to an explosion of sensitive data being generated across the enterprise. That data has to be stored and processed in data…Image of an AI model, data, and an application locked, with an NVIDIA H100 GPU underneath.

Rapid digital transformation has led to an explosion of sensitive data being generated across the enterprise. That data has to be stored and processed in data centers on-premises, in the cloud, or at the edge. Examples of activities that generate sensitive and personally identifiable information (PII) include credit card transactions, medical imaging or other diagnostic tests, insurance claims, and loan applications.

This wealth of data presents an opportunity for enterprises to extract actionable insights, unlock new revenue streams, and improve the customer experience. Harnessing the power of AI enables a competitive edge in today’s data-driven business landscape.

However, the complex and evolving nature of global data protection and privacy laws can pose significant barriers to organizations seeking to derive value from AI:

  • General Data Protection Regulation (GDPR) in Europe
  • Health Insurance Portability and Accountability Act (HIPAA) and Gramm-Leach-Bliley Act (GLBA) in the United States

Customers in healthcare, financial services, and the public sector must adhere to a multitude of regulatory frameworks and also risk incurring severe financial losses associated with data breaches.

Three figures show the costs of data breaches: the global average total cost of a data breach is $4.35M; the average cost of a breach in healthcare is $10.10M; and the average cost of a breach in finance is $5.97M.
Figure 1. Cost of data breaches(Source: IBM Cost of Data Breaches report)

In addition to data, the AI models themselves are valuable intellectual property (IP). They are the result of significant resources invested by the model owners in building, training, and optimizing. When not adequately protected in use, AI models face the risk of exposing sensitive customer data, being manipulated, or being reverse-engineered. This can lead to incorrect results, loss of intellectual property, erosion of customer trust, and potential legal repercussions.

Data and AI IP are typically safeguarded through encryption and secure protocols when at rest (storage) or in transit over a network (transmission). But during use, such as when they are processed and executed, they become vulnerable to potential breaches due to unauthorized access or runtime attacks.

Diagram shows the three states in which data exists: Data at rest or in storage is secure; data in transit or moving across a network is secure; data in use or while being processed is not secure.
Figure 2. Security gap in the end-to-end data lifecycle without confidential computing

Preventing unauthorized access and data breaches

Confidential computing addresses this gap of protecting data and applications in use by performing computations within a secure and isolated environment within a computer’s processor, also known as a trusted execution environment (TEE).

The TEE acts like a locked box that safeguards the data and code within the processor from unauthorized access or tampering and proves that no one can view or manipulate it. This provides an added layer of security for organizations that must process sensitive data or IP.

Figure shows data in use or being processed as secure with confidential computing.
Figure 3. Confidential computing addresses the security gap

The TEE blocks access to the data and code, from the hypervisor, host OS, infrastructure owners such as cloud providers, or anyone with physical access to the servers. Confidential computing reduces the surface area of attacks from internal and external threats. It secures data and IP at the lowest layer of the computing stack and provides the technical assurance that the hardware and the firmware used for computing are trustworthy.

Today’s CPU-only confidential computing solutions are not sufficient for AI workloads that demand acceleration for faster time-to-solution, better user experience, and real-time response times. Extending the TEE of CPUs to NVIDIA GPUs can significantly enhance the performance of confidential computing for AI, enabling faster and more efficient processing of sensitive data while maintaining strong security measures.

Secure AI on NVIDIA H100

Confidential computing is a built-in hardware-based security feature introduced in the NVIDIA H100 Tensor Core GPU that enables customers in regulated industries like healthcare, finance, and the public sector to protect the confidentiality and integrity of sensitive data and AI models in use.

Figure shows AI model, data, and application running on NVIDIA H100 GPU being secure with confidential computing.
Figure 4. Confidential computing on NVIDIA H100 GPUs

With security from the lowest level of the computing stack down to the GPU architecture itself, you can build and deploy AI applications using NVIDIA H100 GPUs on-premises, in the cloud, or at the edge. No unauthorized entities can view or modify the data and AI application during execution. This protects both sensitive customer data and AI intellectual property.

Diagram compares legacy VMs without confidential computing (full access, unencrypted transfers) and fully isolated VMs with confidential computing turned on (no read/write access, encrypted transfers).
Figure 5. Full isolation of virtual machines with confidential computing

For a quick overview of the confidential computing feature in NVIDIA H100, see the following video.

Video 1. What is NVIDIA Confidential Computing?

Unlocking secure AI: Use cases empowered by confidential computing on NVIDIA H100

Accelerated confidential computing with NVIDIA H100 GPUs offers enterprises a solution that is performant, versatile, scalable, and secure for AI workloads. It unlocks new possibilities to innovate with AI while maintaining security, privacy, and regulatory compliance.

  • Confidential AI training
  • Confidential AI inference
  • AI IP protection for ISVs and enterprises
  • Confidential federated learning

Confidential AI training

In industries like healthcare, financial services, and the public sector, the data used for AI model training is sensitive and regulated. This includes PII, personal health information (PHI), and confidential proprietary data, all of which must be protected from unauthorized internal or external access during the training process.

With confidential computing on NVIDIA H100 GPUs, you get the computational power required to accelerate the time to train and the technical assurance that the confidentiality and integrity of your data and AI models are protected.

Figure shows that confidential computing during training provides proof of regulatory compliance, protection from adversarial attacks or tampering, and unauthorized access or modifications from the platform provider.
Figure 6. Protection for training data and AI models being trained

For AI training workloads done on-premises within your data center, confidential computing can protect the training data and AI models from viewing or modification by malicious insiders or any inter-organizational unauthorized personnel. When you are training AI models in a hosted or shared infrastructure like the public cloud, access to the data and AI models is blocked from the host OS and hypervisor. This includes server administrators who typically have access to the physical servers managed by the platform provider.

Confidential AI inference

When trained, AI models are integrated within enterprise or end-user applications and deployed on production IT systems—on-premises, in the cloud, or at the edge—to infer things about new user data.

End-user inputs provided to the deployed AI model can often be private or confidential information, which must be protected for privacy or regulatory compliance reasons and to prevent any data leaks or breaches.

The AI models themselves are valuable IP developed by the owner of the AI-enabled products or services. They are at risk of being viewed, modified, or stolen during inference computations, resulting in incorrect results and loss of business value.

Figure shows AI inference workflow and how confidential computing protects the deployed AI model, application and user data provided as input
Figure 7. Protection for the deployed AI model, AI-enabled apps, and user input data

Deploying AI-enabled applications on NVIDIA H100 GPUs with confidential computing provides the technical assurance that both the customer input data and AI models are protected from being viewed or modified during inference. This provides an added layer of trust for end users to adopt and use the AI-enabled service and also assures enterprises that their valuable AI models are protected during use.

AI IP protection for ISVs and enterprises

Independent software vendors (ISVs) invest heavily in developing proprietary AI models for a variety of application-specific or industry-specific use cases. Examples include fraud detection and risk management in financial services or disease diagnosis and personalized treatment planning in healthcare.

ISVs must protect their IP from tampering or stealing when it is deployed in customer data centers on-premises, in remote locations at the edge, or within a customer’s public cloud tenancy. In addition, customers need the assurance that the data they provide as input to the ISV application cannot be viewed or tampered with during use.

Figure shows AI ISVs protecting their AI IP during deployment across multiple deployment options like on-premises, cloud, edge, and colocation.
Figure 8. Secure deployment of AI IP

Confidential computing on NVIDIA H100 GPUs enables ISVs to scale customer deployments from cloud to edge while protecting their valuable IP from unauthorized access or modifications, even from someone with physical access to the deployment infrastructure. ISVs can also provide customers with the technical assurance that the application can’t view or modify their data, increasing trust and reducing the risk for customers using the third-party ISV application.

Confidential federated learning

Building and improving AI models for use cases like fraud detection, medical imaging, and drug development requires diverse, carefully labeled datasets for training. This demands collaboration between multiple data owners without compromising the confidentiality and integrity of the individual data sources.

Confidential computing on NVIDIA H100 GPUs unlocks secure multi-party computing use cases like confidential federated learning. Federated learning enables multiple organizations to work together to train or evaluate AI models without having to share each group’s proprietary datasets.

Figure shows federated learning use case in healthcare being protected by confidential computing. Confidential computing protects the local AI model training at each participating hospital and protects the model aggregation on the central server.
Figure 9. An added layer of protection for multi-party AI workloads like federated learning

Confidential federated learning with NVIDIA H100 provides an added layer of security that ensures that both data and the local AI models are protected from unauthorized access at each participating site.

When deployed at the federated servers, it also protects the global AI model during aggregation and provides an additional layer of technical assurance that the aggregated model is protected from unauthorized access or modification.

This helps drive the advancement of medical research, expedite drug development, mitigate insurance fraud, and trace money laundering globally while maintaining security, privacy, and regulatory compliance for all parties involved.

NVIDIA platforms for accelerated confidential computing on-premises

Getting started with confidential computing on NVIDIA H100 GPUs requires a CPU that supports a virtual machine (VM)–based TEE technology, such as AMD SEV-SNP and Intel TDX. Extending the VM-based TEE from the supported CPU to the H100 GPU enables all the VM memory to be encrypted and the application running does not require any code changes.

Top OEM partners are now shipping accelerated platforms for confidential computing, powered by NVIDIA H100 Tensor Core GPUs. These confidential computing–compatible systems combine the NVIDIA H100 PCIe Tensor Core GPUs with AMD Milan or AMD Genoa CPUs that support the AMD SEV-SNP technology.

The following partners are delivering the first wave of NVIDIA platforms for enterprises to secure their data, AI models, and applications in use in data centers on-premises:

  • ASRock Rack
  • ASUS
  • Cisco
  • Dell Technologies
  • GIGABYTE
  • Hewlett Packard Enterprise
  • Lenovo
  • Supermicro
  • Tyan

NVIDIA H100 GPU comes with the VBIOS (firmware) that supports all confidential computing features in the first production release. The NVIDIA confidential computing software stack to be released this summer will support single GPU first and then multi-GPU and Multi-Instance GPU in subsequent releases.

For more information about the NVIDIA confidential computing software stack and availability, see the Unlock the Potential of AI with Confidential Computing on NVIDIA GPUs session (Track 2) at the Confidential Computing Summit 2023 on June 29.

For more information about how confidential computing on NVIDIA H100 GPUs works, see the following videos:

Categories
Offsites

Large sequence models for software development activities

Software isn’t created in one dramatic step. It improves bit by bit, one little step at a time — editing, running unit tests, fixing build errors, addressing code reviews, editing some more, appeasing linters, and fixing more errors — until finally it becomes good enough to merge into a code repository. Software engineering isn’t an isolated process, but a dialogue among human developers, code reviewers, bug reporters, software architects and tools, such as compilers, unit tests, linters and static analyzers.

Today we describe DIDACT (​​Dynamic Integrated Developer ACTivity), which is a methodology for training large machine learning (ML) models for software development. The novelty of DIDACT is that it uses the process of software development as the source of training data for the model, rather than just the polished end state of that process, the finished code. By exposing the model to the contexts that developers see as they work, paired with the actions they take in response, the model learns about the dynamics of software development and is more aligned with how developers spend their time. We leverage instrumentation of Google’s software development to scale up the quantity and diversity of developer-activity data beyond previous works. Results are extremely promising along two dimensions: usefulness to professional software developers, and as a potential basis for imbuing ML models with general software development skills.

DIDACT is a multi-task model trained on development activities that include editing, debugging, repair, and code review.

We built and deployed internally three DIDACT tools, Comment Resolution (which we recently announced), Build Repair, and Tip Prediction, each integrated at different stages of the development workflow. All three of these tools received enthusiastic feedback from thousands of internal developers. We see this as the ultimate test of usefulness: do professional developers, who are often experts on the code base and who have carefully honed workflows, leverage the tools to improve their productivity?

Perhaps most excitingly, we demonstrate how DIDACT is a first step towards a general-purpose developer-assistance agent. We show that the trained model can be used in a variety of surprising ways, via prompting with prefixes of developer activities, and by chaining together multiple predictions to roll out longer activity trajectories. We believe DIDACT paves a promising path towards developing agents that can generally assist across the software development process.

A treasure trove of data about the software engineering process

Google’s software engineering toolchains store every operation related to code as a log of interactions among tools and developers, and have done so for decades. In principle, one could use this record to replay in detail the key episodes in the “software engineering video” of how Google’s codebase came to be, step-by-step — one code edit, compilation, comment, variable rename, etc., at a time.

Google code lives in a monorepo, a single repository of code for all tools and systems. A software developer typically experiments with code changes in a local copy-on-write workspace managed by a system called Clients in the Cloud (CitC). When the developer is ready to package a set of code changes together for a specific purpose (e.g., fixing a bug), they create a changelist (CL) in Critique, Google’s code-review system. As with other types of code-review systems, the developer engages in a dialog with a peer reviewer about functionality and style. The developer edits their CL to address reviewer comments as the dialog progresses. Eventually, the reviewer declares “LGTM!” (“looks good to me”), and the CL is merged into the code repository.

Of course, in addition to a dialog with the code reviewer, the developer also maintains a “dialog” of sorts with a plethora of other software engineering tools, such as the compiler, the testing framework, linters, static analyzers, fuzzers, etc.

An illustration of the intricate web of activities involved in developing software: small actions by the developer, interactions with a code reviewer, and invocations of tools such as compilers.

A multi-task model for software engineering

DIDACT utilizes interactions among engineers and tools to power ML models that assist Google developers, by suggesting or enhancing actions developers take — in context — while pursuing their software-engineering tasks. To do that, we have defined a number of tasks about individual developer activities: repairing a broken build, predicting a code-review comment, addressing a code-review comment, renaming a variable, editing a file, etc. We use a common formalism for each activity: it takes some State (a code file), some Intent (annotations specific to the activity, such as code-review comments or compiler errors), and produces an Action (the operation taken to address the task). This Action is like a mini programming language, and can be extended for newly added activities. It covers things like editing, adding comments, renaming variables, marking up code with errors, etc. We call this language DevScript.

The DIDACT model is prompted with a task, code snippets, and annotations related to that task, and produces development actions, e.g., edits or comments.

This state-intent-action formalism enables us to capture many different tasks in a general way. What’s more, DevScript is a concise way to express complex actions, without the need to output the whole state (the original code) as it would be after the action takes place; this makes the model more efficient and more interpretable. For example, a rename might touch a file in dozens of places, but a model can predict a single rename action.

An ML peer programmer

DIDACT does a good job on individual assistive tasks. For example, below we show DIDACT doing code clean-up after functionality is mostly done. It looks at the code along with some final comments by the code reviewer (marked with “human” in the animation), and predicts edits to address those comments (rendered as a diff).

Given an initial snippet of code and the comments that a code reviewer attached to that snippet, the Pre-Submit Cleanup task of DIDACT produces edits (insertions and deletions of text) that address those comments.

The multimodal nature of DIDACT also gives rise to some surprising capabilities, reminiscent of behaviors emerging with scale. One such capability is history augmentation, which can be enabled via prompting. Knowing what the developer did recently enables the model to make a better guess about what the developer should do next.

An illustration of history-augmented code completion in action.

A powerful such task exemplifying this capability is history-augmented code completion. In the figure below, the developer adds a new function parameter (1), and moves the cursor into the documentation (2). Conditioned on the history of developer edits and the cursor position, the model completes the line (3) by correctly predicting the docstring entry for the new parameter.

An illustration of edit prediction, over multiple chained iterations.

In an even more powerful history-augmented task, edit prediction, the model can choose where to edit next in a fashion that is historically consistent. If the developer deletes a function parameter (1), the model can use history to correctly predict an update to the docstring (2) that removes the deleted parameter (without the human developer manually placing the cursor there) and to update a statement in the function (3) in a syntactically (and — arguably — semantically) correct way. With history, the model can unambiguously decide how to continue the “editing video” correctly. Without history, the model wouldn’t know whether the missing function parameter is intentional (because the developer is in the process of a longer edit to remove it) or accidental (in which case the model should re-add it to fix the problem).

The model can go even further. For example, we started with a blank file and asked the model to successively predict what edits would come next until it had written a full code file. The astonishing part is that the model developed code in a step-by-step way that would seem natural to a developer: It started by first creating a fully working skeleton with imports, flags, and a basic main function. It then incrementally added new functionality, like reading from a file and writing results, and added functionality to filter out some lines based on a user-provided regular expression, which required changes across the file, like adding new flags.

Conclusion

DIDACT turns Google’s software development process into training demonstrations for ML developer assistants, and uses those demonstrations to train models that construct code in a step-by-step fashion, interactively with tools and code reviewers. These innovations are already powering tools enjoyed by Google developers every day. The DIDACT approach complements the great strides taken by large language models at Google and elsewhere, towards technologies that ease toil, improve productivity, and enhance the quality of work of software engineers.

Acknowledgements

This work is the result of a multi-year collaboration among Google Research, Google Core Systems and Experiences, and DeepMind. We would like to acknowledge our colleagues Jacob Austin, Pascal Lamblin, Pierre-Antoine Manzagol, and Daniel Zheng, who join us as the key drivers of this project. This work could not have happened without the significant and sustained contributions of our partners at Alphabet (Peter Choy, Henryk Michalewski, Subhodeep Moitra, Malgorzata Salawa, Vaibhav Tulsyan, and Manushree Vijayvergiya), as well as the many people who collected data, identified tasks, built products, strategized, evangelized, and helped us execute on the many facets of this agenda (Ankur Agarwal, Paige Bailey, Marc Brockschmidt, Rodrigo Damazio Bovendorp, Satish Chandra, Savinee Dancs, Matt Frazier, Alexander Frömmgen, Nimesh Ghelani, Chris Gorgolewski, Chenjie Gu, Vincent Hellendoorn, Franjo Ivančić, Marko Ivanković, Emily Johnston, Luka Kalinovcic, Lera Kharatyan, Jessica Ko, Markus Kusano, Kathy Nix, Sara Qu, Marc Rasi, Marcus Revaj, Ballie Sandhu, Michael Sloan, Tom Small, Gabriela Surita, Maxim Tabachnyk, David Tattersall, Sara Toth, Kevin Villela, Sara Wiltberger, and Donald Duo Zhao) and our extremely supportive leadership (Martín Abadi, Joelle Barral, Jeff Dean, Madhura Dudhgaonkar, Douglas Eck, Zoubin Ghahramani, Hugo Larochelle, Chandu Thekkath, and Niranjan Tulpule). Thank you!

Categories
Misc

Decentralizing AI with a Liquid-Cooled Development Platform by Supermicro and NVIDIA

Photo of hardware system from Supermicro.AI is the topic of conversation around the world in 2023. It is rapidly being adopted by all industries including media, entertainment, and broadcasting. To be…Photo of hardware system from Supermicro.

AI is the topic of conversation around the world in 2023. It is rapidly being adopted by all industries including media, entertainment, and broadcasting. To be successful in 2023 and beyond, companies and agencies must embrace and deploy AI more rapidly than ever before. The capabilities of new AI programs like video analytics, ChatGPT, recommenders, speech recognition, and customer service are far surpassing anything thought possible just a few years ago.

However, according to research, less than half of companies or agencies are successfully deploying AI applications due to cost. The other half are scrambling to determine how exactly they can harness this somewhat mysterious and new software that promises to provide a competitive advantage throughout every industry in the world.

In April 2023, Supermicro launched a new system to help expedite the deployment of AI workloads for developers, new adopters, as well as established users. The liquid-cooled AI development platform is called Supermicro SYS-751GE-TNRT-NV1 and there is nothing like it available in the world today. 

The hardware and software system comes with the full suite of NVIDIA AI Enterprise frameworks, models, and tools and the Ubuntu 22.04 operating system. The beauty of this new and revolutionary system is that it decentralizes AI development at an entry-level cost point far cheaper than a large supercomputer.

  • Normally, researchers must book time slots to use the supercomputer and wait in the queue.
  • They run an application (machine learning training, for example) and receive results.
  • When they make changes in the software, they must run the training again by booking another time slot and waiting in the queue.

This is all too time-consuming. It takes too long to get the desired results and increases the overall total cost of ownership (TCO).

With the new AI development platform, all these issues are resolved and the TCO goes down substantially. You can run ML tests, get the results quickly, and run the tests again without waiting. With the proximity of the new system to the actual AI developer, latency is lowered, which can be critically important for many AI workloads.

Optimized hardware for AI enterprise software

The technology that makes this Supermicro product unique is the ability to liquid-cool this solution. The internal closed-loop radiator and cooling system that is super-quiet, extremely energy-efficient, and less expensive than most AI hardware. It puts out virtually no heat.

In addition to this new revolutionary hardware technology, the AI development platform is perfectly optimized for the included downloadable NVIDIA AI Enterprise software programs. This includes over 50 workflows, frameworks, pretrained models, and infrastructure optimization that can run on VMware vSphere.

Most importantly, this AI development platform is literally plug-and-play. Take it out of the box, turn it on, connect to the network, download the included AI software of your choice, and start running those AI applications! 

The technical advancement here is the perfect pairing and optimization of hardware systems to specific NVIDIA AI Enterprise software applications, maximizing the software capabilities to capitalize on the intrinsic advantages of AI.

Optimizing the Supermicro hardware to the unique requirements of NVIDIA AI Enterprise software applications removes all questions about how much memory you need, how many GPUs are needed, or what kind of processors must be installed. Frankly, this system just works, right out of the box. 

Here are some of the resulting customer benefits:

  • Cost-effectiveness: With the price point closer to a workstation, you can deploy AI more cost-effectively than ever before, without trying to figure out what technical hardware components are required to run your applications.
  • Whisper-quiet system: Quieter than many household appliances, it’s perfect for using in a data closet, remote location, under your desk, or even in your home.
  • Super-powerful system: The platform includes four NVIDIA A100 GPUs and two Intel CPUs that can run any AI application available today.
  • Lower TCO with a significant energy savings: The self-contained liquid cooling system almost completely cools itself without needing external AC or connections to any building chilled-water system.
  • Increased security: The platform can be operated in a local data center, with or without relying on the cloud, and it’s secure in either location.
  • Significant time savings: You can have a dedicated, decentralized system that enables you to run ML tests, get results, and re-run without waiting.

Energy-efficient and quiet cooling

The new AI development platform from Supermicro features a novel liquid-cooling solution offering unmatched performance and customer experience.

The liquid cooling solution is self-contained and invisible to the user. This system can be used like any other air-cooled system and offers a problem-free, liquid-cooling experience for any type of user.

The optimized Supermicro cold plates deliver efficient cooling to two 4th Gen Intel Xeon Scalable CPUs (270 W TDP) and up to four NVIDIA A100 GPUs (300 W TDP).

An N+1 redundant pumping system module moves the liquid through the cold plates to cool the CPUs and GPUs. Its redundancy enables for continuous operation in case of pump failure for high system uptime.

The heat is transferred to the surrounding air with a high-performance radiator coupled with low-power fans.

The innovative liquid cooling system designed by Supermicro effectively cools down the system for less than 3% of its total power consumption against 15% for standard air-cooled products.

Finally, the system operates at an extremely low noise level (~30 dB) at idle, making it perfect for a quiet office environment.

For more information, see Liquid Cooled AI Development Platform.

Categories
Misc

NVIDIA Launches Accelerated Ethernet Platform for Hyperscale Generative AI

NVIDIA today announced NVIDIA Spectrum-X™, an accelerated networking platform designed to improve the performance and efficiency of Ethernet-based AI clouds.

Categories
Misc

NVIDIA Brings Advanced Autonomy to Mobile Robots With Isaac AMR

As mobile robot shipments surge to meet the growing demands of industries seeking operational efficiencies, NVIDIA is launching a new platform to enable the next generation of autonomous mobile robot (AMR) fleets. Isaac AMR brings advanced mapping, autonomy and simulation to mobile robots and will soon be available for early customers, NVIDIA founder and CEO Read article >

Categories
Misc

Electronics Giants Tap Into Industrial Automation With NVIDIA Metropolis for Factories

The $46 trillion global electronics manufacturing industry spans more than 10 million factories worldwide, where much is at stake in producing defect-free products. To drive product excellence, leading electronics manufacturers are adopting NVIDIA Metropolis for Factories. More than 50 manufacturing giants and industrial automation providers — including Foxconn Industrial Internet, Pegatron, Quanta, Siemens and Wistron Read article >

Categories
Misc

NVIDIA Collaborates With SoftBank Corp. to Power SoftBank’s Next-Gen Data Centers Using Grace Hopper Superchip for Generative AI and 5G/6G

NVIDIA and SoftBank Corp. today announced they are collaborating on a pioneering platform for generative AI and 5G/6G applications that is based on the NVIDIA GH200 Grace Hopper™ Superchip and which SoftBank plans to roll out at new, distributed AI data centers across Japan.