AI and the metaverse are revolutionizing every aspect of the way we live, work and play — including how we move. Leaders in the automotive and technology industries will come together at NVIDIA GTC to discuss the newest breakthroughs driving intelligent vehicles, whether in the real world or in simulation. The virtual conference, which runs Read article >
Class is in session this GFN Thursday as GeForce NOW makes the up-grade with support for higher resolutions and frame rates in Chrome browser on PC. It’s the easiest way to spice up a boring study session. When the lecture is over, dive into the six games joining the GeForce NOW library this week, where Read article >
Mapping the immune system could lead to the creation of drugs that help our bodies win the fight against cancer and other diseases. That’s the big idea behind immunotherapy. The problem: the immune system is incredibly complex. Enter Immunai, a biotech company that’s using cutting-edge genomics & ML technology to map the human immune system Read article >
Beyond the unimaginable prices for monkey pictures, NFT’s underlying technology provides companies with a new avenue to directly monetize their online…
Beyond the unimaginable prices for monkey pictures, NFT’s underlying technology provides companies with a new avenue to directly monetize their online engagements. Major brands such as Adidas, NBA, and TIME have already begun experimenting with these revenue streams using NFTs–and we are still early in this trend.
As data practitioners, we are positioned to provide valuable insights into these revenue streams, given that all transactions are public on the blockchain. This post provides a guided project to access, analyze, and identify potential fraud using blockchain data using Python.
The basics of blockchain, NFTs, and network graphs.
How to pull NFT data using the open-source package NFT Analyst Starter Pack from a16z.
How to interpret Ethereum blockchain data.
The fraudulent practice of wash trading NFTs.
Constructing network graphs to visualize potential wash trading on the NFT project Bored Ape Yacht Club.
The Jupyter notebook has a more detailed, step-by-step guide for writing Python code to implement this example walkthrough, and the post provides additional context. In addition, this post assumes that you have a basic understanding of pandas, data preparation, and data visualization.
What is blockchain data?
Cutting through the media frenzy of coins named after dogs and pixelated pictures selling for hundreds of thousands of dollars reveals fascinating technology: the blockchain.
The following excerpt best describes this decentralized data source:
“At a very high level, blockchains are ledgers of transactions utilizing cryptography that can only add information and thus can’t be changed (i.e., immutable). What separates blockchains from ledgers you find at banks is a concept called ‘decentralization’— where every computer connected to a respective blockchain must ‘agree’ on the same state of the blockchain and subsequent data added to it.”
Central to this technology is that all the data (for example, logs, metadata, and so on) must be public and accessible. I highly recommend Stanford professor Dan Boneh’s lecture.
What is an NFT?
NFT stands for non-fungible token, a crypto asset on a blockchain (such as Ethereum) in which it represents a unique one-of-one token that can be digitally owned. For example, gold bars are fungible, as multiple bars can exist and represent the same thing, while the original Mona Lisa is non-fungible in that only one exists.
Contrary to popular belief, NFTs are not just art and JPEGs but digital representations of ownership on a blockchain ledger for a unique item such as art, music, or whatever the NFT creator wants to put onto the metadata. For this post, however, we use the NFT project Bored Ape Yacht Club (BAYC), an artwork NFT.
What is a network graph, and why are they prime to represent blockchain data?
Networks are a method of organizing relationship data using nodes and edges. Nodes represent an entity, such as an email address or social media account, while the edges show the connection between nodes.
Furthermore, you can store metadata for nodes and edges to represent different aspects of a relationship. Metadata can range from weights to labels. Figure 1 shows the steps of taking an entire network and zooming into a use case with helpful labels from metadata.
What makes network graphs ideal for representing blockchain transactions is that there is always a to and from blockchain address, as well as significant metadata (for example, timestamps, coin amounts, and so on.) for each transaction. Furthermore, as blockchain data is public by design through decentralization, you can use network graphs to visualize economic behaviors on a respective blockchain.
In this example, I want to demonstrate identifying the fraudulent behavior of wash trading, where an individual intentionally sells an asset to themselves across multiple accounts to artificially inflate the asset’s price.
Chainalysis wrote an excellent report on the phenomena, where they identified over 260 Ethereum crypto wallets potentially engaging in wash trading with a collective profit of over $8.4 million in 2021 alone.
Pulling data from the Ethereum blockchain
Though all blockchain data is publicly available to anyone, it is still tough to access and prepare for analysis. Following are some options to access blockchain data:
Create your own blockchain node (for example, become a miner) to read the rawest data available.
Use a third-party tool to create your own blockchain node.
Use a third-party API to read raw data from their own blockchain node.
Use a third-party API to read cleaned and aggregated blockchain data from their service.
Although all are viable options, each has a tradeoff between reliability, trust, and convenience.
For example, I worked on an NFT analytics project where we wanted to create a reliable NFT market dashboard. Unfortunately, having our own blockchain node was cost-prohibitive, many third-party data sources had various data-quality issues that we couldn’t control, and it became challenging to track transactions across multiple blockchains. That project ultimately required bringing together high-quality data from numerous third-party APIs.
Thankfully for this project, you want the most convenience possible to focus on learning, so I recommend NFT Analyst Starter Pack from a16z. Think of this package as a convenient wrapper around the third-party blockchain API from Alchemy, where it creates easy-to-use CSVs of your desired NFT contract.
Preparing data and creating network graphs
The NFT Analyst Starter Pack results in three separate CSV files for the BAYC NFT project:
BAYC Metadata: Information regarding a specific NFT, where asset_id is the unique identifier within this NFT token.
BAYC Sales: Logs and metadata related to a specific transaction represented by its transaction hash where a seller and buyer inform you of the wallets involved.
BAYC Transfers: The same as BAYC Sales data, but no money is transferred from one wallet to another.
For this project, most of the data preparation is around:
Reorganizing BAYC Sales and BAYC Transfers to enable a clean union of the two datasets.
Deleting duplicate logs of transfer transactions already represented in sales.
Given that the goal is to learn, don’t worry about whether the blockchain data is accurate, but you can always check for yourself by searching the transaction_hash value on Etherscan.
After preparing the data, use the NetworkX package to generate a network graph data structure of your NFT transactions. There are multiple ways to construct a graph, but in my opinion, the most straightforward approach is to use the function from_pandas_edgelist, where you just provide a pandas DataFrame, the to and from values to represent the nodes, and any metadata for edges and labeling.
From this prepared data, the NetworkX package makes visualizing network graphs as easy as nx.draw, but with over 40k transactions in the DataFrame, visualizing the whole graph returns only a useless blob. Therefore, you must be specific about exactly what to visualize among your transactions to create a captivating data story.
Visualizing potential wash trading
Rather than scouring through the transactions of 10,000 NFTs, you can instead validate what others are stating within the market. Notably, the NFT Wash Trading – Is it possible to protect against it? post calls out BAYC token 8099 as potentially being subjected to the fraudulent practice of wash trading.
If you follow along in the accompanying notebook, you carry out these steps:
Filter prepared NFT data to only rows containing logs for asset_id 8099.
Rename the to and from wallet addresses to uppercase letters ordered sequentially following when the wallet address first appeared among the NFT asset’s transactions.
Generate network graph data with the prepared asset 8099 data using the NetworkX package.
Plot the network graph with desired labels, edge arrows, and node positioning.
Did BAYC 8099 NFT experience wash trading?
The data plotted in Figure 2 enables you to visualize the data corresponding to asset 8099. Starting at node H, you can see that this wallet first increases the price from $95k to $166k between HI, then adds more transactions through transfers between HJ. Finally, node H sells the potentially artificially increased NFT to node K.
Though this graph can’t definitively state that node H engaged in wash trading—as you don’t know whether nodes H, I, and J are wallets owned by the same person—seeing loops for a node where the price increases should flag the need to do more due diligence. For example, you could look at etherscan.com to review the transactions between the following wallets:
Perhaps node H had sellers’ remorse and wanted their NFT back, as it’s not uncommon for investors to get attached to their beloved NFTs. But numerous transactions between the wallets associated with nodes H, I, and J could indicate further red flags for the NFT asset.
Next steps
By following along with this post and accompanying notebook, you have seen how to access Ethereum blockchain data and analyze NFTs through network graphs. If you enjoyed this analysis and are interested in doing more projects like this, chat with me in the CharlieDAO Discord, where I hang out. We are a collective of software engineers, data scientists, and crypto natives exploring web3!
Disclaimer
This content is for educational purposes only and is not financial advice. I do not have any financial ties to the analyzed NFTs at the moment of releasing the notebook, and here is my crypto wallet address on Etherscan for reference. This analysis only highlights potential fraud to investigate further but does not prove fraud has taken place. Finally, if you own crypto, never share your “secret recovery phrase” or “private keys” with anyone.
With the latest Memgraph Advanced Graph Extensions (MAGE) release, you can now run GPU-powered graph analytics from Memgraph in seconds, while working in…
With the latest Memgraph Advanced Graph Extensions (MAGE) release, you can now run GPU-powered graph analytics from Memgraph in seconds, while working in Python. Powered by NVIDIA cuGraph, the following graph algorithms will now execute on GPU:
PageRank (graph analysis)
Louvain (community detection)
Balanced Cut (clustering)
Spectral Clustering (clustering)
HITS (hubs versus authorities analytics)
Leiden (community detection)
Katz centrality
Betweenness centrality
This tutorial will show you how to use PageRank graph analysis and Louvain community detection to analyze a Facebook dataset containing 1.3 million relationships.
By the end of the tutorial, you will know how to:
Import data inside Memgraph using Python
Run analytics on large scale graphs and get fast results
Run analytics on NVIDIA GPUs from Memgraph
Tutorial prerequisites
To follow this graph analytics tutorial, you will need an NVIDIA GPU, driver, and container toolkit. Once you have successfully installed the NVIDIA GPU driver and container toolkit, you must also install the following four tools:
The next section walks you through installing and setting up these tools for the tutorial.
Docker
Docker is used to install and run the mage-cugraph Docker image. There are three steps involved in setting up and running the Docker image:
Download Docker
Download the tutorial data
Run the Docker image, giving it access to the tutorial data
1. Download Docker
You can install Docker by visiting the Docker webpage and following the instructions for your operating system.
2. Downloading the tutorial data
Before running the mage-cugraph Docker image, first download the data that will be used in the tutorial. This allows you to give the Docker image access to the tutorial dataset when run.
To download the data, use the following commands to clone the jupyter-memgraph-tutorials GitHub repo, and move it to the jupyter-memgraph-tutorials/cugraph-analytics folder:
Git clone https://github.com/memgraph/jupyter-memgraph-tutorials.git
Cd jupyter-memgraph-tutorials/cugraph-analytics
3. Run the Docker image
You can now use the following command to run the Docker image and mount the workshop data to the /samples folder:
docker run -it -p 7687:7687 -p 7444:7444 --volume /data/facebook_clean_data/:/samples mage-cugraph
When you run the Docker container, you should see the following message:
You are running Memgraph vX.X.X
To get started with Memgraph, visit https://memgr.ph/start
With the mount command executed, the CSV files needed for the tutorial will be located inside the /samples folder within the Docker image, where Memgraph will find them when needed.
Jupyter notebook
Now that Memgraph is running, install Jupyter. This tutorial uses JupyterLab, and you can install it with the following command:
pip install jupyterlab
Once JupyterLab is installed, launch it with the following command:
jupyter lab
GQLAlchemy
UseGQLAlchemy, an Object Graph Mapper (OGM), to connect to Memgraph and also execute queries in Python. You can think of Cypher as SQL for graph databases. It contains many of the same language constructs such as Create, Update, and Delete.
Download CMake on your system, and then you can install GQLAlchemy with pip:
pip install gqlalchemy
Memgraph Lab
The last prerequisite you need to install is Memgraph Lab. You will use it to create data visualizations upon connecting to Memgraph. Learn how to install Memgraph Lab as a desktop application for your operating system.
First, position yourself in the Jupyter notebook. The first three lines of code will import gqlalchemy, connect to Memgraph database instance via host:127.0.0.1 and port:7687, and clear the database. Be sure to start with a clean slate.
from gqlalchemy import Memgraph
memgraph = Memgraph("127.0.0.1", 7687)
memgraph.drop_database()
Import the dataset from CSV files.
Next, you will perform PageRank and Louvain community detection using Python.
Import data
The Facebook dataset consists of eight CSV files, each having the following structure:
node_1,node_2
0,1794
0,3102
0,16645
Each record represents an edge connecting two nodes. Nodes represent the pages, and relationships are mutual likes among them.
There are eight distinct types of pages (Government, Athletes, and TV shows, for example). Pages have been reindexed for anonymity, and all pages have been verified for authenticity by Facebook.
Since Memgraph imports queries faster when data has indices, create them for all the nodes with the label Page on the id property.
memgraph.execute(
"""
CREATE INDEX ON :Page(id);
"""
)
Docker already has container access to the data used in this tutorial, so you can list through the local files in the ./data/facebook_clean_data/ folder. By concatenating both the file names and the /samples/ folder, you can determine their paths. Use the concatenated file paths to load data into Memgraph.
import os
from os import listdir
from os.path import isfile, join
csv_dir_path = os.path.abspath("./data/facebook_clean_data/")
csv_files = [f"/samples/{f}" for f in listdir(csv_dir_path) if isfile(join(csv_dir_path, f))]
Load all CSV files using the following query:
for csv_file_path in csv_files:
memgraph.execute(
f"""
LOAD CSV FROM "{csv_file_path}" WITH HEADER AS row
MERGE (p1:Page {{id: row.node_1}})
MERGE (p2:Page {{id: row.node_2}})
MERGE (p1)-[:LIKES]->(p2);
"""
)
For more information about importing CSV files with LOAD CSV see the Memgraph documentation.
Next, use PageRank and Louvain community detection algorithms with Python to determine which pages in the network are most important, and to find all the communities in a network.
PageRank importance analysis
To identify important pages in a Facebook dataset, you will execute PageRank. Learn about different algorithm settings that can be set when calling PageRank.
Note that you will also find other algorithms integrated within MAGE. Memgraph should help with the process of running graph analytics on large-scale graphs. Find other Memgraph tutorials on how to run these analytics.
MAGE is integrated to simplify executing PageRank. The following query will first execute the algorithm, and then create and set the rank property of each node to the value that the cugraph.pagerank algorithm returns.
The value of that property will then be saved as a variable rank. Note that this (and all tests presented here) were executed on an NVIDIA GeForce GTX 1650 Ti, and Intel Core i5-10300H CPU at 2.50GHz with 16GB RAM, and returned results in around four seconds.
This code returns 10 nodes with the highest rank score. Results are available in a dictionary form.
Now, it is time to visualize results with Memgraph Lab. In addition to creating beautiful visualizations powered by D3.js and our Graph Style Script language, you can use Memgraph Lab to:
Query graph database and write your graph algorithms in Python or C++ or even Rust
Check Memgraph Database Logs
Visualize graph schema
Memgraph Lab comes with a variety of pre-built datasets to help you get started. Open Execute Query view in Memgraph Lab and run the following query:
MATCH (n)
WITH n
ORDER BY n.rank DESC
LIMIT 3
MATCH (n)
The first part of this query will MATCH all the nodes. The second part of the query will ORDER nodes by their rank in descending order.
For the first three nodes, obtain all pages connected to them. We need the WITH clause to connect the two parts of the query. Figure 1 shows the PageRank query results.
The next step is learning how to use Louvain community detection to find communities present in the graph.
Community detection with Louvain
The Louvain algorithm measures the extent to which the nodes within a community are connected, compared to how connected they would be in a random network.
It also recursively merges communities into a single node and executes the modularity clustering on the condensed graphs. This is one of the most popular community detection algorithms.
Using Louvain, you can find the number of communities within the graph. First execute Louvain and save the cluster_id as a property for every node:
To find the number of communities, run the following code:
results = memgraph.execute_and_fetch(
"""
MATCH (n)
WITH DISTINCT n.cluster_id as cluster_id
RETURN count(cluster_id ) as num_of_clusters;
"""
)
# we will get only 1 result
result = list(results)[0]
#don't forget that results are saved in a dict
print(f"Number of clusters: {result['num_of_clusters']}")
Number of clusters: 2664
Next, take a closer look at some of these communities. For example, you may find nodes that belong to one community, but are connected to another node that belongs in the opposing community. Louvain attempts to minimize the number of such nodes, so you should not see many of them. In Memgraph Lab, execute the following query:
MATCH (n2)(m1)
WHERE n1.cluster_id != m1.cluster_id AND n1.cluster_id = n2.cluster_id
RETURN *
LIMIT 1000;
This query will MATCH node n1 and its relationship to two other nodes n2 and m1 with the following parts, respectively: (n2) and (n1)-[e]->(m1). Then, it will filter out only those nodes WHEREcluster_id of n1 and n2 is not the same as the cluster_id of node m1.
Use LIMIT 1000 to show only 1,000 of such relationships, for visualization simplicity.
Using Graph Style Script in Memgraph Lab, you can style your graphs to, for example, represent different communities with different colors. Figure 2 shows the Louvain query results.
Summary
And there you have it: millions of nodes and relationships imported using Memgraph and analyzed using cuGraph PageRank and Louvain graph analytics algorithms. With GPU-powered graph analytics from Memgraph, powered by NVIDIA cuGraph, you are able to explore massive graph databases and carry out inference without having to wait for results. You can find more tutorials covering a variety of techniques on the Memgraph website.
Unlocking the full potential of artificial intelligence (AI) in financial services is often hindered by the inability to ensure data privacy during machine…
Unlocking the full potential of artificial intelligence (AI) in financial services is often hindered by the inability to ensure data privacy during machine learning (ML). For instance, traditional ML methods assume all data can be moved to a central repository.
This is an unrealistic assumption when dealing with data sovereignty and security considerations or sensitive data like personally identifiable information. More practically, it ignores data egress challenges and the considerable cost of creating large pooled datasets.
Massive internal datasets that would be valuable for training ML models remain unused. How can companies in the financial services industry leverage their own data while ensuring privacy and security?
This post introduces federated learning and explains its benefits for businesses handling sensitive datasets. We present three ways federated learning can be used in financial services and provide tips on getting started today.
What is federated learning
Federated learning is a ML technique that enables the extraction of insights from multiple isolated datasets—without needing to share or move that data into a central repository or server.
For example, assume you have multiple datasets you want to use to train an AI model. Today’s standard ML approach requires first gathering all the training data in one place. However, this approach is not feasible for much of the world’s data that is sensitive. This leaves many datasets and use-cases off-limits for applying AI techniques.
On the other hand, federated learning does not assume that one unified dataset can be created. The distributed training datasets are instead left where they are.
The approach involves creating multiple versions of the model and sending one to each server or device where the datasets live. Each site trains the model locally on its subset of the data, and then sends only the model parameters back to a central server. This is the key feature of federated learning: only the model updates or parameters are shared, not the training data itself. This preserves data privacy and sovereignty.
Finally, the central server collects all the updates from each site and intelligently aggregates the “mini-models” into one global model. This global model can capture the insights from the entire dataset, even when actual data cannot be combined.
Note that these local sites could be servers, edge devices like smartphones, or any machine that can train locally and send back the model updates to the central server.
Advantages of privacy-preserving technology
Large-scale collaborations in healthcare have demonstrated the real-world viability of using federated learning for multiple independent parties to jointly train an AI model. However, federated learning is not just about collaborating with external partners.
In financial institutions, we see an incredible opportunity for federated learning to bridge internal data silos. Company-wide ROI can increase as businesses gather all viable data for new products, including recommender systems, fraud detection systems, and call center analytics.
Privacy concerns, however, aren’t limited to financial data. The wave of data privacy legislation being enacted worldwide today (starting with GDPR in Europe and CCPA in California, with many similar laws coming soon) will only accelerate the need for privacy-preserving ML techniques in all industries.
Expect federated learning to become an essential part of the AI toolset in the years ahead.
Practical business use cases
ML algorithms are hungry for data. Furthermore, the real-world performance of your ML model depends not only on the amount of data but also the relevance of the training data.
Many organizations could improve current AI models by incorporating new datasets that cannot be easily accessed without sacrificing privacy. This is where federated learning comes in.
Federated learning enables companies to leverage new data resources without requiring data sharing.
Broadly, three types of use cases are enabled by federated learning:
Intra-company: Bridging internal data silos
Inter-company: Facilitating collaboration between organizations
Edge computing: Learning across thousands of edge devices
Intra-company use case: Leverage siloed internal data
There are many reasons a single company might rely on multiple data storage solutions. For example:
Data governance rules such as GDPRmay require that data be kept in specific geolocations and specify retention and privacy policies.
Mergers and acquisitions come with new data from the partner company. Still, the arduous task of integrating that data into existing storage systems often leaves the data dispersed for long periods.
Both on-premiseand hybrid cloud storage solutions are used, and there is a high cost of moving large amounts of data.
Federated learning enables your company to leverage ML across isolated datasets in different business organizations, geographic regions, or data warehouses while preserving privacy and security.
Inter-company use case: Collaborate with external partners
Gathering enough quantitative data to build powerful AI models is difficult for a single company. Consider an insurance company building an effective fraud detection system. The company can only collect data from observed events such as customers filing a claim. Yet, this data may not be representative of the entire population and can therefore potentially contribute to AI model bias.
To build an effective fraud detection system, the company needs larger datasets with more diverse data points to train robust, generalizable models. Many organizations can benefit from pooling data with others. Practically, most organizations will not share their proprietary datasets on a common supercomputer or cloud server.
Enabling this type of collaboration for industry-wide challenges could bring massive benefits.
For example, in one of the largest real-world federated collaborations, we saw 20 independent hospitals across five continents train an AI model for predicting the oxygen needs of COVID-19 infected patients. On average, the hospitals saw a 38% improvement in generalizability and a 16% improvement in the model performance by participating in the federated system.
Likewise, there is a real opportunity to maintain customer privacy while credit card networks reduce fraudulent activity and banks employ anti-money laundering initiatives. Federated learning increases the data available to a single bank, which can help address issues such as money laundering activities in correspondent banking.
Edge computing: Smartphones and IoT
Google originally introduced federated learning in 2017 to train AI models on personal data distributed across billions of mobile devices. In 2022, many more devices are connected to the internet, including smartwatches, home assistants, alarm systems, thermostats, and even cars.
Federated learning is useful for all kinds of edge devices that are continuously collecting valuable data for ML models, but this data is often privacy sensitive, large in quantity, or both, which prevents logging to the data center.
How does federated learning fit into an existing workflow
It is important to note that federated learning is a general technique. Federated learning is not just about training neural networks; rather, it applies to data analysis, more traditional ML methods, or any other distributed workflow.
Very few assumptions are built into federated learning, and perhaps only two are worth mentioning: 1) local sites can connect to a central server, and 2) each site has the minimum computational resources to train locally.
Beyond that, you are free to design your own application with custom local and global aggregation behavior. You can decide how much trust to place in different parties and how much is shared with the central server. The federated system is configurable to your specific business needs.
For example, federated learning can be paired with other privacy-preserving techniques like differential privacy (to add noise) and homomorphic encryption (to encrypt model updates and obscure what the central server sees).
Get started with federated learning
We have developed a federated_learning code sample that shows you how to train a global fraud prediction model on two different splits of a credit card transactions dataset, corresponding to two different geographic regions.
Although federated learning by definition enables training across multiple machines, this example is designed to simulate an entire federated system on a single machine for you to get up and running in under an hour. The system is implemented with NVFLARE, an NVIDIA open-source framework to enable federated learning.
Acknowledgments
We would like to thank Patrick Hogan and Anita Weemaes for their contributions to this post.
Imagine driving a car — one without self-driving capabilities — to a mall, airport or parking garage, and using an app to have the car drive off to park itself. Software company Seoul Robotics is using NVIDIA technology to make this possible — turning non-autonomous cars into self-driving vehicles. Headquartered in Korea, the company’s initial Read article >
In the fast-paced field of making the world’s tech devices, Pegatron Corp. initially harnessed AI to gain an edge. Now, it’s on the cusp of creating digital twins to further streamline its efficiency. Whether or not they’re familiar with the name, most people have probably used smartphones, tablets, Wi-Fi routers or other products that Taiwan-based Read article >
Posted by Brian Ichter and Karol Hausman, Research Scientists, Google Research, Brain Team
Over the last several years, we have seen significant progress in applying machine learning to robotics. However, robotic systems today are capable of executing only very short, hard-coded commands, such as “Pick up an apple,” because they tend to perform best with clear tasks and rewards. They struggle with learning to perform long-horizon tasks and reasoning about abstract goals, such as a user prompt like “I just worked out, can you get me a healthy snack?”
Meanwhile, recent progress in training language models (LMs) has led to systems that can perform a wide range of language understanding and generation tasks with impressive results. However, these language models are inherently not grounded in the physical world due to the nature of their training process: a language model generally does not interact with its environment nor observe the outcome of its responses. This can result in it generating instructions that may be illogical, impractical or unsafe for a robot to complete in a physical context. For example, when prompted with “I spilled my drink, can you help?” the language model GPT-3 responds with “You could try using a vacuum cleaner,” a suggestion that may be unsafe or impossible for the robot to execute. When asking the FLAN language model the same question, it apologizes for the spill with “I’m sorry, I didn’t mean to spill it,” which is not a very useful response. Therefore, we asked ourselves, is there an effective way to combine advanced language models with robot learning algorithms to leverage the benefits of both?
In “Do As I Can, Not As I Say: Grounding Language in Robotic Affordances”, we present a novel approach, developed in partnership with Everyday Robots, that leverages advanced language model knowledge to enable a physical agent, such as a robot, to follow high-level textual instructions for physically-grounded tasks, while grounding the language model in tasks that are feasible within a specific real-world context. We evaluate our method, which we call PaLM-SayCan, by placing robots in a real kitchen setting and giving them tasks expressed in natural language. We observe highly interpretable results for temporally-extended complex and abstract tasks, like “I just worked out, please bring me a snack and a drink to recover.” Specifically, we demonstrate that grounding the language model in the real world nearly halves errors over non-grounded baselines. We are also excited to release a robot simulation setup where the research community can test this approach.
<!–
With PaLM-SayCan, the robot acts as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task.
–>
With PaLM-SayCan, the robot acts as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task.
A Dialog Between User and Robot, Facilitated by the Language Model Our approach uses the knowledge contained in language models (Say) to determine and score actions that are useful towards high-level instructions. It also uses an affordance function (Can) that enables real-world-grounding and determines which actions are possible to execute in a given environment. Using the the PaLM language model, we call this PaLM-SayCan.
Our approach selects skills based on what the language model scores as useful to the high level instruction and what the affordance model scores as possible.
Our system can be seen as a dialog between the user and robot, facilitated by the language model. The user starts by giving an instruction that the language model turns into a sequence of steps for the robot to execute. This sequence is filtered using the robot’s skillset to determine the most feasible plan given its current state and environment. The model determines the probability of a specific skill successfully making progress toward completing the instruction by multiplying two probabilities: (1) task-grounding (i.e., a skill language description) and (2) world-grounding (i.e., skill feasibility in the current state).
There are additional benefits of our approach in terms of its safety and interpretability. First, by allowing the LM to score different options rather than generate the most likely output, we effectively constrain the LM to only output one of the pre-selected responses. In addition, the user can easily understand the decision making process by looking at the separate language and affordance scores, rather than a single output.
PaLM-SayCan is also interpretable: at each step, we can see the top options it considers based on their language score (blue), affordance score (red), and combined score (green).
Training Policies and Value Functions Each skill in the agent’s skillset is defined as a policy with a short language description (e.g., “pick up the can”), represented as embeddings, and an affordance function that indicates the probability of completing the skill from the robot’s current state. To learn the affordance functions, we use sparse reward functions set to 1.0 for a successful execution, and 0.0 otherwise.
We use image-based behavioral cloning (BC) to train the language-conditioned policies and temporal-difference-based (TD) reinforcement learning (RL) to train the value functions. To train the policies, we collected data from 68,000 demos performed by 10 robots over 11 months and added 12,000 successful episodes, filtered from a set of autonomous episodes of learned policies. We then learned the language conditioned value functions using MT-Opt in the Everyday Robots simulator. The simulator complements our real robot fleet with a simulated version of the skills and environment, which is transformed using RetinaGAN to reduce the simulation-to-real gap. We bootstrapped simulation policies’ performance by using demonstrations to provide initial successes, and then continuously improved RL performance with online data collection in simulation.
Given a high-level instruction, our approach combines the probabilities from the language model with the probabilities from the value function (VF) to select the next skill to perform. This process is repeated until the high-level instruction is successfully completed.
Performance on Temporally-Extended, Complex, and Abstract Instructions To test our approach, we use robots from Everyday Robots paired with PaLM. We place the robots in a kitchen environment containing common objects and evaluate them on 101 instructions to test their performance across various robot and environment states, instruction language complexity and time horizon. Specifically, these instructions were designed to showcase the ambiguity and complexity of language rather than to provide simple, imperative queries, enabling queries such as “I just worked out, how would you bring me a snack and a drink to recover?” instead of “Can you bring me water and an apple?”
We use two metrics to evaluate the system’s performance: (1) the plan success rate, indicating whether the robot chose the right skills for the instruction, and (2) the execution success rate, indicating whether it performed the instruction successfully. We compare two language models, PaLM and FLAN (a smaller language model fine-tuned on instruction answering) with and without the affordance grounding as well as the underlying policies running directly with natural language (Behavioral Cloning in the table below). The results show that the system using PaLM with affordance grounding (PaLM-SayCan) chooses the correct sequence of skills 84% of the time and executes them successfully 74% of the time, reducing errors by 50% compared to FLAN and compared to PaLM without robotic grounding. This is particularly exciting because it represents the first time we can see how an improvement in language models translates to a similar improvement in robotics. This result indicates a potential future where robotics is able to ride the wave of progress that we have been observing in language models, bringing these subfields of research closer together.
Algorithm
Plan
Execute
PaLM-SayCan
84%
74%
PaLM
67%
–
FLAN-SayCan
70%
61%
FLAN
38%
–
Behavioral Cloning
0%
0%
PaLM-SayCan halves errors compared to PaLM without affordances and compared to FLAN over 101 tasks.
SayCan demonstrated successful planning for 84% of the 101 test instructions when combined with PaLM.
<!–
SayCan demonstrated successful planning for 84% of the 101 test instructions when combined with PaLM.
–>
If you’re interested in learning more about this project from the researchers themselves, please check out the video below:
Conclusion and Future Work We’re excited about the progress that we’ve seen with PaLM-SayCan, an interpretable and general approach to leveraging knowledge from language models that enables a robot to follow high-level textual instructions to perform physically-grounded tasks. Our experiments on a number of real-world robotic tasks demonstrate the ability to plan and complete long-horizon, abstract, natural language instructions at a high success rate. We believe that PaLM-SayCan’s interpretability allows for safe real-world user interaction with robots. As we explore future directions for this work, we hope to better understand how information gained via the robot’s real-world experience could be leveraged to improve the language model and to what extent natural language is the right ontology for programming robots. We have open-sourced a robot simulation setup, which we hope will provide researchers with a valuable resource for future research that combines robotic learning with advanced language models. The research community can visit the project’s GitHub page and website to learn more.
Acknowledgements We’d like to thank our coauthors Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Kelly Fu, Keerthana Gopalakrishnan, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan, and Andy Zeng. We’d also like to thank Yunfei Bai, Matt Bennice, Maarten Bosma, Justin Boyd, Bill Byrne, Kendra Byrne, Noah Constant, Pete Florence, Laura Graesser, Rico Jonschkowski, Daniel Kappler, Hugo Larochelle, Benjamin Lee, Adrian Li, Suraj Nair, Krista Reymann, Jeff Seto, Dhruv Shah, Ian Storz, Razvan Surdulescu, and Vincent Zhao for their help and support in various aspects of the project. And we’d like to thank Tom Small for creating many of the animations in this post.