DataBloom - Part 319

Misc

help with resnet50 and identifying new, not seen before, images

Post author By
Post date May 5, 2022
No Comments on help with resnet50 and identifying new, not seen before, images

I need help in coming up with a technique to determine when an object is not in the list of already classified objects.

I am sorting parts that are the identical shape, but differ in the text written on them. Unfortunately the text is written in a circle, so I can’t simply read the text with OCR. (At least I couldn’t make it work.)

I have about 20 different labels on the parts I currently have. Some parts I have 1,000 pictures of, others I only have 100 pictures.

I discovered while I was sorting that there are actually more than 20 different labels, there are parts I didn’t know about that only occur once or twice in five hundred parts. I would like to be able to identify and handle parts like this separately.

I am using keras.resnet50 to classify the images (parts). I get about a 96% accuracy of a part being identified correctly when I use two different classifiers to look at the same image and then check their agreement. I do this because I would rather reject parts than have them sorted incorrectly. Classifier two just has a different number of epochs than classifier one.

My images are 300×300.

I look at the weight of the prediction, and 95% of my predictions are almost certain, say .99999, so I haven’t been able to do the simple thing of just looking at the strength of the prediction.

In case this is relevant, the resnet50 classifier that I have trained might be overtrained. Within three epochs it is at like 99.7% accuracy with a val_accuracy of 98%.

Epoch 3/3

99/99 [==============================] – 572s 6s/step – loss: 0.0138 – accuracy: 0.9967 – val_loss: 0.3574 – val_accuracy: 0.9796

Any ideas?

Also, should I be posting this question on another sub as well?

submitted by /u/marlon1492
[visit reddit] [comments]

Misc

Algorithm for interrogating a model to find what activates a specific class?

Post author By
Post date May 5, 2022
No Comments on Algorithm for interrogating a model to find what activates a specific class?

I have an NN multi-class text classifier. Input is a vector of embeddings (the input is a sequence of words that are looked up on a vocab, and its embedding is used, of course). Since the domain of this model is a closed set (the vocabulary, and the size of the sequence), I’m thinking it should be possible to try a small set of all possible inputs (random phrases) to get an idea of what words would maximize a given class.

My question: is there a more efficient algorithm to do this, other than trying random sequences of words from the vocabulary?

I’m thinking there has to be a way to do this backwards – looking at the weights in the various units in the last layers relative to the embedding values.

Thanks

submitted by /u/AbIgnorantesBurros
[visit reddit] [comments]

Misc

help with resnet50 and identifying previously unencountered objects

Post author By
Post date May 5, 2022
No Comments on help with resnet50 and identifying previously unencountered objects

I need help in coming up with a technique to determine when an object is not in the list of already classified objects.

I have about 20 different labels on the parts I currently have. Some parts I have 1,000 pictures of, others I only have 100 pictures.

My images are 300×300.

I look at the weight of the prediction, and 95% of my predictions are almost certain, say .99999, so I haven’t been able to do the simple thing of just looking at the strength of the prediction.

In case this is relevant, the resnet50 classifier that I have trained might be overtrained. Within three epochs it is at like 99.7% accuracy with a val_accuracy of 98%.

Epoch 3/3

99/99 [==============================] – 572s 6s/step – loss: 0.0138 – accuracy: 0.9967 – val_loss: 0.3574 – val_accuracy: 0.9796

Any ideas?

submitted by /u/marlon1492
[visit reddit] [comments]

Misc

Optimizing Access to Parquet Data with fsspec

Post author By
Post date May 5, 2022
No Comments on Optimizing Access to Parquet Data with fsspec

This post details how the filesystem specification’s new parquet model provides a format-aware byte-cashing optimization.

As datasets continue to grow in size, the adoption of cloud-storage platforms like Amazon S3 and Google Cloud Storage (GCS) are becoming more popular. Although node-local storage is likely to result in better IO performance, this approach can become impractical after the dataset exceeds the single-terabyte scale.

For cases where remote storage is the only practical solution, much of the PyData ecosystem has already adopted Filesystem Spec (fsspec) as a universal file-system API. While fsspec has provided users with adequate remote storage access since the introduction of s3fs and gcsfs, the internal byte-transfer and caching algorithms have not yet leveraged format-specific optimizations for high-performance file formats like Parquet.

Key learnings

In this post, we introduce the fsspec.parquet module, which provides a format-aware, byte-caching optimization for remote Parquet files. This module is both experimental and limited in scope to a single public API: open_parquet_file.

This API provides faster remote-file access. When compared with the default read_parquet behavior in the pandas and cuDF DataFrame libraries, there is a consistent performance boost in overall throughput for partial I/O (column-chunk and row-group selection) from large Parquet files.

We also discuss the optimizations used within this new module and present the preliminary performance results.

What is fsspec?

Filesystem Spec (fsspec) is an open-sourced project providing a unified Python interface to a variety of backend storage systems. fsspec corresponds to a specific fsspec Python library and a larger GitHub organization containing many system-specific repositories (for example, s3fs and gcsfs).

The advantage of using fsspec within a Python-based library or application is that the same POSIX-like file API can write and read from remote objects and local files alike. When a cloud-based object is opened by an fsspec-compatible file system, the underlying application is given an AbstractBufferedFile object, or some other file-like object designed to duck-type with native Python file objects. Now, these file-like objects can be treated the same way as local Python-file handles.

One obvious difference is the AbstractBufferedFile object must use file system-specific commands internally to put and get bytes to and from the remote storage system. As internal data-transfer operations typically have higher latency than local disk accesses, fsspec offers several caching strategies to prefetch data while avoiding many small requests.

Later in this post, we explain why the default caching strategy is often suboptimal for large Parquet files, and how a new KnownPartsOfAFile (“parts”) option can dramatically reduce read latency.

What is Parquet?

Parquet is a binary column-oriented data-storage format, designed with performance, compression, and partial I/O in mind. Due to its dramatic performance advantages over text-formatted files like CSV, the open-source format has experienced rapid growth in popularity since its initial release in 2013.

To understand the optimizations used within the Fsspec-Parquet module, it is useful to have a high-level understanding of the Parquet specification. Figure 1 shows that all Parquet files consist of a collection of contiguously stored row-groups, and each of these row-groups includes a collection of contiguously stored column chunks.

A Parquet file is divided into row-groups, and row-groups are divided into column chunks. Every Parquet file also includes a single block of footer metadata. — *Figure 1. High-level visual representation of the Parquet file format*

Altogether, the majority of all bytes in a Parquet file correspond to these column chunks. The remaining bytes are used to store file metadata within a thrift-formatted footer. This footer metadata includes byte offsets and optional statistics (min, max, valid count, null count) for every column-chunk in the file.

Because this vital information is consolidated in the footer, it is only necessary to sample the end of a Parquet file to determine the specific location of each column-chunk in the file.

What is the purpose of fsspec’s new Parquet module?

fsspec is already the most common means of loading Parquet files under Python. The primary purpose of the new fsspec.parquet module is to provide an optimized utility for exactly this task.

Internally, this new utility (open_parquet_file) essentially wraps a conventional open call in Parquet-specific logic that is designed to begin the transfer of all relevant data from the remote file into local memory before the user has even initiated an explicit read operation.

Although reads of any size may benefit from this new utility, the most significant improvements are seen when the read operation targets only a subset of all columns and row-groups. For example, when the remote read is using column projection, the same list of columns can be passed directly to open_parquet_file:

from fsspec.parquet import open_parquet_file
import pandas as pd

path = “://my-bucket/my-data”
columns = [“col1”, “col2”]
options = {“necessary”: ”credentials”}

with open_parquet_file(
    path,
    columns=columns,
    storage_options=options,
) as f:
    df = pd.read_parquet(path, columns=columns)

As we mentioned earlier, the primary purpose of the fsspec new open_parquet_file function is to improve read performance from large Parquet files by using a format-aware caching strategy within AbstractBufferedFile.open.

To understand why the specific optimizations used by the Parquet module are beneficial, it is helpful to start with an understanding of the simple cache-free approach, as well as the default read-ahead strategy.

Understanding cache-free and read-ahead approaches

Figure 2 shows how cache-free file access is likely to map onto remote get calls within a read_parquet operation (after the remote file is already opened) for both an ideal and naive sequence of read calls.

The overall number of remote-get operations can be dramatically reduced by coalescing multiple requests for adjacent data blocks. — *Figure 2. Comparison of ideal and naive data access patterns from a remote Parquet file*

In this particular example, assume the Parquet file is large enough (~400MB+) to comprise four distinct row-groups, and the read_parquet call is targeting two out of the four available columns. This means the AbstractBufferedFile object must ultimately transfer eight distinct column-chunk ranges, along with the footer metadata, into local memory.

In the case that caching is disabled, it is possible for a well-tuned Parquet library to move only the necessary data into local memory using five distinct requests. However, as fsspec’s file-like interface contains state, these five read calls are always serial and incur one whole latency time for each call.

This ideal-access scenario is incapable of leveraging concurrency to minimize latency, even though the strategy is likely to minimize the number of file-system requests and produce high overall throughput. For Parquet libraries that take a naive-access approach and do not explicitly minimize the number of read calls, the read_parquet operation is likely to suffer from high latency and low overall throughput!

At this point, we’ve established that the I/O library is unable to take advantage of file-system concurrency after the file is already opened for reading. What is even more important to consider is that the default read-ahead caching strategy can degrade the observed performance even further when partial I/O is involved. This is because the inherent assumption of read-ahead caching is that sequential file accesses are likely to be contiguous.

Figure 3 shows that read-ahead caching is likely to transfer about 20 MB of unnecessary data, 5 MB of read-ahead for each row-group.

Using read-ahead caching to load a subset of columns from a Parquet file will result in unnecessary data transfer. — *Figure 3. Comparison of the read-ahead and cache-free file-access strategies in* fsspec

As you see in the following performance results, the benefit of disabled caching over read-ahead caching is dependent on both the engine used within read_parquet and the specific storage system.

If the engine already includes format-aware optimizations to move the necessary byte ranges into local memory (that is, pyarrow and fastparquet), then the no-caching option is likely the better choice. When the engine assumes local file access is fast (cuDF), some form of fsspec-level caching may be critical.

In either case, the engine is unable to begin the transfer of the required byte ranges until after the fsspec file object is created. It is limited by the additive latency of sequential data transfer.

Now that we’ve established a high-level understanding of both default and cache-free AbstractBufferedFile behavior within a typical read_parquet call, we can now explain the two general optimizations used by open_parquet_file to improve overall read throughput.

Optimization 1: Use the “parts” caching strategy

Instead of the format-agnostic caching strategies used within AbstractBufferedFile by default, you use the new KnownPartsOfAFile (“parts”) option to cache exactly what you need before the file is even open.

In other words, begin by using an external Parquet engine (either fastparquet or pyarrow) to parse the footer metadata up-front. Then, transfer the necessary byte-ranges into local memory, and use the parts’ cache to ensure that downstream applications never have to wait for remote data after an open fsspec file object is acquired.

Figure 4 shows that there is no longer any relationship between the logic used to access data within read_parquet and the number or size of get calls used for remote-data access. From the perspective of the Parquet engine, any read operation is near-instantaneous, because all necessary data is already cached in memory.

When the “parts” caching strategy is used, read operations will not result in new remote-file accesses. — *Figure 4. The* fsspec’s “parts” caching strategy. All necessary data must be cached in local memory before the file-like object is even opened.

Optimization 2: Transfer “parts” asynchronously and in parallel

Although the previous optimization enables you to avoid many small get operations and unnecessary data transfer within read_parquet, you must still populate the KnownPArtsOfAFile cache before AbstractBufferedFile can be initialized.

To do this as efficiently as possible, use cat_ranges to fetch all the required column-chunks at one time, both asynchronously and in parallel using asyncio. Because the total number of transferred column chunks can be large for files containing many fields or multiple row-groups, you also aggregate adjacent byte-range requests as long as the size of the aggregated request stays below an upper limit.

Figure 5 shows that this approach ultimately leads to the concurrent transfer of multiple byte ranges of optimal size.

Diagram shows that all necessary remote data is transferred into local-memory in parallel. Adjacent data transfers are coalesced to reduce the overall number of remote-file accesses. — *Figure 5. Data-transfer optimizations used in* fsspec.parquet

Preliminary fsspec.parquet benchmark results

To compare the performance of open_parquet_file with other fsspec– and pyarrow-based file-handling methods, use the available Python script:

pyarrow-6.0.1
fastparquet-0.8.0
cudf-22.04
fsspec/s3fs/gcfs-2022.2.0

Using this script, we read a single column from a 789M Parquet file containing a total of 10 row-groups and 30 columns with snappy compression requiring the transfer of roughly 27M of data. The results for both S3 and GCS, summarized in Figures 6-7, clearly show a significant performance boost of 85% or more when moving from the default caching strategy to open_parquet_file.

In fact, the new function effectively matches the performance of PyArrow’s native S3FileSystem implementation, which is specifically designed for optimal performance with its Parquet engine. In this post, we compare it only with a native PyArrow filesystem for this single Amazon S3 benchmark, because PyArrow only offers publicly supported implementations for Amazon S3 and Hadoop at the time of publication.

Using fsspec’s Parquet module reduces the read latency for both cuDF and fastparquet, and roughly matches optimal performance in pyarrow. — *Figure 6. General benchmark results for S3 storage*

Using fsspec’s Parquet module reduces the read latency for cudf, fastparquet, and pyarrow. The latency is consistently reduced by 50% or more. — *Figure 7. General benchmark results for GCS storage*

To illustrate the benefit of using open_parquet_file scales with the overall size of the file, we also read a single column from a 12 GB Parquet file containing the first day of the Criteo dataset from public GCS storage with compression disabled. Figure 8 shows that the new fsspec function can offer a 10x speedup or more over default caching.

Using fsspec’s Parquet module significantly reduces the read latency for cuDF, fastparquet, and PyArrow. The measured speedup is a factor of ten or more. — *Figure 8. Criteo-dataset benchmark results for GCS storage*

Test it out yourself

In this post, we introduced the fsspec.parquet module, which provides a format-aware, byte-caching optimization for opening remote Parquet files, open_parquet_file. Benchmarking clearly suggests that the new optimization can offer significant performance improvements over default file-opening behavior in fsspec, and may even approach the performance of optimized C++ file-system implementations in PyArrow for partial I/O.

Since being officially released in the 2021.11.0 version of fsspec, the open_parquet_file utility has already been adopted by both the RAPIDS cuDF library and Dask-Dataframe.

Due to dramatic and consistent improvement available for cuDF-based workflows, this new feature has already been adopted as the default file-opening approach within cudf.read_parquet and dask_cudf.read_parquet.

For Dask users without GPU resources, the optimized caching approach can now be opted into by passing an open_file_options argument to read_parquet. For example, the following code example instructs Dask to open all Parquet data files with the parquet pre-caching method:

import dask.dataframe as dd

ddf = dd.read_parquet(
    path, open_file_options={“precache_options”: {“method”: “parquet”}}
)

Given this early success, we are hoping to expand and simplify the available pre-caching options in fsspec, and establish a clear precache_options API within all file-opening functions.

Community feedback here is critical. Please engage on GitHub or comment below, and let us know what you think!

Misc

linux seems stucking on cudnn how to fix it?

Post author By
Post date May 5, 2022
No Comments on linux seems stucking on cudnn how to fix it?

same repo to run the tensorflow gpu code.

on Win, it passed the stage like :

- weight name: discriminator/gan/conv6/bias:0, shape: [256], size: 256 [32;1mTrigger callback: [0mTotal counts of trainable weights: 33579064. Total size of trainable weights: 0G 32M 24K 56B (Assuming32-bit data type.) 2022-05-05 11:29:14.865680: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll 2022-05-05 11:29:15.131382: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2022-05-05 11:29:15.901900: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows Relying on driver to perform ptx compilation. This message will be only logged once.

however the linux stays at the

Total size of trainable weights: 0G 32M 24K 56B (Assuming32-bit data type.)

for ever,

assume this linux wasn’t able to open the cudnn???

cudnn is installed by running

$conda list cudatoolkit 10.0.130 hf841e97_10 conda-forge cudnn 7.6.5.32 ha8d7eb6_1 conda-forge

the version seems fine with tensorflow-gpu 1.15 tensorflow-cuda

after installing the cuda with system package manager,

but there is another cudatoolkit 10 and the cudnn comes withthe tensorflow-gpu. did not restart the machine. does this matter?

2022-05-05 12:46:14.496348: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA 2022-05-05 12:46:14.520365: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2601325000 Hz 2022-05-05 12:46:14.521958: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5565bcb72f40 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2022-05-05 12:46:14.521982: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2022-05-05 12:46:14.523737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2022-05-05 12:46:14.526821: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_SYSTEM_DRIVER_MISMATCH: system has unsupported display driver / cuda driver combination 2022-05-05 12:46:14.526889: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 510.60.2 2022-05-05 12:46:14.526904: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 495.44.0 2022-05-05 12:46:14.526916: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 495.44.0 does not match DSO version 510.60.2 -- cannot find working devices in this configuration

is this because the cuda 11 installed by system not compatible with the cudatoolkit 10 by conda?

any idea how to fix this one?

thanks a lot

submitted by /u/boydbuilding
[visit reddit] [comments]

Misc

does tensorflow 1.15 work with cuda 11 and cudnn 8?

Post author By
Post date May 5, 2022
No Comments on does tensorflow 1.15 work with cuda 11 and cudnn 8?

Hi,

found this

https://github.com/NVIDIA/tensorflow

but have no idea to make it work.

no matches found: nvidia-tensorflow[horovod]

right now

$ nvcc --version Cuda compilation tools, release 11.6,

and

$conda list cudatoolkit 10.0.130 hf841e97_10 conda-forge cudnn 7.6.5.32 ha8d7eb6_1 conda-forge

and tensorflow doesn’t work right now. the repo were developed under tensorflow 1.15..

what is the best solution right now?

to make tensorflow 1.15 work with cuda 11 and cudnn 8?

or downgrade cuda 11 on system

or what

thanks a ton

submitted by /u/boydbuilding
[visit reddit] [comments]

Misc

Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive

Post author By
Post date May 5, 2022
No Comments on Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive

Teaching the AI brains of autonomous vehicles to understand the world as humans do requires billions of miles of driving experience. The road to achieving this astronomical level of driving leads to the virtual world. On the latest episode of the AI Podcast, Waabi CEO and founder Raquel Urtasun joins NVIDIA’s Katie Burke Washabaugh to Read article >

The post Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive appeared first on NVIDIA Blog.

Misc

Upcoming Event: Session on Vision AI at the Edge, from Zero to Deployment Using Low-Code Development at Embedded Vision Summit

Post author By
Post date May 5, 2022
No Comments on Upcoming Event: Session on Vision AI at the Edge, from Zero to Deployment Using Low-Code Development at Embedded Vision Summit

This Embedded Vision Summit session will showcase a low-code approach to develop fully optimized and accelerated vision AI applications using DeepStream SDK and graph composer.

Misc

DLI Course: Disaster Risk Monitoring Using Satellite Imagery

Post author By
Post date May 5, 2022
No Comments on DLI Course: Disaster Risk Monitoring Using Satellite Imagery

Learn to build and deploy a deep learning model for automated flood detection using satellite imagery in this new self-paced course from the NVIDIA Deep Learning Institute.

Offsites

Learning Locomotion Skills Safely in the Real World

Post author By
Post date May 5, 2022
No Comments on Learning Locomotion Skills Safely in the Real World

Posted by Jimmy (Tsung-Yen) Yang, Student Researcher, Robotics at Google

The promise of deep reinforcement learning (RL) in solving complex, high-dimensional problems autonomously has attracted much interest in areas such as robotics, game playing, and self-driving cars. However, effectively training an RL policy requires exploring a large set of robot states and actions, including many that are not safe for the robot. This is a considerable risk, for example, when training a legged robot. Because such robots are inherently unstable, there is a high likelihood of the robot falling during learning, which could cause damage.

The risk of damage can be mitigated to some extent by learning the control policy in computer simulation and then deploying it in the real world. However, this approach usually requires addressing the difficult sim-to-real gap, i.e., the policy trained in simulation can not be readily deployed in the real world for various reasons, such as sensor noise in deployment or the simulator not being realistic enough during training. Another approach to solve this issue is to directly learn or fine-tune a control policy in the real world. But again, the main challenge is to assure safety during learning.

In “Safe Reinforcement Learning for Legged Locomotion”, we introduce a safe RL framework for learning legged locomotion while satisfying safety constraints during training. Our goal is to learn locomotion skills autonomously in the real world without the robot falling during the entire learning process. Our learning framework adopts a two-policy safe RL framework: a “safe recovery policy” that recovers robots from near-unsafe states, and a “learner policy” that is optimized to perform the desired control task. The safe learning framework switches between the safe recovery policy and the learner policy to enable robots to safely acquire novel and agile motor skills.

The Proposed Framework
Our goal is to ensure that during the entire learning process, the robot never falls, regardless of the learner policy being used. Similar to how a child learns to ride a bike, our approach teaches an agent a policy while using “training wheels”, i.e., a safe recovery policy. We first define a set of states, which we call a “safety trigger set”, where the robot is close to violating safety constraints but can still be saved by a safe recovery policy. For example, the safety trigger set can be defined as a set of states with the height of the robots being below a certain threshold and the roll, pitch, yaw angles being too large, which is an indication of falls. When the learner policy results in the robot being within the safety trigger set (i.e., where it is likely to fall), we switch to the safe recovery policy, which drives the robot back to a safe state. We determine when to switch back to the learner policy by leveraging an approximate dynamics model of the robot to predict the future robot trajectory. For example, based on the position of the robot’s legs and the current angle of the robot based on sensors for roll, pitch, and yaw, is it likely to fall in the future? If the predicted future states are all safe, we hand the control back to the learner policy, otherwise, we keep using the safe recovery policy.

The state diagram of the proposed approach. (1) If the learner policy violates the safety constraint, we switch to the safe recovery policy. (2) If the learner policy cannot ensure safety in the near future after switching to the safe recovery policy, we keep using the safe recovery policy. This allows the robot to explore more while ensuring safety.

This approach ensures safety in complex systems without resorting to opaque neural networks that may be sensitive to distribution shifts in application. In addition, the learner policy is able to explore states that are near safety violations, which is useful for learning a robust policy.

Because we use “approximated” dynamics to predict the future trajectory, we also examine how much safer a robot would be if we used a much more accurate model for its dynamics. We provide a theoretical analysis of this problem and show that our approach can achieve minimal safety performance loss compared to one with a full knowledge about the system dynamics.

Legged Locomotion Tasks
To demonstrate the effectiveness of the algorithm, we consider learning three different legged locomotion skills:

Efficient Gait: The robot learns how to walk with low energy consumption and is rewarded for consuming less energy.
Catwalk: The robot learns a catwalk gait pattern, in which the left and right two feet are close to each other. This is challenging because by narrowing the support polygon, the robot becomes less stable.
Two-leg Balance: The robot learns a two-leg balance policy, in which the front-right and rear-left feet are in stance, and the other two are lifted. The robot can easily fall without delicate balance control because the contact polygon degenerates into a line segment.

Locomotion tasks considered in the paper. Top: efficient gait. Middle: catwalk. Bottom: two-leg balance.

Implementation Details
We use a hierarchical policy framework that combines RL and a traditional control approach for the learner and safe recovery policies. This framework consists of a high-level RL policy, which produces gait parameters (e.g., stepping frequency) and feet placements, and pairs it with a low-level process controller called model predictive control (MPC) that takes in these parameters and computes the desired torque for each motor in the robot. Because we do not directly command the motors’ angles, this approach provides more stable operation, streamlines the policy training due to a smaller action space, and results in a more robust policy. The input of the RL policy network includes the previous gait parameters, the height of the robot, base orientation, linear, angular velocities, and feedback to indicate whether the robot is approaching the safety trigger set. We use the same setup for each task.

We train a safe recovery policy with a reward for reaching stability as soon as possible. Furthermore, we design the safety trigger set with inspiration from capturability theory. In particular, the initial safety trigger set is defined to ensure that the robot’s feet can not fall outside of the positions from which the robot can safely recover using the safe recovery policy. We then fine-tune this set on the real robot with a random policy to prevent the robot from falling.

Real-World Experiment Results
We report the real-world experimental results showing the reward learning curves and the percentage of safe recovery policy activations on the efficient gait, catwalk, and two-leg balance tasks. To ensure that the robot can learn to be safe, we add a penalty when triggering the safe recovery policy. Here, all the policies are trained from scratch, except for the two-leg balance task, which was pre-trained in simulation because it requires more training steps.

Overall, we see that on these tasks, the reward increases, and the percentage of uses of the safe recovery policy decreases over policy updates. For instance, the percentage of uses of the safe recovery policy decreases from 20% to near 0% in the efficient gait task. For the two-leg balance task, the percentage drops from near 82.5% to 67.5%, suggesting that the two-leg balance is substantially harder than the previous two tasks. Still, the policy does improve the reward. This observation implies that the learner can gradually learn the task while avoiding the need to trigger the safe recovery policy. In addition, this suggests that it is possible to design a safe trigger set and a safe recovery policy that does not impede the exploration of the policy as the performance increases.

The reward learning curve (blue) and the percentage of safe recovery policy activations (red) using our safe RL algorithm in the real world.

In addition, the following video shows the learning process for the two-leg balance task, including the interplay between the learner policy and the safe recovery policy, and the reset to the initial position when an episode ends. We can see that the robot tries to catch itself when falling by putting down the lifted legs (front left and rear right) outward, creating a support polygon. After the learning episode ends, the robot walks back to the reset position automatically. This allows us to train policy autonomously and safely without human supervision.

Early training stage.

Late training stage.

Without a safe recovery policy.

Finally, we show the clips of learned policies. First, in the catwalk task, the distance between two sides of the legs is 0.09m, which is 40.9% smaller than the nominal distance. Second, in the two-leg balance task, the robot can maintain balance by jumping up to four times via two legs, compared to one jump from the policy pre-trained from simulation.

Final learned two-leg balance.

Conclusion
We presented a safe RL framework and demonstrated how it can be used to train a robotic policy with no falls and without the need for a manual reset during the entire learning process for the efficient gait and catwalk tasks. This approach even enables training of a two-leg balance task with only four falls. The safe recovery policy is triggered only when needed, allowing the robot to more fully explore the environment. Our results suggest that learning legged locomotion skills autonomously and safely is possible in the real world, which could unlock new opportunities including offline dataset collection for robot learning.

No model is without limitation. We currently ignore the model uncertainty from the environment and non-linear dynamics in our theoretical analysis. Including these would further improve the generality of our approach. In addition, some hyper-parameters of the switching criteria are currently being heuristically tuned. It would be more efficient to automatically determine when to switch based on the learning progress. Furthermore, it would be interesting to extend this safe RL framework to other robot applications, such as robot manipulation. Finally, designing an appropriate reward when incorporating the safe recovery policy can impact learning performance. We use a penalty-based approach that obtained reasonable results in these experiments, but we plan to investigate this in future work to make further performance improvements.

Acknowledgements
We would like to thank our paper co-authors: Tingnan Zhang, Linda Luu, Sehoon Ha, Jie Tan, and Wenhao Yu. We would also like to thank the team members of Robotics at Google for discussions and feedback.