This post describes different ways to compile an application using various development environments for the BlueField DPU.
Step-A
Step-B
Go get a cup of coffee…
Step-C
How often have you seen “Go get a coffee” in the instructions? As a developer, I found early on that this pesky quip is the bane of my life. Context switches, no matter the duration, are a high cost to pay in the application development cycle. Of all the steps that require you to step away, waiting for an application to compile is the hardest to shake off.
As we all enter the new world of NVIDIA Bluefield DPU application development, it is important to set up the build-step efficiently, to allow you to {code => compile => unit-test}
seamlessly. In this post, I go over different ways to compile an application for the DPU.
Free range routing with the DOCA dataplane plugin
In the DPU application development series, I talked about creating a DOCA dataplane plugin in FRR for offloading policies. FRR’s code count is close to a million lines (789,678 SLOC), which makes it a great candidate for measuring build times.
Developing directly on the Bluefield DPU
The DPU has an Arm64 architecture and one quick way to get started on DPU applications is to develop directly on the DPU. This test is with an NVIDIA BlueField2 with 8G RAM and 8xCortex-A72 CPUs.
I installed the Bluefield boot file (BFB), which provides the Ubuntu 20.04.3 OS image for the DPU. It also includes the libraries for DOCA-1.2 and DPDK-20.11.3. To build an application with the DOCA libraries, I add the DPDK pkgconfig
location to the PKG_CONFIG
path.
root@dpu-arm:~# export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/mellanox/dpdk/lib/aarch64-linux-gnu/pkgconfig
Next, I set up my code workspace on the DPU by cloning FRR and switching to the DOCA dataplane plugin branch.
root@dpu-arm:~/code# git clone https://github.com/AnuradhaKaruppiah/frr.git
root@dpu-arm:~/code# cd frr
root@dpu-arm:~/code/frr# git checkout dp-doca
FRR requires a list of constantly evolving prerequisites that are enumerated in the FRR community docs. With those dependencies installed, I configured FRR to include the DPDK and DOCA dataplane plugins.
root@dpu-arm:~/code/frr# ./bootstrap.sh
root@dpu-arm:~/code/frr# ./configure --build=aarch64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=${prefix}/lib/aarch64-linux-gnu --libexecdir=${prefix}/lib/aarch64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-exampledir=/usr/share/doc/frr/examples/ --localstatedir=/var/run/frr --sbindir=/usr/lib/frr --sysconfdir=/etc/frr --with-vtysh-pager=/usr/bin/pager --libdir=/usr/lib/aarch64-linux-gnu/frr --with-moduledir=/usr/lib/aarch64-linux-gnu/frr/modules "LIBTOOLFLAGS=-rpath /usr/lib/aarch64-linux-gnu/frr" --disable-dependency-tracking --disable-dev-build --enable-systemd=yes --enable-rpki --with-libpam --enable-doc --enable-doc-html --enable-snmp --enable-fpm --disable-zeromq --enable-ospfapi --disable-bgp-vnc --enable-multipath=128 --enable-user=root --enable-group=root --enable-vty-group=root --enable-configfile-mask=0640 --enable-logfile-mask=0640 --disable-address-sanitizer --enable-cumulus=yes --enable-datacenter=yes --enable-bfdd=no --enable-sharpd=yes --enable-dp-doca=yes --enable-dp-dpdk=yes
As I used the DPU as my development environment, I built and installed the FRR binaries in place:
root@dpu-arm:~/code# make –j12 all; make install
Here’s how the build times fared. I measured that multiple ways:
- Time to build and install the binaries using
make -j12 all
andmake install
- Time to build the same binaries but also assemble them into a Debian package using
dpkg-buildpackage –j12 –uc –us
The first method is used for coding and unit testing. The second method of generating debs is needed to compare with build times on other external development environments.
DPU-ARM build Times |
Real |
User |
Sys |
DPU Arm (Complete make) |
2min 40.529 sec |
16min 29.855 sec |
2min 1.534 sec |
DPU Arm (Debian package) |
5min 23.067 sec |
20min 33.614 sec |
2min 49.628sec |
The difference in times is expected. Generating a package involves several additional steps.
There are some clear advantages to using the DPU as your development environment.
- You can code, build and install, and then unit-test without leaving your workspace.
- You can optimize the build for incremental code changes.
The last option is usually a massive reduction in build time compared to a complete build. For example, I modified the DOCA dataplane code in FRR and rebuilt with these results:
root@dpu-arm:~/code/frr# time make –j12
>>>>>>>>>>>>> snipped make output >>>>>>>>>>>>
real 0m3.119s
user 0m2.794s
sys 0m0.479s
While that may make things easier, it requires reserving a DPU indefinitely for every developer for the sole purpose of application development or maintenance. Your development environment may also require more memory and horsepower, making this a less viable option long-term.
Developing on an x86 server
My Bluefield2 DPU was hosted by an x86-64 Ubuntu 20.04 server, and I used this server for my development environment.
root@server1-x86:~# lscpu |grep "CPU(s):|Model name"
CPU(s): 32
Model name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
root@server1-x86:~# grep MemTotal /proc/meminfo
MemTotal: 131906300 kB
In this case, the build-machine is x86 and the host-machine where the app is going to run is DPU-Arm64. There are several ways to do this:
- Use an Arm emulation on the x86 build-machine. A DOCA development container is available as a part of the DOCA packages.
- Use a cross-compilation toolchain.
In this test, I used the first option as it was the easiest. The second option can give you a different performance but creating that toolchain has its challenges.
I downloaded and loaded the bfb_builder_doca_ubuntu_20.04
container on my x86 server and fired it up.
root@server1-x86:~# sudo docker load -i bfb_builder_doca_ubuntu_20.04-mlnx-5.4.tar
root@server1-x86:~# docker run -v ~/code:/code --privileged -it -e container=dock
er doca_v1.11_bluefield_os_ubuntu_20.04-mlnx-5.4:latest
The DOCA and DPDK libraries come preinstalled in this container, and I just had to add them to the PKG_CONFIG
path.
root@86b87b0ab0c2:/code # export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/opt/mellanox/dpdk/lib/aarch64-linux-gnu/pkgconfig
I set up the workspace and FRR prerequisites within the container, same as with the previous option.
root@86b87b0ab0c2:/code # git clone https://github.com/AnuradhaKaruppiah/frr.git
root@86b87b0ab0c2:/code # cd frr
root@86b87b0ab0c2:/code/frr # git checkout dp-doca
I could build my application within this DOCA container, but I couldn’t test it in place. So, the FRR binaries had to be built and packaged into debs, which are then copied over to the Bluefield DPU for testing. I set up the FRR Debian rules to match the FRR build configuration used in the previous option and generated the package:
root@86b87b0ab0c2:/code/frr # dpkg-buildpackage –j12 –uc -us
Table 2 shows how the build time compares with previous methods.
DPU-Arm & X86 Build Times |
Real |
User |
Sys |
DPU Arm (Complete make) |
2min 40.529sec |
16min 29.855sec |
2min 1.534sec |
DPU Arm (Debian package) |
5min 23.067sec |
20min 33.614sec |
2min 49.628sec |
X86 + DOCA dev container (Debian package) |
24min 19.051sec
|
139min 39.286s
|
3min 58.081sec
|
The giant jump in build time surprised me because I have an amply stocked x86 server and no Docker limits. So, it seems throwing CPUs and RAM at a problem doesn’t always help! This performance degradation is because of the cross architecture, as you can see with the next option.
Developing in an AWS Graviton instance
Next, I tried building my app natively on Arm but this time on an external server with more horsepower. I used an Amazon EC2 Graviton instance for this purpose with specs comparable to my x86 server.
- Arm64 arch, Ubuntu 20.04 OS
- 128G RAM
- 32 vCPUs
root@ip-172-31-28-243:~# lscpu |grep "CPU(s):|Model name"
CPU(s): 32
Model name: Neoverse-N1
root@ip-172-31-28-243:~# grep MemTotal /proc/meminfo
MemTotal: 129051172 kB
To set up the DOCA and DPDK libraries in this instance, I installed the DOCA SDK repo meta package.
root@ip-172-31-28-243:~# dpkg -i doca-repo-aarch64-ubuntu2004-local_1.1.1-1.5.4.2.4.1.3.bf.3.7.1.11866_arm64.deb
root@ip-172-31-28-243:~# apt update
root@ip-172-31-28-243:~# apt install doca-sdk
The remaining steps for cloning and building the FRR Debian package are the same as the previous option.
Table 3 shows how the build fared on the AWS Arm instance.
DPU-Arm, X86 & AWS-Arm Build Times |
Real |
User |
Sys |
DPU Arm (Complete make) |
2min 40.529sec |
16min 29.855sec |
2min 1.534sec |
DPU Arm (Debian package) |
5min 23.067sec |
20min 33.614sec |
2min 49.628sec |
X86 + DOCA dev container (Generate Debian package) |
24min 19.051sec
|
139min 39.286sec
|
3min 58.081sec
|
AWS-Arm (Generate Debian package) |
1min 30.480sec
|
6min 6.056sec |
0min 35.921sec
|
This is a clear winner, no coffee needed.
Figure 1 shows the compile times in these environments.
Summary
In this post, I discussed several development environments for DPU applications:
- Bluefield DPU
- DOCA dev container on an x86 server
- AWS Graviton compute instance
You can prototype your app directly on the DPU, experiment with developing in the x86 DOCA development container, and grab an AWS Graviton instance with DOCA to punch it into hyperspeed!
For more information, see the following resources: