Categories
Misc

sparklyr 1.6: weighted quantile summaries, power iteration clustering, spark_write_rds(), and more

Sparklyr 1.6 is now available on CRAN!

To install sparklyr 1.6 from CRAN, run

install.packages("sparklyr")

In this blog post, we shall highlight the following features and enhancements from sparklyr 1.6:

Weighted quantile summaries

Apache Spark is well-known for supporting approximate algorithms that trade off marginal amounts of accuracy for greater speed and parallelism. Such algorithms are particularly beneficial for performing preliminary data explorations at scale, as they enable users to quickly query certain estimated statistics within a predefined error margin, while avoiding the high cost of exact computations. One example is the Greenwald-Khanna algorithm for on-line computation of quantile summaries, as described in Greenwald and Khanna (2001). This algorithm was originally designed for efficient (epsilon)- approximation of quantiles within a large dataset without the notion of data points carrying different weights, and the unweighted version of it has been implemented as approxQuantile() since Spark 2.0. However, the same algorithm can be generalized to handle weighted inputs, and as sparklyr user @Zhuk66 mentioned in this issue, a weighted version of this algorithm makes for a useful sparklyr feature.

To properly explain what weighted-quantile means, we must clarify what the weight of each data point signifies. For example, if we have a sequence of observations ((1, 1, 1, 1, 0, 2, -1, -1)), and would like to approximate the median of all data points, then we have the following two options:

  • Either run the unweighted version of approxQuantile() in Spark to scan through all 8 data points

  • Or alternatively, “compress” the data into 4 tuples of (value, weight): ((1, 0.5), (0, 0.125), (2, 0.125), (-1, 0.25)), where the second component of each tuple represents how often a value occurs relative to the rest of the observed values, and then find the median by scanning through the 4 tuples using the weighted version of the Greenwald-Khanna algorithm

We can also run through a contrived example involving the standard normal distribution to illustrate the power of weighted quantile estimation in sparklyr 1.6. Suppose we cannot simply run qnorm() in R to evaluate the quantile function of the standard normal distribution at (p = 0.25) and (p = 0.75), how can we get some vague idea about the 1st and 3rd quantiles of this distribution? One way is to sample a large number of data points from this distribution, and then apply the Greenwald-Khanna algorithm to our unweighted samples, as shown below:

library(sparklyr)

sc <- spark_connect(master = "local")

num_samples <- 1e6
samples <- data.frame(x = rnorm(num_samples))

samples_sdf <- copy_to(sc, samples, name = random_string())

samples_sdf %>%
  sdf_quantile(
    column = "x",
    probabilities = c(0.25, 0.75),
    relative.error = 0.01
  ) %>%
  print()
##        25%        75%
## -0.6629242  0.6874939

Notice that because we are working with an approximate algorithm, and have specified relative.error = 0.01, the estimated value of (-0.6629242) from above could be anywhere between the 24th and the 26th percentile of all samples. In fact, it falls in the (25.36896)-th percentile:

pnorm(-0.6629242)
## [1] 0.2536896

Now how can we make use of weighted quantile estimation from sparklyr 1.6 to obtain similar results? Simple! We can sample a large number of (x) values uniformly randomly from ((-infty, infty)) (or alternatively, just select a large number of values evenly spaced between ((-M, M)) where (M) is approximately (infty)), and assign each (x) value a weight of (displaystyle frac{1}{sqrt{2 pi}}e^{-frac{x^2}{2}}), the standard normal distribution’s probability density at (x). Finally, we run the weighted version of sdf_quantile() from sparklyr 1.6, as shown below:

library(sparklyr)

sc <- spark_connect(master = "local")

num_samples <- 1e6
M <- 1000
samples <- tibble::tibble(
  x = M * seq(-num_samples / 2 + 1, num_samples / 2) / num_samples,
  weight = dnorm(x)
)

samples_sdf <- copy_to(sc, samples, name = random_string())

samples_sdf %>%
  sdf_quantile(
    column = "x",
    weight.column = "weight",
    probabilities = c(0.25, 0.75),
    relative.error = 0.01
  ) %>%
  print()
##    25%    75%
## -0.696  0.662

Voilà! The estimates are not too far off from the 25th and 75th percentiles (in relation to our abovementioned maximum permissible error of (0.01)):

pnorm(-0.696)
## [1] 0.2432144
pnorm(0.662)
## [1] 0.7460144

Power iteration clustering

Power iteration clustering (PIC), a simple and scalable graph clustering method presented in Lin and Cohen (2010), first finds a low-dimensional embedding of a dataset, using truncated power iteration on a normalized pairwise-similarity matrix of all data points, and then uses this embedding as the “cluster indicator”, an intermediate representation of the dataset that leads to fast convergence when used as input to k-means clustering. This process is very well illustrated in figure 1 of Lin and Cohen (2010) (reproduced below)

in which the leftmost image is the visualization of a dataset consisting of 3 circles, with points colored in red, green, and blue indicating clustering results, and the subsequent images show the power iteration process gradually transforming the original set of points into what appears to be three disjoint line segments, an intermediate representation that can be rapidly separated into 3 clusters using k-means clustering with (k = 3).

In sparklyr 1.6, ml_power_iteration() was implemented to make the PIC functionality in Spark accessible from R. It expects as input a 3-column Spark dataframe that represents a pairwise-similarity matrix of all data points. Two of the columns in this dataframe should contain 0-based row and column indices, and the third column should hold the corresponding similarity measure. In the example below, we will see a dataset consisting of two circles being easily separated into two clusters by ml_power_iteration(), with the Gaussian kernel being used as the similarity measure between any 2 points:

gen_similarity_matrix <- function() {
  # Guassian similarity measure
  guassian_similarity <- function(pt1, pt2) {
    exp(-sum((pt2 - pt1) ^ 2) / 2)
  }
  # generate evenly distributed points on a circle centered at the origin
  gen_circle <- function(radius, num_pts) {
    seq(0, num_pts - 1) %>%
      purrr::map_dfr(
        function(idx) {
          theta <- 2 * pi * idx / num_pts
          radius * c(x = cos(theta), y = sin(theta))
        })
  }
  # generate points on both circles
  pts <- rbind(
    gen_circle(radius = 1, num_pts = 80),
    gen_circle(radius = 4, num_pts = 80)
  )
  # populate the pairwise similarity matrix (stored as a 3-column dataframe)
  similarity_matrix <- data.frame()
  for (i in seq(2, nrow(pts)))
    similarity_matrix <- similarity_matrix %>%
      rbind(seq(i - 1L) %>%
        purrr::map_dfr(~ list(
          src = i - 1L, dst = .x - 1L,
          similarity = guassian_similarity(pts[i,], pts[.x,])
        ))
      )

  similarity_matrix
}

library(sparklyr)

sc <- spark_connect(master = "local")
sdf <- copy_to(sc, gen_similarity_matrix())
clusters <- ml_power_iteration(
  sdf, k = 2, max_iter = 10, init_mode = "degree",
  src_col = "src", dst_col = "dst", weight_col = "similarity"
)

clusters %>% print(n = 160)
## # A tibble: 160 x 2
##        id cluster
##     <dbl>   <int>
##   1     0       1
##   2     1       1
##   3     2       1
##   4     3       1
##   5     4       1
##   ...
##   157   156       0
##   158   157       0
##   159   158       0
##   160   159       0

The output shows points from the two circles being assigned to separate clusters, as expected, after only a small number of PIC iterations.

spark_write_rds() + collect_from_rds()

spark_write_rds() and collect_from_rds() are implemented as a less memory- consuming alternative to collect(). Unlike collect(), which retrieves all elements of a Spark dataframe through the Spark driver node, hence potentially causing slowness or out-of-memory failures when collecting large amounts of data, spark_write_rds(), when used in conjunction with collect_from_rds(), can retrieve all partitions of a Spark dataframe directly from Spark workers, rather than through the Spark driver node. First, spark_write_rds() will distribute the tasks of serializing Spark dataframe partitions in RDS version 2 format among Spark workers. Spark workers can then process multiple partitions in parallel, each handling one partition at a time and persisting the RDS output directly to disk, rather than sending dataframe partitions to the Spark driver node. Finally, the RDS outputs can be re-assembled to R dataframes using collect_from_rds().

Shown below is an example of spark_write_rds() + collect_from_rds() usage, where RDS outputs are first saved to HDFS, then downloaded to the local filesystem with hadoop fs -get, and finally, post-processed with collect_from_rds():

library(sparklyr)
library(nycflights13)

num_partitions <- 10L
sc <- spark_connect(master = "yarn", spark_home = "/usr/lib/spark")
flights_sdf <- copy_to(sc, flights, repartition = num_partitions)

# Spark workers serialize all partition in RDS format in parallel and write RDS
# outputs to HDFS
spark_write_rds(
  flights_sdf,
  dest_uri = "hdfs://<namenode>:8020/flights-part-{partitionId}.rds"
)

# Run `hadoop fs -get` to download RDS files from HDFS to local file system
for (partition in seq(num_partitions) - 1)
  system2(
    "hadoop",
    c("fs", "-get", sprintf("hdfs://<namenode>:8020/flights-part-%d.rds", partition))
  )

# Post-process RDS outputs
partitions <- seq(num_partitions) - 1 %>%
  lapply(function(partition) collect_from_rds(sprintf("flights-part-%d.rds", partition)))

# Optionally, call `rbind()` to combine data from all partitions into a single R dataframe
flights_df <- do.call(rbind, partitions)

Dplyr-related improvements

Similar to other recent sparklyr releases, sparklyr 1.6 comes with a number of dplyr-related improvements, such as

  • Support for where() predicate within select() and summarize(across(…)) operations on Spark dataframes
  • Addition of if_all() and if_any() functions
  • Full compatibility with dbplyr 2.0 backend API

select(where(…)) and summarize(across(where(…)))

The dplyr where(…) construct is useful for applying a selection or aggregation function to multiple columns that satisfy some boolean predicate. For example,

library(dplyr)

iris %>% select(where(is.numeric))

returns all numeric columns from the iris dataset, and

library(dplyr)

iris %>% summarize(across(where(is.numeric), mean))

computes the average of each numeric column.

In sparklyr 1.6, both types of operations can be applied to Spark dataframes, e.g.,

library(dplyr)
library(sparklyr)

sc <- spark_connect(master = "local")
iris_sdf <- copy_to(sc, iris, name = random_string())

iris_sdf %>% select(where(is.numeric))

iris %>% summarize(across(where(is.numeric), mean))

if_all() and if_any()

if_all() and if_any() are two convenience functions from dplyr 1.0.4 (see here for more details) that effectively1 combine the results of applying a boolean predicate to a tidy selection of columns using the logical and/or operators.

Starting from sparklyr 1.6, if_all() and if_any() can also be applied to Spark dataframes, .e.g.,

library(dplyr)
library(sparklyr)

sc <- spark_connect(master = "local")
iris_sdf <- copy_to(sc, iris, name = random_string())

# Select all records with Petal.Width > 2 and Petal.Length > 2
iris_sdf %>% filter(if_all(starts_with("Petal"), ~ .x > 2))

# Select all records with Petal.Width > 5 or Petal.Length > 5
iris_sdf %>% filter(if_any(starts_with("Petal"), ~ .x > 5))

Compatibility with dbplyr 2.0 backend API

Sparklyr 1.6 is fully compatible with the newer dbplyr 2.0 backend API (by implementing all interface changes recommended in here), while still maintaining backward compatibility with the previous edition of dbplyr API, so that sparklyr users will not be forced to switch to any particular version of dbplyr.

This should be a mostly non-user-visible change as of now. In fact, the only discernible behavior change will be the following code

library(dbplyr)
library(sparklyr)

sc <- spark_connect(master = "local")

print(dbplyr_edition(sc))

outputting

[1] 2

if sparklyr is working with dbplyr 2.0+, and

[1] 1

if otherwise.

Acknowledgements

In chronological order, we would like to thank the following contributors for making sparklyr 1.6 awesome:

We would also like to give a big shout-out to the wonderful open-source community behind sparklyr, without whom we would not have benefitted from numerous sparklyr-related bug reports and feature suggestions.

Finally, the author of this blog post also very much appreciates the highly valuable editorial suggestions from @skeydan.

If you wish to learn more about sparklyr, we recommend checking out sparklyr.ai, spark.rstudio.com, and also some previous sparklyr release posts such as sparklyr 1.5 and sparklyr 1.4.

That is all. Thanks for reading!

Greenwald, Michael, and Sanjeev Khanna. 2001. “Space-Efficient Online Computation of Quantile Summaries.” SIGMOD Rec. 30 (2): 58–66. https://doi.org/10.1145/376284.375670.

Lin, Frank, and William Cohen. 2010. “Power Iteration Clustering.” In, 655–62.

  1. modulo possible implementation-dependent short-circuit evaluations↩︎

Categories
Offsites

sparklyr 1.6: weighted quantile summaries, power iteration clustering, spark_write_rds(), and more

Sparklyr 1.6 is now available on CRAN!

To install sparklyr 1.6 from CRAN, run

install.packages("sparklyr")

In this blog post, we shall highlight the following features and enhancements from sparklyr 1.6:

Weighted quantile summaries

Apache Spark is well-known for supporting approximate algorithms that trade off marginal amounts of accuracy for greater speed and parallelism. Such algorithms are particularly beneficial for performing preliminary data explorations at scale, as they enable users to quickly query certain estimated statistics within a predefined error margin, while avoiding the high cost of exact computations. One example is the Greenwald-Khanna algorithm for on-line computation of quantile summaries, as described in Greenwald and Khanna (2001). This algorithm was originally designed for efficient (epsilon)- approximation of quantiles within a large dataset without the notion of data points carrying different weights, and the unweighted version of it has been implemented as approxQuantile() since Spark 2.0. However, the same algorithm can be generalized to handle weighted inputs, and as sparklyr user @Zhuk66 mentioned in this issue, a weighted version of this algorithm makes for a useful sparklyr feature.

To properly explain what weighted-quantile means, we must clarify what the weight of each data point signifies. For example, if we have a sequence of observations ((1, 1, 1, 1, 0, 2, -1, -1)), and would like to approximate the median of all data points, then we have the following two options:

  • Either run the unweighted version of approxQuantile() in Spark to scan through all 8 data points

  • Or alternatively, “compress” the data into 4 tuples of (value, weight): ((1, 0.5), (0, 0.125), (2, 0.125), (-1, 0.25)), where the second component of each tuple represents how often a value occurs relative to the rest of the observed values, and then find the median by scanning through the 4 tuples using the weighted version of the Greenwald-Khanna algorithm

We can also run through a contrived example involving the standard normal distribution to illustrate the power of weighted quantile estimation in sparklyr 1.6. Suppose we cannot simply run qnorm() in R to evaluate the quantile function of the standard normal distribution at (p = 0.25) and (p = 0.75), how can we get some vague idea about the 1st and 3rd quantiles of this distribution? One way is to sample a large number of data points from this distribution, and then apply the Greenwald-Khanna algorithm to our unweighted samples, as shown below:

library(sparklyr)

sc <- spark_connect(master = "local")

num_samples <- 1e6
samples <- data.frame(x = rnorm(num_samples))

samples_sdf <- copy_to(sc, samples, name = random_string())

samples_sdf %>%
  sdf_quantile(
    column = "x",
    probabilities = c(0.25, 0.75),
    relative.error = 0.01
  ) %>%
  print()
##        25%        75%
## -0.6629242  0.6874939

Notice that because we are working with an approximate algorithm, and have specified relative.error = 0.01, the estimated value of (-0.6629242) from above could be anywhere between the 24th and the 26th percentile of all samples. In fact, it falls in the (25.36896)-th percentile:

pnorm(-0.6629242)
## [1] 0.2536896

Now how can we make use of weighted quantile estimation from sparklyr 1.6 to obtain similar results? Simple! We can sample a large number of (x) values uniformly randomly from ((-infty, infty)) (or alternatively, just select a large number of values evenly spaced between ((-M, M)) where (M) is approximately (infty)), and assign each (x) value a weight of (displaystyle frac{1}{sqrt{2 pi}}e^{-frac{x^2}{2}}), the standard normal distribution’s probability density at (x). Finally, we run the weighted version of sdf_quantile() from sparklyr 1.6, as shown below:

library(sparklyr)

sc <- spark_connect(master = "local")

num_samples <- 1e6
M <- 1000
samples <- tibble::tibble(
  x = M * seq(-num_samples / 2 + 1, num_samples / 2) / num_samples,
  weight = dnorm(x)
)

samples_sdf <- copy_to(sc, samples, name = random_string())

samples_sdf %>%
  sdf_quantile(
    column = "x",
    weight.column = "weight",
    probabilities = c(0.25, 0.75),
    relative.error = 0.01
  ) %>%
  print()
##    25%    75%
## -0.696  0.662

Voilà! The estimates are not too far off from the 25th and 75th percentiles (in relation to our abovementioned maximum permissible error of (0.01)):

pnorm(-0.696)
## [1] 0.2432144
pnorm(0.662)
## [1] 0.7460144

Power iteration clustering

Power iteration clustering (PIC), a simple and scalable graph clustering method presented in Lin and Cohen (2010), first finds a low-dimensional embedding of a dataset, using truncated power iteration on a normalized pairwise-similarity matrix of all data points, and then uses this embedding as the “cluster indicator”, an intermediate representation of the dataset that leads to fast convergence when used as input to k-means clustering. This process is very well illustrated in figure 1 of Lin and Cohen (2010) (reproduced below)

in which the leftmost image is the visualization of a dataset consisting of 3 circles, with points colored in red, green, and blue indicating clustering results, and the subsequent images show the power iteration process gradually transforming the original set of points into what appears to be three disjoint line segments, an intermediate representation that can be rapidly separated into 3 clusters using k-means clustering with (k = 3).

In sparklyr 1.6, ml_power_iteration() was implemented to make the PIC functionality in Spark accessible from R. It expects as input a 3-column Spark dataframe that represents a pairwise-similarity matrix of all data points. Two of the columns in this dataframe should contain 0-based row and column indices, and the third column should hold the corresponding similarity measure. In the example below, we will see a dataset consisting of two circles being easily separated into two clusters by ml_power_iteration(), with the Gaussian kernel being used as the similarity measure between any 2 points:

gen_similarity_matrix <- function() {
  # Guassian similarity measure
  guassian_similarity <- function(pt1, pt2) {
    exp(-sum((pt2 - pt1) ^ 2) / 2)
  }
  # generate evenly distributed points on a circle centered at the origin
  gen_circle <- function(radius, num_pts) {
    seq(0, num_pts - 1) %>%
      purrr::map_dfr(
        function(idx) {
          theta <- 2 * pi * idx / num_pts
          radius * c(x = cos(theta), y = sin(theta))
        })
  }
  # generate points on both circles
  pts <- rbind(
    gen_circle(radius = 1, num_pts = 80),
    gen_circle(radius = 4, num_pts = 80)
  )
  # populate the pairwise similarity matrix (stored as a 3-column dataframe)
  similarity_matrix <- data.frame()
  for (i in seq(2, nrow(pts)))
    similarity_matrix <- similarity_matrix %>%
      rbind(seq(i - 1L) %>%
        purrr::map_dfr(~ list(
          src = i - 1L, dst = .x - 1L,
          similarity = guassian_similarity(pts[i,], pts[.x,])
        ))
      )

  similarity_matrix
}

library(sparklyr)

sc <- spark_connect(master = "local")
sdf <- copy_to(sc, gen_similarity_matrix())
clusters <- ml_power_iteration(
  sdf, k = 2, max_iter = 10, init_mode = "degree",
  src_col = "src", dst_col = "dst", weight_col = "similarity"
)

clusters %>% print(n = 160)
## # A tibble: 160 x 2
##        id cluster
##     <dbl>   <int>
##   1     0       1
##   2     1       1
##   3     2       1
##   4     3       1
##   5     4       1
##   ...
##   157   156       0
##   158   157       0
##   159   158       0
##   160   159       0

The output shows points from the two circles being assigned to separate clusters, as expected, after only a small number of PIC iterations.

spark_write_rds() + collect_from_rds()

spark_write_rds() and collect_from_rds() are implemented as a less memory- consuming alternative to collect(). Unlike collect(), which retrieves all elements of a Spark dataframe through the Spark driver node, hence potentially causing slowness or out-of-memory failures when collecting large amounts of data, spark_write_rds(), when used in conjunction with collect_from_rds(), can retrieve all partitions of a Spark dataframe directly from Spark workers, rather than through the Spark driver node. First, spark_write_rds() will distribute the tasks of serializing Spark dataframe partitions in RDS version 2 format among Spark workers. Spark workers can then process multiple partitions in parallel, each handling one partition at a time and persisting the RDS output directly to disk, rather than sending dataframe partitions to the Spark driver node. Finally, the RDS outputs can be re-assembled to R dataframes using collect_from_rds().

Shown below is an example of spark_write_rds() + collect_from_rds() usage, where RDS outputs are first saved to HDFS, then downloaded to the local filesystem with hadoop fs -get, and finally, post-processed with collect_from_rds():

library(sparklyr)
library(nycflights13)

num_partitions <- 10L
sc <- spark_connect(master = "yarn", spark_home = "/usr/lib/spark")
flights_sdf <- copy_to(sc, flights, repartition = num_partitions)

# Spark workers serialize all partition in RDS format in parallel and write RDS
# outputs to HDFS
spark_write_rds(
  flights_sdf,
  dest_uri = "hdfs://<namenode>:8020/flights-part-{partitionId}.rds"
)

# Run `hadoop fs -get` to download RDS files from HDFS to local file system
for (partition in seq(num_partitions) - 1)
  system2(
    "hadoop",
    c("fs", "-get", sprintf("hdfs://<namenode>:8020/flights-part-%d.rds", partition))
  )

# Post-process RDS outputs
partitions <- seq(num_partitions) - 1 %>%
  lapply(function(partition) collect_from_rds(sprintf("flights-part-%d.rds", partition)))

# Optionally, call `rbind()` to combine data from all partitions into a single R dataframe
flights_df <- do.call(rbind, partitions)

Dplyr-related improvements

Similar to other recent sparklyr releases, sparklyr 1.6 comes with a number of dplyr-related improvements, such as

  • Support for where() predicate within select() and summarize(across(…)) operations on Spark dataframes
  • Addition of if_all() and if_any() functions
  • Full compatibility with dbplyr 2.0 backend API

select(where(…)) and summarize(across(where(…)))

The dplyr where(…) construct is useful for applying a selection or aggregation function to multiple columns that satisfy some boolean predicate. For example,

library(dplyr)

iris %>% select(where(is.numeric))

returns all numeric columns from the iris dataset, and

library(dplyr)

iris %>% summarize(across(where(is.numeric), mean))

computes the average of each numeric column.

In sparklyr 1.6, both types of operations can be applied to Spark dataframes, e.g.,

library(dplyr)
library(sparklyr)

sc <- spark_connect(master = "local")
iris_sdf <- copy_to(sc, iris, name = random_string())

iris_sdf %>% select(where(is.numeric))

iris %>% summarize(across(where(is.numeric), mean))

if_all() and if_any()

if_all() and if_any() are two convenience functions from dplyr 1.0.4 (see here for more details) that effectively1 combine the results of applying a boolean predicate to a tidy selection of columns using the logical and/or operators.

Starting from sparklyr 1.6, if_all() and if_any() can also be applied to Spark dataframes, .e.g.,

library(dplyr)
library(sparklyr)

sc <- spark_connect(master = "local")
iris_sdf <- copy_to(sc, iris, name = random_string())

# Select all records with Petal.Width > 2 and Petal.Length > 2
iris_sdf %>% filter(if_all(starts_with("Petal"), ~ .x > 2))

# Select all records with Petal.Width > 5 or Petal.Length > 5
iris_sdf %>% filter(if_any(starts_with("Petal"), ~ .x > 5))

Compatibility with dbplyr 2.0 backend API

Sparklyr 1.6 is fully compatible with the newer dbplyr 2.0 backend API (by implementing all interface changes recommended in here), while still maintaining backward compatibility with the previous edition of dbplyr API, so that sparklyr users will not be forced to switch to any particular version of dbplyr.

This should be a mostly non-user-visible change as of now. In fact, the only discernible behavior change will be the following code

library(dbplyr)
library(sparklyr)

sc <- spark_connect(master = "local")

print(dbplyr_edition(sc))

outputting

[1] 2

if sparklyr is working with dbplyr 2.0+, and

[1] 1

if otherwise.

Acknowledgements

In chronological order, we would like to thank the following contributors for making sparklyr 1.6 awesome:

We would also like to give a big shout-out to the wonderful open-source community behind sparklyr, without whom we would not have benefitted from numerous sparklyr-related bug reports and feature suggestions.

Finally, the author of this blog post also very much appreciates the highly valuable editorial suggestions from @skeydan.

If you wish to learn more about sparklyr, we recommend checking out sparklyr.ai, spark.rstudio.com, and also some previous sparklyr release posts such as sparklyr 1.5 and sparklyr 1.4.

That is all. Thanks for reading!

Greenwald, Michael, and Sanjeev Khanna. 2001. “Space-Efficient Online Computation of Quantile Summaries.” SIGMOD Rec. 30 (2): 58–66. https://doi.org/10.1145/376284.375670.

Lin, Frank, and William Cohen. 2010. “Power Iteration Clustering.” In, 655–62.

  1. modulo possible implementation-dependent short-circuit evaluations↩︎

Categories
Misc

Come Sale Away with GFN Thursday

GFN Thursday means more games for GeForce NOW members, every single week. This week’s list includes the day-and-date release of Spacebase Startopia, but first we want to share the scoop on some fantastic sales available across our digital game store partners that members will want to take advantage of this very moment. Discounts for All Read article >

The post Come Sale Away with GFN Thursday appeared first on The Official NVIDIA Blog.

Categories
Misc

NIH, NVIDIA Use AI to Trace COVID-19 Disease Progression in Chest CT Images

Researchers from the U.S. National Institutes of Health have collaborated with NVIDIA experts on an AI-accelerated method to monitor COVID-19 disease severity over time from patient chest CT scans.

Researchers from the U.S. National Institutes of Health have collaborated with NVIDIA experts on an AI-accelerated method to monitor COVID-19 disease severity over time from patient chest CT scans. 

Published today in Scientific Reports, this work studied the progression of lung opacities in chest CT images of COVID patients, and extracted insights about the temporal relationships between CT features and lab measurements. 

Quantifying CT opacities can tell doctors how severe a patient’s condition is. A better understanding of the progression of lung opacities in COVID patients could help inform clinical decisions in patients with pneumonia, and yield insights during clinical trials for therapies to treat the virus. 

Selecting a dataset of more than 100 sequential chest CTs from 29 COVID patients from China and Italy, the researchers used an NVIDIA Clara AI segmentation model to automate the time-consuming task of segmenting the total lung in each CT scan. Expert radiologists reviewed the total lung segmentations, and manually segmented the lung opacities. 

To track disease progression, the researchers used generalized temporal curves, which correlated the CT imaging data with lab measurements such as white blood cell count and procalcitonin levels. They then used 3D visualizations to reconstruct the evolution of COVID opacities in one of the patients. 

The team found that lung opacities appeared between one and five days before symptom onset, and peaked a day after symptoms began. They also analyzed two opacity subtypes — ground glass opacity and consolidation — and discovered that ground glass opacities appeared earlier in the disease, and persisted for a time after the resolution of the consolidation.  

In the paper, the researchers showed how CT dynamic curves could be used as a clinical reference tool for mild COVID-19 cases, and might help spot cases that grow more severe over time. These curves could also assist clinicians in identifying chronic lung effects by flagging cases where patients have residual opacities visible in CT scans long after other symptoms dissipate. 

This paper follows research published in Nature Communications, in which the team used deep learning to distinguish COVID-19 associated pneumonia from non-COVID pneumonia in chest scans. The deep learning models were developed using the NVIDIA Clara application framework for medical imaging, and are available for research use in the NGC catalog

Read the full paper in Scientific Reports. Download the models from NGC and visit our COVID-19 research hub for more. 

Learn more about NVIDIA’s work in healthcare at the GPU Technology Conference, April 12-16. Registration is free. The healthcare track includes 16 live webinars, 18 special events, and over 100 recorded sessions.

Subscribe to NVIDIA healthcare news

Categories
Offsites

Recursive Classification: Replacing Rewards with Examples in RL

A general goal of robotics research is to design systems that can assist in a variety of tasks that can potentially improve daily life. Most reinforcement learning algorithms for teaching agents to perform new tasks require a reward function, which provides positive feedback to the agent for taking actions that lead to good outcomes. However, actually specifying these reward functions can be quite tedious and can be very difficult to define for situations without a clear objective, such as whether a room is clean or if a door is sufficiently shut. Even for tasks that are easy to describe, actually measuring whether the task has been solved can be difficult and may require adding many sensors to a robot’s environment.

Alternatively, training a model using examples, called example-based control, has the potential to overcome the limitations of approaches that rely on traditional reward functions. This new problem statement is most similar to prior methods based on “success detectors”, and efficient algorithms for example-based control could enable non-expert users to teach robots to perform new tasks, without the need for coding expertise, knowledge of reward function design, or the installation of environmental sensors.

In “Replacing Rewards with Examples: Example-Based Policy Search via Recursive Classification,” we propose a machine learning algorithm for teaching agents how to solve new tasks by providing examples of success (e.g., if “success” examples show a nail embedded into a wall, the agent will learn to pick up a hammer and knock nails into the wall). This algorithm, recursive classification of examples (RCE), does not rely on hand-crafted reward functions, distance functions, or features, but rather learns to solve tasks directly from data, requiring the agent to learn how to solve the entire task by itself, without requiring examples of any intermediate states. Using a version of temporal difference learning — similar to Q-learning, but replacing the typical reward function term using only examples of success — RCE outperforms prior approaches based on imitation learning on simulated robotics tasks. Coupled with theoretical guarantees similar to those for reward-based learning, the proposed method offers a user-friendly alternative for teaching robots new tasks.

Top: To teach a robot to hammer a nail into a wall, most reinforcement learning algorithms require that the user define a reward function. Bottom: The example-based control method uses examples of what the world looks like when a task is completed to teach the robot to solve the task, e.g., examples where the nail is already hammered into the wall.

Example-Based Control vs Imitation Learning
While the example-based control method is similar to imitation learning, there is an important distinction — it does not require expert demonstrations. In fact, the user can actually be quite bad at performing the task themselves, as long as they can look back and pick out the small fraction of states where they did happen to solve the task.

Additionally, whereas previous research used a stage-wise approach in which the model first uses success examples to learn a reward function and then applies that reward function with an off-the-shelf reinforcement learning algorithm, RCE learns directly from the examples and skips the intermediate step of defining the reward function. Doing so avoids potential bugs and bypasses the process of defining the hyperparameters associated with learning a reward function (such as how often to update the reward function or how to regularize it) and, when debugging, removes the need to examine code related to learning the reward function.

Recursive Classification of Examples
The intuition behind the RCE approach is simple: the model should predict whether the agent will solve the task in the future, given the current state of the world and the action that the agent is taking. If there were data that specified which state-action pairs lead to future success and which state-action pairs lead to future failure, then one could solve this problem using standard supervised learning. However, when the only data available consists of success examples, the system doesn’t know which states and actions led to success, and while the system also has experience interacting with the environment, this experience isn’t labeled as leading to success or not.

Left: The key idea is to learn a future success classifier that predicts for every state (circle) in a trajectory whether the task will be solved in the future (thumbs up/down). Right: In the example-based control approach, the model is provided only with unlabeled experience (grey circles) and success examples (green circles), so one cannot apply standard supervised learning. Instead, the model uses the success examples to automatically label the unlabeled experience.

Nonetheless, one can piece together what these data would look like, if it were available. First, by definition, a successful example must be one that solves the given task. Second, even though it is unknown whether an arbitrary state-action pair will lead to success in solving a task, it is possible to estimate how likely it is that the task will be solved if the agent started at the next state. If the next state is likely to lead to future success, it can be assumed that the current state is also likely to lead to future success. In effect, this is recursive classification, where the labels are inferred based on predictions at the next time step.

The underlying algorithmic idea of using a model’s predictions at a future time step as a label for the current time step closely resembles existing temporal-difference methods, such as Q-learning and successor features. The key difference is that the approach described here does not require a reward function. Nonetheless, we show that this method inherits many of the same theoretical convergence guarantees as temporal difference methods. In practice, implementing RCE requires changing only a few lines of code in an existing Q-learning implementation.

Evaluation
We evaluated the RCE method on a range of challenging robotic manipulation tasks. For example, in one task we required a robotic hand to pick up a hammer and hit a nail into a board. Previous research into this task [1, 2] have used a complex reward function (with terms corresponding to the distance between the hand and the hammer, the distance between the hammer and the nail, and whether the nail has been knocked into the board). In contrast, the RCE method requires only a few observations of what the world would look like if the nail were hammered into the board.

We compared the performance of RCE to a number of prior methods, including those that learn an explicit reward function and those based on imitation learning , all of which struggle to solve this task. This experiment highlights how example-based control makes it easy for users to specify even complex tasks, and demonstrates that recursive classification can successfully solve these sorts of tasks.

Compared with prior methods, the RCE approach solves the task of hammering a nail into a board more reliably that prior approaches based on imitation learning [SQIL, DAC] and those that learn an explicit reward function [VICE, ORIL, PURL].

Conclusion
We have presented a method to teach autonomous agents to perform tasks by providing them with examples of success, rather than meticulously designing reward functions or collecting first-person demonstrations. An important aspect of example-based control, which we discuss in the paper, is what assumptions the system makes about the capabilities of different users. Designing variants of RCE that are robust to differences in users’ capabilities may be important for applications in real-world robotics. The code is available, and the project website contains additional videos of the learned behaviors.

Acknowledgements
We thank our co-authors, Ruslan Salakhutdinov and Sergey Levine. We also thank Surya Bhupatiraju, Kamyar Ghasemipour, Max Igl, and Harini Kannan for feedback on this post, and Tom Small for helping to design figures for this post.

Categories
Misc

GTC 21: Top 5 Arm Computing and Ecosystem Sessions

NVIDIA and Arm are working together to open new opportunities for partners, users, and developers, driving a new wave of computing around the world. Explore all the Arm accelerated computing and ecosystem sessions at GTC.

From powering the world’s largest supercomputers and cloud data centers, to edge devices on factory floors and city streets, the NVIDIA accelerated computing platform is used to help solve the world’s most challenging computational problems. 

NVIDIA and Arm are working together to open new opportunities for partners, users, and developers, driving a new wave of computing around the world. 

Explore all the Arm accelerated computing and ecosystem sessions at GTC. Here are a few key sessions you may be interested in. 

  1. A Vision for the Next Decade of Computing 

    AI, 5G, and the internet of things are sparking the world’s potential. And for many hardware engineers and software developers, these technologies will also become the challenge of their careers. The question is how to invisibly integrate the new intelligence everywhere by creating more responsive infrastructure that links people, processes, devices, and data seamlessly. Getting there will require architectural leaps, new partnerships, and plenty of creativity. Arm President Rene Haas will discuss the forces pushing these advances and how Arm’s global developer ecosystem will react to drive the next wave of compute.

    Speaker: Rene Haas, President, IP Products Group, Arm

  1. Introducing Developer Tools for Arm and NVIDIA Systems

    NVIDIA GPUs on Arm servers are here. In migrating to, or developing on, Arm servers with NVIDIA GPUs, developers using native code, CUDA, and OpenACC continue to need tools and toolchains to succeed and to get the most out of applications. We’ll explore the role of key tools and toolchains on Arm servers, from Arm, NVIDIA and elsewhere — and show how each tool fits in the end-to-end journey to production science and simulation.

    Speaker: Daniel Owens, Product Director, Infrastructure Software, Arm

  1. The Arm HPC User Group: An Open Community for Arm-Based Research and Engagement

    We’ll introduce the newly created Arm HPC User Group, which provides a forum for application developers, system integrators, tool vendors, and implementers to share their experiences. Learn about the history of Arm for HPC and see what plans the Arm HPC User Group has to engage with users and researchers over the coming year. You don’t need an in-depth technical knowledge of either Arm systems or HPC to attend or appreciate this talk.

    Speaker: Jeffrey Young, Senior Research Scientist, Georgia Tech

  1. HPC Applications on Arm and NVIDIA A100

    By design, HPC applications have radically different performance characteristics across domains of expertise. Achieving a balanced computing platform that addresses a breadth of HPC applications is a fundamental advance in the HPC state of the art. We demonstrate that Arm-based CPUs (such as the Ampere Altra), paired with NVIDIA GPUs (such as the NVIDIA A100), comprise a balanced, performant, and scalable supercomputing platform for any HPC application, whether CPU-bound, GPU-accelerated, or GPU-bound. We present the runtime performance profiles of representative applications from genomics.

    Speakers:
    Thomas Bradley, Director of Developer Technology at NVIDIA
    John Linford, Director of HPC Applications, Arm

  1. Scalable, Efficient, Software-Defined 5G-Enabled Edge Based on NVIDIA GPUs and Arm Servers 

    We’ll demonstrate a scalable, performance-optimized 5G-enabled edge cloud that’s based on Arm servers with NVIDIA GPUs. We’ll focus on fully software-defined 5G Distributed Unit (DU) with an NVIDIA GPU/Aerial-based PHY layer with the upper layers based on Ampere Altra server based on Arm Neoverse N1 CPU. We’ll cover the performance, scale, and power benefits of this architecture for a centralized radio access network architecture.

    Speakers:
    Anupa Kelkar, Product Manager, NVIDIA
    Mo Jabbari, Senior Segment Marketing Manager, Arm

Register today for free and start building your schedule. Once you are signed in, you can view all Arm sessions here. 

You can also explore all GTC conference topics here. Topics include areas of interest  such as GPU programming, HPC, deep learning, data science, and autonomous machines, or industries including healthcare, public sector, retail, and telecommunications.  

Categories
Misc

Learn How Industry Leaders Are Developing, Training and Testing AVs at GTC 2021

Autonomous vehicles are born in the data center, and at GTC 2021, attendees can learn exactly how high-performance compute is vital to developing, training, testing and validating the next generation of transportation. The NVIDIA GPU Technology Conference returns to the virtual stage April 12-16, featuring autonomous vehicle leaders in a range of talks, panels and […]

Autonomous vehicles are born in the data center, and at GTC 2021, attendees can learn exactly how high-performance compute is vital to developing, training, testing and validating the next generation of transportation.

The NVIDIA GPU Technology Conference returns to the virtual stage April 12-16, featuring autonomous vehicle leaders in a range of talks, panels and virtual networking events. Attendees will also have access to hands-on training for self-driving development and other deep learning topics. Registration is free of charge.

During GTC, NVIDIA experts as well as those from companies such as Ford, General Motors, Toyota, Uber and Lyft will be hosting sessions on developing and leveraging an AI infrastructure for safe autonomous vehicle development.

Data Center Development

The array of redundant and diverse deep neural networks that run in autonomous vehicles all begin development in the data center and continue to be iterated upon as the car learns new features and capabilities.

Bryan Goodman, senior technical leader of Ford’s AI Advancement center, Wadim Kehl, senior machine learning engineer at Toyota, and Norm Marks, global director of automotive business development at NVIDIA, will come together for a panel discussion on the challenges and best practices for scaling data center infrastructure for this type of comprehensive DNN development.

Additionally, experts will discuss how to use GPUs to enable the scale necessary for autonomous vehicle development. Sammy Sidhu, perception engineer at Lyft, Travis Addair, senior software engineer at Uber, Michael Del Balso, founder of Tecton.ai, and Manish Harsh, manager of developer relations at NVIDIA, will cover their experience in building the machine learning operations platforms for self-driving cars.

Validation in the Virtual World

Once self-driving DNNs are developed, they must undergo exhaustive testing and validation before they can operate in the real world. With simulation, these algorithms can experience millions of miles of eventful driving data in a fraction of the time and cost it would take to drive in the real world.

Nicolas Orand, senior director at R&D Autonomy, Klaus Lamberg, strategic product manager at dSpace, Blake Gasca, senior director of business development at SmartDrive, and Justyna Zander, head of AV verification and validation at NVIDIA, will discuss the simulation toolchain, from scenario databases and sensor modeling to full system validation.

GTC will also feature the latest in simulation technology, with Gavriel State, senior director of system software at NVIDIA, showcasing the NVIDIA DRIVE Sim platform on Omniverse, generating synthetic data to comprehensively train deep neural networks for autonomous vehicle applications.

AI at the Edge

Data center operations don’t end once algorithms are validated. These DNNs are continuously improving to deliver cutting edge capabilities.

Alexandra Baleta and Thomas Schmitt of VMWare will join Christophe Couvreur, vice president of product at Cerence, Sunil Samil, vice president of products at Akridata, and Dean Harris, automotive business development manager at NVIDIA, will share how they enable AI applications in autonomous vehicles, leveraging near edge compute infrastructure for scale and cost optimization.

Florian Baumann, CTO of automotive and AI at Dell, will cover the ways autonomous vehicle developers can leverage enterprise AI, data science, and big data analytics to optimize the self-driving car experience. 

Finally, General Motor’s Jayaraman Sivakumar and Brian Roginski, Tata Consultancy Services Head of Cognitive Business Sivakumar Shanmugam and Sean Young, NVIDIA Director BD Manufacturing, will discuss how GM has adopted virtualization to meet new demands for advanced engineering and design.

GTC will also include NVIDIA DRIVE Developer Days, running from April 20-22, which will consist of deep dive sessions on DRIVE end-to-end solutions.

Don’t miss out on the opportunity to learn from the premier experts in autonomous vehicle development — take advantage of free GTC registration today.

Categories
Misc

Explore Deploying and Optimizing Industrial-Scale AI at GTC

Many topics will be covered including solutions in computational fluid dynamics, predictive maintenance, inspection, and factory logistics across Industrial Manufacturing, Aerospace, Oil and Gas, Electronic Design Automation, Engineering Simulation, and more.

Industrial-Scale AI content is at GTC. From April 12-16, 1,400 live and on-demand sessions will be at your fingertips. Many topics will be covered including solutions in computational fluid dynamics, predictive maintenance, inspection, and factory logistics across Industrial Manufacturing, Aerospace, Oil and Gas, Electronic Design Automation (EDA), Engineering Simulation (CAE), and more. 

Free registration provides access to topic experts, meet-and-greet networking events, and a keynote loaded with announcements from NVIDIA CEO Jensen Huang. 

Here’s some of the top sessions for Industrial-scale AI at GTC21:

  • GE Renewable Energy: Advances in Renewable Energy: Enabling Our Decarbonized Energy Future with Technology Innovations and Smart Operations
  • Synopsys: GPU-Powered Order-of-Magnitude Speedup for IC Simulation
  • Cadence Design Systems: Accelerating PCB Layout Editor Using Modern GPU Architecture for Complex Designs
  • C3.ai: Transformer-Based Deep Learning for Asset Predictive Maintenance
  • Siemens: Physics-Informed Neural Network for Fluid-Dynamics Simulation and Design
  • BMW: A Simulation-First Approach in Leveraging Collaborative Robots for Inspecting BMW Vehicles
  • Data Monsters: Industrial Edge AI Challenges: Is Scaling Impossible?
  • Cascade Technologies: Leveraging GPUs for High-Throughput, High-Fidelity Flow Simulations

See more featured speakers and events on the Manufacturing topic page. If you have already registered, there are pre-selected playlists for you to scroll through and build out your GTC schedule. 

>> Register for free on the GTC website

Categories
Misc

Robotics at GTC: Jetson tutorials, AI in STEM, and Commercial Apps

If you’re looking for a curated list of Edge AI sessions at GTC, we’ve put together top sessions in each Robotics category.

From Jetson 101 fundamental walk-throughs, to technical deep dive tutorials, GTC is hosting over 1,400 sessions for all technical abilities and applications. Free registration provides access to topic experts, meet-and-greet networking events, and a keynote loaded with breakthrough announcements from NVIDIA CEO Jensen Huang. 

If you’re looking for a curated list of Edge AI sessions at GTC, we’ve put together top sessions in each Robotics category. 

Special Events

  • [CWES1134] Connect with Experts: All Things Jetson
    Join Jetson experts from various teams including product, system software, hardware, and AI/deep learning for an engaging discussion with you.
  • [SE3283/SE3258] Ask and Learn about NVIDIA Jetson with Us
    Do you have questions about DLI topics on Getting Started with Jetson Nano, Jetbot, and Hello AI World? Come to these office hours.
  • [CWES1963] Connect with Experts (EMEA): AI at the Edge for Autonomous Machines, Robotics, and IVA
    Developing solutions for vision, autonomous machines, or robotics? Share your questions with our experts.

NVIDIA-Run DIY Maker Sessions

  • [S32700] Jetson 101: Learning Edge AI Fundamentals
  • [S32750] Build Edge AI Projects with the Jetson Community
  • [S32354] Optimizing for Edge AI on Jetson
  • [S31824] Sim-to-Real in Isaac Sim

Robotics in Education and Research

  • [S32637­] Duckietown on NVIDIA Jetson: Hands-on AI in the classroom. (ETH Zurich)
  • [S32702] Hands-on deep learning robotics curriculum in high schools with Jetson Nano (CAVEDU)
  • Using Deep Learning and Simulation to Teach Robots Manipulation in Complex Environments (Dieter Fox, NVIDIA)
  • [S31905] Deep Learning Warm-Starts Grasp-Optimized Motion Planning (UC Berkeley)
  • [S31238] Robot Manipulator Joint Space Control via Deep Reinforcement Learning (NVIDIA)
  • [S31221] Improving Reinforcement Learning for Robot Manipulation via Composing Hierarchical Objecting-Centric Controllers (Carnegie Mellon University)

Commercial AI Applications

  • [S32588] A Mask-Detecting Smart Camera Using the Jetson Nano: The Developer Journey (Berkeley Design Technology, Inc)
  • [S31824] Sim-to-Real in Isaac Sim (NVIDIA)
  • [S32641] A New Kind of Collaboration: AI-Enabled Robotics and Humans Work Together to Automate Real-World Warehouse Tasks (Plus One Robotics)
  • [S32250] How AI is Revolutionizing Recycling: Practical Robotics at Scale (AMP Robotics)
  • [S31530] A Digital-Twin Use Case for Industrial Collaborative Robotics Applications Using Isaac Sim (Mondragon Unibertsitatea)

See more featured speakers and events on the Autonomous Machines/Robotics topic page. If you’re already registered, check out the pre-packaged playlists to get your schedule started. 

>> Register for free on the GTC website

Categories
Misc

AI-Powered Video Analytics at GTC: Making Physical Spaces Smarter And Safer

There’s a deep lineup of IVA sessions covering applications in smart spaces such as airports, railway transit hubs, smart traffic systems, and autonomous machines, with developer sessions for vision-AI optimization with Pre-trained models, DeepStream SDK, and Transfer Learning Toolkit.

Find out how to make our important physical spaces smarter using the most widely deployed IoT devices – video cameras.

NVIDIA GTC will be hosted on April 12-16. With over 1,400 breakthrough sessions for all technical levels, those registered have access to topic experts, networking events, and a front-row seat to NVIDIA CEO Jensen Huang’s keynote.

There’s a deep lineup of Intelligent Video Analytics sessions covering applications in smart spaces such as airports, railway transit hubs, smart traffic systems, and autonomous machines, with developer sessions for vision-AI optimization with Pre-trained models, DeepStream SDK, and Transfer Learning Toolkit.

Here are a few spotlight sessions to look out for:

  • [S32797] Train Smarter not Harder with NVIDIA Pre-trained models and Transfer Learning Toolkit 3.0
    Learn how the world’s top AI teams combine pre-trained models and transfer learning to supercharge their AI vision development,
  • [S32798] Bringing Scale and Optimization to Video Analytics Pipelines with NVIDIA DeepStream SDK
    This talk provides  a sneak peek at the next version of DeepStream. With all new intuitive GUI and development tools, it offers a zero-coding paradigm which further simplifies application development.
  • [CWES1127] Transfer Learning Toolkit and DeepStream SDK for Vision AI/Intelligent Video Analytics
    Get your questions answered on how to build and deploy vision AI applications for traffic engineering, parking management, sports analytics, retail, or smart workspaces for occupancy analytics and more.
  • [S31869] How Cities are Turning AI into Cost Savings
    Learn how the City of Raleigh, North Carolina, is building new AI-powered video analytics capabilities with ESRI’s ArcGIS into their traffic operations and turning real-time roadway insights into cost savings.
  • [S32032] Accelerating Azure Edge AI Vision Deployments
    Explore how GPU-accelerated model training and inference can span from the cloud to the edge, and how to leverage Azure Machine Learning and Live Video Analytics to create compelling solutions.
  • [S31845] AI-Enabled Video Analytics Improves Airline Operational Efficiency
    Get insights on how Seattle-Tacoma International Airport (SEA-TAC) is implementing AI video analytics to help improve overall airport operations.
  • [E31902] How AI Enabled Video Analytics Saves Lives and Money at Metropolitan Rail Networks
    Learn how AI-based video analytics solutions can be used to save money and increase safety operational efficiency in Metro rail networks with a case study from the UK rail industry.
  • [SS32770] Driving Operational Efficiency with NVIDIA Transfer Learning Toolkit, Pre-trained Models, and DeepStream SDK
    Learn how to build business value from Vision AI deployments using NVIDIA TLT, pre-trained models, and DeepStream SDK with ADLINK, including examples such as detecting loitering and intrusion
  • [SS33151] Designing AI Enabled Real-time Video Analytics at Scale
    Join experts from Quantiphi to learn how to address several engineering and costing challenges faced when going from an intelligent video analytics pilot to large-scale implementation.
  • [SS33127] Building Efficient and Intelligent Networks Using Network Edge AI Platform
    Lanner will partner with Tensor Network to discuss how NVIDIA AI can be structured in a networked approach where AI workloads can be distributed within the edge networks.

Check out additional speakers and sessions on the Intelligent Video Analytics topic page. Or, if you’re already registered, check out the pre-packaged playlists to get your schedule started.

>> Register for free on the GTC website

Image credit: Datafromsky