Categories
Misc

How to Build a Winning Recommendation System – Part 1

Recommender systems (RecSys) have become a key component in many online services, such as e-commerce, social media, news service, or online video streaming. However with their growth in importance,  the growth in scale of industry datasets, and more sophisticated models, the bar has been raised for computational resources required for recommendation systems.  After NVIDIA introduced Merlin … Continued

Recommender systems (RecSys) have become a key component in many online services, such as e-commerce, social media, news service, or online video streaming. However with their growth in importance,  the growth in scale of industry datasets, and more sophisticated models, the bar has been raised for computational resources required for recommendation systems. 

After NVIDIA introduced Merlin – a Framework for Deep Recommender Systems – to meet the computational demands for large-scale DL recommender systems, and a NVIDIA team won the ACM RecSys Challenge 2020,  now a NVIDIA team has won the  WSDM WebTour 21 Challenge organized by Booking.com.  The Booking.com challenge focused on predicting the last city destination for a traveler’s trip given their previous booking history within the trip. NVIDIA’s interdisciplinary team included colleagues from NVIDIA’s KGMON (Kaggle Grandmasters), NVIDIA’s RAPIDS (Data Science), and NVIDIA’s Merlin (Recommender Systems) who collaborated on the winning solution.

This post is the first of a three-part series that gives an overview of the NVIDIA team’s first place solution for the booking.com challenge. This first post gives an overview of recommender system concepts. The second post will discuss deep learning for recommender systems.  The third post will discuss the winning solution, the steps involved, and also what made a difference in the outcome.

What is a Recommendation System?

Recommender systems are trained to understand the preferences, previous decisions, and characteristics of people and products, using data gathered about their interactions, which include impressions, clicks, likes, and purchases. Recommender systems help solve information overload by helping users find relevant products from a wide range of selections by providing personalized content.  Because of their capability to predict consumer interests and desires on a highly personalized level, recommender systems are a favorite with content and product providers because they drive consumers to just about any product or service that interests them, from books to videos to health classes to clothing.

He image shows a user, items,  and a question mark representing which item to show the user.
Figure 1 A recommendation system filters items and only shows those most likely to induce an interaction.

Types of Recommendation Systems

Traditionally, recommender systems approaches could be divided into these broad categories:  collaborative filtering,  content filtering, and hybrid recommenders systems. More recently, some variations have been proposed to leverage explicitly the user context (context-aware recommendation), the sequence of user interactions (sequential recommendation) and the interactions of the current user session for next-click prediction (session-based recommendation).

Collaborative filtering algorithms recommend items (this is the filtering part) based on preference information from many users (this is the collaborative part). This approach uses similarity of user preference behavior,  given previous interactions between users and items, recommender algorithms learn to predict future interaction. These recommender systems build a model from a user’s past behavior, such as items purchased previously or ratings given to those items and similar decisions by other users. The idea is that if some people have made similar decisions and purchases in the past, like a movie choice, then there is a high probability they will agree on additional future selections. For example, if a collaborative filtering recommender knows you and another user share similar tastes in movies, it might recommend a movie to you that it knows this other user already likes.

The image shows a movie watched by similar users being recommended.
Figure 2: collaborative filtering recommends items based on how similar users liked the item.

Content filtering, by contrast, uses the attributes or features of an item  (this is the content part) to recommend other items similar to the user’s preferences. This approach is based on similarity of items and user features,  given information about a user and items they have interacted with, (e.g. a user’s demographics, like age or gender, the category of a restaurant’s cuisine, the average review for a movie), model the likelihood of a new interaction.  For example, if a content filtering recommender sees you liked the movies “You’ve Got Mail” and “Sleepless in Seattle,” it might recommend another movie to you with the same genres and/or cast, such as “Joe Versus the Volcano.”

The image shows a movie with features similar to what the user has watched before being recommended.
Figure 3: Content filtering recommends items with features similar to the users’ preferences.

Collaborative filtering is straightforward to apply, as it only requires as input the user id and item id for each interaction. However, it requires a minimum number of interactions by user and by item before starting to provide meaningful recommendations, which is characterized as the cold-start problem. On the other hand, as content-based filtering only leverages the interactions of each user, it deals nicely with the user cold-start problem. But it tends to create a filter bubble, recommending only items very similar to those the user has interacted with before.

Hybrid recommender systems combine the advantages of the types above to create a more comprehensive recommending system.

Session or sequence-based recommender systems use the sequence of user item interactions within a session in the recommendation process. Examples include predicting the next item in an online shopping cart, the next video to watch, or in the booking.com example, the next travel destination of a traveler.

Netflix spoke at NVIDIA GTC about making better recommendations by framing a recommendation as a contextual sequence prediction. Their approach uses a sequence of user actions, plus the current context, to predict the probability of the next action. In the Netflix example, given one sequence for each user—the country, device, date, and time when they watched a movie—they trained a model to predict what to watch next. 

The image shows a sequence of Netflix user context and movie watched and a question for  the next movie watched.
Figure 4: Netflix uses a sequence of contextual user actions, plus the current context, to predict the probability of the next movie a user will want to watch.

How Recommenders Work

Recommender systems are trained using data gathered about the users, items, and their interactions, which include impressions, clicks, likes, mentions, and so on. How a recommender model makes recommendations will depend on the type of data you have.  If you only have data about which interactions have occurred in the past, you’ll probably be interested in collaborative filtering. If you have data describing the user and items they have interacted with (e.g. a user’s age, the category of a restaurant’s cuisine, the average review for a movie), you can model the likelihood of a new interaction given these properties at the current moment by adding content and context filtering.

The image shows a recommender function using user and product data to rank products by user preference, to propose new products by product similarity to propose products by user’s similarity,  in order to predict a user rating.
Figure 5: Recommenders use data gathered about the users, items, and their interactions to rank products by user preference, and then propose new products by product similarity and or to propose products by user’s similarity.

Matrix Factorization for Recommendation

Matrix factorization (MF) techniques are the core of many popular algorithms, including word embedding and topic modeling, and have become a dominant methodology within the collaborative-filtering-based recommendations. MF can be used to calculate the similarity in user’s ratings or interactions to provide recommendations. In the simple user-item matrix below, Ted and Carol like movies B and C. Bob likes movie B. To recommend a movie to Bob, matrix factorization calculates that users who liked B also liked C, so C is a possible recommendation for Bob.

The images shows a user item matrix with users as rows, Items as columns and a user rating for an item as the cell value.
Figure 6: A user-item matrix with users as rows, Items as columns, and a user rating for an item as the cell value.

Matrix factorization using the  alternating least squares (ALS) algorithm  approximates the sparse user item rating matrix u-by-i as the product of two dense matrices, user and item factor matrices of size u × f and f × i  (where u is the number of users, i the number of items and f the number of latent features) . The factor matrices represent latent or hidden features which the algorithm tries to discover. One matrix tries to describe the latent or hidden features of each user, and one tries to describe latent properties of each movie. For each user and for each item, the ALS algorithm iteratively learns (f) numeric “factors” that represent the user or item. In each iteration, the algorithm alternatively holds one factor matrix fixed and optimizes for the other by minimizing the loss function with respect to the other. This process continues until it converges. 

The image shows 3 matrices, a sparse user item rating matrix u-by-i as the product of two dense matrices, user and item factor matrices of size u × f and f × i
Figure 7: Matrix factorization factors a sparse ratings matrix R (u-by-i) into a u-by-f matrix (U) and an f-by-i matrix (I ).

Conclusion

In this blog, we gave an overview of recommender system concepts and matrix factorization. In part two we will go over deep learning models for recommender systems and in part three we will go over the booking.com winning solution. To learn more, be sure to: 

Categories
Misc

GAN for All Seasons: AI-Generated Art Accompanies Pandemic Poetry in The Washington Post

A recent National Poetry Month feature in The Washington Post presented AI-generated artwork alongside five original poems reflecting on seasons of the past year. 

A recent National Poetry Month feature in The Washington Post presented AI-generated artwork alongside five original poems reflecting on seasons of the past year. 

Created by the Lede Lab — an experimental news team at The Post dedicated to exploring emerging technologies and new storytelling techniques — the artwork combined the output of machine learning models including NVIDIA StyleGAN2. Developed by NVIDIA Research, StyleGAN is a popular AI for high-res image generation that’s been adopted for art exhibits, manga illustrations and reimagined historical portraits.

Running on NVIDIA GPUs in the cloud, StyleGAN2 was trained on scanned images of brush strokes and palette knife textures painted by the group’s designer, Shikha Subramaniam.

The team also used the open-source AttnGAN model to create generative art that responded line by line to each of the five commissioned poems in the piece. Combined, the outputs from both models created a series of abstract videos to accompany the text.

As readers scroll through the interactive feature, the dynamic AI-generated artwork morphs to reflect each line of the poems — in one case shifting from colorful to monochrome and back again.

“Anxiously watching the coronavirus spread across the globe, we missed sharing so much with others, including the four seasons with their shifts in color and temperature,” wrote Suzette Moyer, senior design editor at The Post. The poems — authored by Mary Szybist, Dorianne Laux, Ada Limón, Kazim Ali and Willie Perdomo — are “hopeful works about the seasons we missed and the days we can look forward to.”

View the interactive piece in The Post >>

For more AI-inspired artwork, visit the AI Art Gallery featured at the recent NVIDIA GPU Technology Conference.

 

Categories
Misc

Implementing backprop in python and comparing it to tensorflow

submitted by /u/jben_hun
[visit reddit] [comments]

Categories
Misc

Crop bounding box from an image

import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices(‘GPU’)
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
from absl import app, flags, logging
from absl.flags import FLAGS
import core.utils as utils
from core.yolov4 import filter_boxes
from tensorflow.python.saved_model import tag_constants
from PIL import Image
import cv2
import numpy as np
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
flags.DEFINE_string(‘framework’, ‘tf’, ‘(tf, tflite, trt’)
flags.DEFINE_string(‘weights’, ‘./checkpoints/yolov4-416’,
‘path to weights file’)
flags.DEFINE_integer(‘size’, 416, ‘resize images to’)
flags.DEFINE_boolean(‘tiny’, False, ‘yolo or yolo-tiny’)
flags.DEFINE_string(‘model’, ‘yolov4’, ‘yolov3 or yolov4’)
flags.DEFINE_string(‘image’, ‘./data/kite.jpg’, ‘path to input image’)
flags.DEFINE_string(‘output’, ‘result.png’, ‘path to output image’)
flags.DEFINE_float(‘iou’, 0.45, ‘iou threshold’)
flags.DEFINE_float(‘score’, 0.25, ‘score threshold’)
def main(_argv):
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
STRIDES, ANCHORS, NUM_CLASS, XYSCALE = utils.load_config(FLAGS)
input_size = FLAGS.size
image_path = FLAGS.image
original_image = cv2.imread(image_path)
original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
# image_data = utils.image_preprocess(np.copy(original_image), [input_size, input_size])
image_data = cv2.resize(original_image, (input_size, input_size))
image_data = image_data / 255.
# image_data = image_data[np.newaxis, …].astype(np.float32)
images_data = []
for i in range(1):
images_data.append(image_data)
images_data = np.asarray(images_data).astype(np.float32)
if FLAGS.framework == ‘tflite’:
interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
interpreter.set_tensor(input_details[0][‘index’], images_data)
interpreter.invoke()
pred = [interpreter.get_tensor(output_details[i][‘index’]) for i in range(len(output_details))]
if FLAGS.model == ‘yolov3’ and FLAGS.tiny == True:
boxes, pred_conf = filter_boxes(pred[1], pred[0], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
else:
boxes, pred_conf = filter_boxes(pred[0], pred[1], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
else:
saved_model_loaded = tf.saved_model.load(FLAGS.weights, tags=[tag_constants.SERVING])
infer = saved_model_loaded.signatures[‘serving_default’]
batch_data = tf.constant(images_data)
pred_bbox = infer(batch_data)
for key, value in pred_bbox.items():
boxes = value[:, :, 0:4]
pred_conf = value[:, :, 4:]
boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
scores=tf.reshape(
pred_conf, (tf.shape(pred_conf)[0], -1, tf.shape(pred_conf)[-1])),
max_output_size_per_class=50,
max_total_size=50,
iou_threshold=FLAGS.iou,
score_threshold=FLAGS.score
)
pred_bbox = [boxes.numpy(), scores.numpy(), classes.numpy(), valid_detections.numpy()]
image = utils.draw_bbox(original_image, pred_bbox)
image = Image.fromarray(image.astype(np.uint8))
image.show()
image = cv2.cvtColor(np.array(image), cv2.COLOR_BGR2RGB)
cv2.imwrite(FLAGS.output, image)
if __name__ == ‘__main__’:
try:
app.run(main)
except SystemExit:
pass
PLZ HELP ME CROP THE BOUNDING BOX IN ORDER TO PERFORM A TESSERACT TO READ WHAT IS INSIDE THE BOUNDING BOX (DIGITS) .This is my work but it doesn’t crop

import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices(‘GPU’)
if len(physical_devices) > 0:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
from absl import app, flags, logging
from absl.flags import FLAGS
import core.utils as utils
from core.yolov4 import filter_boxes
from tensorflow.python.saved_model import tag_constants
from PIL import Image
import cv2
import numpy as np
import os
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
flags.DEFINE_string(‘framework’, ‘tf’, ‘(tf, tflite, trt’)
flags.DEFINE_string(‘weights’, ‘./checkpoints/yolov4-416’,
‘path to weights file’)
flags.DEFINE_integer(‘size’, 416, ‘resize images to’)
flags.DEFINE_boolean(‘tiny’, False, ‘yolo or yolo-tiny’)
flags.DEFINE_string(‘model’, ‘yolov4’, ‘yolov3 or yolov4’)
flags.DEFINE_string(‘image’, ‘./data/kite.jpg’, ‘path to input image’)
flags.DEFINE_string(‘output’, ‘result.png’, ‘path to output image’)
flags.DEFINE_float(‘iou’, 0.45, ‘iou threshold’)
flags.DEFINE_float(‘score’, 0.25, ‘score threshold’)
flags.DEFINE_boolean(‘crop’, False, ‘crop detections from images’)
def crop_objects (img, data, path){
boxes, scores = data
class_name = “Compteur”
# get box coords
xmin, ymin, xmax, ymax = boxes[i]
# crop detection from image (take an additional 5 pixels around all edges)
cropped_img = img[int(ymin)-5:int(ymax)+5, int(xmin)-5:int(xmax)+5]
# construct image name and join it to path for saving crop properly
img_name = class_name +’.png’
img_path = os.path.join(path, img_name )
# save image
cv2.imwrite(img_path, cropped_img)
}
# helper function to convert bounding boxes from normalized ymin, xmin, ymax, xmax —> xmin, ymin, xmax, ymax
def format_boxes(bboxes, image_height, image_width):
for box in bboxes:
ymin = int(box[0] * image_height)
xmin = int(box[1] * image_width)
ymax = int(box[2] * image_height)
xmax = int(box[3] * image_width)
box[0], box[1], box[2], box[3] = xmin, ymin, xmax, ymax
return bboxes
def draw_bbox(image, bboxes, info = False, counted_classes = None, show_label=True, allowed_classes=list(read_class_names(cfg.YOLO.CLASSES).values()), read_plate = False):
classes = read_class_names(cfg.YOLO.CLASSES)
num_classes = len(classes)
image_h, image_w, _ = image.shape
hsv_tuples = [(1.0 * x / num_classes, 1., 1.) for x in range(num_classes)]
colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))
random.seed(0)
random.shuffle(colors)
random.seed(None)
out_boxes, out_scores, out_classes, num_boxes = bboxes
for i in range(num_boxes):
if int(out_classes[i]) < 0 or int(out_classes[i]) > num_classes: continue
coor = out_boxes[i]
fontScale = 0.5
score = out_scores[i]
class_ind = int(out_classes[i])
class_name = classes[class_ind]
if class_name not in allowed_classes:
continue
else:
if read_plate:
height_ratio = int(image_h / 25)
plate_number = recognize_plate(image, coor)
if plate_number != None:
cv2.putText(image, plate_number, (int(coor[0]), int(coor[1]-height_ratio)),
cv2.FONT_HERSHEY_SIMPLEX, 1.25, (255,255,0), 2)
bbox_color = colors[class_ind]
bbox_thick = int(0.6 * (image_h + image_w) / 600)
c1, c2 = (coor[0], coor[1]), (coor[2], coor[3])
cv2.rectangle(image, c1, c2, bbox_color, bbox_thick)
if info:
print(“Object found: {}, Confidence: {:.2f}, BBox Coords (xmin, ymin, xmax, ymax): {}, {}, {}, {} “.format(class_name, score, coor[0], coor[1], coor[2], coor[3]))
if show_label:
bbox_mess = ‘%s: %.2f’ % (class_name, score)
t_size = cv2.getTextSize(bbox_mess, 0, fontScale, thickness=bbox_thick // 2)[0]
c3 = (c1[0] + t_size[0], c1[1] – t_size[1] – 3)
cv2.rectangle(image, c1, (np.float32(c3[0]), np.float32(c3[1])), bbox_color, -1) #filled
cv2.putText(image, bbox_mess, (c1[0], np.float32(c1[1] – 2)), cv2.FONT_HERSHEY_SIMPLEX,
fontScale, (0, 0, 0), bbox_thick // 2, lineType=cv2.LINE_AA)
if counted_classes != None:
height_ratio = int(image_h / 25)
offset = 15
for key, value in counted_classes.items():
cv2.putText(image, “{}s detected: {}”.format(key, value), (5, offset),
cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0, 255, 0), 2)
offset += height_ratio
return image

def main(_argv):
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
STRIDES, ANCHORS, NUM_CLASS, XYSCALE = utils.load_config(FLAGS)
input_size = FLAGS.size
image_path = FLAGS.image
original_image = cv2.imread(image_path)
original_image = cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB)
# image_data = utils.image_preprocess(np.copy(original_image), [input_size, input_size])
image_data = cv2.resize(original_image, (input_size, input_size))
image_data = image_data / 255.
# image_data = image_data[np.newaxis, …].astype(np.float32)
images_data = []
for i in range(1):
images_data.append(image_data)
images_data = np.asarray(images_data).astype(np.float32)
if FLAGS.framework == ‘tflite’:
interpreter = tf.lite.Interpreter(model_path=FLAGS.weights)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
print(input_details)
print(output_details)
interpreter.set_tensor(input_details[0][‘index’], images_data)
interpreter.invoke()
pred = [interpreter.get_tensor(output_details[i][‘index’]) for i in range(len(output_details))]
if FLAGS.model == ‘yolov3’ and FLAGS.tiny == True:
boxes, pred_conf = filter_boxes(pred[1], pred[0], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
else:
boxes, pred_conf = filter_boxes(pred[0], pred[1], score_threshold=0.25, input_shape=tf.constant([input_size, input_size]))
else:
saved_model_loaded = tf.saved_model.load(FLAGS.weights, tags=[tag_constants.SERVING])
infer = saved_model_loaded.signatures[‘serving_default’]
batch_data = tf.constant(images_data)
pred_bbox = infer(batch_data)
for key, value in pred_bbox.items():
boxes = value[:, :, 0:4]
pred_conf = value[:, :, 4:]
boxes, scores, classes, valid_detections = tf.image.combined_non_max_suppression(
boxes=tf.reshape(boxes, (tf.shape(boxes)[0], -1, 1, 4)),
scores=tf.reshape(
pred_conf, (tf.shape(pred_conf)[0], -1, tf.shape(pred_conf)[-1])),
max_output_size_per_class=50,
max_total_size=50,
iou_threshold=FLAGS.iou,
score_threshold=FLAGS.score
)
# format bounding boxes from normalized ymin, xmin, ymax, xmax —> xmin, ymin, xmax, ymax
original_h, original_w, _ = original_image.shape
bboxes = format_boxes(boxes.numpy()[0], original_h, original_w)

# hold all detection data in one variable
pred_bbox = [bboxes, scores.numpy()[0], classes.numpy()[0], valid_detections.numpy()[0]]
image = utils.draw_bbox(original_image, pred_bbox)
# image = utils.draw_bbox(image_data*255, pred_bbox)
image = Image.fromarray(image.astype(np.uint8))
image.show()
image = cv2.cvtColor(np.array(image), cv2.COLOR_BGR2RGB)
cv2.imwrite(FLAGS.output, image)
# if crop flag is enabled, crop each detection and save it as new image
if FLAGS.crop:
crop_path = os.path.join(os.getcwd(), ‘detections’, ‘crop’, image_name)
try:
os.mkdir(crop_path)
except FileExistsError:
pass
crop_objects(cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB), pred_bbox, crop_path)
if __name__ == ‘__main__’:
try:
app.run(main)
except SystemExit:
pass

submitted by /u/artificialYolov4
[visit reddit] [comments]

Categories
Misc

Why could I be getting high validation loss after loading a model?

I’m basically following the fine tuning instructions for efficientnet here. I use the model without top weights, freezing the rest, and train the model some, training and validation loss both ended around 1.7, and both accuracies were around 0.34. Then I saved the model, and loaded it again, and unfreezed the top 20 layers except for batch normalization ones. When I start training again, training loss is going down, between 1 and 2, and training accuracy is going up. But validation loss is in the hundreds, going up and down, and validation accuracy is oscillating between values of 0.01-0.3. Sometimes the validation accuracy goes down but the loss goes down also. Any ideas why the validation loss would be so high? I’m using ImageDataGenerator with a validation split of 0.2, and train_datagen.flow_from_dataframe with subsets for training and validation for each, for each training run. I saved the model with h5 format.

submitted by /u/Sea_Ad5023
[visit reddit] [comments]

Categories
Misc

cuSPARSELt v0.1.0 Now Available: Arm and Windows Support

NVIDIA announced the availability of cuSPARSELt version 0.1.0. This software can be downloaded now free for members of the NVIDIA Developer Program.

Today, NVIDIA is announcing the availability of cuSPARSELt version 0.1.0. This software can be downloaded now free for members of the NVIDIA Developer Program.

Download Now

What’s New

  • Support for Window 10 (x86_64)
  • Support for Linux ARM
  • Introduced SM 8.6 Compatibility
  • Support for TF32 compute type
  • Better performance for SM 8.0 kernels (up to 90% SOL)
  • Position independent sparseA / sparseB
  • New APIs for compression and pruning
    • Decoupled from cusparseLtMatmulPlan_t

See the cuSPARSELt Release Notes for more information

About cuSPARSELt

NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:

D=alpha op(A) cdot op(B) + beta op(C)

In this formula, op(A) and op(B) refer to in-place operations such as transpose/non-transpose.

The cuSPARSELt APIs allow flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.

Key features:

  • NVIDIA Sparse MMA tensor core support
  • Mixed-precision computation support:
    • FP16 input/output, FP32 Tensor Core accumulate
    • BFLOAT16 input/output, FP32 Tensor Core accumulate
    • INT8 input/output, INT32 Tensor Core compute
    • FP32 input/output, TF32 Tensor Core compute
    • TF32 input/output, TF32 Tensor Core compute
  • Matrix pruning and compression functionalities
  • Auto-tuning functionality (see cusparseLtMatmulSearch())

Learn more:

Recent Developer Blog posts:

Categories
Misc

Similarity in Graphs: Jaccard Versus the Overlap Coefficient

There is a wide range of graph applications and algorithms that I hope to discuss through this series of blog posts, all with a bias toward what is in RAPIDS cuGraph. I am assuming that the reader has a basic understanding of graph theory and graph analytics. If there is interest in a graph analytic … Continued

This post was originally published on the RAPIDS AI Blog.

There is a wide range of graph applications and algorithms that I hope to discuss through this series of blog posts, all with a bias toward what is in RAPIDS cuGraph. I am assuming that the reader has a basic understanding of graph theory and graph analytics. If there is interest in a graph analytic primer, please leave me a comment below. It should also be noted that I approach graph analysis from a social network perspective and tend to use the social science theory and terms, but I have been trying to use ‘vertex’ rather than ‘node.’

Every RAPIDS cuGraph 0.7 release adds new features. Of interest to this discussion is the expansion of the Jaccard Similarity metric to allow for comparisons of any pair of vertices, and the addition of the Overlap Coefficient algorithm. Those two algorithms fall into the category of similarity metrics and lead to the topic of this blog, which is to discuss the difference between the two algorithms and why I think one is better than the other. Let’s start with a quick introduction to the similarity metrics (warning math ahead).

The Jaccard Similarity, also called the Jaccard Index or Jaccard Similarity Coefficient, is a classic measure of similarity between two sets that was introduced by Paul Jaccard in 1901. Given two sets, A and B, the Jaccard Similarity is defined as the size of the intersection of set A and set B (i.e. the number of common elements) over the size of the union of set A and set B (i.e. the number of unique elements).

Figure 1. The Jaccard Similarity Metric.

The Overlap Coefficient, also known as the Szymkiewicz–Simpson coefficient, is defined as the size of the union of set A and set B over the size of the smaller set between A and B.

Figure 2. The Overlap Coefficient Metric.

When applying either of the similarity metrics in a graph setting, the sets are typically comprised of the neighbors of the vertex pair being compared. The neighbors of a vertex v, in a graph (V,E) is defined as the set, U, of vertices connected by way of an edge to vertex v, or N(v) = {U} where v ∈V and ∀ u ∈ U ∃ edge(v,u) ∈ E. Computing the size of the union, | A U B |, can be computationally inexpensive since we only want the size and not the actual elements. The size of the union| A U B | can be computed with |A| + |B| — | A intersect B |.

Figure 3. Efficient Jaccard Computation.

There is a wide range of applications for similarity scoring and it is important to cover a few of them before getting into comparing the two algorithms. Let’s start with something that I’m sure a lot of the reader are familiar with, and that is recommending people to connect with on social media. What I am going to present is a very simplistic approach to the problem — most social networking sites use a much more advanced version that usually includes some type of community detection. But first, even more background …

Figure 4: Triadic Closure.

Within the field of social network analysis, there is the concept of Triadic Closure, which was first introduced by sociologists Georg Simmel in 1908. Given three people, A, B, and C, see Figure 1; if A and C are friends (connected), and B and C are friends, then there is a high probability that A and B will connect. That probability is so high that Granovetter, in his 1973 work on weak-ties, deemed the missing link as the “forbidden triad”, which meant that for Granovetter’s application that you could infer a connection between A and B. For our recommendation application, this means that we need to find those unconnected A — B pairs since there is a high change that those users will become friends — it is always good to recommend something that the user will accept.

Now back to the application: the basic process starts by first computing the similarity metric (Jaccard or Overlap Coefficient) for all vertex pairs connected by an edge. Then for a given vertex (example, vertex A) find their neighbors with the highest similar score and recommend neighbors of B that are missing from A. As mentioned, this is a very simple view since there is a range of options for applying additional weights, like only looking within community clusters, that could be added to better select recommendations that will be accepted. The application of triadic closure and similarity should be apparent to anyone that uses social media since those tools remind you constantly that you should connect to a friend of friends. Note that this approach does not work for vertices with a single connection, also called a satellite, since their similarity score will be zero. But for satellites it is easy to just recommend all connections from their sole neighbor.

The problem with the previous approach is that not all recommendations can come from solely looking at directly connected vertex pairs. Therefore, being able to compute similarity scores between any pair of vertices is important. This was a limitation of the initial cuGraph Jaccard implementation and what has been addressed in cuGraph release 0.7. Consider figure 2 below. Looking at the Jaccard similarity score between connected vertices A and B, the neighbors of A are {B, C}, and for B are {A, D}. Hence the Jaccard score is js(A, B) = 0 / 4 = 0.0. Even the Overlap Coefficient yields a similarity of zero since the size of the intersection is zero. Now looking at the similarity between A and D, where both share the exact same set of neighbors. The Jaccard Similarity between A and D is 2/2 or 1.0 (100%), likewise the Overlap Coefficient is 1.0 size in this case the union size is the same as the minimal set size.

Figure 5: Non-connected Vertex Pair Similarity.

Figure 6: Bipartite Graph.

Continuing with the social network recommender example, the application should recommend that vertices A and D connect since they share the same set of neighbors. But non-connected vertex pair similarity is used in other applications as well. Consider a product recommendation system that is using a bipartite graph. One vertex type is Users and the other type is Product. The goal is to not find similarities between Users and Products as that is a different analytic but to find similarities between Users and other Users so that additional products can be recommended. The process is similar to that mentioned above; the vertices being compared are just not directly connected. Looking at the example figure (Figure 3) and focusing on User 1. User 1 neighbors are {A, B}, User 2 are {A, B, D} , and User 3 is {B. D}. The Jaccard Similarity in this case is 1-to-2 = 2 / 3 = 0.66, and between 1 and 3 = 1 / 3 = 0.33. Since User 1 and 2 both purchased products A and B, the application should recommend to User 1 that they also purchase product D.

Why I prefer the overlap coefficient

If the Jaccard similarity score is so useful, why introduce the Overlap Coefficient in cuGraph? Let’s look at the same example but using the Overlap Coefficient. Comparing User 1 to User 2 = 2 /2 = 1.0. And comparing User 1 to User 3 = 1 /2 = 0.5. The similarity between User 1 and User 2 is still the highest, but the fact the score is 1.0 indicated that the set of neighbors of User A is a complete subset of User 2. That type of insight is one of the benefits of the Overlap Coefficient.

In my opinion, the Jaccard Similarity is a very powerful analysis technique, but it has a major drawback when the two sets being compared have different sizes. Consider two sets, A and B, where both sets contain 100 elements. Now assume that 50 of those elements are common across the two sets. The Jaccard Similarity is js(A, B) = 50 / (100 + 100 – 50 ) = 0.33. Now if we increase set A by 10 elements and decrease set B by the same amount, all while maintaining 50 elements in common, the Jaccard Similarity remains the same. And there is where I think Jaccard fails: it has no sensitivity to the sizes of the sets. The following figure highlights how the Jaccard and Overlap Coefficient change as the set sizes are change but the intersection size remains that same.

Figure 7: Similarity Scores as Set Sizes Change.

The use of the smaller set size as the denominator makes it so that the score provides an indication of how much of the smaller set is within the larger. That provides insight into whether one set is an exact subset of the larger set. Look back at the example described above, and illustrated above, using the Overlap Coefficient it is easy to see to what degree set B is contained within set A.

Conclusion

Jaccard might be better known than the Overlap Coefficient and that might play into why Jaccard is more widely used. The unfamiliarity with Overlap Coefficient might explain why it is not in the NetworkX package. Nevertheless, in my opinion, the Overlap Coefficient can provide better insight into how similar two vertices are — really, how similar the set of neighbors are. By knowing the sizes of each set, an analyst can easily know if one set is a proper subset (full contained) in the other set, which is something that is not apparent using Jaccard.

I also think there is a fundamental flaw in how we derive the set for similarity computation when the vertex pairs are connected by an edge. Consider the 5-clique shown below, figure 5. Since every vertex is connected to every other vertex, I (you) would assume that the similarity scores would be 1.0, exact similarity. However, because of the way the sets are created, for both Jaccard and the Overlap Coefficient, the score for vertex pairs connected by an edge can never be equal to 1.0.

Figure 8: Five Clique.

Let’s compare vertex 1 to vertex 2. The neighbors of 1 are {2, 3, 4, 5} and the neighbors of 2 are {1, 3, 4, 5}. The Jaccard score is then: 3 / 5 or 0.6. The Overlap Coefficient is 3 /4 or 0.75.

While those similarity scores are mathematically correct according to the algorithm, the resulting similarity score do not match what I would except for a clique. The issue is that the vertex pairs being compared are reflected in the neighborhood sets. Rephrasing that statement, vertex 1 appears in vertex 2’s neighbor set. Likewise, vertex 2 appears in vertex 1’s neighbor. The fact that the vertices being compared appear in the associative set prevents the sets from ever matching. In my opinion, the vertices being compared should not be part of the sets being evaluated.

A solution to this problem would be to build the sets differently or modify the similarity algorithm. The algorithm modification might be computationally easier. The change would be to subtract 1 from the size of each set on the union, not the intersection.

Figure 9: Modified Jaccard and Overlap Coefficient for Connected Vertex Pairs.

For the clique example, the similarity scores would then be:

js(1, 2) = 3 / (5–2) = 3/3 = 1.0. and 

oc(1, 2) = 3 / (4–1) — 3/3 = 1.0.

Now, it could be that the results are correct and that my expectation are wrong. But this is just my opinion 🙂

About me

Brad Rees leads the RAPIDS cuGraph team at NVIDIA where he directs and develops graph analytic solutions. He has been designing, implementing, and supporting a variety of advanced software and hardware systems for over 30 years. Brad specializes in complex analytic systems, primarily using graph analytic techniques for social and cyber network analysis, and has been working on variety of advanced software and hardware systems for over 30 years. His technical interests are in HPC, machine learning, deep learning, and graph. Brad has a Ph.D. in Computer Science from the Florida Institute of Technology.

Some references

M. S. Granovetter, “The strength of weak ties,” The American Journal of Sociology, vol. 78, no. 6, pp. 1360–1380, 1973.

Thanks to Corey Nolet.

Categories
Offsites

HDR+ with Bracketing on Pixel Phones

We’re continuously working to improve the Pixel — making it more helpful, more capable, and more fun — with regular updates, such as the recent V8.2 update to the Camera app. One such improvement (launched on Pixel 5 and Pixel 4a 5G in October) is a feature that operates “under the hood”, HDR+ with Bracketing. This feature works by merging images taken with different exposure times to improve image quality (especially in shadows), resulting in more natural colors, improved details and texture, and reduced noise.

Why Are HDR Scenes Hard to Capture?
The original HDR+ burst photography system is the engine behind high-quality mobile photography, which captures a rapid series of deliberately underexposed images, then combines and renders them in a way that preserves detail across the range of tones. But this system had one limitation: scenes with high dynamic range (HDR) like the one below were noisy in the shadows because all images captured are underexposed.

The same photo using HDR+ (red outline) and HDR+ with Bracketing (green outline). While the characteristic HDR+ look remains the same, bracketing improves image quality, especially in shadows, with more natural colors, improved details and texture, and reduced noise.

Capturing HDR scenes is difficult because of the physical constraints of image sensors combined with limited signal in the shadows. We can correctly expose either the shadows or the highlights, but not both at the same time.

The same scene shot with different exposure settings and tonemapped to similar overall brightness. Left/Top: Exposure set for the highlights. The bright blue sky is preserved, but the shadows are very noisy. Right/Bottom: Exposure set for the shadows. Noise in the shadows is reduced, but the sky is clipped (white).

Photographers sometimes work around these limitations by taking two different exposures and combining them. This approach, known as exposure bracketing, can deliver the best of both worlds, but it is time-consuming to do by hand. It is also challenging in computational photography because it requires:

  1. Capturing additional long exposure frames while maintaining the fast, predictable capture experience of the Pixel camera.
  2. Taking advantage of long exposure frames while avoiding ghosting artifacts caused by motion between frames.

To avoid these challenges, the original HDR+ system used a different approach to handle high dynamic range scenes.

The Limits of HDR+
The capture strategy used by HDR+ is based on underexposure, which avoids loss of detail in the highlights. While this strategy comes at the expense of noise in the shadows, HDR+ offsets the increased noise through the use of burst photography.

Using bursts to improve image quality. HDR+ starts from a burst of full-resolution raw images (left). Depending on conditions, between 2 and 15 images are aligned and merged into a computational raw image (middle). The merged image has reduced noise and increased dynamic range, leading to a higher quality final result (right).

This approach works well for scenes with moderate dynamic range, but breaks down for HDR scenes. To understand why, we need to take a closer look at how two types of noise get into an image.

Noise in Burst Photography
One important type of noise is called shot noise, which depends only on the total amount of light captured — the sum of N frames, each with E seconds of exposure time has the same amount of shot noise as a single frame exposed for N × E seconds. If this were the only type of noise present in captured images, burst photography would be as efficient as taking longer exposures. Unfortunately, a second type of noise, read noise, is introduced by the sensor every time a frame is captured. Read noise doesn’t depend on the amount of light captured but instead depends on the number of frames taken — that is, with each frame taken, an additional fixed amount of read noise is added.

This is why using burst photography to reduce total noise isn’t as efficient as simply taking longer exposures: taking multiple frames can reduce the effect of shot noise, but will also increase read noise. Even though read noise increases with the number of frames, it is still possible to reduce the overall noisiness with burst photography, but it becomes less efficient. If one were to break a long exposure into N shorter exposures, the ratio of signal to noise in the final image would be lower because of the additional read noise. In this case, to get back to the signal-to-noise ratio in the single long exposure, one would need to merge N2 short-exposure frames. In the example below, if a long exposure were divided into 12 short exposures, we’d have to capture 144 (12 × 12) short frames to match the signal-to-noise ratio in the shadows! Capturing and processing this many frames would be much more time consuming — burst capture and processing could take over a minute and result in a poor user experience. Instead, with bracketing one can capture both short and long exposures — combining highlight protection and noise reduction.

Left: The result of merging 12 short-exposure frames in Night Sight mode. Right: A single frame whose exposure time is 12 times longer than an individual short exposure. The longer exposure has significantly less noise in the shadows but sacrifices the highlights.

Solving with Bracketing
While the challenges of bracketing prevented the original HDR+ system from using it, incremental improvements since then, plus a recent concentrated effort, have made it possible in the Camera app. To start, adding bracketing to HDR+ required redesigning the capture strategy. Capturing is complicated by zero shutter lag (ZSL), which underpins the fast capture experience on Pixel. With ZSL, the frames displayed in the viewfinder before the shutter press are the frames we use for HDR+ burst merging. For bracketing, we capture an additional long exposure frame after the shutter press, which is not shown in the viewfinder. Note that holding the camera still for half a second after the shutter press to accommodate the long exposure can help improve image quality, even with a typical amount of handshake.

Capture strategy. Top: The original HDR+ method captures short exposures before the shutter press, six in this example. Bottom: HDR+ with Bracketing captures five short exposures before the shutter press and one long exposure after the shutter press.

For Night Sight, the capture strategy isn’t constrained by the viewfinder — because all frames are captured after the shutter press while the viewfinder is stopped, this mode easily accommodates capturing longer exposure frames. In this case, we capture three long exposures to further reduce noise.

Capture strategy for Night Sight. Top: The original Night Sight captured 15 short exposure frames. Bottom: Night Sight with bracketing captures 12 short and 3 long exposures.

The Merging Algorithm
When merging bracketed shots, we choose one of the short frames as the reference frame to avoid potentially clipped highlights and motion blur. All other frames are aligned to this frame before they are merged. This introduces a challenge — for complex scene motion or occluded regions, it is impossible to find exactly matching regions and a naïve merge algorithm would produce ghosting artifacts in these cases.

Left: Ghosting artifacts are visible around the silhouette of a moving person, when deghosting is disabled.
Right: Robust merging produces a clean image.

To address this, we designed a new spatial merge algorithm, similar to the one used for Super Res Zoom, that decides per pixel whether image content should be merged or not. This deghosting is more complicated for frames with different exposures. Long exposure frames have different noise characteristics, clipped highlights, and different amounts of motion blur, which makes comparisons with the short exposure reference frame more difficult. In addition, ghosting artifacts are more visible in bracketed shots, because noise that would otherwise mask these errors is reduced. Despite those challenges, our algorithm is as robust to these issues as the original HDR+ and Super Res Zoom and doesn’t produce ghosting artifacts. At the same time, it merges images 40% faster than its predecessors. Because it merges RAW images early in the photographic pipeline, we were able to achieve all of those benefits while keeping the rest of processing and the signature HDR+ look unchanged. Furthermore, users who prefer to use computational RAW images can take advantage of those image quality and performance improvements.

Bracketing on Pixel
HDR+ with Bracketing is available to users of Pixel 4a (5G) and 5 in the default camera, as well as in Night Sight and Portrait modes. For users of Pixel 4 and 4a, the Google Camera app supports bracketing in Night Sight mode. No user interaction is needed to activate HDR+ with Bracketing — depending on the dynamic range of the scene, and the presence of motion, HDR+ with bracketing chooses the best exposures to maximize image quality (examples).

Acknowledgements
HDR+ with Bracketing is the result of a collaboration across several teams at Google. The project would not have been possible without the joint efforts of Sam Hasinoff, Dillon Sharlet, Kiran Murthy, Mike Milne, Andy Radin, Nicholas Wilson, Navin Sarma‎, Gabriel Nava, Emily To, Sushil Nath, Alexander Schiffhauer, Isaac Reynolds, Bill Strathearn, Marius Renn, Alex Hong, Jose Ricardo Lima, Bob Hung, Ying Chen Lou, Joy Hsu, Blade Chiu, David Massoud, Jean Hsu, Ellie Yang, and Marc Levoy.

Categories
Misc

VAE won’t learn color

I found the CVAE tutorial on tensorflow.org. I then repurposed it on new data however it is only reconstructing the outline in a nearly grayscale image, ignoring the colour, the loss is 14000. Please help…

#below: model

class VAE(tf.keras.Model):

def __init__(self, latent_dim):

super(VAE, self).__init__()

self.latent_dim = latent_dim

self.encoder = tf.keras.Sequential(

[

tf.keras.layers.InputLayer(input_shape=(128, 128,3)),

tf.keras.layers.Conv2D(

filters=32, kernel_size=3, strides=(2, 2), activation=’relu’),

tf.keras.layers.Conv2D(

filters=64, kernel_size=3, strides=(2, 2), activation=’relu’),

tf.keras.layers.Conv2D(

filters=128, kernel_size=3, strides=(2, 2), activation=’relu’),

# tf.keras.layers.Conv2D(

# filters=256, kernel_size=3, strides=(2, 2), activation=’relu’),

tf.keras.layers.Flatten(),

# No activation

tf.keras.layers.Dense(latent_dim + latent_dim),

]

)

self.decoder = tf.keras.Sequential(

[

tf.keras.layers.InputLayer(input_shape=(latent_dim,)),

tf.keras.layers.Dense(units=16*16*256, activation=tf.nn.relu),

tf.keras.layers.Reshape(target_shape=(16, 16, 256)),

tf.keras.layers.Conv2DTranspose(

filters=128, kernel_size=3, strides=2, padding=’same’,

activation=’relu’),

tf.keras.layers.Conv2DTranspose(

filters=64, kernel_size=3, strides=2, padding=’same’,

activation=’relu’),

tf.keras.layers.Conv2DTranspose(

filters=32, kernel_size=3, strides=2, padding=’same’,

activation=’relu’),

# No activation

tf.keras.layers.Conv2DTranspose(

filters=3, kernel_size=3, strides=1, padding=’same’),

]

)

u/tf.function

def sample(self, eps=None):

if eps is None:

eps = tf.random.normal(shape=(100, self.latent_dim))

return self.decode(eps, apply_sigmoid=True)

def encode(self, x):

mean, logvar = tf.split(self.encoder(x), num_or_size_splits=2, axis=1)

return mean, logvar

def reparameterize(self, mean, logvar):

eps = tf.random.normal(shape=mean.shape)

return eps * tf.exp(logvar * .5) + mean

def decode(self, z, apply_sigmoid=False):

logits = self.decoder(z)

if apply_sigmoid:

probs = tf.sigmoid(logits)

return probs

return logits

optimizer = tf.keras.optimizers.Adam(1e-3)

def log_normal_pdf(sample, mean, logvar, raxis=1):

log2pi = tf.math.log(2. * np.pi)

return tf.reduce_sum(

-.5 * ((sample – mean) ** 2. * tf.exp(-logvar) + logvar + log2pi),

axis=raxis)

def compute_loss(model, x):

mean, logvar = model.encode(x)

z = model.reparameterize(mean, logvar)

x_logit = model.decode(z)

cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)

logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])

logpz = log_normal_pdf(z, 0., 0.)

logqz_x = log_normal_pdf(z, mean, logvar)

return -tf.reduce_mean(logpx_z + logpz – logqz_x)

#above: loss

def train_step(model, x, optimizer):

“””Executes one training step and returns the loss.

This function computes the loss and gradients, and uses the latter to

update the model’s parameters.

“””

with tf.GradientTape() as tape:

loss = compute_loss(model, x)

gradients = tape.gradient(loss, model.trainable_variables)

optimizer.apply_gradients(zip(gradients, model.trainable_variables))

epochs = 10

latent_dim = 64 # up or 8

num_examples_to_generate = 32

model = VAE(latent_dim)

submitted by /u/much_bad_gramer
[visit reddit] [comments]

Categories
Misc

Reconstructing thousands of particles in one go at the CERN LHC with TensorFlow

Reconstructing thousands of particles in one go at the CERN LHC with TensorFlow submitted by /u/nbortolotti
[visit reddit] [comments]