The latest Merlin .5 update includes a data generator for training, multi-GPU dataloader, and initial support for session-based recommenders.
Billions of people in the world are online. Many discrete moments online are spent browsing, shopping, streaming entertainment, or engaging with social media. Each discrete moment, or session, online is an opportunity for recommenders to make informed decisions a bit easier, faster, and more personalized for an individual person.Yet, when considering scale, this translates into recommenders potentially supporting billions of people interacting with trillions of things online.
At GTC Spring 2021, NVIDIA shared how retail, entertainment, on-demand, and social companies are building and utilizing recommenders at scale including early adopters of NVIDIA Merlin. Merlin open source components include NVTabular for ETL, HugeCTR for training, and Triton for inference. The NVIDIA Merlin team continues to ingest feedback from early adopters to streamline recommender workflows for machine learning engineers. The latest Merlin .5 update includes a data generator for training, multi-GPU dataloader, and initial support for session-based recommenders. Also, the update continuously reaffirms NVIDIA’s commitment to democratizing and streamlining recommender workflows.
Supporting Experimentation and Streamlining Recommender Workflows
Ongoing experimentation is vital for fine tuning recommender models performance before models are deployed to production. A configurable data generator, using synthetic data, helps machine learning engineers calculate the probability distribution to be uniform or power-law for categorical features, without modifying the configuration file. Merlin HugeCTR’s new data generator considers categorical data and is particularly helpful for benchmarking and research purposes.
Merlin .5’s inclusion of a multi-GPU dataloader was based on feedback from Merlin early adopters and also helps streamline workflows. Machine learning engineers are able to use the Merlin NVTabular TensorFlow (TF) dataloader for multi-GPU training on a single node using TF Distributed. Merlin NVTabular utilizes Dask and Dask-cuDF to scale easily to multi-GPU and multi-node as well as provide a high-performance recommender specific ETL pipeline.
Merlin Session-Based Recommenders Support: Just A Beginning
Data scientists and machine learning engineers at the forefront of e-commerce, news, and social media recommender work have added, or are considering to add, session-based recommenders. While collaborative filtering and content-based filtering are established recommender methods, session-based recommenders are gaining attention due to the potential increased accuracy of predictions when users interests are dynamic and specific to a shorter time frame (i.e., within a session). With Merlin .5, NVTabular provides new preprocessing functionality needed to transform and group data for session based-recommenders.
Download and Try Merlin’s Latest Update
The latest preprocessing and training enhancements to NVIDIA Merlin reaffirms NVIDIA’s commitment to democratizing and accelerating recommender workflows. As machine learning engineers and data scientists use a hybrid of libraries, packages, tools, and techniques to create effective and impactful recommenders, Merlin components are designed to be easy-to-use and interoperable with existing recommender workflows.
To discover hands-on how Merlin components streamline recommender workflows, download and try Merlin NVTabular for ETL, HugeCTR for training, and Triton for inference.