So I have been trying at this for ages now, my GPU is GTX 1650 SUPER and I have Python 3.9.5 with tensorflow version 2.7.0 and CUDA 11.2, cuDNN 8.1.0. Here is the issue: sometimes, my models train but most of the time it errors with:
E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
or some other variants (again, they kinda take turns fighting to be errors):
E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_MISALIGNED_ADDRESS: misaligned address E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION: an illegal instruction was encountered
and I have absolutely no idea why its doing this. I don’t think its my code because I was able to train my code 100% of the time on CPU, but it takes like half an hour to do that and GPUs take only 3 minutes, 10 times faster!!! Who else has this error and why is it occuring randomly, also: could MSI Afterburner mess with it?
I have a small data set of images (seven images) that I annotated by drawing bounding boxes on the objects I want to detect using VGG Image Annotator (VIA). When I exported the annotations, it only gave me one JSON file for all the annotations.
How would I go about importing that JSON file, along with the pictures into TensorFlow? I tried to follow the documentation, but got lost.
I also tried the RoboFlow service where I could turn my images/annotations into TFRecords and a TensorFlow CSV files, but I was also lost on how to implement this.
I am using Python 3.6.9 in my JupyterLab environment.
I trained a custom image classification model using Python and now want to deploy it on a native application which I’ve made using React Native. My initial approach to this was to use tfjs. However, tfjs has poor support, unresolved complaints on the forums and a ton of dependency issues when it comes to using custom models on react native, from my experience. I couldn’t figure out any of those by myself. So I switched up and am now trying to load the model on a Python REST API, and am planning to call it from my React Native app. Is this the best way to approach this problem? I’m new to machine learning and programming overall in general, and would love to hear some insight. Is it sensible to upload an image from the React Native front end and then get the prediction using the model on the back end? Also if I’m wrong about the problems with tfjs and someone has successfully used it I’d love to hear how you do it, please. Any and everything helpful is much appreciated, thank you very much.
Posted by Daniel Suo, Software Engineer and Elad Hazan, Research Scientist, Google Research, on behalf of the Google AI Princeton Team
Mechanical ventilators provide critical support for patients who have difficulty breathing or are unable to breathe on their own. They see frequent use in scenarios ranging from routine anesthesia, to neonatal intensive care and life support during the COVID-19 pandemic. A typical ventilator consists of a compressed air source, valves to control the flow of air into and out of the lungs, and a “respiratory circuit” that connects the ventilator to the patient. In some cases, a sedated patient may be connected to the ventilator via a tube inserted through the trachea to their lungs, a process called invasive ventilation.
A mechanical ventilator takes breaths for patients who are not fully capable of doing so on their own. In invasive ventilation, a controllable, compressed air source is connected to a sedated patient via tubing called a respiratory circuit.
In both invasive and non-invasive ventilation, the ventilator follows a clinician-prescribed breathing waveform based on a respiratory measurement from the patient (e.g., airway pressure, tidal volume). In order to prevent harm, this demanding task requires both robustness to differences or changes in patients’ lungs and adherence to the desired waveform. Consequently, ventilators require significant attention from highly-trained clinicians in order to ensure that their performance matches the patients’ needs and that they do not cause lung damage.
Example of a clinician-prescribed breathing waveform (orange) in units of airway pressure and the actual pressure (blue), given some controller algorithm.
In “Machine Learning for Mechanical Ventilation Control”, we present exploratory research into the design of a deep learning–based algorithm to improve medical ventilator control for invasive ventilation. Using signals from an artificial lung, we design a control algorithm that measures airway pressure and computes necessary adjustments to the airflow to better and more consistently match prescribed values. Compared to other approaches, we demonstrate improved robustness and better performance while requiring less manual intervention from clinicians, which suggests that this approach could reduce the likelihood of harm to a patient’s lungs.
Current Methods Today, ventilators are controlled with methods belonging to the PID family (i.e., Proportional, Integral, Differential), which control a system based on the history of errors between the observed and desired states. A PID controller uses three characteristics for ventilator control: proportion (“P”) — a comparison of the measured and target pressure; integral (“I”) — the sum of previous measurements; and differential (“D”) — the difference between two previous measurements. Variants of PID have been used since the 17th century and today form the basis of many controllers in both industrial (e.g., controlling heat or fluids) and consumer (e.g., controlling espresso pressure) applications.
PID control forms a solid baseline, relying on the sharp reactivity of P control to rapidly increase lung pressure when breathing in and the stability of I control to hold the breath in before exhaling. However, operators must tune the ventilator for specific patients, often repeatedly, to balance the “ringing” of overzealous P control against the ineffectually slow rise in lung pressure of dominant I control.
Current PID methods are prone to over- and then under-shooting their target (ringing). Because patients differ in their physiology and may even change during treatment, highly-trained clinicians must constantly monitor and adjust existing methods to ensure such violent ringing as in the above example does not occur.
To more effectively balance these characteristics, we propose a neural network–based controller to create a set of control signals that are more broad and adaptable than PID-generated controls.
A Machine-Learned Ventilator Controller While one could tunethe coefficients of a PID controller (either manually or via an exhaustive grid search) through a limited number of repeated trials, it is impossible to apply such a direct approach towards a deep controller, as deep neural networks (DNNs) are often parameter-rich and require significant training data. Similarly, popular model-free approaches, such as Q-Learning or Policy Gradient, are data-intensive and therefore unsuitable for the physical system at hand. Further, these approaches don’t take into account the intrinsic differentiability of the ventilator dynamical system, which is deterministic, continuous and contact-free.
We therefore adopt a model-based approach, where we first learn a DNN-based simulator of the ventilator-patient dynamical system. An advantage of learning such a simulator is that it provides a more accurate data-driven alternative to physics-based models, and can be more widely distributed for controller research.
To train a faithful simulator, we built a dataset by exploring the space of controls and the resulting pressures, while balancing against physical safety, e.g., not over-inflating a test lung and causing damage. Though PID control can exhibit ringing behavior, it performs well enough to use as a baseline for generating training data. To safely explore and to faithfully capture the behavior of the system, we use PID controllers with varied control coefficients to generate the control-pressure trajectory data for simulator training. Further, we add random deviations to the PID controllers to capture the dynamics more robustly.
We collect data for training by running mechanical ventilation tasks on a physical test lung using an open-source ventilator designed by Princeton University’s People’s Ventilator Project. We built a ventilator farm housing ten ventilator-lung systems on a server rack, which captures multiple airway resistance and compliance settings that span a spectrum of patient lung conditions, as required for practical applications of ventilator systems.
We use a rack-based ventilator farm (10 ventilators / artificial lungs) to collect training data for a ventilator-lung simulator. Using this simulator, we train a DNN controller that we then validate on the physical ventilator farm.
The true underlying state of the dynamical system is not available to the model directly, but rather only through observations of the airway pressure in the system. In the simulator we model the state of the system at any time as a collection of previous pressure observations and the control actions applied to the system (up to a limited lookback window). These inputs are fed into a DNN that predicts the subsequent pressure in the system. We train this simulator on the control-pressure trajectory data collected through interactions with the test lung.
The performance of the simulator is measured via the sum of deviations of the simulator’s predictions (under self-simulation) from the ground truth.
While it is infeasible to compare real dynamics with their simulated counterparts over all possible trajectories and control inputs, we measure the distance between simulation and the known safe trajectories. We introduce some random exploration around these safe trajectories for robustness.
Having learned an accurate simulator, we then use it to train a DNN-based controller completely offline. This approach allows us to rapidly apply updates during controller training. Furthermore, the differentiable nature of the simulator allows for the stable use of the direct policy gradient, where we analytically compute the gradient of the loss with respect to the DNN parameters. We find this method to be significantly more efficient than model-free approaches.
Results To establish a baseline, we run an exhaustive grid of PID controllers for multiple lung settings and select the best performing PID controller as measured by average absolute deviation between the desired pressure waveform and the actual pressure waveform. We compare these to our controllers and provide evidence that our DNN controllers are better performing and more robust.
Breathing waveform tracking performance:
We compare the best PID controller for a given lung setting against our controller trained on the learned simulator for the same setting. Our learned controller shows a 22% lower mean absolute error (MAE) between target and actual pressure waveforms.
Comparison of the MAE between target and actual pressure waveforms (lower is better) for the best PID controller (orange) for a given lung setting (shown for two settings, R=5 and R=20) against our controller (blue) trained on the learned simulator for the same setting. The learned controller performs up to 22% better.
Robustness:
Further, we compare the performance of the single best PID controller across the entire set of lung settings with our controller trained on a set of learned simulators over the same settings. Our controller performs up to 32% better in MAE between target and actual pressure waveforms, suggesting that it could require less manual intervention between patients or even as a patient’s condition changes.
As above, but comparing the single best PID controller across the entire set of lung settings against our controller trained over the same settings. The learned controller performs up to 32% better, suggesting that it may require less manual intervention.
Finally, we investigated the feasibility of using model-free and other popular RL algorithms (PPO, DQN), in comparison to a direct policy gradient trained on the simulator. We find that the simulator-trained direct policy gradient achieves slightly better scores and does so with a more stable training process that uses orders of magnitude fewer training samples and a significantly smaller hyperparameter search space.
In the simulator, we find that model-free and other popular algorithms (PPO, DQN) perform approximately as well as our method.
However, these other methods take an order of magnitude more episodes to train to similar levels.
Conclusions and the Road Forward We have described a deep-learning approach to mechanical ventilation based on simulated dynamics learned from a physical test lung. However, this is only the beginning. To make an impact on real-world ventilators there are numerous other considerations and issues to take into account. Most important amongst them are non-invasive ventilators, which are significantly more challenging due to the difficulty of discerning pressure from lungs and mask pressure. Other directions are how to handle spontaneous breathing and coughing. To learn more and become involved in this important intersection of machine learning and health, see an ICML tutorial on control theory and learning, and consider participating in one of our kaggle competitions for creating better ventilator simulators!
Acknowledgements The primary work was based in the Google AI Princeton lab, in collaboration with Cohen lab at the Mechanical and Aerospace Engineering department at Princeton University. The research paper was authored by contributors from Google and Princeton University, including: Daniel Suo, Naman Agarwal, Wenhan Xia, Xinyi Chen, Udaya Ghai, Alexander Yu, Paula Gradu, Karan Singh, Cyril Zhang, Edgar Minasyan, Julienne LaChance, Tom Zajdel, Manuel Schottdorf, Daniel Cohen, and Elad Hazan.
Posted by Joshua Greaves, Software Engineer and Pablo Samuel Castro, Staff Software Engineer, Google Research, Brain Team
Benchmark challenges have been a driving force in the advancement of machine learning (ML). In particular, difficult benchmark environments for reinforcement learning (RL) have been crucial for the rapid progress of the field by challenging researchers to overcome increasingly difficult tasks. The Arcade Learning Environment, Mujoco, and others have been used to push the envelope in RL algorithms, representation learning, exploration, and more.
In “Autonomous Navigation of Stratospheric Balloons Using Reinforcement Learning”, published in Nature, we demonstrated how deep RL can be used to create a high-performing flight agent that can control stratospheric balloons in the real world. This research confirmed that deep RL can be successfully applied outside of simulated environments, and contributed practical knowledge for integrating RL algorithms with complex dynamical systems. Today we are excited to announce the open-source release of the Balloon Learning Environment (BLE), a new benchmark emulating the real-world problem of controlling stratospheric balloons. The BLE is a high-fidelity simulator, which we hope will provide researchers with a valuable resource for deep RL research.
Station-Keeping Stratospheric Balloons Stratospheric balloons are filled with a buoyant gas that allows them to float for weeks or months at a time in the stratosphere, about twice as high as a passenger plane’s cruising altitude. Though there are many potential variations of stratospheric balloons, the kind emulated in the BLE are equipped with solar panels and batteries, which allow them to adjust their altitude by controlling the weight of air in their ballast using an electric pump. However, they have no means to propel themselves laterally, which means that they are subject to wind patterns in the air around them.
By changing its altitude, a stratospheric balloon can surf winds moving in different directions.
The goal of an agent in the BLE is to station-keep — i.e., to control a balloon to stay within 50km of a fixed ground station — by changing its altitude to catch winds that it finds favorable. We measure how successful an agent is at station-keeping by measuring the fraction of time the balloon is within the specified radius, denoted TWR50 (i.e., the time within a radius of 50km).
A station-seeking balloon must navigate a changing wind field to stay above a ground station. Left: Side elevation of a station-keeping balloon. Right: Birds-eye-view of the same balloon.
The Challenges of Station-Keeping To create a realistic simulator (without including copious amounts of historical wind data), the BLE uses a variational autoencoder (VAE) trained on historical data to generate wind forecasts that match the characteristics of real winds. A wind noise model is then used to make the windfields more realistic to match what a balloon would encounter in real-world conditions.
Navigating a stratospheric balloon through a wind field can be quite challenging. The winds at any given altitude rarely remain ideal for long, and a good balloon controller will need to move up and down through its wind column to discover more suitable winds. In RL parlance, the problem of station-keeping is partially observable because the agent only has access to forecasted wind data to make those decisions. An agent has access to wind forecasts at every altitude and the true wind at its current altitude. The BLE returns an observation which includes a notion of wind uncertainty.
A stratospheric balloon must explore winds at different altitudes in order to find favorable winds. The observation returned by the BLE includes wind predictions and a measure of uncertainty, made by mixing a wind forecast and winds measured at the balloon’s altitude.
In some situations, there may not be suitable winds anywhere in the balloon’s wind column. In this case, an expert agent is still able to fly towards the station by taking a more circuitous route through the wind field (a common example is when the balloon moves in a zig-zag fashion, akin to tacking on a sailboat). Below we demonstrate that even just remaining in range of the station usually requires significant acrobatics.
An agent must handle long planning horizons to succeed in station-keeping. In this case, StationSeeker (an expert-designed controller) heads directly to the center of the station-keeping area and is pushed out, while Perciatelli44 (an RL agent) is able to plan ahead and stay in range longer by hugging the edge of the area.
Night-time adds a fresh element of difficulty to station-keeping in the BLE, which reflects the reality of night-time changes in physical conditions and power availability. While during the day the air pump is powered by solar panels, at night the balloon relies on its on-board batteries for energy. Using too much power early in the night typically results in limited maneuverability in the hours preceding dawn. This is where RL agents can discover quite creative solutions — such as reducing altitude in the afternoon in order to store potential energy.
An agent needs to balance the station-keeping objective with a finite energy allowance at night.
Despite all these challenges, our research demonstrates that agents trained with reinforcement learning can learn to perform better than expert-designed controllers at station-keeping. Along with the BLE, we are releasing the main agents from our research: Perciatelli44 (an RL agent) and StationSeeker (an expert-designed controller). The BLE can be used with any reinforcement learning library, and to showcase this we include Dopamine’s DQN and QR-DQN agents, as well as Acme’s QR-DQN agent (supporting both standalone and distributed training with Launchpad).
Evaluation performance by the included benchmark agents on the BLE. “Finetuned” is a fine-tuned Perciatelli44 agent, and Acme is a QR-DQN agent trained with the Acme library.
The BLE source code contains information on how to get started with the BLE, including training and evaluating agents, documentation on the various components of the simulator, and example code. It also includes the historical windfield data (as a TensorFlow DataSet) used to train the VAE to allow researchers to experiment with their own models for windfield generation. We are excited to see the progress that the community will make on this benchmark.
Acknowledgements We would like to thank the Balloon Learning Environment team: Sal Candido, Marc G. Bellemare, Vincent Dumoulin, Ross Goroshin, and Sam Ponda. We’d also like to thank Tom Small for his excellent animation in this blog post and graphic design help, along with our colleagues, Bradley Rhodes, Daniel Eisenberg, Piotr Staczyk, Anton Raichuk, Nikola Momchev, Geoff Hinton, Hugo Larochelle, and the rest of the Google Brain team in Montreal.
Real-time rendering and photorealistic graphics used to be tall tales, but NVIDIA Omniverse has made them fact from fiction. NVIDIA’s own artists are writing new chapters in Omniverse, an accelerated 3D design platform that connects and enhances 3D apps and creative workflows, to showcase these stories. Combined with the NVIDIA Studio platform, Omniverse and Studio-validated Read article >
Take a Robotics Deep Dive with Jetson Developer Day at GTC
NVIDIA Jetson Developer Day is led by world-renowned experts in robotics, edge AI, and deep learning. This one-day event, at the start of GTC on Monday, March 21, gives a unique deep-dive into building next-gen AI-powered applications and autonomous machines.
Whether you are new to the Jetson platform or an advanced user, you will be among the first to hear about the latest hardware and software developments for robotics, computer vision, deep learning, and more. Attendees will learn about the newest addition to the NVIDIA Jetson family, AGX Orin. It’s the ideal solution for deploying AI at the edge for advanced robotics and autonomous machines in fields such as manufacturing, logistics, retail, agriculture, and beyond.
You are also invited to join the special developer breakout session at the end of the day. Interact directly with the Jetson product team and get answers to all your questions.
Get Hands-on DLI Training at GTC with Early-Bird Pricing
The Deep Learning Institute (DLI) is hosting 45 full-day workshops for developers at GTC.
The workshops are available in multiple languages and time zones, and participants can earn a DLI certificate of competency upon completion. Workshop topics include deep learning, cybersecurity, recommender systems, NLP, and more.
Early bird pricing for full-day workshops are just $99 until Feb. 28, 2022. The regular workshop rate is $149. The 2-hour DLI sessions are free with registration.
The newly announced AI racer, Gran Turismo Sophy, uses deep reinforcement learning to beat human Gran Turismo Sport drivers in real-time competitions.
Gran Turismo (GT) Sport competitors are facing a new, AI-supercharged contender thanks to the latest collaborative effort from Sony AI, Sony Interactive Entertainment (SIE), and Polyphony Digital Inc., the developers behind GT Sport.
The autonomous AI racing agent, known as Gran Turismo Sophy (GT Sophy), recently beat the world’s best drivers in GT Sport. Published in Nature, the work introduces a novel deep reinforcement-learning platform used to create GT Sophy and could spur new AI-powered experiences for players across the globe.
“Sony’s purpose is to ‘fill the world with emotion, through the power of creativity and technology,’ and Gran Turismo Sophy is a perfect embodiment of this,” Kenichiro Yoshida, Chairman, President and CEO, of Sony Group Corporation said in a press release.
“This group collaboration in which we have built a game AI for gamers is truly unique to Sony as a creative entertainment company. It signals a significant leap in the advancement of AI while also offering enhanced experiences to GT fans around the world.”
Smart gaming
AI is not new to gaming. In 2017, the Alpha Zero program from DeepMind made news when it learned to play and conquer chess, shogi (Japanese chess), and Go using deep reinforcement learning (deep RL.)
An offset of machine learning, deep RL in basic terms uses a computational RL agent to make decisions by trial and error to solve a problem. With the introduction of deep learning into the algorithm, the agent makes decisions from very large datasets and decides on actions to reach its goal efficiently.
The Alpha Zero program used an algorithm where an untrained neural network played millions of games against itself, adjusting play based on its outcome.
Racing AI, however, poses more complicated inference needs with innumerable variables from different cars, tracks, drivers, weather, and opponents. As one of the most realistic driving simulators, GT Sport uses authentic race car and track dimensions, reproducing racing environments by also accounting for factors such as air resistance and tire friction.
Reinforcing good behavior
Creating a racing agent capable of adjusting to real-time factors, the team trained GT Sophy on three specific skills—race car control, racing tactics, and racing etiquette using a newly developed deep RL algorithm. According to the project’s website, the algorithm uses the latest in reinforcement-learning techniques, to train a racing agent with rewards or penalties based on its actions.
“One of the advantages of using deep RL to develop a racing agent is that it eliminates the need for engineers to program how and when to execute the skills needed to win the race—as long as it is exposed to the right conditions, the agent learns to do the right thing by trial and error,” the researchers write in the study.
The team custom-built a web-based Distributed, Asynchronous Rollouts and Training (DART) platform to train GT Sophy on PlayStation 4 consoles using SIE’s worldwide cloud infrastructure researchers then used DART for collecting training data and evaluating versions of the agent.
Using this system, the researchers specify an experiment, run it automatically, and view data in a web browser. Each experiment uses a single trainer on a compute node with the cuDNN-accelerated TensorFlow deep learning framework and an NVIDIA V100 GPU, or half of an NVIDIA A100 GPU coupled with around eight vCPUs and 55 GiB of memory.
“The system allows Sony AI’s research team to seamlessly run hundreds of simultaneous experiments while they explore techniques that would take GT Sophy to the next level,” according to the project’s website.
Supercharged GT Sophy
In 2021, four of the world’s best GT Sport drivers competed against GT Sophy in two separate events. These competitions featured three racecourses, and four GT Sophy agents and cars. In its debut, GT Sophy excelled in timed trials but didn’t perform as well when challenging racers on the same track.
The team made improvements based on the results of the first race, upgrading the training regime, increasing network size, adjusting features and rewards, and enhancing the opponents.
The result led to a racing agent that could pass a human driver around a sharp corner, handle crowded starts, make slingshot passes out of the slipstream, and executive defensive maneuvers. The agent did this all while abiding by the subtle sportsmanship considerations human drivers understand and practice. It also bested top human drivers in timed trials and in an FIA-Certified Gran Turismo championship series.
The paper reports that GT Sophy learns to get around a track in just a few hours. In about 2 days, it can beat about 95% of human players. Give it 10 to 12 days, about 45,00 driving hours, and GT Sophy equals or exceeds the top drivers in the world.
With its racing prowess, the aim of GT Sophy is to make GT Sport more enjoyable, competitive, and educational. Some of the experts that competed against GT Sophy reported learning new approaches to turns and driving techniques.
The researchers also see the potential for deep RL to improve real-world applications of systems such as collaborative robotics, drones, or autonomous vehicles.
The approximate Python code is available in the supplementary information section of the study.
Recent strides in the efficacy of AI, the adoption of IoT devices and the power of edge computing have come together to unlock the power of edge AI. This has opened new opportunities for edge AI that were previously unimaginable — from helping radiologists identify pathologies in the hospital, to driving cars down the freeway, Read article >
Learn the easier way to encode time-related Information by using dummy variables, cyclical coding with sine/cosine information, and radial basis functions.
Imagine you have just started a new data science project. The goal is to build a model predicting Y, the target variable. You have already received some data from the stakeholders/data engineers, did a thorough EDA, and selected some variables you believe are relevant for the problem at hand. Then you finally built your first model. The score is acceptable, but you believe you can do much better. What do you do?
There are many ways in which you could follow up. One possibility would be to increase the complexity of the machine-learning model you have used. Alternatively, you can try to come up with some more meaningful features and continue to use the current model (at least for the time being).
For many projects, both enterprise data scientists and participants of data science competitions like Kaggle agree that it is the latter – identifying more meaningful features from the data – that can often make the most improvement to model accuracy for the least amount of effort.
You are effectively shifting the complexity from the model to the features. The features do not have to be very complex. But, ideally, we find features that have a strong yet simple relationship with the target variable.
Many data science projects contain some information about the passage of time. And this is not restricted to time series forecasting problems. For example, you can often find such features in traditional regression or classification tasks. This article investigates how to create meaningful features using date-related information. We present three approaches, but we need some preparation first.
Setup and data
For this article, we mostly use very well-known Python packages as well as relying on a relatively unknown one, scikit-lego, which is a library containing numerous useful functionalities that are expanding scikit-learn’s capabilities. We import the required libraries as follows:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import date
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import FunctionTransformer
from sklearn.metrics import mean_absolute_error
from sklego.preprocessing import RepeatingBasisFunction
To keep things simple, we generate the data ourselves. In this example, we work with an artificial time series. We initiate by creating an empty DataFrame with an index spanning four calendar years (we use the pd.date_range). Then, we create two columns:
day_nr – a numeric index representing the passage of time
day_of_year – the ordinal day of the year
Lastly, we have to create the time series itself. To do so, we combine two transformed sine curves and some random noise. The code used for generating the data is based on the code included in scikit-lego’sdocumentation.
# for reproducibility
np.random.seed(42)
# generate the DataFrame with dates
range_of_dates = pd.date_range(start="2017-01-01",
End="2020-12-30")
X = pd.DataFrame(index=range_of_dates)
# create a sequence of day numbers
X["day_nr"] = range(len(X))
X["day_of_year"] = X.index.day_of_year
# generate the components of the target
signal_1 = 3 + 4 * np.sin(X["day_nr"] / 365 * 2 * np.pi)
signal_2 = 3 * np.sin(X["day_nr"] / 365 * 4 * np.pi + 365/2)
noise = np.random.normal(0, 0.85, len(X))
# combine them to get the target series
y = signal_1 + signal_2 + noise
# plot
y.plot(figsize=(16,4), title="Generated time series");
Then, we create a new DataFrame, in which we store the generated time series. This DataFrame will be used for comparison of the models’ performance using the different approaches to feature engineering.
In this section, we describe the three considered approaches to generating time-related features.
Before we dive right into it, we should define an evaluation framework. Our simulated data contains observations from a period of four years. We will use the first 3 years of generated data as the training set and we will evaluate on the fourth year. We will use the Mean Absolute Error (MAE) as the evaluation metric.
Below we define a variable that will serve us for cutting off the two sets:
TRAIN_END = 3 * 365
Approach #1: dummy variables
We start with something that you are most likely already familiar with, at least to some degree. The easiest way to encode time-related information is to use dummy variables (also known as one-hot encoding). Let’s look at an example.
First, we extracted the information about the month (encoded as an integer in the range of 1 to 12) from the DatetimeIndex. Then, we used the pd.get_dummies function to create the dummy variables. Each column contains information on whether the observation (row) comes from the given month or not.
As you might have noticed, we have dropped one level and only have 11 columns now. We have done that in order to avoid the infamous dummy variable trap (perfect multicollinearity), which can be an issue when working with linear models.
In our example, we used the dummy variable approach to capture the month in which the observation was recorded. However, this same approach could be used to indicate a range of other information from the DatetimeIndex. For example, the day/week/quarter of the year, a flag whether a given day is a weekend, the first/last day of a period, and much, much more. You can find a list containing all the possible features we can extract from the pandas documentation index, available at pandas.pydata.org.
Bonus tip: This is outside of the scope of this simple exercise, but in real-life scenarios, we can also use information about special days (think national holidays, Christmas, Black Friday, and so on) to create features. holidays is a nice Python library containing past and future information about special days per country.
As described in the introduction, the goal of feature engineering is to shift complexity from the model side to the feature side. That is why we will use one of the simplest ML models – linear regression – to see how well we can fit the time series using only the created dummies.
We can see that the fitted line already follows the time series quite well, though it is a bit jagged (step-like) – caused by the discontinuity of the dummy features. And that is what we will try to solve with the next two approaches.
But before proceeding it might be worth mentioning that when using non-linear models such as decision trees (or ensembles of thereof), we do not explicitly encode features such as month number or day of the year as dummies. Those models are capable of learning non-monotonic relationships between ordinal input features and the target.
Approach #2: cyclical encoding with sine/cosine transformation
As we have seen preceding, the fitted line resembles steps. That is because each dummy is treated separately with no continuity. However, there is a clear cyclical continuity present with variables such as time. What does that mean?
Imagine we are working with energy consumption data. When we include the information about the month of the observed consumption, it makes sense there is a stronger connection between two consecutive months. Using this logic, the connection between December and January and between January and February is strong. In comparison, the connection between January and July is not that strong. The same applies to other time-related information as well.
So how can we incorporate this knowledge into feature engineering? Trigonometric functions come to the rescue. We can use the following sine/cosine transformations to encode the cyclical time feature into two features.
In the snippet below, we copy the initial DataFrame, add the column with month numbers, and then encode both the month and day_of_year columns using the sine/cosine transformations. Then, we plot both pairs of curves.
There are two insights we can draw from the transformed data, which is plotted in Figure 3. The first is that we can easily see that the curves are step-wise when using the months for encoding but when using daily frequency, the curves are much smoother; Secondly, we can also see why we must use two curves instead of one. Due to the repetitive nature of the curves, if you drew a straight horizontal line through the plot for a single year, you would cross the curve in two places. This would not be enough for the model to understand the observation’s time point. But with the two curves, there is no such issue, and a user can identify every single time point. This is clearly visible when we plot the values of the sine/cosine functions on a scatter plot. In Figure 4 we can see the circular pattern, with no overlapping values.
Let’s fit the same linear regression model using only the newly created features coming from the daily frequency.
Figure 5 shows that the model is able to pick up the general trend of the data, identifying periods with higher and lower values. However, it appears that the magnitude of the predictions is less accurate, and at a glance, this fit appears worse than the one achieved using dummy variables (Figure 2).
Before we discuss the third feature engineering technique, it is worth mentioning that there is a serious drawback of this approach, which is apparent when using tree-based models. By design, the tree-based models make a split based on a single feature at the time. And as we have mentioned before, the sine/cosine features should be considered simultaneously in order to properly identify the time points within a period.
Approach #3: radial basis functions
The last approach uses radial basis functions. We will not go into much detail on what they actually are, but you can read a bit more on the topic here. Essentially, we again want to solve the issue we encountered with the first approach, that is, that there is a continuity to our time features.
We use the handy scikit-lego library, which offers the RepeatingBasisFunction class, and specify the following parameters:
The number of basis functions we want to create (we chose 12).
Which column to use for indexing the RBFs. In our case, that is the column containing information on which day of the year the given observation comes from.
The range of the input – in our case, the range is from 1 to 365.
What to do with the remaining columns of the DataFrame we will use for fitting the estimator. ”drop” will only keep the created RBF features, ”passthrough” will keep both the old and new features.
Figure 6 shows the 12 radial basis functions that we have created using the day number as input. Each curve contains information about how close we are to a certain day of the year (because we chose that column). For example, the first curve measures distance from January 1, so it peaks on the first day of every year and decreases symmetrically as we move away from that date.
By design, the basis functions are equally spaced over the input range. We chose 12 as we wanted the RBFs to resemble months. This way, each function shows approximately (because of the months’ unequal length) the distance to the first day of the month.
Similar to the previous approaches, let’s fit the linear regression model using the 12 RBF features.
Figure 7 shows that the model is able to accurately capture the real data when using the RBF features.
There are two key parameters that we can tune when using radial basis functions:
the number of the radial basis functions,
the shape of the bell curves – it can be modified with the width argument of RepeatingBasisFunction.
One method for tuning these parameter values would be to use grid search to identify the optimal values for a given data set.
Final comparison
We can execute the following snippet to generate a numeric comparison of different approaches to encoding time-related information.
results_df.plot(title="Comparison of fits using different time-based features",
figsize=(16,4),
color = ["c", "k", "b", "r"])
plt.axvline(date(2020, 1, 1), c="m", linestyle="--");
Figure 8 illustrates that the radial basis functions resulted in the closest fit from the considered approaches. The sine/cosine features allowed the model to pick up the main patterns but were not enough to capture the dynamics of the series entirely.
Using the snippet below, we calculate the Mean Absolute Error for each of the models, over both training and test sets. We expect the scores to be very similar between training and test sets, as the generated series is almost perfectly cyclical – the only difference between the years is the random component.
Naturally, that would not be the case in a real-life situation, in which we would encounter much more variability between the same periods over time. However, in such cases, we would also use many other features (for example, some measure of trend or the passage of time) to account for those changes.
As before, we can see that the model using RBF features resulted in the best fit, while the sine/cosine features performed the worst. Our assumption about the similarity of the scores between the training and test sets was also confirmed.
Takeaways
We showed three approaches to encoding time-related information as features for machine learning models.
Aside from the most popular dummy-encoding, there are approaches that are better suited for encoding the cyclical nature of time.
When using those approaches, the granularity of the time interval greatly matters for the shape of the newly created features.
Using the radial basis functions, we can decide on the number of functions we want to use, as well as the width of the bell curves.
You can find the code used for this article on my GitHub. In case you have any feedback, I would be happy to discuss it on Twitter.