The importance of time series data in ML

The importance of time series data in ML

Most machine learning platforms can only handle limited forms of data. They often discard information like time series. Here, we look at why time series matters so much for accurate ML models.

Time is a fundamental concept for understanding the world and how to react to it. Our world is four-dimensional. We can move freely in three of these dimensions, but, outside of SciFi, we can only move forwards in time. Without an understanding of time, we are unable to understand the order or meaning of many events. If we want to know what happens next, we need to know the sequence of events that already happened. And yet, many machine learning platforms throw away information about time and simply process numerical data. 

Why time matters

Imagine that you have been asked to predict how much energy will be consumed in your neighborhood tomorrow. What factors might affect your prediction? And what data might you need to know? Clearly, the weather will have a big role to play. In hot weather, people need their power-hungry A/C. In cold weather, they may turn on electric heaters. The day of the week also has an impact. Will people be at home because it is a weekend? Or out at the office as it’s a working day? Or is it even a public holiday? Other key factors include any big events (Superbowl for instance). As you see, many of these factors are related to time.

nuclear electricity production in the USA as a monthly time-series

So, what data might you use to build your prediction? Well, one key thing will be a detailed history of electricity use over the previous couple of years. You also need historical weather data and a calendar that includes all significant events. All this data will be in the form of a time series. The usage data may be incredibly fine-grained, with figures for each second or averages over each minute. Weather data will probably be hour-by-hour. And the events will vary; some, like Superbowl, will last for a couple of hours, others, like Thanksgiving, may have an impact over several days.

Building your model

Now you are ready to start building your prediction model. Naively, you might start by looking for obvious correlations, like temperature and day of the week. Then you might start modeling the impact of events. In all cases, you need to know the correlation between electricity demand and the event. And this isn’t always immediate. On a hot weekday, you may only turn up your A/C when you get home that evening. 

As you can probably tell, this process rapidly starts to become too complex for simple statistical models. Indeed, you are now entering the realms of big data. And ultimately, you will need to use machine learning to create your prediction model.

Turning to an ML platform

Most people lack the skills needed to create machine learning models. Yes, there are thousands of tutorials online, but most assume you have a math major and are a competent programmer. Creating a good ML model requires feature engineering, model selection, training, and verification. All these are hard if you lack the skills and experience. This is where ML platforms come into play. 

An ML platform is designed to simplify the task of creating ML models. They range from ones that help with feature engineering and model training, to ones that create complete models from scratch. However, most ML platforms are unable to cope with time-series data. Indeed, often they will only import simple tabular data.

What happens when you throw away the time series?

So, let’s see how the lack of time-series data might affect your prediction model. Firstly, you will lose the direct link between cause and effect. You will be able to infer this by careful data engineering. For instance, you can create daily summary data for each type of data. However, this loses a lot of the subtleties. Secondly, you will no longer be able to infer delayed effects in your model. Thirdly, you will probably need to discard some of the data during feature engineering. This will add bias to your results.

What happens when you try to engineer the data?

Data and feature engineering refer to modifying your raw data to make it easier to create ML models. If you can’t directly import time-series data, then you need to engineer it to remove the direct time stamps. Instead, you will need to transform all the data into a single timeframe, possibly hourly. Features that have data points for every second in time will need to be averaged over each hour. Features that have more coarse time stamps may need intermediate data points added by extrapolation. Both these processes add noise and error to the eventual model. Worse, they also add bias as you decide what timeframe to normalize to and how to average or extrapolate data. So, however good your platform is at creating models, the result will be a model that is less accurate.

The Sonsasoft NuGene approach

Sonasoft NuGene is a universal ML platform that is designed to create AI bots with little or no human assistance. NuGene takes a very different approach to conventional ML platforms. For starters, NuGene will accept raw data in almost any format. This includes audio, video, IoT sensor readings, numerical data, and, of course, time-series data. This ability to process such mixed data is one of NuGene’s strengths. 

Learning by itself

NuGene is designed to explore and learn from your data in the same way a human data scientist would. It uses unsupervised deep learning to extract interesting patterns and artifacts in the data. It then examines these patterns looking for correlations that might be relevant. Then, it develops a set of hypotheses to try and explain these correlations. At this point,  just as any good scientist would, NuGene tests these correlations for causation. After all, everyone knows correlation does not equal causation (as this cool website proves). 

Creating working bots

NuGene starts to build ML models once it is confident it understands your data fully. It will create a large number of different models and check which ones give the most reliable results. Machine Learning is an active research area, and we are constantly adding new models to NuGene’s library. Finally, once NuGene has a working model, it makes it easy to package this into a bot. This bot can then be inserted into your system and get on with whatever job it is designed for. Be it predicting electricity demand, looking for anomalies, or scoring credit risks. 

Conclusions

As we saw, machine learning models often require time-series data to be accurate. However, many ML platforms simplify this away. The result is models that are less accurate. In effect, you have traded accuracy for simplicity. However, NuGene provides the best of both worlds. You get to benefit from accurate, time-series data with the simplicity of a hands-off AI bot factory. 

TRY IT FOR FREE

We are confident that our products will exceed your expectations.  We want you to try it for free.