AI workflow for time series forecasting Tutorial

SEDIMARK · December 10, 2024

1. Introduction

As part of the SEDIMARK toolbox that Users will use for configuring AI and Data Processing pipelines for their use case, AI tasks such as forecasting are made readily available for inferencing on Data Assets. Time series forecasting has a wide range of applications across various fields, including financial market prediction, weather forecasting, and traffic flow prediction.

In this tutorial, we will use Python to demonstrate the basic AI workflow for time series forecasting, specifically focusing on temperature forecasting for agriculture use cases. Accurate temperature forecasting is crucial for agriculture as it helps farmers plan their activities, manage crops, and optimize yields.

The Jupyter notebook that contains the content of this tutorial can be downloaded from Github.

2. Environment Setup

We need to install some toolboxes and libraries for this experiment. Therefore, please copy and use the below command in your python terminal:

pip install numpy pandas matplotlib scikit-learn torch

3. Data Preprocessing

In this section, we generate the simulation data and apply the preprocessing.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

# Generate sample data
date_rng = pd.date_range(start='2023-01-01', end='2023-06-30', freq='D')
df = pd.DataFrame(date_rng, columns=['date'])
df['temperature'] = np.random.randint(20, 35, size=(len(date_rng)))

# Set date as index
df.set_index('date', inplace=True)

# Visualize data
df['temperature'].plot(figsize=(12, 6), title='Temperature Time Series')
plt.show()

# Normalize data
scaler = MinMaxScaler(feature_range=(0, 1))
df['temperature_scaled'] = scaler.fit_transform(df['temperature'].values.reshape(-1, 1))

# Split into training and testing sets
train_size = int(len(df) * 0.8)
train, test = df[:train_size], df[train_size:]

# Create dataset for Transformer
def create_dataset(data, time_step=1):
    X, Y = [], []
    for i in range(len(data) - time_step - 1):
        X.append(data[i:(i + time_step), 0])
        Y.append(data[i + time_step, 0])
    return np.array(X), np.array(Y)

time_step = 10
X_train, y_train = create_dataset(train['temperature_scaled'].values, time_step)
X_test, y_test = create_dataset(test['temperature_scaled'].values, time_step)

# Convert to PyTorch tensors
import torch
X_train = torch.tensor(X_train.reshape(X_train.shape[0], time_step, 1), dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_test = torch.tensor(X_test.reshape(X_test.shape[0], time_step, 1), dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)

4. Build the Simple Transformer Model

We use the toolbox and librires support provided by Pytorch to create a simple and basic Transformer model (Encoder-Decoder).

import torch.nn as nn
import torch.optim as optim

class TransformerModel(nn.Module):
    def __init__(self, num_heads, d_model, num_encoder_layers, num_decoder_layers, dff):
        super(TransformerModel, self).__init__()
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=num_heads, dim_feedforward=dff)
        self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_encoder_layers)
        self.decoder_layer = nn.TransformerDecoderLayer(d_model=d_model, nhead=num_heads, dim_feedforward=dff)
        self.transformer_decoder = nn.TransformerDecoder(self.decoder_layer, num_layers=num_decoder_layers)
        self.flatten = nn.Flatten()
        self.dense1 = nn.Linear(d_model * time_step, dff)
        self.dense2 = nn.Linear(dff, 1)

    def forward(self, src):
        encoder_output = self.transformer_encoder(src)
        decoder_output = self.transformer_decoder(encoder_output, encoder_output)
        flatten_output = self.flatten(decoder_output)
        dense_output = self.dense1(flatten_output)
        output = self.dense2(dense_output)
        return output

# Hyperparameters
num_heads = 2
d_model = 64
num_encoder_layers = 2
num_decoder_layers = 2
dff = 128

# Create model
model = TransformerModel(num_heads, d_model, num_encoder_layers, num_decoder_layers, dff)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train model
num_epochs = 50
batch_size = 64
train_loader = torch.utils.data.DataLoader(dataset=list(zip(X_train, y_train)), batch_size=batch_size, shuffle=True)

for epoch in range(num_epochs):
    model.train()
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs.squeeze(), batch_y)
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

5. Model Evaluation

We evaluate our trained model on the created data.

import math
from sklearn.metrics import mean_squared_error

model.eval()
with torch.no_grad():
    train_predict = model(X_train).squeeze().numpy()
    test_predict = model(X_test).squeeze().numpy()

# Inverse transform the predictions
train_predict = scaler.inverse_transform(train_predict.reshape(-1, 1))
test_predict = scaler.inverse_transform(test_predict.reshape(-1, 1))
y_train = scaler.inverse_transform(y_train.reshape(-1, 1))
y_test = scaler.inverse_transform(y_test.reshape(-1, 1))

# Calculate RMSE
train_score = math.sqrt(mean_squared_error(y_train, train_predict))
test_score = math.sqrt(mean_squared_error(y_test, test_predict))
print(f'Train Score: {train_score} RMSE')
print(f'Test Score: {test_score} RMSE')

# Visualize predictions
plt.figure(figsize=(12, 6))
plt.plot(df['temperature'], label='Actual Data')
plt.plot(df.index[time_step:train_size], train_predict, label='Train Predict')
plt.plot(df.index[train_size+time_step+1:], test_predict, label='Test Predict')
plt.legend()
plt.show()

6. Conclusion

This tutorial demonstrates how to use a basic Transformer model for time series forecasting, specifically for temperature prediction in agriculture. Accurate temperature forecasting is essential for agricultural planning and decision-making, helping farmers optimize crop management and improve yields. Through this example, readers can gain a fundamental understanding of applying Transformers to time series forecasting and further research and optimize the model for better prediction performance.

Subscribe to SEDIMARK!

SEDIMARK Follow

SEcure Decentralised Intelligent Data MARKetplace. A #horizoneurope project funded by the European Union.

Retweet on Twitter SEDIMARK Retweeted

Avatar datos.gob.es – Dirección General del Dato @datosgob ·

2 Oct

🔍El próximo 04/10 se celebra el Hackathon Openred 2025 en Santander. Una iniciativa de ciencia ciudadana para crear una red de sensores que mida la radiación gamma ambiental, promoviendo la generación de datos abiertos entre otros objetivos. Inscríbete: https://bit.ly/4nS5OtF

Reply on Twitter 1973646710200214022 Retweet on Twitter 1973646710200214022 3 Like on Twitter 1973646710200214022 2 Twitter 1973646710200214022

Retweet on Twitter SEDIMARK Retweeted

Avatar European Commission @eu_commission ·

2 Oct

#ChooseEurope for Science is live!

With €22.5 million under #MSCA, we’re funding organisations to:

🔬 Host postdoctoral researchers in all fields
🔬 Offer stable jobs and great working conditions
🔬 Create long-term career prospects

Europe is where science thrives.

More ↓

Reply on Twitter 1973666767911124992 Retweet on Twitter 1973666767911124992 96 Like on Twitter 1973666767911124992 254 Twitter 1973666767911124992

Retweet on Twitter SEDIMARK Retweeted

Avatar Water Europe @h20eu ·

30 Sep

🍃The Europe's Environment 2025 report is out!

⚠️30% of EU territory faces #water stress & nearly half of surface waters fail quality standards.

🔗Discover more: https://watereurope.eu/the-2025-europes-environment-report-shows-mixed-picture-in-policy-implementation-trends-for-the-eu-water-sector/

#EuropesEnvironment2025

Reply on Twitter 1972995223270941109 Retweet on Twitter 1972995223270941109 7 Like on Twitter 1972995223270941109 9 Twitter 1972995223270941109