SEDIMARK Logo

AI workflow for time series forecasting Tutorial

SEDIMARK · December 10, 2024
Pipeline AI Example tutorial

1. Introduction

As part of the SEDIMARK toolbox that Users will use for configuring AI and Data Processing pipelines for their use case, AI tasks such as forecasting are made readily available for inferencing on Data Assets. Time series forecasting has a wide range of applications across various fields, including financial market prediction, weather forecasting, and traffic flow prediction.

In this tutorial, we will use Python to demonstrate the basic AI workflow for time series forecasting, specifically focusing on temperature forecasting for agriculture use cases. Accurate temperature forecasting is crucial for agriculture as it helps farmers plan their activities, manage crops, and optimize yields.

The Jupyter notebook that contains the content of this tutorial can be downloaded from Github.

2. Environment Setup

We need to install some toolboxes and libraries for this experiment. Therefore, please copy and use the below command in your python terminal:

pip install numpy pandas matplotlib scikit-learn torch

3. Data Preprocessing

In this section, we generate the simulation data and apply the preprocessing.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

# Generate sample data
date_rng = pd.date_range(start='2023-01-01', end='2023-06-30', freq='D')
df = pd.DataFrame(date_rng, columns=['date'])
df['temperature'] = np.random.randint(20, 35, size=(len(date_rng)))

# Set date as index
df.set_index('date', inplace=True)

# Visualize data
df['temperature'].plot(figsize=(12, 6), title='Temperature Time Series')
plt.show()

# Normalize data
scaler = MinMaxScaler(feature_range=(0, 1))
df['temperature_scaled'] = scaler.fit_transform(df['temperature'].values.reshape(-1, 1))

# Split into training and testing sets
train_size = int(len(df) * 0.8)
train, test = df[:train_size], df[train_size:]

# Create dataset for Transformer
def create_dataset(data, time_step=1):
    X, Y = [], []
    for i in range(len(data) - time_step - 1):
        X.append(data[i:(i + time_step), 0])
        Y.append(data[i + time_step, 0])
    return np.array(X), np.array(Y)

time_step = 10
X_train, y_train = create_dataset(train['temperature_scaled'].values, time_step)
X_test, y_test = create_dataset(test['temperature_scaled'].values, time_step)

# Convert to PyTorch tensors
import torch
X_train = torch.tensor(X_train.reshape(X_train.shape[0], time_step, 1), dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_test = torch.tensor(X_test.reshape(X_test.shape[0], time_step, 1), dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)

4. Build the Simple Transformer Model

We use the toolbox and librires support provided by Pytorch to create a simple and basic Transformer model (Encoder-Decoder).

import torch.nn as nn
import torch.optim as optim

class TransformerModel(nn.Module):
    def __init__(self, num_heads, d_model, num_encoder_layers, num_decoder_layers, dff):
        super(TransformerModel, self).__init__()
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=num_heads, dim_feedforward=dff)
        self.transformer_encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=num_encoder_layers)
        self.decoder_layer = nn.TransformerDecoderLayer(d_model=d_model, nhead=num_heads, dim_feedforward=dff)
        self.transformer_decoder = nn.TransformerDecoder(self.decoder_layer, num_layers=num_decoder_layers)
        self.flatten = nn.Flatten()
        self.dense1 = nn.Linear(d_model * time_step, dff)
        self.dense2 = nn.Linear(dff, 1)

    def forward(self, src):
        encoder_output = self.transformer_encoder(src)
        decoder_output = self.transformer_decoder(encoder_output, encoder_output)
        flatten_output = self.flatten(decoder_output)
        dense_output = self.dense1(flatten_output)
        output = self.dense2(dense_output)
        return output

# Hyperparameters
num_heads = 2
d_model = 64
num_encoder_layers = 2
num_decoder_layers = 2
dff = 128

# Create model
model = TransformerModel(num_heads, d_model, num_encoder_layers, num_decoder_layers, dff)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train model
num_epochs = 50
batch_size = 64
train_loader = torch.utils.data.DataLoader(dataset=list(zip(X_train, y_train)), batch_size=batch_size, shuffle=True)

for epoch in range(num_epochs):
    model.train()
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = model(batch_X)
        loss = criterion(outputs.squeeze(), batch_y)
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

5. Model Evaluation

We evaluate our trained model on the created data.

import math
from sklearn.metrics import mean_squared_error

model.eval()
with torch.no_grad():
    train_predict = model(X_train).squeeze().numpy()
    test_predict = model(X_test).squeeze().numpy()

# Inverse transform the predictions
train_predict = scaler.inverse_transform(train_predict.reshape(-1, 1))
test_predict = scaler.inverse_transform(test_predict.reshape(-1, 1))
y_train = scaler.inverse_transform(y_train.reshape(-1, 1))
y_test = scaler.inverse_transform(y_test.reshape(-1, 1))

# Calculate RMSE
train_score = math.sqrt(mean_squared_error(y_train, train_predict))
test_score = math.sqrt(mean_squared_error(y_test, test_predict))
print(f'Train Score: {train_score} RMSE')
print(f'Test Score: {test_score} RMSE')

# Visualize predictions
plt.figure(figsize=(12, 6))
plt.plot(df['temperature'], label='Actual Data')
plt.plot(df.index[time_step:train_size], train_predict, label='Train Predict')
plt.plot(df.index[train_size+time_step+1:], test_predict, label='Test Predict')
plt.legend()
plt.show()

6. Conclusion

This tutorial demonstrates how to use a basic Transformer model for time series forecasting, specifically for temperature prediction in agriculture. Accurate temperature forecasting is essential for agricultural planning and decision-making, helping farmers optimize crop management and improve yields. Through this example, readers can gain a fundamental understanding of applying Transformers to time series forecasting and further research and optimize the model for better prediction performance.




Subscribe to SEDIMARK!

* required

Discover how #SEDIMARK can revolutionize health data management with secure, decentralized, and innovative solutions. 🌐 Learn more about our vision: 🚀

#HealthData #Innovation #DataSecurity

🌟 Introducing our Improved Data Orchestrator! With this tool, even less experienced users can easily handle ETL tasks and create new pipelines using generative AI. 🚀 Simplifying data processing for everyone!
🌐 https://innovation-radar.ec.europa.eu/innovation/56728
#GenerativeAI #DataOrchestration #ETL #AI

Insights from the #EBDVF2024 session in Budapest on "Leveraging Technologies for Data Management to Implement #DataSpaces"! 6 key projects like @sedimark shared groundbreaking developments toward a more interoperable, standardized data-sharing future 🌍https://sedimark.eu/sedimark-at-the-european-big-data-value-forum-ebdvf-2024/

2

Load More
crossmenu