Skip to content

ari-dasci/S-WinStat

Repository files navigation

S-WinStat

License Python PyTorch

Proposal

Transformer-based models, highly successful in natural language processing and computer vision, are increasingly being applied to time series analysis tasks such as forecasting and anomaly detection [1]. However, the unique structure of temporal data (cycles, seasonality) differs significantly from that of language, raising questions about whether standard positional encoding mechanisms are truly suitable for this domain.
Previous research already suggests that their effectiveness in time series applications may be limited [2].

This project aims to analyze published experiments to validate (or refute) these findings across different application domains — such as finance, energy, and industrial monitoring — using multiple accuracy and anomaly detection metrics.

We also explore an alternative form of positional encoding specifically designed for time series applications. Various approaches will be studied, including conventional positional encodings and specialized architectures that integrate these principles. Their impact on prediction quality and early anomaly detection will be assessed, comparing them against standard methods while analyzing advantages and limitations in terms of interpretability, generalization capacity, and computational cost.


Objectives

  • Conduct an exhaustive review of the state of the art on positional encoding and its application in Transformer models for time series.
  • Systematically evaluate the effectiveness of standard positional encodings in forecasting and anomaly detection tasks using benchmark time series datasets.
  • Study novel proposals for positional encodings or complete architectures adapted to time series, exploring alternative approaches beyond the current state of the art.

Running the Experiments

To compare different methods, a .py file has been created that accepts multiple input parameters to configure the model as desired, allowing specification of the type of positional encoding (PE) and its associated hyperparameters.
This is the run_exp.py file, which modifies the behavior of a base Informer model [5].


General Configuration

--model               Type of model to use (informer)
--ex_name             Experiment name
--folder              Directory where the model (e.g., InformerVanilla or InformerRope) is located
--data                Dataset name
--root_path           Root path where the dataset is located
--data_path           Data file name
--features            Type of prediction: M (multi→multi), S (uni→uni), MS (multi→uni)
--target              Target variable for S or MS tasks
--freq                Temporal frequency for encoding (hours: h; minutes: t; seconds: s)
--checkpoints         Path to save model checkpoints

Input and Output Lengths

--seq_len             Input sequence length for the encoder
--label_len           Length of the decoder’s start token
--pred_len            Length of the sequence to predict

Model Configuration

--enc_in              Number of input variables to the encoder
--dec_in              Number of input variables to the decoder
--c_out               Number of model outputs
--d_model             Model dimension
--n_heads             Number of attention heads
--e_layers            Number of encoder layers
--d_layers            Number of decoder layers
--s_layers            Stacked encoder layers (stack mode only)
--d_ff                Inner feed-forward dimension
--factor              Reduction factor for probabilistic attention
--padding             Padding type (0: none, 1: same)
--distil              Disable distilling if included
--dropout             Dropout rate
--attn                Type of attention in encoder (use 'full' to avoid information loss)

Temporal Encoding

--time_encoding       Type of positional/temporal encoding (see below)
--embed               Type of temporal embedding (timeF, fixed, learned)
--activation          Activation function (e.g., gelu, relu)
--window              Window size for statistics
--output_attention    Displays encoder-generated attention
--cols                Specific dataset columns to use

Training and Execution

--num_workers           Number of DataLoader workers
--itr                   Number of experiment repetitions
--train_epochs          Number of training epochs
--batch_size            Batch size for training (default: 32)
--patience              Patience for early stopping (default: 3)
--learning_rate         Learning rate
--des                   Experiment description
--loss                  Loss function (mse, mae, etc.)
--lradj                 Learning rate adjustment strategy
--use_amp               Use mixed-precision training (AMP)
--inverse               Invert output transformation
--shuffle_decoder_input Shuffle decoder inputs during testing

GPU Configuration

--use_gpu           Use GPU if available
--gpu               GPU index to use
--use_multi_gpu     Enable multi-GPU support
--devices           IDs of GPUs to use

PE Types: --time_encoding

Value Description
no_pe No positional encoding; only raw input data are used.
informer Original temporal encoding from Informer.
stats WinStat base: Encoding based on sliding-window statistics, computing mean, std, and extrema.
stats_lags WinStatLag: Same as stats, but includes lag features as local context.
all_pe_weighted WinStatFlex: Weighted combination of the above, plus fixed and learnable PEs (LPE), normalized via Softmax.
tpe WinStatTPE: Temporal Positional Encoding (t-PE), integrating lag, window, and fixed PE information with learned, Softmax-normalized weights.
tupe Transformer with Untied Positional Encoding (TUPE), a method that decouples word and positional correlations in the self-attention module, enhancing model expressiveness.
rope Rotary Positional Encoding (RoPE), a method that encodes positional information by applying rotation matrices to the query and key vectors in self-attention, preserving relative distances and improving sequence modeling. Folder parameter must be set to 'InformerRope'

Some PE variants with performance below the baseline (no_pe) were omitted, as well as those that disrupted the attention mechanism due to poor dataset performance (SPE) [4].

An example execution can be found in the slurm_task.sh file.


Environment

To run this project, an updated environment with PyTorch (Python 3.12) is required.
You can create it using the requirements.txt file:

conda create --name --file requirements.txt


References

[1] Wen, Q., Zhou, T., Zhang, C., Chen, W., Ma, Z., Yan, J., & Sun, L. (2022). Transformers in time series: A survey. arXiv preprint arXiv:2202.07125.

[2] Zeng, A., Chen, M., Zhang, L., & Xu, Q. (2023, June). Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37, No. 9, pp. 11121–11128).

[3] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 12, pp. 11106–11115).

[4] Irani, H., & Metsis, V. (2025). Positional Encoding in Transformer-Based Time Series Models: A Survey. arXiv preprint arXiv:2502.12370. https://arxiv.org/abs/2502.12370

[5] Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021, May). Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 12, pp. 11106–11115).

About

Learnable positional encoding for transformers in time series forecasting.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •