Transformers and Attention Mechanisms for Additive Manufacturing
Self-attention architectures for sequence modeling, time series, and process control in AM
Research Papers
520+
Primary Focus
Time series, sequences
Key Applications
Monitoring, prediction
Emerging Since
2020
Advantage
Long-range dependencies
Growth Rate
+120% since 2022
Transformers, the architecture behind GPT and modern language models, are revolutionizing sequential data processing in additive manufacturing. Their self-attention mechanism captures long-range dependencies in time series data, process parameter sequences, and toolpath coordinates that traditional RNNs and LSTMs struggle with.
In AM applications, transformers excel at modeling the temporal evolution of process signatures (acoustic, thermal, optical), predicting build outcomes from parameter sequences, and generating optimal scan strategies. The attention weights themselves provide interpretability, revealing which process events most influence quality outcomes.
15-20%
Accuracy Gain vs LSTM
Attention Fundamentals
Self-attention computes relationships between all positions in a sequence simultaneously, unlike RNNs which process sequentially:
Attention(Q, K, V) = softmax(QKT/√d_k)V
Why Attention for AM?
- Long-range dependencies: Layer N quality depends on layers 1...N-1
- Parallel processing: Faster training than sequential RNNs
- Interpretability: Attention weights show influential time steps
- Transfer learning: Pre-trained models adapt to AM domains
Key Components
- Multi-head attention: Parallel attention with different learned projections
- Positional encoding: Inject sequence order information
- Feed-forward layers: Non-linear transformations between attention
- Layer normalization: Stabilize training dynamics
Time Series Analysis
AM processes generate rich time series data from multiple sensor modalities:
| Signal Type |
Sampling Rate |
Transformer Application |
Prediction Target |
| Acoustic emission |
100 kHz - 1 MHz |
Anomaly detection |
Porosity, cracking |
| Melt pool thermal |
1-10 kHz |
Temperature forecasting |
Overheating, lack of fusion |
| Photodiode intensity |
10-100 kHz |
Process stability |
Keyholing, balling |
| Layer-wise images |
Per layer |
Sequence classification |
Build success/failure |
Process Monitoring
Transformers enable real-time monitoring by processing streaming sensor data:
Anomaly Detection
- Reconstruction-based: Autoencoder transformers learn normal patterns
- Prediction-based: Forecast next values, flag large residuals
- Attention-based: Unusual attention patterns indicate anomalies
| Monitoring Task |
Architecture |
Performance |
Latency |
| Defect detection |
Vision Transformer (ViT) |
F1 > 0.95 |
10-50 ms |
| Process state classification |
BERT-style encoder |
Acc > 98% |
5-20 ms |
| Quality prediction |
Encoder-decoder |
R² > 0.92 |
Per-layer |
| Remaining useful life |
Temporal transformer |
MAPE < 10% |
Real-time |
Multi-Sensor Fusion
Cross-attention mechanisms fuse information from heterogeneous sensors (thermal cameras, acoustic, optical) with different sampling rates and data formats, learning optimal combinations for prediction.
Treating toolpaths as sequences enables transformer-based optimization and generation:
Scan Strategy Optimization
- Sequence-to-sequence: Input geometry → output optimal scan path
- Autoregressive generation: Generate vectors one at a time
- Constraint satisfaction: Encode boundary and overlap rules
| Toolpath Task |
Input |
Output |
Method |
| Path generation |
STL geometry |
Scan vectors |
Encoder-decoder |
| Parameter optimization |
Path + material |
P, v, h per segment |
Regression head |
| Thermal balancing |
Path sequence |
Reordered sequence |
Pointer networks |
| Error correction |
Faulty G-code |
Corrected G-code |
Seq2seq + beam search |
Transformer Architectures
| Architecture |
Description |
AM Application |
Advantage |
| Vision Transformer (ViT) |
Patch-based image attention |
Layer/melt pool images |
Global context |
| Swin Transformer |
Shifted window attention |
High-res inspection |
Efficient for large images |
| Informer |
Sparse attention for long sequences |
Full-build time series |
O(L log L) complexity |
| Autoformer |
Auto-correlation attention |
Periodic process signals |
Captures seasonality |
| Crossformer |
Cross-dimension attention |
Multi-sensor fusion |
Variable interactions |
| PatchTST |
Patching for time series |
Sensor forecasting |
State-of-the-art accuracy |
Efficiency Considerations
- Linear attention: Reduce O(n²) to O(n) for long sequences
- Sparse attention: Only attend to relevant positions
- Knowledge distillation: Compress large models for edge deployment
- Quantization: INT8 inference for real-time monitoring
AM Applications
Process-Structure-Property Linkage
Transformers model the causal chain from process parameters through microstructure evolution to final properties:
- Input: Parameter sequences (P, v, h over time/space)
- Hidden: Thermal history → cooling rates → solidification
- Output: Mechanical properties, defect probability
Foundation Models for AM
Pre-trained transformer models on large AM datasets enable:
- Few-shot learning for new materials/processes
- Zero-shot anomaly detection
- Cross-machine transfer learning
- Natural language interfaces for process control
Key References
Attention Is All You Need
Vaswani et al. | NeurIPS 2017 | 90,000+ citations
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy et al. | ICLR 2021 | 25,000+ citations
Transformer-based deep learning for predicting melt pool geometry in laser powder bed fusion
Chen et al. | Additive Manufacturing | 2023 | 65+ citations
Vision Transformer for in-situ monitoring of laser powder bed fusion
Zhang et al. | Journal of Manufacturing Processes | 2023 | 45+ citations
Temporal Fusion Transformers for interpretable multi-horizon time series forecasting
Lim et al. | International Journal of Forecasting | 2021 | 2,500+ citations