FireSense: Multispectral Fire Detection with Channel Attention and Probabilistic Calibration
Basanth Periyapatna Roopa Kumar, Nischitha Nagendran, Nandhakumar Apparsamy, Nitya Rondla
Department of Applied Data Science · San Jose State University · DATA 255: Deep Learning
Abstract
We present a deep learning framework for automated wildfire detection in Landsat-8 satellite imagery that addresses three fundamental challenges: extreme class imbalance, annotation uncertainty, and prediction reliability. Our approach combines a ResNet34-UNet architecture with Convolutional Block Attention Modules (CBAM) to leverage all ten spectral bands of Landsat-8, enabling the model to learn optimal band importance for fire detection.
We introduce soft labels derived from multi-annotator consensus to capture uncertainty in fire boundaries, and apply temperature scaling for calibrated probability estimates. Evaluated on the ActiveFire dataset spanning North and South America (14,815 patches), our method achieves a mean Intersection over Union (IoU) of 69.6%, representing a 24.9% improvement over baseline approaches and 45.6% improvement over classical fire detection algorithms.
The attention mechanism correctly prioritizes Short-Wave Infrared (SWIR) and Thermal Infrared (TIR) bands, validating the physical intuition that thermal signatures are critical for fire detection.
Mean IoU
69.6%
Dice Score
0.828
Precision
0.834
Recall
0.867
vs. Classical Baselines
+45.6%
ECE Reduction
86.1%
Model Architecture
FireSense Model Architecture — ResNet34-UNet + CBAM
Landsat-8 Satellite Imagery
10 spectral bands · 256×256 px · 30m resolution
ResNet34 Encoder (ImageNet pretrained)
Stage 1
64 ch
Stage 2
128 ch
Stage 3
256 ch
Stage 4
512 ch
First conv layer modified to accept 10-channel input · Skip connections to decoder
CBAM — Convolutional Block Attention Module
Channel Attention
AvgPool + MaxPool
MLP Shared Weights
Sigmoid → Mc(F)
SWIR1/SWIR2/TIR1 highest weights
Spatial Attention
AvgPool + MaxPool
7×7 Convolution
Sigmoid → Ms(F)
Spatial fire region weights
U-Net Decoder (Transposed Convolutions + Skip Connections)
Block 4
3×3 Conv · BN · ReLU · CBAM
Block 3
3×3 Conv · BN · ReLU · CBAM
Block 2
3×3 Conv · BN · ReLU · CBAM
Block 1
3×3 Conv · BN · ReLU · CBAM
Soft Label Generation
Multi-annotator consensus · ŷ_soft = ⅓∑(Schroeder+Murphy+Kumar-Roy)
GCE Loss + Dice Loss
λ=0.5 · q=0.7 noise tolerance
Temperature Scaling
p_cal = σ(z/T) · T*=1.287
Calibrated Predictions
ECE: 0.142→0.020 (−86.1%)
Dataset — ActiveFire
Experiments utilize the ActiveFire dataset, comprising Landsat-8 imagery patches from wildfire events across North and South America during the 2020 fire season. Each patch is a 256×256 pixel crop at 30-meter spatial resolution, providing approximately 7.68 km × 7.68 km ground coverage. Geographic coverage spans two continents with distinct fire regimes: North America (45%) — California wildfires, Pacific Northwest forest fires, Canadian boreal fires — and South America (55%) — Amazon deforestation fires and Brazilian cerrado burns.
The dataset includes fire masks from three classical detection algorithms: Schroeder et al. (threshold-based MODIS), Murphy et al. (contextual local statistics), and Kumar-Roy et al. (multi-temporal change detection). These often disagree on fire boundaries, motivating the soft label approach.
Landsat-8 Spectral Bands Used in This Study
| Band | Name | Wavelength (μm) | Fire Relevance | CBAM Attention Weight |
|---|---|---|---|---|
| 6 | SWIR1 | 1.61 | Very High | 0.18 |
| 7 | SWIR2 | 2.19 | Very High | 0.15 |
| 9 | TIR1 | 10.9 | Very High | 0.12 |
| 10 | TIR2 | 12.0 | High | ~0.10 |
| 5 | NIR | 0.87 | High | 0.15 |
| 4 | Red | 0.66 | Medium | Low |
| 1 | Coastal | 0.44 | Low | Very Low |
| 2 | Blue | 0.49 | Low | Very Low |
| 3 | Green | 0.56 | Low | Very Low |
| 8 | Cirrus | 1.38 | Low | Very Low |
Methodology
Soft Label Generation
Rather than using binary masks from a single annotator, soft labels encode multi-annotator consensus: ŷ_soft(i,j) = ⅓ ∑ y_a(i,j) for annotators a ∈ {Schroeder, Murphy, Kumar-Roy}. This yields soft labels in {0, 0.33, 0.66, 1.0}, capturing uncertainty at fire boundaries where annotators disagree. The approach reduces overfitting to noisy annotations and produces smoother decision boundaries that generalize better to unseen data.
Effect of Soft Labels on Model Performance
| Training | IoU | Dice | Precision | Recall |
|---|---|---|---|---|
| Hard Labels | 0.592 | 0.744 | 0.781 | 0.823 |
| Soft Labels | 0.680 | 0.810 | 0.834 | 0.867 |
| Improvement | +14.8% | +8.9% | +6.8% | +5.3% |
Loss Function
A combination of Generalized Cross-Entropy (GCE) loss for robustness to label noise and Dice loss for handling class imbalance: ℒ = ℒ_GCE + λℒ_Dice, where λ=0.5. GCE is defined as: ℒ_GCE(p,y) = (1 − p_y^q) / q with q=0.7 controlling noise tolerance. All models were trained using the AdamW optimizer (lr 5×10⁻⁴, weight decay 10⁻⁴) with cosine annealing over 100 epochs.
Temperature Scaling
To calibrate prediction probabilities, temperature scaling p_calibrated = σ(z/T) was applied as a post-hoc step where T is optimized on the validation set. The optimal T* = 1.287 > 1 indicates the uncalibrated model is overconfident — a common finding in deep neural networks. Temperature scaling reduces Expected Calibration Error (ECE) from 0.142 to 0.020 (−86.1%) and Brier Score from 0.089 to 0.061 (−31.2%).
Results
Table V — Ablation Study: Contribution of Each Component
| Configuration | IoU | Dice | Cumulative Gain |
|---|---|---|---|
| Baseline U-Net (RGB) | 0.584 | 0.737 | — |
| + ResNet34 Encoder | 0.621 | 0.766 | +6.3% |
| + 10-Band Input | 0.658 | 0.794 | +12.7% |
| + Soft Labels | 0.680 | 0.810 | +16.4% |
| + CBAM Attention | 0.706 | 0.828 | +20.9% |
| + Temperature Scaling | 0.696 | 0.828 | +19.2% |
Table VII — Comparison with Classical Fire Detection Algorithms
| Method | IoU | Precision | Recall | F1 |
|---|---|---|---|---|
| Schroeder | 0.412 | 0.523 | 0.687 | 0.594 |
| Murphy | 0.445 | 0.556 | 0.712 | 0.624 |
| Kumar-Roy | 0.478 | 0.589 | 0.734 | 0.654 |
| FireSense (Ours) | 0.696 | 0.834 | 0.867 | 0.850 |
| Improvement | +45.6% | +41.6% | +18.1% | +30.0% |
Confounder Robustness
The attention mechanism effectively suppresses common confounders by focusing on thermal bands that distinguish true fires from false positives. Evaluated on challenging subsets: Cloud Cover (FP reduction −42%), Industrial Heat (−55%), Sun Glint (−48%), Bare Soil (−40%). Average false-positive reduction: −46%.
Regional Performance on Real Satellite Imagery
The model generalizes well across both North American (crown fires) and South American (surface fires) fire regimes. Average improvement over baseline: +24.9% IoU. Sample 1 (North America): +27.0%, Sample 3 (South America): +21.5%. The model correctly delineates fire boundaries with high IoU (74.8%) despite complex terrain and smoke obscuration.
Conclusion
FireSense demonstrates that combining multi-spectral input, attention mechanisms, soft labels, and calibration yields substantial improvements over both classical algorithms and baseline deep learning approaches. The learned CBAM attention weights align with physical intuition about fire detection — SWIR bands (1.6–2.2 μm) capture reflected radiation from active flames, while TIR bands (10.9–12.0 μm) detect thermal emission. The model discovers this importance hierarchy without explicit supervision, validating the attention mechanism's utility.
Future work will explore multi-temporal analysis to leverage fire spread dynamics, active learning to reduce annotation requirements, and transfer to other satellite sensors (Sentinel-2, MODIS) for global coverage.
Tech Stack