Deep LearningSJSU · DATA 255 · 2025

FireSense: Multispectral Fire Detection with Channel Attention and Probabilistic Calibration

Basanth Periyapatna Roopa Kumar, Nischitha Nagendran, Nandhakumar Apparsamy, Nitya Rondla

Department of Applied Data Science · San Jose State University · DATA 255: Deep Learning

Abstract

We present a deep learning framework for automated wildfire detection in Landsat-8 satellite imagery that addresses three fundamental challenges: extreme class imbalance, annotation uncertainty, and prediction reliability. Our approach combines a ResNet34-UNet architecture with Convolutional Block Attention Modules (CBAM) to leverage all ten spectral bands of Landsat-8, enabling the model to learn optimal band importance for fire detection.

We introduce soft labels derived from multi-annotator consensus to capture uncertainty in fire boundaries, and apply temperature scaling for calibrated probability estimates. Evaluated on the ActiveFire dataset spanning North and South America (14,815 patches), our method achieves a mean Intersection over Union (IoU) of 69.6%, representing a 24.9% improvement over baseline approaches and 45.6% improvement over classical fire detection algorithms.

The attention mechanism correctly prioritizes Short-Wave Infrared (SWIR) and Thermal Infrared (TIR) bands, validating the physical intuition that thermal signatures are critical for fire detection.

Mean IoU

69.6%

Dice Score

0.828

Precision

0.834

Recall

0.867

vs. Classical Baselines

+45.6%

ECE Reduction

86.1%

Model Architecture

FireSense Model Architecture — ResNet34-UNet + CBAM

Landsat-8 Satellite Imagery

10 spectral bands · 256×256 px · 30m resolution

ResNet34 Encoder (ImageNet pretrained)

Stage 1

64 ch

Stage 2

128 ch

Stage 3

256 ch

Stage 4

512 ch

First conv layer modified to accept 10-channel input · Skip connections to decoder

CBAM — Convolutional Block Attention Module

Channel Attention

AvgPool + MaxPool

MLP Shared Weights

Sigmoid → Mc(F)

SWIR1/SWIR2/TIR1 highest weights

Spatial Attention

AvgPool + MaxPool

7×7 Convolution

Sigmoid → Ms(F)

Spatial fire region weights

U-Net Decoder (Transposed Convolutions + Skip Connections)

Block 4

3×3 Conv · BN · ReLU · CBAM

Block 3

3×3 Conv · BN · ReLU · CBAM

Block 2

3×3 Conv · BN · ReLU · CBAM

Block 1

3×3 Conv · BN · ReLU · CBAM

Soft Label Generation

Multi-annotator consensus · ŷ_soft = ⅓∑(Schroeder+Murphy+Kumar-Roy)

GCE Loss + Dice Loss

λ=0.5 · q=0.7 noise tolerance

Temperature Scaling

p_cal = σ(z/T) · T*=1.287

Calibrated Predictions

ECE: 0.142→0.020 (−86.1%)

Dataset — ActiveFire

Experiments utilize the ActiveFire dataset, comprising Landsat-8 imagery patches from wildfire events across North and South America during the 2020 fire season. Each patch is a 256×256 pixel crop at 30-meter spatial resolution, providing approximately 7.68 km × 7.68 km ground coverage. Geographic coverage spans two continents with distinct fire regimes: North America (45%) — California wildfires, Pacific Northwest forest fires, Canadian boreal fires — and South America (55%) — Amazon deforestation fires and Brazilian cerrado burns.

The dataset includes fire masks from three classical detection algorithms: Schroeder et al. (threshold-based MODIS), Murphy et al. (contextual local statistics), and Kumar-Roy et al. (multi-temporal change detection). These often disagree on fire boundaries, motivating the soft label approach.

Landsat-8 Spectral Bands Used in This Study

Band	Name	Wavelength (μm)	Fire Relevance	CBAM Attention Weight
6	SWIR1	1.61	Very High	0.18
7	SWIR2	2.19	Very High	0.15
9	TIR1	10.9	Very High	0.12
10	TIR2	12.0	High	~0.10
5	NIR	0.87	High	0.15
4	Red	0.66	Medium	Low
1	Coastal	0.44	Low	Very Low
2	Blue	0.49	Low	Very Low
3	Green	0.56	Low	Very Low
8	Cirrus	1.38	Low	Very Low

Methodology

Soft Label Generation

Rather than using binary masks from a single annotator, soft labels encode multi-annotator consensus: ŷ_soft(i,j) = ⅓ ∑ y_a(i,j) for annotators a ∈ {Schroeder, Murphy, Kumar-Roy}. This yields soft labels in {0, 0.33, 0.66, 1.0}, capturing uncertainty at fire boundaries where annotators disagree. The approach reduces overfitting to noisy annotations and produces smoother decision boundaries that generalize better to unseen data.

Effect of Soft Labels on Model Performance

Training	IoU	Dice	Precision	Recall
Hard Labels	0.592	0.744	0.781	0.823
Soft Labels	0.680	0.810	0.834	0.867
Improvement	+14.8%	+8.9%	+6.8%	+5.3%

Loss Function

A combination of Generalized Cross-Entropy (GCE) loss for robustness to label noise and Dice loss for handling class imbalance: ℒ = ℒ_GCE + λℒ_Dice, where λ=0.5. GCE is defined as: ℒ_GCE(p,y) = (1 − p_y^q) / q with q=0.7 controlling noise tolerance. All models were trained using the AdamW optimizer (lr 5×10⁻⁴, weight decay 10⁻⁴) with cosine annealing over 100 epochs.

Temperature Scaling

To calibrate prediction probabilities, temperature scaling p_calibrated = σ(z/T) was applied as a post-hoc step where T is optimized on the validation set. The optimal T* = 1.287 > 1 indicates the uncalibrated model is overconfident — a common finding in deep neural networks. Temperature scaling reduces Expected Calibration Error (ECE) from 0.142 to 0.020 (−86.1%) and Brier Score from 0.089 to 0.061 (−31.2%).

Results

Table V — Ablation Study: Contribution of Each Component

Configuration	IoU	Dice	Cumulative Gain
Baseline U-Net (RGB)	0.584	0.737	—
+ ResNet34 Encoder	0.621	0.766	+6.3%
+ 10-Band Input	0.658	0.794	+12.7%
+ Soft Labels	0.680	0.810	+16.4%
+ CBAM Attention	0.706	0.828	+20.9%
+ Temperature Scaling	0.696	0.828	+19.2%

Table VII — Comparison with Classical Fire Detection Algorithms

Method	IoU	Precision	Recall	F1
Schroeder	0.412	0.523	0.687	0.594
Murphy	0.445	0.556	0.712	0.624
Kumar-Roy	0.478	0.589	0.734	0.654
FireSense (Ours)	0.696	0.834	0.867	0.850
Improvement	+45.6%	+41.6%	+18.1%	+30.0%

Confounder Robustness

The attention mechanism effectively suppresses common confounders by focusing on thermal bands that distinguish true fires from false positives. Evaluated on challenging subsets: Cloud Cover (FP reduction −42%), Industrial Heat (−55%), Sun Glint (−48%), Bare Soil (−40%). Average false-positive reduction: −46%.

Regional Performance on Real Satellite Imagery

The model generalizes well across both North American (crown fires) and South American (surface fires) fire regimes. Average improvement over baseline: +24.9% IoU. Sample 1 (North America): +27.0%, Sample 3 (South America): +21.5%. The model correctly delineates fire boundaries with high IoU (74.8%) despite complex terrain and smoke obscuration.

Conclusion

FireSense demonstrates that combining multi-spectral input, attention mechanisms, soft labels, and calibration yields substantial improvements over both classical algorithms and baseline deep learning approaches. The learned CBAM attention weights align with physical intuition about fire detection — SWIR bands (1.6–2.2 μm) capture reflected radiation from active flames, while TIR bands (10.9–12.0 μm) detect thermal emission. The model discovers this importance hierarchy without explicit supervision, validating the attention mechanism's utility.

Future work will explore multi-temporal analysis to leverage fire spread dynamics, active learning to reduce annotation requirements, and transfer to other satellite sensors (Sentinel-2, MODIS) for global coverage.

Tech Stack

PyTorchResNet34UNetCBAMLandsat-8ActiveFire DatasetAdamWGCE LossDice LossTemperature ScalingGrad-CAMApple M4 (MPS)

← Back to all research