Diffusion-Based Low-Light Image Enhancement
with Color and Luminance Priors

Xuanshuo Fu, Lei Kang, Javier Vazquez-Corral
1Computer Vision Center (CVC), Universitat Autònoma de Barcelona, Spain
† corresponding author

Abstract

Low-light images often suffer from low contrast, noise, and color distortion, degrading visual quality and impairing downstream vision tasks. We propose a conditional diffusion framework for low-light image enhancement that incorporates a Structured Control Embedding Module (SCEM). SCEM decomposes a low-light image into four informative components—illumination, illumination-invariant features, shadow priors, and color-invariant cues—which serve as control signals to condition a U-Net–based diffusion model trained with a simplified noise-prediction loss. The resulting SCEM-equipped diffusion method enforces structured enhancement guided by physical priors. Our model is trained only on LOLv1 and evaluated without fine-tuning on LOLv2-real, LSRW, DICM, MEF, and LIME, achieving state-of-the-art quantitative and perceptual metrics and strong cross-dataset generalization. Code will be released upon acceptance.

Fig. 1: Quantitative comparisons

Fig.1(a) radar chart on LOLv1/LOLv2-real/LSRW (with GT)

(a) PSNR/SSIM/LPIPS/FID on LOLv1, LOLv2-real, and LSRW (with ground truth).

Fig.1(b) radar chart on DICM/MEF/LIME (no GT)

(b) NIQE/BRISQUE/PI on DICM, MEF, and LIME (no ground truth).

Fig. 1 reports both fidelity metrics (PSNR/SSIM) and perceptual metrics (LPIPS/FID, NIQE/BRISQUE/PI) across datasets with and without ground truth. Our SCEM-equipped diffusion model remains consistently strong across all axes, highlighting robust generalization beyond the training distribution.

Method

We condition a U-Net diffusion backbone with structured priors extracted by the Structured Control Embedding Module (SCEM). Given a low-light input, SCEM produces four complementary control signals: (1) illumination, (2) illumination-invariant features, (3) shadow priors, and (4) color-invariant cues. These priors provide physically meaningful guidance during denoising, enabling robust enhancement under complex lighting and color shifts.

Fig. 2: Structured Control Embedding Module (SCEM)

Fig.2 SCEM module and diffusion pipeline

SCEM decomposes a low-light image into illumination, illumination-invariant features, shadow priors, and color-invariant cues to condition the diffusion model.

Fig. 2 illustrates the overall pipeline: the SCEM controls are concatenated and injected into the diffusion model to steer the enhancement process with a simplified noise-prediction objective.

Results

We train only on LOLv1 and evaluate without fine-tuning on LOLv2-real, LSRW, DICM, MEF, and LIME. Fig. 1 summarizes quantitative comparisons across datasets with and without ground truth, and Fig. 3 shows qualitative results. The proposed method achieves state-of-the-art performance on both fidelity (PSNR/SSIM) and perceptual metrics (LPIPS/FID, NIQE/BRISQUE/PI), demonstrating strong cross-dataset generalization.

Fig. 3: Qualitative comparison

Fig.3 qualitative comparisons

BibTeX

@inproceedings{fu2026diffusionllie,
  title     = {Diffusion-Based Low-Light Image Enhancement with Color and Luminance Priors},
  author    = {Fu, Xuanshuo and Kang, Lei and Vazquez-Corral, Javier},
  booktitle = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
  year      = {2026}
}