CycleISP Paper Reading

CycleISP: Real Image Restoration via Improved Data Synthesis


  1. CNN在AWGN 的数据集上表现很好,但是在real noise 中表现差,其原因是没有考虑到ISP将noise转变了,不在符合假设的高斯噪声。
  2. Noise at the RAW sensor space is signal-dependent, after demosaicking, it becomes spatio-chromatically correlated; and after passing through the rest of the pipeline, its probability distribution not necessarily remains Gaussian. This implies that the camera ISP heavily transforms the sensor noise, and therefore more sophisticated models that take into account the influence of imaging pipeline are needed to synthesize realistic noise than uniform AWGN model.

The CNNs achieve impressive results on these synthetic datasets, they do not perform well when applied on real camera images, as reported in recent benchmark datasets.

This is mainly because the AWGN is not adequate for modeling the real camera noise which is signal-dependent and heavily transformed by the camera imaging pipeline

On synthetic datasets, existing deep learning-based denoising models yield impressive results, but they exhibit poor generalization to real camera data as compared to conventional methods [8, 15]. This trend is also demonstrated in recent benchmarks [1, 44].



作者提出使用两个对称NN来模拟ISP的处理过程,一个是RGB2RAW, 一个是RAW2RGB。RGB2RAW的网络目的是为了去构建原图的raw图像,在得到了raw图像后,为了生成图像去训练,还需要将raw转回RGB,并且提供可以加噪声的RGB。
这样让NN train 在这个生成的数据集(spatio-aware ISP like noise)而不是AWGN(spatio-none aware independent noise)上面。就可以在real noise中达到较好的效果。



  • 独立训练RGB2RAW和RAW2RGB网络:
    在数据集MIT-Adobe FiveK dataset,其中含有5000 RAW images,使用LibRaw library 产生 sRGB images.构成raw-sRGB pair对。其中4850 for train, and 150 for val.

    CycleISP are independently trained for 1200 epochs with a batch size of 4. The initial learning rate is 10−4 , which is decreased to 10−5 after 800 epochs.
    image|539x116,75% image|431x75,75%

  • 微调CycleISP,双网络同时fine tune:

    1. 联合fine tuning: 基于第一步的数据集,loss 函数image|620x55
    2. 在SIDD上进行微调。输入为SIDD的 sRGBclean, 添加nosie为SIDD提供的GT的 $RAW{noisy} - RAW{clean}$ 加在RGB2RAW的输出上面。经过RAW2RGB得到 $\hat{RGB}{noisy}$ 与GT的noise图像构成loss。简单说就是输入SIDD的干净图,RAW图应该和SIDD GT的RAW图一样,通过加了GT真实噪声 $RAW{noisy} - RAW{clean}$ 后的RAW变回有噪声sRGB应该和SIDD 的噪声图一致。采用loss也是上面的 $\mathcal{L}{joint}$ 。
  • 训练去噪器:
    数据集使用1 million MIR flickr 数据集,90:5:5 for training, validation and testing. 高斯预处理每一张图像。使用CycleISP生成数据,注意:添加的noise (不论RAW还是sRGB) 都不是fine-tune里面的SIDD的真实噪声而是生成噪声:We use the same procedure for sampling shot/read noise factors as in Unprocessing images for learned raw denoising