Shani Gamrian1, Hila Barel1, Feiran Li1, Masakazu Yoshimura2, Daisuke Iso1
1Sony AI, 2Sony Group Corporation
An official implementation of RAM.
Object detection models are typically applied to standard RGB images processed through Image Signal Processing (ISP) pipelines, which are designed to enhance sensor-captured RAW images for human vision. However, these ISP functions can lead to a loss of critical information that may be essential in optimizing for computer vision tasks, such as object detection. In this work, we introduce Raw Adaptation Module (RAM), a module designed to replace the traditional ISP, with parameters optimized specifically for RAW object detection. Inspired by the parallel processing mechanisms of the human visual system, RAM departs from existing learned ISP methods by applying multiple ISP functions in parallel rather than sequentially, allowing for a more comprehensive capture of image features. These processed representations are then fused in a specialized module, which dynamically integrates and optimizes the information for the target task. This novel approach not only leverages the full potential of RAW sensor data but also enables task-specific pre-processing, resulting in superior object detection performance. Our approach outperforms RGB-based methods and achieves state-of-the-art results across diverse RAW image datasets under varying lighting conditions and dynamic ranges.
In our paper we trained and evaluated our methods on the following datasets:
-
ROD: RAW HDR driving scenes datasets (ROD-Day, ROD-Night).
-
NOD: Low-light RAW datasets (NOD-Sony, NOD-Nikon).
-
LOD: Long exposure and short exposure datasets (LOD-Dark, LOD-Normal).
-
PASCALRAW: Daylight RAW dataset.
All RAM experiments were performed on RAW data, which was preprocessed by reshaping it into the RGGB representation and saving it
as .npy
files before training.
The code used for converting the RAW images into numpy files is provided here.
For MMDetection training, please arrange the data as follows:
ROD_dataset/
/images/
/image1.npy
/image2.npy
/...
/annotations/
/train.json
/val.json
/...
And modify the data_root
path in the datasets config file to the desired ROD_dataset (or any other dataset) directory.
We also provide the train/val splits annotations files in MMDetection format here.
We evaluate our pipeline on various RAW datasets.
PASCALRAW (12-bit)
Model | Resolution | Epoch | mAP | mAP50 | mAP75 | Download |
---|---|---|---|---|---|---|
Faster R-CNN R50 | 600X400 | 50 | 68.7 | 93.2 | 82.3 | model |
YOLOX-T | 600X400 | 170 | 73.8 | 95.1 | 87.1 | model |
NOD-NIKON (14-bit)
Model | Resolution | Epoch | mAP | mAP50 | mAP75 | Download |
---|---|---|---|---|---|---|
Faster R-CNN R50 | 600X400 | 30 | 34.2 | 59.4 | 34.2 | model |
YOLOX-T | 600X400 | 65 | 37.1 | 62.1 | 38.2 | model |
NOD-SONY (14-bit)
Model | Resolution | Epoch | mAP | mAP50 | mAP75 | Download |
---|---|---|---|---|---|---|
Faster R-CNN R50 | 600X400 | 35 | 36.7 | 62.4 | 37.4 | model |
YOLOX-T | 600X400 | 210 | 39.7 | 63.1 | 41.9 | model |
LOD-DARK (14-bit)
Model | Resolution | Epoch | mAP | mAP50 | mAP75 | Download |
---|---|---|---|---|---|---|
Faster R-CNN R50 | 600X400 | 95 | 38.1 | 61.7 | 40.6 | model |
YOLOX-T | 600X400 | 130 | 43.9 | 64.8 | 46.8 | model |
ROD - Day & Night (24-bit)
Model | Resolution | Epoch | DAY/mAP | DAY/mAP50 | NIGHT/mAP | NIGHT/mAP50 | Download |
---|---|---|---|---|---|---|---|
Faster R-CNN R50 | 620x400 | 80 | 33.1 | 57.1 | 46.8 | 70.6 | model |
YOLOX-T | 620x400 | 290 | 30.8 | 46.3 | 52.9 | 78.3 | model |
The code files in this repository can be easily integrated into any codebase by just adding the code files and
incorporating the RawAdaptationModule
class before the backbone:
ram = RawAdaptationModule(functions=['wb', 'ccm', 'gamma', 'brightness'],
ffm_params=dict(ffm_type='BN_HG'),
clamp_values=True)
The following instructions are specifically for running our code with MMDetection as reported in the paper.
Unfortunately, due to licensing restrictions, we’re unable to share the full code that includes MMDetection files. However, the instructions below will guide you on how to easily integrate our code into the MMDetection framework.
For MMDetection installation, please follow the commands here:
- Clone the MMDetection repository and follow their installation instructions.
- Install the torch version (2.0.1) relevant for your GPU/CUDA:
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118
pip install -e git+https://github.com/open-mmlab/mmcv.git@987d34b0cf8d6cd8725258332fcfc8c54529b1ab#egg=mmcv
For integration:
- Add ram.py, loading.py and the datasets files to your current code.
- Make sure the pre-processing module supports RAW inputs.
- Define RAM on your desired backbone by adding the following code:
def __init__(self,
...,
preproc_transform=None
):
(... init code)
if preproc_transform:
preproc_type = preproc_transform.pop("type")
if preproc_type == "ram":
self.preproc_transform = RawAdaptationModule(**preproc_transform)
else:
raise ValueError(f"Unknown learnable transform type: {preproc_type}")
else:
self.preproc_transform = None
def forward(self, x):
if self.preproc_transform:
x = self.preproc_transform(x)
(... forward code)
- Test the model using the provided pre-trained weights to verify that it's working correctly.
⭐ If you run into any issues while using the code, feel free to open an issue, we’ll do our best to help.
- Make sure the
data_path
in your config file is correct. - Run, e.g.
python tools/train.py configs/faster-rcnn/faster-rcnn-r50-fpn-nod_nikon-ram.py
- Make sure the
data_path
in your config file is correct. - Download the trained model to a
/home/checkpoint
directory. - Run, e.g.
python tools/test.py configs/faster-rcnn/faster-rcnn-r50-fpn-nod_nikon-ram.py /home/checkpoint/faster_rcnn-nod_nikon-h400-epoch_30.pth
Please open an issue or contact Shani Gamrian for any questions.