Domain Matching

A collection of utilities for matching pixelwise features between datasets.

The library, lexset_dataset_bridge, offers functionalities for image enhancement and manipulation, catering specifically to the domain of dataset augmentation and improvement. It provides three core functions:

  1. adaptive_histogram_matching utilizes the Adaptive Histogram Equalization technique to adjust the brightness and color of synthetic images, aligning them with real images.

  2. color_transfer_mean harmonizes the colors of synthetic images to match real images by transferring the mean and standard deviation of color values.

  3. transfer_shot_noise introduces a realistic shot noise, derived from real images, to synthetic images.

All three functions aim to narrow the gap between synthetic and real image datasets, optimizing the former for training machine learning models with improved real-world accuracy.

Installation

The latest version is 1.5.0

Note: The use of lexset_dataset_bridge is subject to legal terms and conditions which may be found in the license file contained within the package.

pip install lexset-dataset-bridge

Adaptive Histogram Matching

  • Purpose: This function is designed to refine the brightness and color distribution of synthetic images, making them closely resemble real images.

  • Mechanism: It begins by converting the images to the HSV color space, which separates image intensity (Value) from color information. The average brightness and color values of both the synthetic and real images are computed. These averages are then used to derive scaling factors that adjust the brightness and color of the synthetic images to align with those of the real images. Additionally, the function employs the Adaptive Histogram Equalization technique, which enhances the contrast in images based on local regions, rather than the whole image. This ensures finer control over contrast adjustments and prevents over-amplification of noise.

  • Applications: Such a function can be invaluable in scenarios where synthetic images generated by computer graphics or simulations need to be harmonized with real photographs, ensuring a seamless blend of both in datasets.

import lexset_dataset_review as data_review

synthetic_data_dir = "E:/path"
real_data_dir = "E:/path"
output_data_dir = "E:/path"

data_review.adaptive_histogram_matching(synthetic_data_dir,real_data_dir,output_data_dir)

Color Transfer (mean)

  • Purpose: The primary goal of this function is to make the color distribution of synthetic images akin to real ones.

  • Mechanism: The function calculates the mean and standard deviation of colors in both the synthetic and real images. By adjusting the synthetic image colors using these statistical measures, it's able to transfer the overall color feel from the real image to the synthetic one. The process is efficient and maintains the inherent texture and content of the synthetic image, only altering its color palette.

  • Applications: This is particularly useful for scenarios like style transfer, where one might want to impose the color characteristics of one image onto another. It can also be used in dataset augmentation to ensure that synthetic datasets are colored in ways consistent with real-world data.

import lexset_dataset_review as data_review

synthetic_data_dir = "E:/path"
real_data_dir = "E:/path"
output_data_dir = "E:/path"

data_review.color_transfer_mean(synthetic_data_dir,real_data_dir,output_data_dir)

Transfer Shot Noise

  • Purpose: To introduce a touch of realistic noise to synthetic images by borrowing shot noise characteristics from real images.

  • Mechanism: This function operates by first calculating the shot noise present in real images. Shot noise is a type of electronic noise that occurs due to the discrete nature of electric charge. Once derived, this noise is then superimposed onto the synthetic images. By doing so, the synthetic images inherit some of the natural imperfections present in real photographs, making them appear more lifelike.

  • Applications: In the domain of machine learning, introducing such noise can be beneficial. Models trained on noisy data often generalize better to real-world scenarios as they become accustomed to handling natural imperfections in the data. This function aids in achieving such a training environment, especially when the initial dataset is predominantly synthetic and lacks real-world imperfections.

import lexset_dataset_review as data_review

synthetic_data_dir = "E:/path"
real_data_dir = "E:/path"
output_data_dir = "E:/path"

data_review.transfer_shot_noise(synthetic_data_dir,real_data_dir,output_data_dir)

Last updated