Merge Datasets

Combine multiple synthetic datasets into one

Merges multiple COCO (Common Objects in Context) datasets stored in different directories into a single, new dataset. The function randomly samples a given percentage of images (and their associated annotations) from each directory to be included in the merged dataset. The merged dataset will include the images, annotations, and categories from each of the original datasets.

Arguments:

  • json_dirs (type: list of strings)

    • Description: A list of directory paths where each directory contains a COCO-formatted JSON annotation file named "annotations.json" and associated image files.

    • Example: ["path/to/dataset1", "path/to/dataset2"]

  • percentages (type: list of integers or floats)

    • Description: A list of percentages specifying how much data to keep from each directory in json_dirs. The list length should match the length of json_dirs. Each percentage is between 0 and 100.

    • Example: [50, 60] (This will keep 50% of the data from the first directory and 60% from the second.)

  • output_json_path (type: string)

    • Description: The path where the merged JSON annotation file will be saved. The file will be named "annotations.json".

    • Example: "path/to/merged/annotations.json"

  • output_img_dir (type: string)

    • Description: The directory where the merged image files will be saved. If the directory does not exist, it will be created.

    • Example: "path/to/merged/images"

Example Use:

from lexset.LexsetManager import merge_datasets

# Define the directories containing your COCO JSON files and images
# add as many as you like
json_dirs = ["D:/<path 1>", "D:/<path 2>"]

# Define the percentage of data to keep from each directory
percentages = [50, 50]  # 50% from the first directory, 60% from the second

# Define paths to output JSON and image directory
output_json_path = "D:/<path 3>/coco_annotations.json"
output_img_dir = "D:/<path 3>/"

# Merge the datasets
merge_datasets(json_dirs, percentages, output_json_path, output_img_dir)

Last updated