Combine multiple synthetic datasets into one
Merges multiple COCO (Common Objects in Context) datasets stored in different directories into a single, new dataset. The function randomly samples a given percentage of images (and their associated annotations) from each directory to be included in the merged dataset. The merged dataset will include the images, annotations, and categories from each of the original datasets.
json_dirs
(type: list of strings)
Description: A list of directory paths where each directory contains a COCO-formatted JSON annotation file named "annotations.json" and associated image files.
Example: ["path/to/dataset1", "path/to/dataset2"]
percentages
(type: list of integers or floats)
Description: A list of percentages specifying how much data to keep from each directory in json_dirs
. The list length should match the length of json_dirs
. Each percentage is between 0 and 100.
Example: [50, 60]
(This will keep 50% of the data from the first directory and 60% from the second.)
output_json_path
(type: string)
Description: The path where the merged JSON annotation file will be saved. The file will be named "annotations.json".
Example: "path/to/merged/annotations.json"
output_img_dir
(type: string)
Description: The directory where the merged image files will be saved. If the directory does not exist, it will be created.
Example: "path/to/merged/images"