> For the complete documentation index, see [llms.txt](https://lexset-1.gitbook.io/lexset-seahaven/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://lexset-1.gitbook.io/lexset-seahaven/manage-simulation/python-api-v3-beta/merge-datasets.md).

# Merge Datasets

Merges multiple COCO (Common Objects in Context) datasets stored in different directories into a single, new dataset. The function randomly samples a given percentage of images (and their associated annotations) from each directory to be included in the merged dataset. The merged dataset will include the images, annotations, and categories from each of the original datasets.

### Arguments:

* **`json_dirs`** (type: list of strings)
  * **Description**: A list of directory paths where each directory contains a COCO-formatted JSON annotation file named "annotations.json" and associated image files.
  * **Example**: `["path/to/dataset1", "path/to/dataset2"]`
* **`percentages`** (type: list of integers or floats)
  * **Description**: A list of percentages specifying how much data to keep from each directory in `json_dirs`. The list length should match the length of `json_dirs`. Each percentage is between 0 and 100.
  * **Example**: `[50, 60]` (This will keep 50% of the data from the first directory and 60% from the second.)
* **`output_json_path`** (type: string)
  * **Description**: The path where the merged JSON annotation file will be saved. The file will be named "annotations.json".
  * **Example**: `"path/to/merged/annotations.json"`
* **`output_img_dir`** (type: string)
  * **Description**: The directory where the merged image files will be saved. If the directory does not exist, it will be created.
  * **Example**: `"path/to/merged/images"`

### Example Use:

```python
from lexset.LexsetManager import merge_datasets

# Define the directories containing your COCO JSON files and images
# add as many as you like
json_dirs = ["D:/<path 1>", "D:/<path 2>"]

# Define the percentage of data to keep from each directory
percentages = [50, 50]  # 50% from the first directory, 60% from the second

# Define paths to output JSON and image directory
output_json_path = "D:/<path 3>/coco_annotations.json"
output_img_dir = "D:/<path 3>/"

# Merge the datasets
merge_datasets(json_dirs, percentages, output_json_path, output_img_dir)
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://lexset-1.gitbook.io/lexset-seahaven/manage-simulation/python-api-v3-beta/merge-datasets.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
