pylabel package

Submodules

pylabel.analyze module

The analyze module analyzes the contents of the dataset and provides summary statistics such as the count of images and classes.

class pylabel.analyze.Analyze(dataset=None)[source]

Bases: object

ShowClassSplits(normalize=True)[source]

Show the distribution of classes across train, val, and test splits of the dataset.

Parameters:

normalize (bool) – Defaults to True. If True, then will return the relative frequencies of the classes between 0 and 1. If False, then will return the absolute counts of each class.

Returns:

Pandas Dataframe

Examples

>>> dataset.analyze.ShowClassSplits(normalize=True)
cat_name  all  train  test  dev
squirrel  .66  .64    .65   .63
nut       .34  .34    .35   .37
>>> dataset.analyze.ShowClassSplits(normalize=False)
cat_name  all  train  test  dev
squirrel  66   64     65    63
nut       34   34     35    37
property class_counts

Counts the numbers of instances of each class in the dataset. Uses the Pandas value_counts method and returns a Pandas Series.

Returns:

Pandas Series

Example

>>> dataset.analyze.class_counts
squirrel  50
nut       100
property class_ids

Returns a sorted list of all cat ids in the dataset.

Returns:

List

Example

>>> dataset.analyze.class_ids
[0,1]
property class_name_id_map

Returns a dict where the class name is the key and class id is the value.

Returns:

Dict

Example

>>> dataset.analyze.class_name_id_map
{('Squirrel', 0),('Nut', 1)}
property classes

Returns list of all cat names in the dataset sorted by cat_id value.

Returns:

List

Example

>>> dataset.analyze.classes
["Squirrel", "Nut"]
property num_classes

Counts the unique number of classes in the dataset.

Returns:

Int

Example

>>> dataset.analyze.num_classes
2
property num_images

Counts the number of images in the dataset.

Returns:

Int

Example

>>> dataset.analyze.num_images
100

pylabel.dataset module

The dataset is the primary object that you will interactive with when using PyLabel. All other modules are sub-modules of the dataset object.

class pylabel.dataset.Dataset(df)[source]

Bases: object

ReindexCatIds(cat_id_index=0)[source]

Reindex the values of the cat_id column so that that they start from an int (usually 0 or 1) and then increment the cat_ids to index + number of categories. It’s useful if the cat_ids are not continuous, especially for dataset subsets, or combined multiple datasets. Some models like Yolo require starting from 0 and others like Detectron require starting from 1.

Parameters:

cat_id_index (int) – Defaults to 0. The cat ids will increment sequentially the cat_index value. For example if there are 10 classes then the cat_ids will be a range from 0-9.

Example

>>> dataset.analyze.class_ids
    [1,2,4,5,6,7,8,9,11,12]
>>> dataset.ReindexCatIds(cat_id_index) = 0
>>> dataset.analyze.class_ids
    [0,1,2,3,4,5,6,7,8,9]
analyze

See pylabel.analyze module.

df

The dataframe where annotations are stored. This dataframe can be read directly to query the contents of the dataset. You can also edit this dataframe to filter records or edit the annotations directly.

Example

>>> dataset.df
Type:

Pandas Dataframe

export

See pylabel.export module.

labeler

See pylabel.labeler module.

name

Default is ‘dataset’. A friendly name for your dataset that is used as part of the filename(s) when exporting annotation files.

Type:

string

path_to_annotations

Default is ‘’. The path to the annotation files associated with the dataset. When importing, this will be path to the directory where the annotations are stored. By default, annotations will be exported to the same directory. Changing this value will change where the annotations are exported to.

Type:

string

splitter

See pylabel.splitter module.

visualize

See pylabel.visualize module.

pylabel.exporter module

PyLabel currently supports exporting annotations in COCO, YOLO, and VOC PASCAL formats.

class pylabel.exporter.Export(dataset=None)[source]

Bases: object

ExportToCoco(output_path=None, cat_id_index=None)[source]

Writes COCO annotation files to disk (in JSON format) and returns the path to files.

Parameters:
  • output_path (str) – This is where the annotation files will be written. If not-specified then the path will be derived from the path_to_annotations and name properties of the dataset object.

  • cat_id_index (int) – Reindex the cat_id values so that they start from an int (usually 0 or 1) and then increment the cat_ids to index + number of categories continuously. It’s useful if the cat_ids are not continuous in the original dataset. Some models like Yolo require starting from 0 and others like Detectron require starting from 1.

Returns:

A list with 1 or more paths (strings) to annotations files.

Example

>>> dataset.exporter.ExportToCoco()
['data/labels/dataset.json']
ExportToVoc(output_path=None, segmented_=False, path_=False, database_=False, folder_=False, occluded_=False)[source]

Writes annotation files to disk in VOC XML format and returns path to files.

By default, tags with empty values will not be included in the XML output. You can optionally choose to include them if they are required for your solution.

Parameters:
  • output_path (str) – This is where the annotation files will be written. If not-specified then the path will be derived from the .path_to_annotations and .name properties of the dataset object.

  • segmented (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.

  • path (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.

  • database (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.

  • folder (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.

  • occluded (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.

Returns:

A list with 1 or more paths (strings) to annotations files.

Example

>>> dataset.export.ExportToVoc()
['data/voc_annotations/000000000322.xml', ...]
ExportToYoloV5(output_path='training/labels', yaml_file='dataset.yaml', copy_images=False, use_splits=False, cat_id_index=None, segmentation=False, keypoints=False)[source]

Writes annotation files to disk in YOLOv5 format and returns the paths to files.

Parameters:
  • output_path (str) – This is where the annotation files will be written. If not-specified then the path will be derived from the .path_to_annotations and .name properties of the dataset object. If you are exporting images to train a model, the recommended path to use is ‘training/labels’.

  • yaml_file (str) – If a file name (string) is provided, a YOLOv5 YAML file will be created with entries for the files and classes in this dataset. It will be created in the parent of the output_path directory. The recommended name for the YAML file is ‘dataset.yaml’.

  • copy_images (boolean) – If True, then the annotated images will be copied to a directory next to the labels directory into a directory named ‘images’. This will prepare your labels and images to be used as inputs to train a YOLOv5 model.

  • use_splits (boolean) – If True, then the images and annotations will be moved into directories based on the values in the split column. For example, if a row has the value split = “train” then the annotations for that row will be moved to directory /train. If a YAML file is specificied then the YAML file will use the splits to specify the folders user for the train, val, and test datasets.

  • cat_id_index (int) – Reindex the cat_id values so that they start from an int (usually 0 or 1) and then increment the cat_ids to index + number of categories continuously. It’s useful if the cat_ids are not continuous in the original dataset. Yolo requires the set of annotations to start at 0 when training a model.

  • segmentation (boolean) – If true, then segmentation annotations will be exported instead of bounding box annotations. If there are no segmentation annotations, then no annotations will be empty.

  • keypoints (boolean) – If true, then keypoint annotations will be exported as well as bounding box annotations. It is not possible to export both segmentation and keypoint annotations at the same time in YOLO format. Each bounding box within a dataset should have the same number of keypoints defined e.g. 17 for COCO. Keypoints are a triplet of (x, y, visibility), see e.g. https://cocodataset.org/#format-data If some images have no keypoint annotations, then the bounding boxes will be followed by a series of delimiting spaces. If some bounding boxes within an image have no keypoint annotations, those keypoints will be a series of zeroes, denoting x=0, y=0, visibility=0.

Returns:

A list with 1 or more paths (strings) to annotations files. If a YAML file is created then the first item in the list will be the path to the YAML file.

Examples

>>> dataset.export.ExportToYoloV5(output_path='training/labels',
>>>     yaml_file='dataset.yaml', cat_id_index=0)
['training/dataset.yaml', 'training/labels/frame_0002.txt', ...]

pylabel.importer module

This module includes the commands to import an existing dataset. PyLabel current supports importing labels from COCO, YOLO, and VOC formats. You can also import set of images that do not have labels yet and label them manually using the PyLabel labelling tool.

pylabel.importer.ImportCoco(path, path_to_images=None, name=None, encoding='utf-8')[source]

This function takes the path to a JSON file in COCO format as input. It returns a PyLabel dataset object that contains the annotations.

Returns:

PyLabel dataset object.

Parameters:
  • path (str) – The path to the JSON file with the COCO annotations.

  • path_to_images (str) – The path to the images relative to the json file. If the images are in the same directory as the JSON file then omit this parameter. If the images are in a different directory on the same level as the annotations then you would set path_to_images=’../images/’

  • name (str) – This will set the dataset.name property for this dataset. If not specified, the filename (without extension) of the COCO annotation file file will be used as the dataset name.

  • encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).

Example

>>> from pylabel import importer
>>> dataset = importer.ImportCoco("coco_annotations.json")
pylabel.importer.ImportImagesOnly(path, name='dataset')[source]

Import a directory of images as a dataset with no annotations. Then use PyLabel to annote the images. Will import images with these extensions: (‘.png’, ‘.jpg’, ‘.jpeg’, ‘.tiff’, ‘.bmp’, ‘.gif’)

Parameters:
  • path (str) – The path to the directory with the images.

  • name (str) – Default is ‘dataset’. Descriptive name, which is used when outputting files.

Returns:

A dataset object with one row for each image and no annotations.

Example

>>> from pylabel import importer
>>> dataset = importer.ImportImagesOnly(path="images/")
pylabel.importer.ImportVOC(path, path_to_images=None, name='dataset', encoding='utf-8')[source]

Provide the path a directory with annotations in VOC Pascal XML format and it returns a PyLabel dataset object that contains the annotations.

Returns:

PyLabel dataset object.

Parameters:
  • path (str) – The path to the directory with the annotations in VOC Pascal XML format.

  • path_to_images (str) – The path to the images relative to the annotations. If the images are in the same directory as the annotation files then omit this parameter. If the images are in a different directory on the same level as the annotations then you would set path_to_images=’../images/’

  • name (str) – Default is ‘dataset’. This will set the dataset.name property for this dataset.

  • encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).

Example

>>> from pylabel import importer
>>> dataset = importer.ImportVOC(path="annotations/", path_to_images="../images/")
pylabel.importer.ImportYoloV5(path, img_ext='jpg,jpeg,png,webp', cat_names=[], path_to_images='', name='dataset', encoding='utf-8')[source]

Provide the path a directory with annotations in YOLO format and it returns a PyLabel dataset object that contains the annotations. The Yolo format does not store much information about the images, such as the height and width. When you import a Yolo dataset PyLabel will extract this information from the images.

Returns:

PyLabel dataset object.

Parameters:
  • path (str) – The path to the directory with the annotations in YOLO format.

  • img_ext (str, comma separated) – Specify the file extension(s) of the images used in your dataset: .jpeg, .png, etc. This is required because the YOLO format does not store the filename of the images. It could be any of the image formats supported by YoloV5. PyLabel will iterate through the file extensions specified until it finds a match.

  • cat_names (list) – YOLO annotations only store a class number, not the name. You can provide a list of class ids that correspond to the int used to represent that class in the annotations. For example [‘Squirrel,’Nut’]. If you have the class names already stored in a YOLO YAML file then use the ImportYoloV5WithYaml method to automatically read the class names from that file.

  • path_to_images (str) – The path to the images relative to the annotations. If the images are in the same directory as the annotation files then omit this parameter. If the images are in a different directory on the same level as the annotations then you would set path_to_images=’../images/’

  • name (str) – Default is ‘dataset’. This will set the dataset.name property for this dataset.

  • encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).

Example

>>> from pylabel import importer
>>> dataset = importer.ImportYoloV5(path="labels/", path_to_images="../images/")
pylabel.importer.ImportYoloV5WithYaml(yaml_file, image_ext='jpg', name_of_annotations_folder='labels', path_to_annotations=None, encoding='utf-8')[source]

Import a YOLO dataset by reading the YAML file to extract the class names, image and label locations, and preserve if an image should be in the train, test, or val split.

Returns:

PyLabel dataset object.

Parameters:
  • yaml_file (str) – Path to the yaml file that describes the dataset to be imported.

  • image_ext (str) – The image file extension.

  • path_to_annotations (str) – the path to the annotations file; if path to annotations is none, file replaces name of images file from yaml file with annotations.

  • name_of_annotations_folder (str) – Default is “labels”. Change this to “annotations” if your folder is called “annotations”

  • encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).

Example

>>> from pylabel import importer
>>> dataset = importer.ImportYoloV5WithYaml(yaml_file='data/dataset.yaml')

pylabel.labeler module

class pylabel.labeler.Labeler(dataset=None)[source]

Bases: object

StartPyLaber(new_classes=None, image=None, yolo_model=None)[source]

Display the bbox widget loaded with images and annotations from this dataset.

pylabel.shared module

pylabel.splitter module

class pylabel.splitter.Split(dataset=None)[source]

Bases: object

GroupShuffleSplit(train_pct=0.5, test_pct=0.25, val_pct=0.25, group_col='img_filename', random_state=None)[source]

This function uses the GroupShuffleSplit command from sklearn. It can split into 3 groups (train, test, and val) by applying the command twice. If you want to split into only 2 groups (train and test), then set val_pct to 0.

StratifiedGroupShuffleSplit(train_pct=0.7, test_pct=0.3, val_pct=0.0, weight=0.01, group_col='img_filename', cat_col='cat_name', batch_size=1)[source]

This function will ‘split” the dataframe by setting the split collumn equal to train, test, or val. When a split dataset is exported the annotations will be split into seperate groups so that can be used used in model training, testing, and validation.

UnSplit()[source]

Unsplit the dataset by setting all values of the split column to null.

pylabel.visualize module

class pylabel.visualize.Visualize(dataset=None)[source]

Bases: object

ShowBoundingBoxes(img_id: int = 0, img_filename: str = '') <module 'PIL.Image' from '/home/docs/checkouts/readthedocs.org/user_builds/pylabel/envs/latest/lib/python3.11/site-packages/PIL/Image.py'>[source]

Enter a filename or index number and return the image with the bounding boxes drawn.

Module contents