pylabel package¶
Submodules¶
pylabel.analyze module¶
The analyze module analyzes the contents of the dataset and provides summary statistics such as the count of images and classes.
- class pylabel.analyze.Analyze(dataset=None)[source]¶
Bases:
object
- ShowClassSplits(normalize=True)[source]¶
Show the distribution of classes across train, val, and test splits of the dataset.
- Parameters:
normalize (bool) – Defaults to True. If True, then will return the relative frequencies of the classes between 0 and 1. If False, then will return the absolute counts of each class.
- Returns:
Pandas Dataframe
Examples
>>> dataset.analyze.ShowClassSplits(normalize=True) cat_name all train test dev squirrel .66 .64 .65 .63 nut .34 .34 .35 .37
>>> dataset.analyze.ShowClassSplits(normalize=False) cat_name all train test dev squirrel 66 64 65 63 nut 34 34 35 37
- property class_counts¶
Counts the numbers of instances of each class in the dataset. Uses the Pandas value_counts method and returns a Pandas Series.
- Returns:
Pandas Series
Example
>>> dataset.analyze.class_counts squirrel 50 nut 100
- property class_ids¶
Returns a sorted list of all cat ids in the dataset.
- Returns:
List
Example
>>> dataset.analyze.class_ids [0,1]
- property class_name_id_map¶
Returns a dict where the class name is the key and class id is the value.
- Returns:
Dict
Example
>>> dataset.analyze.class_name_id_map {('Squirrel', 0),('Nut', 1)}
- property classes¶
Returns list of all cat names in the dataset sorted by cat_id value.
- Returns:
List
Example
>>> dataset.analyze.classes ["Squirrel", "Nut"]
- property num_classes¶
Counts the unique number of classes in the dataset.
- Returns:
Int
Example
>>> dataset.analyze.num_classes 2
- property num_images¶
Counts the number of images in the dataset.
- Returns:
Int
Example
>>> dataset.analyze.num_images 100
pylabel.dataset module¶
The dataset is the primary object that you will interactive with when using PyLabel. All other modules are sub-modules of the dataset object.
- class pylabel.dataset.Dataset(df)[source]¶
Bases:
object
- ReindexCatIds(cat_id_index=0)[source]¶
Reindex the values of the cat_id column so that that they start from an int (usually 0 or 1) and then increment the cat_ids to index + number of categories. It’s useful if the cat_ids are not continuous, especially for dataset subsets, or combined multiple datasets. Some models like Yolo require starting from 0 and others like Detectron require starting from 1.
- Parameters:
cat_id_index (int) – Defaults to 0. The cat ids will increment sequentially the cat_index value. For example if there are 10 classes then the cat_ids will be a range from 0-9.
Example
>>> dataset.analyze.class_ids [1,2,4,5,6,7,8,9,11,12] >>> dataset.ReindexCatIds(cat_id_index) = 0 >>> dataset.analyze.class_ids [0,1,2,3,4,5,6,7,8,9]
- analyze¶
See pylabel.analyze module.
- df¶
The dataframe where annotations are stored. This dataframe can be read directly to query the contents of the dataset. You can also edit this dataframe to filter records or edit the annotations directly.
Example
>>> dataset.df
- Type:
Pandas Dataframe
- export¶
See pylabel.export module.
- labeler¶
See pylabel.labeler module.
- name¶
Default is ‘dataset’. A friendly name for your dataset that is used as part of the filename(s) when exporting annotation files.
- Type:
string
- path_to_annotations¶
Default is ‘’. The path to the annotation files associated with the dataset. When importing, this will be path to the directory where the annotations are stored. By default, annotations will be exported to the same directory. Changing this value will change where the annotations are exported to.
- Type:
string
- splitter¶
See pylabel.splitter module.
- visualize¶
See pylabel.visualize module.
pylabel.exporter module¶
PyLabel currently supports exporting annotations in COCO, YOLO, and VOC PASCAL formats.
- class pylabel.exporter.Export(dataset=None)[source]¶
Bases:
object
- ExportToCoco(output_path=None, cat_id_index=None)[source]¶
Writes COCO annotation files to disk (in JSON format) and returns the path to files.
- Parameters:
output_path (str) – This is where the annotation files will be written. If not-specified then the path will be derived from the path_to_annotations and name properties of the dataset object.
cat_id_index (int) – Reindex the cat_id values so that they start from an int (usually 0 or 1) and then increment the cat_ids to index + number of categories continuously. It’s useful if the cat_ids are not continuous in the original dataset. Some models like Yolo require starting from 0 and others like Detectron require starting from 1.
- Returns:
A list with 1 or more paths (strings) to annotations files.
Example
>>> dataset.exporter.ExportToCoco() ['data/labels/dataset.json']
- ExportToVoc(output_path=None, segmented_=False, path_=False, database_=False, folder_=False, occluded_=False)[source]¶
Writes annotation files to disk in VOC XML format and returns path to files.
By default, tags with empty values will not be included in the XML output. You can optionally choose to include them if they are required for your solution.
- Parameters:
output_path (str) – This is where the annotation files will be written. If not-specified then the path will be derived from the .path_to_annotations and .name properties of the dataset object.
segmented (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.
path (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.
database (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.
folder (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.
occluded (bool) – Defaults to False. Set to true to include this field in the XML schema of the output files.
- Returns:
A list with 1 or more paths (strings) to annotations files.
Example
>>> dataset.export.ExportToVoc() ['data/voc_annotations/000000000322.xml', ...]
- ExportToYoloV5(output_path='training/labels', yaml_file='dataset.yaml', copy_images=False, use_splits=False, cat_id_index=None, segmentation=False, keypoints=False)[source]¶
Writes annotation files to disk in YOLOv5 format and returns the paths to files.
- Parameters:
output_path (str) – This is where the annotation files will be written. If not-specified then the path will be derived from the .path_to_annotations and .name properties of the dataset object. If you are exporting images to train a model, the recommended path to use is ‘training/labels’.
yaml_file (str) – If a file name (string) is provided, a YOLOv5 YAML file will be created with entries for the files and classes in this dataset. It will be created in the parent of the output_path directory. The recommended name for the YAML file is ‘dataset.yaml’.
copy_images (boolean) – If True, then the annotated images will be copied to a directory next to the labels directory into a directory named ‘images’. This will prepare your labels and images to be used as inputs to train a YOLOv5 model.
use_splits (boolean) – If True, then the images and annotations will be moved into directories based on the values in the split column. For example, if a row has the value split = “train” then the annotations for that row will be moved to directory /train. If a YAML file is specificied then the YAML file will use the splits to specify the folders user for the train, val, and test datasets.
cat_id_index (int) – Reindex the cat_id values so that they start from an int (usually 0 or 1) and then increment the cat_ids to index + number of categories continuously. It’s useful if the cat_ids are not continuous in the original dataset. Yolo requires the set of annotations to start at 0 when training a model.
segmentation (boolean) – If true, then segmentation annotations will be exported instead of bounding box annotations. If there are no segmentation annotations, then no annotations will be empty.
keypoints (boolean) – If true, then keypoint annotations will be exported as well as bounding box annotations. It is not possible to export both segmentation and keypoint annotations at the same time in YOLO format. Each bounding box within a dataset should have the same number of keypoints defined e.g. 17 for COCO. Keypoints are a triplet of (x, y, visibility), see e.g. https://cocodataset.org/#format-data If some images have no keypoint annotations, then the bounding boxes will be followed by a series of delimiting spaces. If some bounding boxes within an image have no keypoint annotations, those keypoints will be a series of zeroes, denoting x=0, y=0, visibility=0.
- Returns:
A list with 1 or more paths (strings) to annotations files. If a YAML file is created then the first item in the list will be the path to the YAML file.
Examples
>>> dataset.export.ExportToYoloV5(output_path='training/labels', >>> yaml_file='dataset.yaml', cat_id_index=0) ['training/dataset.yaml', 'training/labels/frame_0002.txt', ...]
pylabel.importer module¶
This module includes the commands to import an existing dataset. PyLabel current supports importing labels from COCO, YOLO, and VOC formats. You can also import set of images that do not have labels yet and label them manually using the PyLabel labelling tool.
- pylabel.importer.ImportCoco(path, path_to_images=None, name=None, encoding='utf-8')[source]¶
This function takes the path to a JSON file in COCO format as input. It returns a PyLabel dataset object that contains the annotations.
- Returns:
PyLabel dataset object.
- Parameters:
path (str) – The path to the JSON file with the COCO annotations.
path_to_images (str) – The path to the images relative to the json file. If the images are in the same directory as the JSON file then omit this parameter. If the images are in a different directory on the same level as the annotations then you would set path_to_images=’../images/’
name (str) – This will set the dataset.name property for this dataset. If not specified, the filename (without extension) of the COCO annotation file file will be used as the dataset name.
encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).
Example
>>> from pylabel import importer >>> dataset = importer.ImportCoco("coco_annotations.json")
- pylabel.importer.ImportImagesOnly(path, name='dataset')[source]¶
Import a directory of images as a dataset with no annotations. Then use PyLabel to annote the images. Will import images with these extensions: (‘.png’, ‘.jpg’, ‘.jpeg’, ‘.tiff’, ‘.bmp’, ‘.gif’)
- Parameters:
path (str) – The path to the directory with the images.
name (str) – Default is ‘dataset’. Descriptive name, which is used when outputting files.
- Returns:
A dataset object with one row for each image and no annotations.
Example
>>> from pylabel import importer >>> dataset = importer.ImportImagesOnly(path="images/")
- pylabel.importer.ImportVOC(path, path_to_images=None, name='dataset', encoding='utf-8')[source]¶
Provide the path a directory with annotations in VOC Pascal XML format and it returns a PyLabel dataset object that contains the annotations.
- Returns:
PyLabel dataset object.
- Parameters:
path (str) – The path to the directory with the annotations in VOC Pascal XML format.
path_to_images (str) – The path to the images relative to the annotations. If the images are in the same directory as the annotation files then omit this parameter. If the images are in a different directory on the same level as the annotations then you would set path_to_images=’../images/’
name (str) – Default is ‘dataset’. This will set the dataset.name property for this dataset.
encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).
Example
>>> from pylabel import importer >>> dataset = importer.ImportVOC(path="annotations/", path_to_images="../images/")
- pylabel.importer.ImportYoloV5(path, img_ext='jpg,jpeg,png,webp', cat_names=[], path_to_images='', name='dataset', encoding='utf-8')[source]¶
Provide the path a directory with annotations in YOLO format and it returns a PyLabel dataset object that contains the annotations. The Yolo format does not store much information about the images, such as the height and width. When you import a Yolo dataset PyLabel will extract this information from the images.
- Returns:
PyLabel dataset object.
- Parameters:
path (str) – The path to the directory with the annotations in YOLO format.
img_ext (str, comma separated) – Specify the file extension(s) of the images used in your dataset: .jpeg, .png, etc. This is required because the YOLO format does not store the filename of the images. It could be any of the image formats supported by YoloV5. PyLabel will iterate through the file extensions specified until it finds a match.
cat_names (list) – YOLO annotations only store a class number, not the name. You can provide a list of class ids that correspond to the int used to represent that class in the annotations. For example [‘Squirrel,’Nut’]. If you have the class names already stored in a YOLO YAML file then use the ImportYoloV5WithYaml method to automatically read the class names from that file.
path_to_images (str) – The path to the images relative to the annotations. If the images are in the same directory as the annotation files then omit this parameter. If the images are in a different directory on the same level as the annotations then you would set path_to_images=’../images/’
name (str) – Default is ‘dataset’. This will set the dataset.name property for this dataset.
encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).
Example
>>> from pylabel import importer >>> dataset = importer.ImportYoloV5(path="labels/", path_to_images="../images/")
- pylabel.importer.ImportYoloV5WithYaml(yaml_file, image_ext='jpg', name_of_annotations_folder='labels', path_to_annotations=None, encoding='utf-8')[source]¶
Import a YOLO dataset by reading the YAML file to extract the class names, image and label locations, and preserve if an image should be in the train, test, or val split.
- Returns:
PyLabel dataset object.
- Parameters:
yaml_file (str) – Path to the yaml file that describes the dataset to be imported.
image_ext (str) – The image file extension.
path_to_annotations (str) – the path to the annotations file; if path to annotations is none, file replaces name of images file from yaml file with annotations.
name_of_annotations_folder (str) – Default is “labels”. Change this to “annotations” if your folder is called “annotations”
encoding (str) – Default is ‘utf-8. Encoding of the annotations file(s).
Example
>>> from pylabel import importer >>> dataset = importer.ImportYoloV5WithYaml(yaml_file='data/dataset.yaml')
pylabel.labeler module¶
pylabel.splitter module¶
- class pylabel.splitter.Split(dataset=None)[source]¶
Bases:
object
- GroupShuffleSplit(train_pct=0.5, test_pct=0.25, val_pct=0.25, group_col='img_filename', random_state=None)[source]¶
This function uses the GroupShuffleSplit command from sklearn. It can split into 3 groups (train, test, and val) by applying the command twice. If you want to split into only 2 groups (train and test), then set val_pct to 0.
- StratifiedGroupShuffleSplit(train_pct=0.7, test_pct=0.3, val_pct=0.0, weight=0.01, group_col='img_filename', cat_col='cat_name', batch_size=1)[source]¶
This function will ‘split” the dataframe by setting the split collumn equal to train, test, or val. When a split dataset is exported the annotations will be split into seperate groups so that can be used used in model training, testing, and validation.