Additional Resources

We’ve compiled some additional resources to assist you with model training. On this page, you’ll find Frequently Asked Questions (FAQs), troubleshooting support, a command cheat sheet for CLI help, and a glossary of training commands.


This page answers FAQs from four categories

Please visit our Discord channel or or send an email to to ask questions that are not answered on our page. Please find helpful tutorials and additional reading material on our Blog page.


What is an epoch?

An epoch is running through every image 1 time.

What is batch size?

The batch size is the number of images processed before the model is updated. Batch size is largely dependent on how much memory you have available for training: the more memory you can use, the larger the possible batch size.

What does “overfitting” mean?

When the model performs very well on the training dataset, but not on data that hasn’t been seen before, the model is overfit. This means that even though the performance metrics may appear very good for the training dataset, the model cannot be generalized to new data. For instance, if you have a model that you want to train to detect sporting equipment, and for the label ‘ball’ your dataset included only green tennis balls, even if the precision and recall were very high and loss is very low, the model probably won’t generalize to basketballs or baseballs, or maybe even non-green tennis balls. This is an extreme example; if you train any dataset too much, any model will learn that dataset so well that it doesn’t understand that new data may also be instances of the desired labels.

What is “loss”?

There are numerous algorithms to measure loss, and this measurement will be different for different machine learning tasks. In general, loss measures how far off the model was in correctly learning the task, and as such it is always a value that we want to minimize. There are two types of loss, training loss and validation loss. Training loss is measured by how accurately the model predicts using the training data. Validation loss measures how accurately the model predicted on validation data, which is annotated data that the model was not trained on.

What are “precision” and “recall”?

Precision describes how many of the detected objects are what we actually wanted to detect. It is calculated by dividing the number of correctly identified objects, the true positives, by the total identified objects (both the true and false positives).

Recall describes how many of the objects of interest we managed to detect. It is calculated by dividing the true positives by the true positives plus the true negatives.

Say we have a model that is supposed to detect dogs, and in a picture there are three dogs and two cats. If the model detects all entities in the picture as dogs, it would have low precision, because only 3 of the 5 objects were what we wanted to detect. It would have high recall, however, because it managed to detect all the dogs. We want our model to have both high precision and high recall. This would mean we want a model to correctly identify dogs as dogs, and not identify any cats as dogs.

What is “data augmentation”?

It can sometimes be challenging to collect sufficient images to train your model. You can augment your dataset by taking the images you do have and creating additional images by rotating, cropping, brightening, darkening, blurring, etc. them. One Python library you can use to do this is imgaug.

How do I know when I should stop training?

Generally, you want to stop training when loss no longer decreases and mAP no longer increases. If you visually test your model, using it in an application, and you notice certain instances of labels are no longer being picked up, you may have overfit your model. If instead some objects are being mis-identified, your model may need more training.

Data Collection

What is the format for my training data?

Input for training is expected to be in either Pascal VOC format or COCO. For both of these format, images should be JPEGs or PNGs stored in a folder named ‘JPEGImages’. Each format has different ways of representing annotations; for Pascal VOC, annotations are in XML format, whereas in COCO the annotations are in JSON. In both cases annotations should be stored in a folder named ‘Annotations’; you can see an example of both formats in our Data Collection guide. Every image should correspond to an annotation either a file, e.g. file ‘0.jpg’ corresponds to ‘0.xml’, or as a JSON object. Every dataset consists of the ‘Annotations’ and ‘JPEGImages’ folders zipped together. Zip the folders by selecting the individual ‘Annotations’ and ‘JPEGImages’ folders, not a parent directory.

What if I have multiple datasets, do I need to combine them before training?

Yes, the CLI has a aai dataset merge command that will combine any input datasets provided they are in zip format.

How much input data do I need?

Approximately 300 images per label at a minimum is recommended, however more data will almost surely produce better results. See the Data Capture Guidelines document for more details.


Do I need to annotate all objects in an image?

No, however you should be careful to not include too many images containing objects of interest without annotations, and you should be mindful of what other objects are in your images. For more details on data collection and annotation, please refer to our blogs on these subjects.

How much of an object needs to be showing before I annotate it?

In general, about 20% of the image should be present to annotate it. Additionally, if more than 20% of the image is covered, you can mark the annotation as ‘truncated’.

How can I test my model?

You can use your model in an app to visually assess the model’s ability to detect the desired objects. The web and desktop apps both publish to your catalog automatically. To add a model to an app, use

$ aai app models add <username/modelname>

If you have done some more training and the model is already added to your application, to update the version of the model, simply run

$ aai app models update

Can I test multiple versions of my model side by side?

Yes! However, you must use different training names for the two models, i.e. ‘my_model1’ and ‘my_model2’. You can train and publish two versions of your model, using these different ids, and test each model’s performance on the same input stream. See this tutorial for more details.

Do I need to train on all of my labels?

Yes, at the moment you must train on all the labels in your dataset. Additionally, if a label is specified in the training command, it must be present in your dataset! You do not need to manually enter labels, they will be automatically detected in your dataset.


How long does it take to train a model?

This depends largely on the size of your dataset, the number of epochs you are running, the batch size, and whether you are training on a CPU or GPU. Generally, training on GPU will be much faster (approximately 3-5 faster) than CPU. For reference, our license plate detection model was trained on a GPU for 1,300 epochs with a batch size of 16 using a dataset that contained 951 images and this training took approximately 20 hours. (Note that this model may not have needed to be trained for this many epochs, however we offer it as an example in how these different components may affect one another). As another example, running 4 epochs on a CPU using a dataset of 592 images took about 20 minutes using a batch size of four.

What if I accidentally close my session?

There is a Sessions tab on the top right of the training output interface that displays all of your sessions, past and current. Simply click the tab, find your session, then click the reconnect link on the far right.

What hardware do I need to train a model?

Because all of our training is done on the cloud, all you need is a computer with internet connection and a browser! You can also use the desktop app on Windows and Mac.

Is there a limit on how much data I can train?

You can train on up to 2GB of data. Training on datasets, either individually or combined, that exceed this size may result in inconsistent results and should not be attempted.

Is there a place I can see logs of the training data if I lost my console output?

Yes, when using the web app, you can click the sessions tab, click on a session, then click the SESSION_RECORDS link at the bottom of the interface. This will show all of the logs of that session. When using the desktop app, they will appear in the .alwaysai directory (**note this does not exist yet but it should, should I make a ticket for it?).

Can I see performance metrics while the model is training?

When training using the web and desktop applications, you can see a plot of the loss at each step, as well as the validation loss every epoch. You can also see logs of these values by clicking the logs tab at the top right of the graph.

What if I want to revert to an older version of my model?

You can specify an older version of a model to continue training from with the continue training from a previous version checkbox and selecting an older version from the dropdown. You can use an older version of a model by specifying the desired model name and using

$ aai app models add <username/modelname> --local-version <version>

Do I have to train all at once, or can I pick up where I left off?

On the web or desktop apps, you can continue training your model from a previous version by checking the continue training from a previous version checkbox and specifying the version you would like to continue from in the dropdown. Make sure the model id is the same as the one you used to train previously before you click the checkbox so the app can fetch the correct versions for your particular model. If it is not checked and the model name is the same as one that already exists, it will begin training from scratch and a new model version will be made, incrementing from the last version.

Do I have to keep using the same training settings if I continue training?

No! You can change any settings between iterations of training.

What types of models can I train?

Currently, we train object detection models by transfer-learning from either a pre-trained MobileNet-SSD or YOLOv3 model that has been trained on the COCO Dataset.


Trouble with Annotations

Error 1

Error: Annotations directory not found
alwaysAI retraining expects a PascalVOC datset. The zipped directory must have Pascal VOC
formatted annotation files with a "JPEGImages" directory containing the images corresponding to the annotation
files found in "Annotations"

- Annotations
    - annotation xml or json files
- JPEGImages
    - corresponding image files in jpeg format (.jpg or .jpeg)

This error occurs when you are attempting to merge datasets (aai dataset merge) that have not been compressed properly. Select the ‘Annotations’ folder and ‘JPEGImages’ folder for one annotation set and compress these (do not compress the parent directory containing these folders). Repeat for all annotations sets (you can rename after compression if need be), and then use these compressed files as input for the command.

Command Cheat Sheet

Data Commands

Run Data Collection Starter App

$ aai app start

Merge Datasets

You can merge and train multiple datasets, so long as the aggregate file size is 2 GB or less.

$ aai dataset merge <> <>

Resize a Dataset

This is not necessary, but may speed up training time, especially if training more than once, using the --continue-from-version flag.

$ aai dataset resize --target-dir <>

Post-Training Commands

Publish the Model

$ aai model publish <username/modelname>

Add (published) Model to App

$ aai model add <username/modelname>

Add (unpublished) Model to App

$ aai model add <username/modelname> --local-version <version>

Update a Model (already added to app)

$ aai models update

Glossary of Terms


(Adjective). The process of labeling data by defining which areas of an image contain the relevant object(s).

(Noun). The actual files that contain the information regarding the areas of interest for a particular image. Annotations are sometimes referred to as the ground truth and they are used in supervised learning; the model repeatedly compares predictions against annotations in order to improve.


The process of altering images thereby creating new images that are sufficiently different from the originals. Augmentation can include blurring, cropping, brightening, darkening, rotating, and more. Augmentation is used to increase the size of a dataset.


The number of images trained on before the model is updated.


In annotation, ‘difficult’ is set to 1 when the object is not easily recognized, otherwise it is set to 0.


Training on each image one time.


A quantification of how different the model’s prediction is from the ground truth.

Learning Rate

How often the weights in the model are updated.


When the model performs well on the training dataset, but poorly on new test data.

Train, Validation, Test Split

There are three components to training a neural network. The actual training, the tuning of the hyperparameters, such as learning rate, and testing the model. To accomplish this, the original dataset is typically split into a training and testing dataset, usually with an 80/20 split, respectively. The training dataset is then split into a training and validation dataset. As the model trains, it compares its prediction to the annotation on all of the training data and adjusts the weights and other hyperparameters accordingly. When the model is done training, the model is tested against the validation data to see how well it performs.


When annotating, this describes whether the object being annotated is completely visible. If the object is visible (i.e. not truncated), this value is set to 0, otherwise it is set to 1. Typically if 20% or more of the object is obscured, it should be marked as truncated.


The model has poor performance on validation and training data as well as test data.


Weights are a way of quantifying how important a given input is for a neural network and how much it contributes to the output.