Data Collection¶
This page outlines how to collect video or image data and also includes some helpful tips for dataset generation.
Data Collection¶
Collecting Data is the first step in model building, and it can be accomplished in many ways. You can find more information about the different ways to gather data the Data Capture Guidelines document. The Data Collection app simplifies data collection and allows you to capture data directly from the edge device and cameras on which you will be inferencing.
Note: The Data Collection workflow is only available for Enterprise users.
1. Add a Data Collection Device¶
Follow the device provisioning workflow to provision a device. Once your device shows up on the “Devices” page, you’re ready to move to the next step.
2. Deploy the Application To Your Device¶
Navigate to the “Projects” page on the alwaysAI Dashboard. Find and select the “alwaysAI Data Collection” project. The application versions list should have one or more versions available to deploy to your devices. Click the “Deploy” button on the latest release, select your devices, and deploy!
On the device page, the application should be listed in the “Application Status” table, and clicking the “Details” button gives you the ability to view logs and view and modify configuration and environment variables. The application won’t work correctly without updating the configuration, which is described in the next section.
3. Configure the Application¶
The app configuration is managed through various parameters in
alwaysai.app.json
-> "app_configurations"
section, which can be updated
on the “Config” tab for the application. Below is a list of these configurations
divided into required and optional sections:
Required Configurations¶
upload_mode: Specifies the upload behavior for the collected data. Possible values:
"WHEN_IDLE"
: Starts uploading only when out of the collection duration and stops once the collection restarts or all uploads are finished."ALWAYS"
: Uploads a video as soon as it is finished writing based oncollection_video_duration_s
."NEVER"
: Only stores the video on the local device and never uploads it."ONLY"
: Only uploads any saved files to the provided S3 bucket.
"upload_mode": "WHEN_IDLE"
capture_mode: Specifies the mode of capture. Possible values:
"VIDEO"
: Capture video"IMAGE"
: Capture images
"capture_mode": "video"
frame_rate: Sets the frame rate for data collection. For videos, this is the output video frame rate. For images, the app will attempt to capture images at this rate.
"frame_rate": 15
video_streams: A list of video streams to collect data from. Each stream can be configured with:
mode: Type of video stream. Possible values:
"IP"
: IP camera."USB"
: USB camera."TEST"
: Test stream.
config: The configuration corresponding to the mode:
For
"IP"
: URL of the stream (e.g.,rtsp://
).For
"USB"
: Camera ID (e.g.,0
for the first connected USB camera).For
"TEST"
: “”
"video_streams": [ {"mode": "IP", "config": "rtsp://"}, {"mode": "USB", "config": 0}, {"mode": "TEST", "config": ""} ]
collection_video_duration_s: The duration in seconds for each video chunk. For example, if set to
300
, a new video file will be created every 300 seconds."collection_video_duration_s": 300
collection_video_format: The format to use for the collection video. Possible values:
"MP4"
"AVI"
"collection_video_format": "MP4"
timezone: Specifies the timezone for scheduling the data collection. Example:
"US/Pacific"
."timezone": "US/Pacific"
opening_time: The start time for data collection in
HHMM
format. Example:"1455"
for 2:55 PM."opening_time": "1455"
closing_time: The stop time for data collection in
HHMM
format. Example:"1457"
for 2:57 PM."closing_time": "1457"
Optional Configurations¶
frame_dimensions: An optional parameter that specifies the dimensions
[width, height]
to which the images/videos will be resized. If not set, the dimensions of the input video stream will be maintained."frame_dimensions": [1920, 1080]
stack_mode: An optional parameter to stack video streams.
NOTE: The images will be stacked using the edgeIQ
safe_hstack
andsafe_vstack
and could undergo resizing/padding. Learn MorePossible values:
"HSTACK"
: Stack streams horizontally."VSTACK"
: Stack streams vertically.If not set, each video source will have its own output files.
"stack_mode": "HSTACK"
Additional Guides¶
FFmpeg¶
We recommend you download and install ffmpeg to assist you in generating your dataset. It is great for generating sample images from a wide variety of video formats, or changing the format of a video. There are detailed installation instructions on the website provided, and many sites with command instructions and examples. e.g. https://www.labnol.org/internet/useful-ffmpeg-commands/28490/
A sample command that gives you high quality samples with 2 frames per second is:
$ ffmpeg -i movie_name.mov -r 2 -q:v 1 image_name_%4d.png
Data Capture Guidelines¶
Capturing data is the first step in the model training process, and one of the most important. A model is only as good as the dataset it is trained on; as the saying goes: garbage in, garbage out. Keeping that in mind, we have compiled a list of considerations that will help you to ensure the data you capture gives you the best chance at an accurate, robust model.
There are three ways to generate a dataset:
Collect the dataset yourself
Acquire data from outside sources
Use a digitally generated dataset.
The scope of this document is for the first method, however, the considerations we discuss apply for the other ways of generating a dataset as well.
Data Source¶
You can collect your dataset in either image or video format. The main, overarching theme to keep in mind is: Collect as you will inference. In computer vision, “inference” is the term we use for applying a trained model to an input to infer an outcome. If your target application will be analyzing random images from the internet, your dataset should be images pulled from the internet. If your target application will be running on security camera video footage collected from a camera in a high corner of a building lobby, then your data should be from a similar, or preferably the same, video camera. Model training is done on images, and inference is technically done on images as well, even if it is analyzing a video stream, however the concept is to train on data that resembles real-world applications. Once your video data is collected, can easily sample the videos to create images to use for training. While you may be able to find a ready made dataset, or generate one using images or video collected by someone else, if you have full control of the source of your data, you can ensure a better quality dataset.
Class/Label Balance¶
The term label balance refers to aiming to have roughly equivalent number of example images for each class, or label, as they are often called, you are training your model to recognize. If there is a large discrepancy in the number of images across classes, e.g. you are training a model to recognize bottles and cans and have 2000 images of bottles and 50 images of cans, the model will not be balanced. This could result in disparity in accuracy or precision across classes and will generally be detrimental to your model.
Lighting¶
The optimal lighting for your dataset depends on the lighting of your target application. To return to the security camera example, the lighting will vary greatly depending on whether your security camera is inside or outside, or if it is running during the night, the day, or both. A camera that is inside may have consistent lighting throughout the day, and even the night, whereas a camera outside is subject to changes in lighting due to things like weather and time of day. None of these things are an impediment, however, you’ll want to take them into consideration, and ideally have examples of the all the lighting conditions your model will be exposed to in your training data.
Angle¶
The angle that objects are viewed from can drastically change their shape. An umbrella from above or below is an octagon, but from the side it is a crescent with a line. When collecting data, consider the angle you will be inferencing from, whether your camera will be high or low, as well as the direction that your targets will be crossing the frame, if it is relevant.
Distance¶
When humans looks at a scene, our brains perform a lot of processing to interpret whether an object is close to us or far away. The main factor in this is size: the closer an object is to us, the bigger it appears. We need to take that into consideration when training a computer vision model. In order to teach the model to recognize an object consistently regardless of how close it is to the camera, we need images of the object from a wide range of distances in our dataset. That means we need images where our target class takes up most of the frame, as well as images where the target class takes up very little of the frame. Try to capture the target object from a variety of distances, especially if the object will be moving towards or away from the camera in the target application.
Resolution¶
Resolution will play a role in the quality of the model if there is a large discrepancy between the resolution of the training images and inferencing images. For example, if the training images are high-definition, the model will have trouble finding the same shape in grainy, low resolution images. Typically, resolution is close enough across most devices, however it is good to take this into consideration in general.
Scale¶
Most likely, the framework on which you train the model will re-scale all images so that they are consistent for training. However, if the raw images that you gather for training have a wide range of scales, this re-scaling will affect all of them differently, which will have a negative impact on your model. Try to use training images that are at roughly the same scale.
Occlusion¶
Your occlusion tolerance is something that you have to make a decision on when gathering data. How much of an object do you want to be visible before your model detects it? 50%? 80%? 20%? Keep in mind, if you want a partially visible object to be detected by your model, you need a large number of examples of the object being occluded in your dataset. In addition, the more an object is occluded, the less defining features are able to be detected, and as such, you may introduce false positives, or reduce the accuracy of inferences if you use images containing occluded objects as your training data.
Weather¶
If you are going to be inferencing in a location that has weather, i.e. outside, try to account for that when gathering data. If all the data you collect is during a bright sunny day, what happens when it rains or snows? The clouds will reduce light, the rain will add an artifact over the entire inference area that needs to be accounted for. Snow will completely change the background, etc. Think about whether you will be inferencing in various weather conditions, and try to incorporate that as best you can into your data collection. You may not be able to make it rain when you are capturing data, but maybe you could simulate it by using images from time with varying amounts of light, like early morning or evening.
Background¶
The background of your training dataset can drastically change how your target classes are recognized. If you collect all your data in a controlled environment, say with a white background, it will be easy for the model to recognize the objects you are training on, but this won’t translate to an accurate model in the real world. In the real world, your object may be camouflaged by the background, or have half blend in and half stand out, or any number of situations. To generate a robust model that is accurate in many situations, vary the background of the training dataset as much as possible.
Foreground¶
What happens if your target class is in the background and there are other things in the foreground? It will affect the focus of your hardware, the clarity of your object, and the overall visibility of your target. This is a very likely situation when you deploy your model in the real world. There is no guarantee that what you are training for will be front and center in your image, so try to include images that have things other than your target class as the foreground.