Auto Data Labeling is a new feature that is currently being constantly mentioned, and some even deem it the solution for the time-consuming and resource-consuming casual manual annotation.
As the Manual Data Labeling – aka Manual Data Annotation takes hours to annotate one dataset, the Auto Labeling technology now proposes a simpler, faster and more advanced way of processing data, through the use of AI itself.
How we normally handle dataset
The most common and simplest approach to data labeling is, of course, a fully manual one. A human user is presented with a series of raw, unlabeled data (such as images or videos), and is tasked with labeling it according to a set of rules.
For example, when processing image data, the most common types of annotations are classification tags, bounding boxes, polygon segmentation, and key points.
Classification tags, which are the easiest and cheapest annotation, may take as little as a few seconds whereas fine-grained polygon segmentation could take a few minutes per each instance of objects.
In order to calculate the impact of AI automation on data labeling times, let’s assume that it takes a user 10 seconds to draw a bounding box around an object, and select the object class from a given list. This is an observable assumption backed by our own empirical evidence.
In this case, a typical dataset with 100,000 images and 5 objects per image, it would take around 1,500 man-hours to label and this would be equivalent to spending around ~$10K just for data labeling.
Adding a layer of quality control in order to manually verify each piece of labeled data also adds time to delivery. It would take a trained user about one second to check-off each bounding box annotation, thereby increasing the labeling costs by about 10%.
Some workflows may choose to adopt consensus-based quality control. This is when multiple users annotate the same piece of data and the results are consolidated/compared for quality control. With consensus-based workflows, the amount of time and money spent is proportional to the number of users that work on overlapping tasks for consensus. Simply put, if you had three users label the same image three times, you would have to pay for all 3 annotations.
All this is to emphasize that, the two most expensive steps in data labeling are:
- The data labeling itself
- Th reviewing and verifying it for quality control.
Therefore, the utmost purpose of Auto-Label technology is reducing the time for both data labeling and verification.
Thankfully, with advancements in AI and machine learning, Auto-Label technology has come a long way. However, not all Auto-Label technologies are created equally, and, in many instances, naive attempts to use AI end up requiring more human input for correcting errors induced by the AI. Therefore, one has to be extremely cognizant of how the selected AI impacts the overall data workstream.
We will now dive into what Superb AI’s Auto-Label exactly is, the purpose behind this technology, and the advancements that Superb AI is making in this field.
The advantages of Auto Labeling
Auto Labeling is quite a new term in the field, but the technology advancement implementing and making it happen is developing with high speed, shown in the large number of tools on the market now. So what is data labeling and its benefits?
What’s auto labeling?
Auto labeling is a feature found in data annotation tools that apply artificial intelligence (AI) to enrich, annotate, or label a dataset. Tools with this feature augment the work of humans in the loop to save time and money on data labeling for machine learning.
Most tools allow you to load pre-annotated data into the tool. More advanced tools, which are evolving into platforms (e.g., tool plus Software Development Kit or SDK), allow you to leverage AI or bring your own algorithm to the tool to improve the data enrichment process by auto labeling data.
Other tools offer prediction models that suggest annotations so workers can validate them. Some features leverage embedded neural networks that can learn from every annotation made. All of these features can save time and resources for machine learning teams and will have a profound effect on data annotation workflows.
Outstanding benefits of auto labeling
In our work with organizations using tools to annotate images for machine learning, we find auto labeling can be helpful when it is applied in a data annotation workflow in two ways:
- Pre-annotate some or all of your dataset. Workers come behind the automation to review, correct, and complete the annotations. Automation cannot annotate everything; there will be exceptions and edge cases. It’s also far from perfect, so you must plan for people to make reviews and corrections as necessary.
- Reduce the amount of work sent to people. An auto-labeling model can assign a confidence level based on the use case, task difficulty, and other factors. It enriches the dataset with annotations, and sends annotations with lower confidence scores to a person for review or correction.
We’ve run time experiments, with one team using tools that have an automation feature versus another team that is manually annotating the same data. In some cases, we’ve seen auto labeling provide low quality results which increases the amount of time required per annotation task. Other times, it has provided a helpful starting point and reduced task time.
In one image annotation experiment, auto labeling combined with human-powered review and improvements was 10% faster than the 100% manual labeling process. That time savings increased to 40% to 50% faster as the automation learned over time.
It also had a more than five-pixel margin of error for vehicles and missed the objects that were farthest from the camera. As you can see in the image, an auto-labeling feature tagged a garbage bin as a person. It’s important to keep in mind that pre-annotation predictions are based on existing models and any misses in the auto labeling reflect the accuracy of those models.
Data annotation tools can include automation, also called auto labeling, which uses artificial intelligence to label data, and workers can confirm or correct those labels, saving time in the process.
This screenshot of a street sign shows how auto-labeling enriched an image with a bounding box around a garbage can. It was a mistake. It labeled the object as a person. While auto labeling is not perfect, it can provide a helpful starting point and reduce task time for teams of data labelers.
Some tasks are ripe for pre-annotation. For example, if you use the example from our experiment, you could use pre-annotation to label images, and a team of data labelers can determine whether to resize or delete the labels, or bounding boxes.
This reduction of labeling time can be helpful for a team that needs to annotate images at pixel-level segmentation.
Our takeaway from the experiments is that applying auto labeling requires creativity. We find that our clients who use it successfully are willing to experiment, fail, and pivot their process as necessary.
As Auto Labeling one of the breakthroughs for better outlook of the AI technology, specifically machine learning, we still have a lot to discover with this new term.
If you want to hear from our experts concerning the matter of Auto Labeling, please contact us for further details.
- Website: https://www.lotus-qa.com/
- Tel: (+84) 24-6660-7474
- Fanpage: https://www.facebook.com/LotusQualityAssurance