This is the second video of the series 5 Essentials of AI Training Data Labeling work. Ngoc will talk about scale in data annotation and steps to implement data labeling for AI development.
You can watch our video here, or read the transcription below. Turn on subtitles for English, Japanese, Korean and Vietnamese.
Hello everyone, I’m Ngoc from Lotus QA.
In the previous video of the series about annotation, I have talked about the first essential for AI training data annotation: Data quality. In today’s video, I will talk about the second essential: SCALE.
Labeling data is a time-consuming process, but we must focus on this task to improve data quality and model performance.
“Does labeling work take that much time?”
I’m sure some of you will think so, but let’s look at an example of video annotation. A 10-minute video contains somewhere between 18,000 and 36,000 frames, about 30-60 frames per second. 1 hour of video data collected takes about 5 men/month to annotate. There is a direct relationship between the volume of your training data and the size of your team when you are scaling data annotation.
Scaling Data Annotation
You can scale your annotation workforce by hiring an internal team of labelers or outsourcing. Hiring an internal team of labelers is an expensive option, but sometimes it is the only option. For example, scaling difficult training data, such as CT scans, would need to be labeled by radiologists who have the necessary medical expertise to properly interpret the data.
The concern with outsourcing annotation work requiring subject matter expertise is that a BPO (Business process outsourcing) will not be able to provide specialized labelers.
However, Business Process Outsourcing covers a surprisingly wide range of areas of expertise, and with a little research, you can find annotation services that have the ability to label your data for a fraction of the cost of hiring an internal team. Moreover, an annotation service BPO team can provide you with the elastic capacity to scale your workforce up or down, according to your project and business needs, without compromising data quality.
How do I know when it’s time to scale and hire a data labeling service?
If your most expensive resources like data scientists or engineers are spending significant time wrangling data for machine learning or data analysis, you’re ready to consider scaling with a data labeling service.
Increases in data labeling volume, whether they happen over weeks or months, will become increasingly difficult to manage in-house.
They also drain the time and focus of some of your most expensive human resources: data scientists and machine learning engineers. It’s better to free up such a high-value resource for more strategic and analytical work that will extract business value from your data.
4 Steps to Scale Data Labeling
Design for data labeling workforce capacity
A data labeling service can provide access to a large pool of workers.
Your best bet is working with the same team of labelers because as their familiarity with your business rules, context, and edge cases increases, data quality improves over time.
This is especially helpful with data labeling for machine learning projects, where quality and flexibility to iterate are essential.
Look for elasticity
You may have to label data in real-time, based on the volume of incoming data generated.
Perhaps your business has seasonal spikes in purchase volume over certain weeks of the year.
We have also found that product launches can generate spikes in data labeling volume.
You will want a workforce that can adjust scale based on your needs.
Choose smart tooling
Whether you buy it or build it yourself, the data enrichment tool you choose will significantly influence your ability to scale data labeling.
You’ll need a tool that gives you the flexibility to make changes to your data features and labeling process.
Measure labeler’s productivity
Productivity can be measured in a variety of ways, but in our experience, we’ve found that three measures, in particular, provide a helpful view into worker productivity:
- The volume of completed work
- Quality of the work (accuracy plus consistency)
- Worker engagement.
On the worker side, strong processes lead to greater productivity. Combining technology, workers, and coaching shortens labeling time and minimizes downtime. We have found data quality is higher when we place data labelers in small teams, train them on your tasks and business rules, and show them what quality work looks like.
If you are interested in our videos, please click “like” and subscribe to our channel.
Interested in our AI Data Annotation Service?