Ipyannotator in The Journal of Open Source Software

Author

Ítalo Epifânio

Published

November 25, 2022

In August of 2022, Ipyannotator was published in The Journal of Open Source Software, an important step in promoting the tool in the AI community. In the paper we describe the ‘statement of need’ and the overall architecture of the solution.

Data-centric AI is a hot topic in the AI community due to its capability of providing competitive advantages for companies and Machine Learning engineers’ careers. Ipyannotator is an open source and highly customizable annotation framework that helps teams to improve, explore and create high-quality data.

Data labeling can be a painful step in every supervised Machine Learning (ML) project. Most of the tooling is not flexible enough to deal with the large variety of domains, applications and data types, which can impose limitations to the users. Ipyannotator was designed to address those limitations by providing built-in annotators but also allowing customization.

Even though Ipyannotator’s built-in annotators are focused on computer vision applications, e.g. bounding box and image classification, the framework is flexible enough to allow users to extend the features to other domains like Natural Language Processing (NLP).

To promote customization, Ipyannotator was developed entirely using Python, which even applies to the UI interactions that commonly rely on JavaScript. This framework allows any software engineer with Python knowledge to customize the solution. Thus, data science teams can work with the tooling they already know with a minimal learning curve.

Ipyannotator abstracts HTML and JavaScript interactions, allowing data science teams to customize the framework according to their requirements.

Ipyannotator follows the trend of tooling focused on data science teams, like PyScript, by abstracting the learning curve of a new programming language (JavaScript). Further than the language abstraction, Ipyannotator runs on top of Jupyter notebooks, an environment that data science teams are very familiar with.

The framework is not restricted to data scientists as it can also run on a web server, providing its features to professional annotators who don’t know how to program.

Ipyannotator use cases are listed in the tutorial section of the documentation. If you got curious about customizing and extending the annotation tool you can check the build annotator tutorial.