Tagging photographic images: showcasing the magnificent history of Victoria
How can we enable researchers to tag images of digitised records from photographic collections?
Go to Challenge | 6 teams have entered this challenge.
Team 2155
Jakrapun Sangchan | jakrapun.san@gmail.com |
Ajay Krishnan Jayakumar Usha | ajaykrishnan.ju@gmail.com |
Kristian Guinto | guintojames8@gmail.com |
Project Title | Tagging photographic images: showcasing the magnificent history of Victoria |
State | Victoria |
Sponsor | Public Record Office Victoria (PROV) |
Project objective | Using new technologies and tools, crowdsource, or other ways to tag images so that they can be used on the PROV website. |
Solution Summary | Use generative ai to automatically make captions and tags and have humans in the loop to enhance the model output. |
Dataset | Images from:
https://prov.vic.gov.au/explore-collection/photographic-collections Metadata from: https://prov.vic.gov.au/prov-collection-api |
Repository | Github |
Our proposed solution can be divided into several components including:
Using public APIs to access image data
This section includes a data pipeline written in Python using the PROV API to collect image links and descriptions. The pipeline allows the user to query either on serial number or keyword.
Using state-of-the-art generative ai models to automatically make image captions and tags.
For our machine learning project, we chose to work with the BLIP model, an open-source vision-language model from Salesforce. We focused on two main tasks:
Using the BLIP model, we generated captions for images retrieved from the PROV API. For generating tags, we applied BLIP's Visual Question Answering capability. This allowed the model to answer a set of predefined questions and provide corresponding tags. Future iterations of this project can have more context-specific questions to generate context-specific tags.
In our implementation, we refrained from using third-party APIs. Instead, we downloaded the model and executed it locally within our environment. This approach ensured that no data was transmitted outside the system, thereby preventing potential data leakage to the public.
An app that puts humans in the loop to enhance the model’s outputs
Our app has the following functionalities:
A demonstration of how the refined data can be used to self-improve
In this section, we've adopted principles from Reinforcement Learning from Human Feedback (RLHF). The essence of this approach is to collect user feedback to elevate our model's performance in subsequent iterations. We've designed an intuitive user interface that facilitates users in providing feedback and making adjustments to model outputs. By leveraging this feedback, we aim to accumulate more accurate descriptions and tags, setting the stage for enhanced model performance in the future.
The combination of these four components gives a complete solution while also providing a systematic way for self-improvement. The refined captions and tags can be added to the metadata of the images which will allow for better searchability. These can also be used to improve the model’s performance such that it is more aligned with the objectives of an organization.
Traditional solution: Humans to write tags for each image.
Our solution:
Description of Use Sourcing the images using PROV's API. We then feed these images to an ai to generate descriptions and tags.
Go to Challenge | 6 teams have entered this challenge.