Improving Data Searchability with Human and AI Collaboration

Project Info

Team Name


Team 2155


Team Members


Ajay , Jack and 1 other member with an unpublished profile.

Project Description


Project Details

Team members

Jakrapun Sangchan jakrapun.san@gmail.com
Ajay Krishnan Jayakumar Usha ajaykrishnan.ju@gmail.com
Kristian Guinto guintojames8@gmail.com

Project overview

Project Title Tagging photographic images: showcasing the magnificent history of Victoria
State Victoria
Sponsor Public Record Office Victoria (PROV)
Project objective Using new technologies and tools, crowdsource, or other ways to tag images so that they can be used on the PROV website.
Solution Summary Use generative ai to automatically make captions and tags and have humans in the loop to enhance the model output.
Dataset Images from:

https://prov.vic.gov.au/explore-collection/photographic-collections

Metadata from: https://prov.vic.gov.au/prov-collection-api

Repository Github

Solution description and data story

Objective

  • The project’s primary objective is to find new ways to improve PROV’s search functionality.

Solution

  • Generate new metadata describing an image to expand its searchability. The metadata that we generate is a combination of a caption and responses to questions about the image (tags).
  • An app combining generative ai models and human feedback to generate correct and relevant image captions and tags.
  • A dataset that will allow for future development by taking advantage of techniques such as model fine-tuning and Reinforcement learning from human feedback (RLHF), a technique used in improving ChatGPT’s performance.

Technical description

Our proposed solution can be divided into several components including:

  1. Using public APIs to access image data.
  2. Using state-of-the-art generative ai models to automatically make image captions and tags.
  3. An app that puts humans in the loop to enhance the model’s outputs.
  4. A demonstration of how the refined data can be used to self-improve.

Using public APIs to access image data

This section includes a data pipeline written in Python using the PROV API to collect image links and descriptions. The pipeline allows the user to query either on serial number or keyword.

Using state-of-the-art generative ai models to automatically make image captions and tags.

For our machine learning project, we chose to work with the BLIP model, an open-source vision-language model from Salesforce. We focused on two main tasks:

  1. Image Captioning
  2. Image Tag Generation

Using the BLIP model, we generated captions for images retrieved from the PROV API. For generating tags, we applied BLIP's Visual Question Answering capability. This allowed the model to answer a set of predefined questions and provide corresponding tags. Future iterations of this project can have more context-specific questions to generate context-specific tags.

In our implementation, we refrained from using third-party APIs. Instead, we downloaded the model and executed it locally within our environment. This approach ensured that no data was transmitted outside the system, thereby preventing potential data leakage to the public.

An app that puts humans in the loop to enhance the model’s outputs

Our app has the following functionalities:

  • Keyword search on PROV’s image repository. The search will load images using the API.
  • Interactive presentation of the model-generated description and tags.
  • Generate descriptions and tags from any image URL.

A demonstration of how the refined data can be used to self-improve

In this section, we've adopted principles from Reinforcement Learning from Human Feedback (RLHF). The essence of this approach is to collect user feedback to elevate our model's performance in subsequent iterations. We've designed an intuitive user interface that facilitates users in providing feedback and making adjustments to model outputs. By leveraging this feedback, we aim to accumulate more accurate descriptions and tags, setting the stage for enhanced model performance in the future.

The combination of these four components gives a complete solution while also providing a systematic way for self-improvement. The refined captions and tags can be added to the metadata of the images which will allow for better searchability. These can also be used to improve the model’s performance such that it is more aligned with the objectives of an organization.


Data Story


image

Traditional solution: Humans to write tags for each image.

Our solution:

  • AI to generate baseline descriptions and tags for the image.
  • Humans to give feedback either by accepting the ai generated content or updating the ai generated content.
  • Use the enhanced description and tags to improve the image metadata.
  • [Bonus] Use the enhanced description and tags to improve the AI.

Evidence of Work

Video

Homepage

Project Image

Team DataSets

Photographic collections - PROV

Description of Use Sourcing the images using PROV's API. We then feed these images to an ai to generate descriptions and tags.

Data Set

Challenge Entries

Tagging photographic images: showcasing the magnificent history of Victoria

How can we enable researchers to tag images of digitised records from photographic collections?

Go to Challenge | 6 teams have entered this challenge.