TGI Brief: Talking Sense to AI Image Retrieval

Photo collage of 2023 TGI Fellows.

by Bob Grant

A TGI Fellow and research collaborators are embarking on a project to incorporate natural language and text prompts into AI image retrieval models to better aid investigators.

Taylor Geospatial Institute Fellow Abby Stylianou, Ph.D., TGI Research Council member Nathan Jacobs, Ph.D., and George Washington University computer scientist Robert Pless, Ph.D., have just won a grant from the North Carolina State University Laboratory for Analytic Sciences to improve the ways that investigators can use artificial intelligence to sift through vast databases of images, searching for a particular area, location, or feature. Stylianou is a Saint Louis University assistant professor of computer science. Jacobs is computer science faculty at Washington University in St. Louis. Pless is faculty in computer science at George Washington University, and served as doctoral advisor to both Stylianou and Jacobs.

Together, this team of researchers is looking to inject naturalistic language-driven guidance into AI search models, something that is intuitive but challenging from a computer science perspective, especially when trawling through large volumes of satellite imagery, for example. “They want to search for a particular type of land cover or a particular type of building,” said Stylianou while looking at this problem from the perspective of a National Geospatial Intelligence Agency (NGA) analyst. “It’s really hard to say without natural language what it is you’re looking for, but often very easy to define that with language. We want to be able to build search tools that allow investigators to interact with them using language.”

Stylianou and her collaborators are working with the TraffickCam, a mobile app she and Pless launched in 2015 that enables users help combat sex trafficking by uploading photos of the hotel rooms they stay in when they travel. Users of the TraffickCam system upload photos of hotel rooms they’re staying in, and that data is fed to analysts at the National Center for Missing and Exploited Children. Deep learning models developed by Stylianou and Pless then match user photos with images gathered in investigations of human traffickers, who often post photos of their victims posed in hotel rooms for online ads.

“Let’s say the investigative picture had hardwood floors in it,” Stylianou explained. “Now they get back results, some of which have hardwood floors and some of which don’t. They might want to be able to say, ‘Only show me pictures with hardwood floors.’ That is sort of the specification that’s really easy to do in natural language, but hard to do otherwise, hard to specify what does it mean to say I want to focus on some particular thing. That’s been this motivating use case for why we want to build in text guidance into image search models.”

Stylianou and her collaborators have three main aims that will seek to answer three distinct questions in the context of their research grant:

1 – “How you can take the capabilities that already exist in text-guided image analysis and make them work on parts of images and regions of images instead of full images?”

2 – “How to decompose text annotations into different parts and make text guided image retrieval work with combinations of different concepts?”

3 – “What’s the right user interface? What are the specific ways that a human might articulate their language that are different than what we are envisioning when we’re doing the research?”

The first two questions, Stylianou added, are platform agnostic, but the third entails applying those insights to the TraffickCam platform. She said that she is optimistic that she and her colleagues can arrive at a solution that will propel the problem of text-guided investigative image retrieval forward. “I 100% think that we will have something that is better than what we have today for text-guided image retrieval,” she said, noting that the Laboratory for Analytic Sciences can choose to extend the award to subsequent years.

The eventual goal is to use the insights they gather in the course of this project to improve TraffickCam’s utility in actual human trafficking investigations and to more broadly apply those insights across other image retrieval projects. “Imagine you have the entire Earth’s worth of Planet imagery that’s been captured every day,” Stylianou said. “That’s too much for somebody to sit down and actually look at. I want to search for an anomalous event, or I want to search for a particular land use type, or I want to search for building that has particular properties, because that’s relevant. And that’s the sort of thing that being able to incorporate natural language into the image search that you’re doing is potentially really, really powerful.”

As geospatial artificial intelligence (GeoAI) continues to grow, research like the project that Stylianou and her partners are starting, combined with emerging applications and technologies and growing amounts of geospatial data will help power its evolution as a field. To help tackle problems and introduce new opportunities for collaboration, TGI has launched a GeoAI Working Group, with its first meeting happening on March 6th, 2023. If you or your organization are researching or applying AI technologies to geospatial questions, join us for this kickoff event happening online and in St. Louis.

Stay on top of all the exciting things TGI is doing: Subscribe to our mailing list to get all the latest news on TGI people, research, and events.