
Presenter: Morgan Strong
The Queensland Art Gallery | Gallery of Modern Art (QAGOMA) embarked on an ambitious project to leverage AI to make their digital content easily accessible on personal devices. By pointing a mobile device at an artwork – 2D or 3D – the artwork is identified rapidly to return related digital content, such as descriptions, colour and shape analyses, interactive components and interactive questions. Morgan Strong explores the selection process, challenges, onboarding visitors, model training and design considerations for the app.
Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.
Want to learn more about this event?
Visit the Fantastic Futures 2024 Hub
This transcript was generated by NFSA Bowerbird and may contain errors.
Thank you. First of all, I'd like to acknowledge that I'm presenting on the unceded lands of the Ngunnawal and Ngambri people, and I pay respects to Aboriginal and Torres Strait Islanders, elders past and present, and in the spirit of reconciliation, acknowledge the immense creative contribution of Indigenous people make to the art and culture of this country. That's me. I'm head of digital at the Queensland Art Gallery, Gallery of Modern Art. I'm resting wistfully knowing I'm giving the third talk about computer vision so far today. But this one's a bit of a case study. So hopefully we can have a bit of a fun and learn from what I learned through this project. So it's going to be three parts. We're going to talk about who we are as an institution and the journey we took to get here. Art Seeker, the app that uses AI that's in production, and kind of lessons learned, what went well, what didn't go so well when we implemented it. So a little bit of context. That beautifully lit up building on the banks of Maywar, that's my office. I don't see any of that view, I'm right in the middle there. We've got a beautiful Terrell lighting it up. For our international friends, Brisbane is about halfway up the eastern coast, and we are aiming to be Australia's most inspiring and welcoming gallery. Importantly, this vision is going to come in for Australia, Asia, and the Pacific. Now, very importantly for me, too, I came on board at the start of 2020 to lead a digital transformation project, and that was really about upgrading all of our back-end systems, digitising the collection, and trying to find the ways that we had digital content interpretation easier to access. So that's where this kind of project all comes in to be. We're making a lot of digital content through this project. We went from, well, I think in the 50s up to about 94% digitized as we are now. And we made a lot of content in that process. But I wanted to look at ways that we could make it much easier to access that content rather than just searching online and getting it all, and whether or not it was possible to leverage AI to make this a bit easier. So I started this off way back in 2021 with bringing on a paid internship from Queensland University of Technology. Our intern was doing self-driving cars, which is the natural progression. And I wanted to look at how we could use AI in different ways. So we started off with a project that didn't really work so well. This would be a lot better now. We wanted to see if we could find better kind of representations within our collection and stories and make them a bit easier to find. Now, you can see some Japanese prints here. This worked really well. We could find the characters, as you can see in the middle square there, and put them in a queue for translation. And after that, we could then find kind of objects of interest that you could then see in themes across the collection. However, that only really worked in certain parts of the collection where there was a lot you know, kind of homogenousness to it, and we could validate the model against really big open reams of data. So when we started looking at some of our Pacific, if you remember our vision, and so on collections, the hyperlocal nature of it and the fact that there isn't huge amounts digitized meant it just wasn't the right approach. We weren't doing the right, you know, justice to the work. So gave up on that pretty quick. This was another approach where I'm sure Jorge, our intern, really enjoyed the day I came in and said, I'd like to see if we could search by vibe. And I do have a personal belief that we're not very welcoming to casual visitors. We kind of expect them to know way too much to start their journey. There's no real issue that if you see something you like, that's the entry point that I want them to discover the rest of the collection and the art that we have on display. So we started some experiments where we applied all kinds of filters that would look for kind of the balance of colors, you can see on the right there, or where there were different shapes, or where there was flow in the works, or where there was different kind of contrasts, and then build these kind of generalized models that go, okay, I visually kind of like this, can I find some sort of a way to suggest some unexpected things that you might like as well? That was quite promising. The issue being that we don't have all the time in the world to really make this work, so it was hard to determine which ones was noise, which one was signal, and that approach yielded some pretty cool results, but maybe there's some better ways to look at it. So this was almost a bit of an accident, but it worked really well, where we noticed no matter how bad I took a photo of an artwork, I could recognize it from some reference photos. So even some really, really similar things, works, it could pretty much determine what work I was trying to look at. So they're the background experiments that we had going into this. I'm now going to just play a little video, which is the app itself. So now let's do a little bit of a demonstration of ArtSeeker in action. So I've got the web app up. I activate the camera. We'll get the work inside the crosshairs. take a quick photo, and it pretty instantly recognizes it. Yes, this is the work. We'll then come back and start returning some information. We can get a bit of a description about it. The color profile is really cool because what happens is it simplifies the artwork to find the five most dominant colors within that, and then it searches the rest of the collection. It takes a couple of seconds to find any other works that have that balance. So it's not looking for a single color. It's looking for a balance of colors. We ask, how does this make you feel? You can leave a bit of a discussion, so we can thread some resources. And any materials that we have about that particular work, we can just browse them here. And then also, we can kind of show what's nearby. So if you don't want to go through the effort of scanning, again, because you don't want to wait those 200 milliseconds, you can kind of browse through everything else that's in the room and potentially look at those works and also find some content about those. So that's where we ended up. I'm now going to dive into the back of how that kind of works, then I'm going to go into how we can make that sustainable, because we can't add more work to make this production network So quite simply, there's a back end, which is our collection management system and dams, which is where we do all of our cataloguing and so on Then there's a Drupal 10 site, which sits on top of that, which is the public interface. It's a headless API source. And then we have a progressive web app, which is a React Expo app. And it talks to an AWS inference, which I'm going to talk about in a little bit. So there's lots to it, but because this is an AI conference, I'm just going to talk about these two parts. So, the journey was not straightforward as to make it work across the gallery that quickly. We started making a Siamese neural network. Now, that is really accurate. It's also incredibly slow. So, it kind of works with a baseline vector. You put in two inputs and it compares the results. It's kind of like a database against fingerprints. The issue being that the more works that we added into the recognition, the slower it got. So when we started off, it was like two, three seconds, acceptable. Once we started getting to gallery size, 15, maybe not. Once we started adding to galleries, 40 seconds. That's really not very nice entry point. The idea is we want to make it as seamless as possible that if you like an artwork, you can kind of just put your phone up. You don't have to look for a QR code. You just see it, scan it, and then you start the discovery of journey. You know, start interacting. So it worked well in the theoretical stage, but it didn't go so well as we scanned out. But it's very, very accurate. So if I knew that a theme was going to be slow AI, I probably wouldn't have made such a big deal about the library that we did use, which is called fast AI. But fast AI is a lot simpler. And it's fast. It's a deep learning library, and it aims to be very easy to use. So if anyone's just dipping their toes after hearing about all the cool stuff here, I highly recommend taking a look at this one and just think of it as a way of achieving slow AI. It also has the bonus of being a Queensland project. So we built this in-house in Queensland, and we're using a Queensland framework. So that's nice. When we added more works to the fast AI inference, so that's where it makes the, I guess it works out what workers that you're looking at. It scaled in a linear fashion, so if we had like 1,000 works in there, it was, you know, 100 milliseconds, then we got up to about 7 or 8,000, it was only up to about 200 milliseconds. And it meant that the cost didn't blow out. The way we train it is we have a whole bunch of images, we tag them as to what ID they are as training images, and I'll get to that in a little bit. It only costs about $1 to $2 to retrain the index each time. Whereas when we were using the Siamese neural network, it was 10, sometimes hundreds of dollars per training round. So this is much, much more sustainable. It's not quite as accurate. It's about 95% to 98%. But it's good enough. And it does the trick. And we've got that Siamese network if we want to use it into the future at some point. So that's the app. But I think what's more exciting is making it sustainable. So it's one thing to get an app and put it on for a show, and then you've got like 100 works on display. You work really hard, and you get 100 nice descriptions and all this rich content about those. But you can't do that kind of effort for a whole gallery. And this works in the entire gallery. So we had to really build it into our BAU process. We had to make it that our regular work was actually making the app richer each time we did things. So we have a architecture. I won't dwell on it, but that's all the components that we've got in our architecture. And I couldn't really add any processes where we're going to double enter data into any of the systems. I really just had to use what was there. So this, I actually think, is probably just as impressive, if not more so. It's the audit app that a colleague of mine works on delivering. So effectively, the one part of our process that was still very paper-driven was each day, Floor staff go around and they audit all the works to make sure that they're there, that they're in good condition. No one's done anything naughty to them. And at that point, they can check that off. And that's what you can see with this little mobile app. And they see it, check, check, check. And then that validates to tell everyone, yeah, that work is on display. Therefore, it's available to be scanned and it will return. Also, it tells you how many training images we've taken on it. And it means that a process that was really clunky and manual We automated, but used it as a way to train ArtSeeker in recognizing the works. So my colleague, who's no longer here with me, is now at State Library Victoria. He didn't die or anything. But he's going to take you in. So what we did is we actually built some tools to assist us in that. And I'll give you a demo of that now. If I open up this internal app, I can do the training of this work by just finding it in the list. So I'll just do a quick search. Here it is. And then I can go down and add training data. So here. I'll first take this flag to take a reference image. This is what we used to verify with the training data that it's the correct image and then all I do is take 5 images so. One from the middle and then I'll step around the artwork and just get on slightly different angles just to give it some coverage of the different angles that people might take of this artwork. And that's all we do to get the 200 millisecond response time for him. For the inference, I cut him off a bit early there. Sorry about that. Yeah, so that to me was, that was what made it sustainable is that before then, I had to go around and literally take them with my phone, go into S3 and put them all into individual buckets for each artwork and it was a nightmare. Now, when the new exhibition's hung, it takes two hours to train it initially, so that's a huge difference. Okay, so now I'm going to go to my last little bit, which is lessons learned. If you're going to try something at home, these are my takeaways from my case study. What's gone well? I would say the small agile team that we had. There was Jorge, our intern, who did the initial research, and it was just Nick and me. And it just meant that we could iterate really fast. I looked after the kind of back ends and the Drupal, the CMS, the DEM side and the APIs. He did the AI inference engine and the front end. And it just meant that we could catch up on chat and really just build it really fast. For me, the fact that it works in our existing workflows, if you do stuff, it ends up in ArtSeeker. You don't have to do stuff for ArtSeeker. The tech works, that's nice. It's cheap. Our Amazon hosting bill is very small now. It's agnostic as well, like in terms of right now, we've got a headless Drupal source, but we could put anything into there as long as we've mapped it the right way. was really well targeted in the pilot phase, so when we first rolled it out, we did it just in one gallery called Creative Generation, which is like the top year 12 students in Queensland, and we invited them all in, told them about the app. I can see we've got the Brat Summer lighting in their honor at the minute, and we really just got it going for that younger demographic, and they loved it, and it's really funny, though, like they were scanning it, and they're, oh, that's cool. After one scan, it's just accepted, it should just work. The novelty wears off so quick. The experimental research phase was well utilized. You know, we went through all of those steps. I said we found the best way that we could use a very limited amount of time with our internship and made it to production. I think that's kind of cool. It works across the gallery now. The audit app I think is really, really useful like as a way of what's going really well is we're in the process to make this app work. We've really improved our auditing process and how we do conservation reports. Lessons learned. This is really important. We were just taken, Nick and I, by how well the tech worked that we kind of forgot about the content and engagement side. So when we showed people, and then when, because it could be gallery wide, sometimes they were scanning stuff and nothing was coming back because there was nothing in the backend systems. And you're like, well, that's useless. I just scanned the work and got some metadata. So really, like, the focus kind of has to be that if something's going on display, we need something to make it worthwhile. Because if someone engages with it, we need to be delivering something back. Ongoing funding's always difficult, it's just me managing it now at the minute. Innovation in traditional orgs can be a bit slow and hard and selling the benefits as a result can be tricky. And also because it's a web app, not like an installed app, the ReactExpo APIs can be quite fragile, like iOS can release something and it just breaks the next day and then you gotta go fix it. But I would say that all of the approaches we took were the right ones because it's working. Now, there's no questions allowed, so I'm going to end on that note, but you can come talk to me afterwards. I hope you enjoyed.
The National Film and Sound Archive of Australia acknowledges Australia’s Aboriginal and Torres Strait Islander peoples as the Traditional Custodians of the land on which we work and live and gives respect to their Elders both past and present.