Fantastic Futures 2024 - Day 2 - Session 16

Title:

Fantastic Futures 2024 - Day 2 - Session 16

Year

2024

Summary

Adopting Whisper: creating a front end optimised for processing needs

Presenter: Rachel Senese Myers

Georgia State University Library has a rich collection of audiovisual material within its Special Collections and Archives department, with over 3,000 audiovisual assets available to patrons. In order to facilitate better accessibility to current and future assets, the Library has started a three-phased project to identify, create and expand tools for fast AI-generated transcriptions while balancing responsible implementation.

Fantastic Futures 2024

Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.

Want to learn more about this event?
Visit the Fantastic Futures 2024 Hub

TRANSCRIPT

This transcript was generated by NFSA Bowerbird and may contain errors.

Sorry, guys, we're going to hear about West Coast one more time. And I will warn you, I have way too many slides, because I thought I'd talk fast. We're going to see how fast I can actually go. So as I always said, as everybody does, we have our Pryo. So Georgia State University is one of the largest universities in the state of Georgia in the United States. We have digital projects, which is my unit, which is involved in digitization, AV processing, as well as our digital collections. Special collections is a separate unit. And they have nine curatorial areas, which becomes a little bit more important later on. So across those nine curatorial areas, we have conservatively about 200,000, but we actually think it's closer to 400,000 AV assets. It ranges from TV shows, movies, board recordings, musical performances, you name it, we probably have it somewhere in our assets. Of those almost 400,000, only 2,500 of them are actually online. Traditionally, when it comes to anything that was not oral history, we really didn't have any transcripts created, which is a huge access problem. If we wanted to be generous, we would try and use YouTube's captions, but it doesn't really work out very well. And we also did try an Adobe Premiere Pro transcription project, which I'll talk about in a moment. For oral histories, we did use a transcription service, but that takes time and that takes a lot of money. We also used OMS to actually sync those transcripts as well, which OMS is Oral History Metadata Synchronizer. So it syncs it back to your audio, which seems a little silly now that we have Whisper. And we had a lot of problems with this whole process that we were doing. With no transcripts, that's a huge access problem and an accessibility problem. For our oral histories, it was very expensive when it comes to time. and money, so we could accession an oral history in 2012. It may not go up until 2022 because of the time and the money involved. when it comes to transcribing them. OMS also is not searchable within our digital collections problem. So we really had to have a big ethical conversation within ourselves about what is our ethical responsibility to our patrons and our donors and what ethically we are comfortable with doing. So we did an experiment. We talked to a lot of our different peer organizations, we took what I learned last year from the Fantastic Futures Conference, and we decided we're gonna do an experiment with Whisper. So we had it run, University of Georgia ran a couple of us, we're like, okay, we're ready to really pursue this, because this is blowing Adobe Premiere Pro out of the water when it comes to accuracy. So we looked at our IT infrastructure and our environment, and IT was like, hey, you can't do this. You really can't do this. So we're like, OK, how can we do this? So we kind of settled on the fact that we had to have a UI, a user interface, in order to actually do this. So we're like, OK, great. That's fine. There's two out there. We experimented with HuggingFace's web UI. We also experimented with Wishper. So what we found when we were experimenting with both of these was with the web UI, we really liked the fact that we can choose our settings and what we wanted to use with Whisper. There was diarization integration, and we could actually process our videos already on YouTube. So everything that didn't have a transcript, we can now, instead of downloading it, loading it, we can actually just give the YouTube video. We didn't like the fact that there was no database functionality. So if we accidentally refreshed our browser, that was gone. We had to redo the whole process. It was great. There was also no progress information, so if something was stuck, we really couldn't tell. So in order to, and there was also no editing of those transcripts. So we kind of got what we got. We didn't have a really good way of editing it. So we decided let's try Whisper. That will hopefully answer those major problems with the web UI. Tried Whisper. We love the database. We love the editing interface. It's relatively easy to use, but there's no diarization. Couldn't choose our settings and what model we wanted to use. There was also no batch uploads. We couldn't really use YouTube very well, and there's no batch edits. So, what do you do? You create your own. So, we took what we liked from Wishper and what we liked from the web UI and we are creating our own. We also have a few other things that we wanted to use and incorporate into it. So, like the batch edits. We also wanted a frontend and also an admin portal. So, that way on the backends, I'm able to kind of set our prioritization. And we also wanted some specific outputs. So our web developers created what they call WhisperScribe. There's our GitHub, so if you want to follow, we would love to have you. They just rolled out the first iteration of it that we're currently testing. They literally just gave it to us like three weeks ago. So we're working out all of the bugs in there. So on the left is the ingest, on the right is the editing. And then for future developments, what we're going to be doing is actually incorporating our special collections department. So on point of ingest, or on point of accession, we're actually going to be creating partitions. So each of our curators within the nine different areas will have their own area. So they have a private area where they can actually run any AV that they receive. Especially if it's not going to be public, but we still want to make sure we have that transcript at the onset, so they can do it there. We're also going to have dynamic user permissions, so that way if they have GRA students or volunteers who are going to do this work, we can set permissions for them. And then we're also going to be looking at investigating syncing some of those existing transcripts that we already have. So, there we go. you

Fantastic Futures