
Presenter: Benjamin Lee, Kath Bode, Andrew Dean
From the application of AI-generated poetry to understand one of the earliest examples of generative language from the 1960s to using AI in the investigation of Irishness in Australian newspaper fiction, Benjamin Lee, Andrew Dean and Kath Bode present case studies on how academic AI projects can further scholarly inquiry and reinterpret digital collections.
Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.
Want to learn more about this event?
Visit the Fantastic Futures 2024 Hub
This transcript was generated by NFSA Bowerbird and may contain errors.
Hey, everyone. It's wonderful to be here to be a part of this conversation. My name is Ben. This is Andrew. We've been working together on a collaboration. We're going to keep our time to as close to seven minutes as we possibly can. The timer is running. You can keep us accountable. And we want to make sure that Kath has plenty of time as well. So I realize I should probably use the clicker. There we go. So Andrew and I are going to talk today about our work to reanimate Kitsiya's archive of code that he wrote surrounding computationally generated poetry. And as some very initial scaffolding for this, one of the themes of the conference here is looking at generative AI and themes of creativity in the archive. We recognize that if we are to look into the future, we ought to be able to look into the past to understand the history of generative AI, how it intersects with questions of creativity and literary studies, and also to understand what it means as we have increasing volumes of code bases that are being preserved, whether in an analog format or now increasingly in a more digital format, how we actually make these code bases usable to then be able to interrogate them as historical sources later on. So a little bit about our collaboration and our different disciplinary backgrounds. I come from a training in computer science. I'm now in an information school. And much of my work is really centered around this question of what I call computing cultural heritage, of how we might, in synopsis here, build and develop search systems for digital collections, really surfacing questions of discoverability and access, how we then use those to further collaborations in the digital humanities, and also investigate or interrogate these kinds of questions of ethical and sociotechnical dimensions. But I'm going to turn it over to Andrew here to frame the other component of this work from a literary studies perspective. Sure, thank you very much. Yes, I work in literary studies, often quite traditional literary studies, among other things. I'm a scholar of the writing of Jane Katsia, South African-Australian novelist. So, yes, as I say there on the slide, there's often been this understanding that there's long-standing kind of separation between computation and literary thought that relates to the nature of creativity, which we understand to be separate from reason. We hive off creativity from reason. Reason is everything that creativity is not. Creativity is everything that reason is not. But that distinction is not really sustainable. All writers are engaged with the form in which they are using. They're always a reader of their own work. They're thinking with the machines that they're using. And that includes everything from word processing to pen and paper. And we're always engaged with this in all our own different ways, from post-it notes on computer screens to the notebooks that were handed today at the beginning of the conference. So here we have a literary magazine from the 1970s, an anti-apartheid literary magazine with the poem there called Hero and Bad Mother by a famous now writer, J.M. Katsia. And it's created by a computer. And the poem is terrible. And it's actually very important that it's terrible and that we understand it to be terrible because it's not just a subjective account of this being bad. This poem is a kind of dreamlike series that could see it as used and generated and then put into some kind of order. Now, how did he do that? Well, he was a computer programmer in the early 1960s at the origin of commercial computing. He coded in Fortran for IBM in the early 1960s and then the competitor for IBM ICL. And then he, during his PhD, he used computation as a way to understand stylistics in the work of Samuel Beckett. He understood this, though, to be a failure, a profound failure, and I'll talk about that in a moment. One of the things that's important, and I say this on the slide, is that he was using these computational methods in order to escape writer's block. And so this is from his autobiography of another called Youth. If his heart is not in the right state to generate poetry of his own, can he at least string together pseudo poems made of phrases generated by machines, and thus, by going through the motions of writing, learn again to write. So he's in dialogue with these machines that he's created using the scarce computer time, creating something that will allow him to escape the domination of reason, which is stopping him at that moment in his life from writing. Now, as it happened, he departed from computation really from the late 1970s, even though he did create the timetable at the University of Cape Town using computational methods. But nonetheless, he thought of this as a wrong turning, and for a number of reasons. One of the most important being its relationship with political oppression, in particular the implementation of apartheid in South Africa, and also the ballistic weapons program at Older Mason, which his own work was contributing to. And I have there at the bottom of the slide another quote from his autobiography, because he was working on one of the very first operating systems. It's also a way of thinking about what his own writing does, which is that it is constantly assessing itself. So it sallies forth and then moves back. So as I say there on the slide, or as Kaseya says on the slide, should it be read as another swing of tape it must ask itself, or should it, on the contrary, break off and read a punched card or a strip of paper? These questions are to be answered according to the overriding principles of efficiency. And if you change the word efficiency for beauty or something along those lines, you come very close to describing how his fiction works. Okay, so if you visit the Harry Ransom Center at the University of Texas at Austin, you'll find boxes of Cutsias papers, including dot matrix paper, which contains all the code he used for his computer-generated poetry. So on the left-hand side here, we have him beginning to formulate the algorithms and actually develop some pseudorandom number generators to produce this poetry. You know, we've taken a step back in time here to Fortran. We're actually feeding him punch cards, and the resulting papers are actually a physical archive that we're then considering. not just digitizing, but actually reanimating the code. And so I think this is one of my favorite bits in the archive. Here's a word banker, effectively the controlled vocabulary that Quesilla was using in order to generate these poems. And so we can see again that this is quite rudimentary. And also, you know, certainly not pushing the bounds on what we would think with large language models in terms of how the poetry is being written. But nonetheless, again, a really important opportunity for us to recognize that going back 60 years ago, conversations among literary figures surrounding generated AI, or excuse me, generated poetry, creativity, and these conversations around AI were present. So our work has taken the form of re-animating the code base. In the interest of time, we're not going to show a full working demo, but we have his code fully re-implemented. We can not only generate the poems that he himself has created, but also effectively generate new Kitsia poems by re-running the code, feeding in new seeds. And our next step here is building out a nice UI, which we'll hopefully be able to share very soon. But again, I want to emphasize here that the question for us isn't just about this looking at LLMs, looking at generative AI in the present day, but understanding its history or this genealogy of work. And then also thinking about the fact that larger code bases are just going to be a challenge persistently for all of us. And so here to point to all the great work happening in digital preservation and critical code studies in terms of how we make these archives usable and continue to interrogate them in meaningful ways. Andrew, I don't know if we have anything else. Okay. I think we just have some acknowledgments in particular, our collaboration at the Kluge Center at the Library of Congress and Gina Wynne who has done much of this work with us. But with that, we'll keep the time and turn it over to Kath. Thanks, Andrew and Ben. That was a brilliant job of keeping to time. And I will try and now do the same. I'd like to pay my respects to the Ngunnawal people, the traditional custodians of the land we're meeting on. So when we were writing the abstract, people weren't joking, were they? When we were writing the abstract in, this isn't my slide. When we were writing the abstract in May, we felt confident that by October we'd be able to report on what we'd promised, which was to tell you all that we'd learned about Irishness in Australian literature by using Word2Vec to create a word embedding using a large corpus of digitised Australian historical newspaper fiction. And we had reason to believe that we could do this, so we're part of a team that's just been given Australian Research Council funding to do this job. We'd be working with a familiar, for me, an extensive corpus of around 52,000 publications of fiction from Trove. And that's a very large corpus that I've been working with for about 10 years. And some of us also have some experience working on this. So Galen and I have been, oh, I can't go back. Galen and I fine-tuned GPT-2 back in the before Kraken days to explore whether we could create a chat interface for students writing and reading with the different genres of machine learning and newspaper fiction. So, despite our initial confidence, five months on we haven't produced a usable word embedding, let alone used it to investigate the complex intersections that constitute Irishness. So basically, in making this promise, we now think that we actually just didn't pay attention to the diverse and unexpected ways machine learning participates in the involving knowledge apparatus, and in particular, how OCR would. So taking fantastic futures back to the future with OCR, So despite not doing what we promised, we wanted to discuss what we've learned from this project so far in the hope of connecting with others in this room who are working in this same space as us, same space being employing larger language model technologies with historical archives. doing so in ways that are attentive to historical and social as well as technical challenges, and doing with not much money. So we'd love to know if you've had any of the experiences we've had and what you've done. So after OCR sunk our efforts to create a word embedding from scratch, We first tried to use LLMs to create a cleaner corpus for word embedding. And we anticipated that different LLMs would have different rates of success in correcting OCR, but we were unprepared for the extent of the difference. So this is a comparison of the text in Trove from one of the stories, corrections of it by a local LLAMA model and by Claude accessed through their developer API. So due to factors including a much larger model, larger context window, and more robust system prompting, Claude's corrections are actually really impressive and could be further improved using its multimodality. But even if we could convince ourselves to, you know, subject the world to the social and environmental harms that would come from using Claude to do a large corpus, we wouldn't be able to, we don't have the money, we estimate it would cost around 100,000 US dollars for our 52,000 publications of fiction using this method. So if we were surprised by how well Claude did, we were also surprised by how bad Lama was. So we've heard that such errors are called hallucinations, and I guess if you're thinking about distorted sensory perception, then sticking a a space between each letter, I guess that's a sort of hallucination. But the better description for a lot of the stuff that we found in what Lama was doing would be perseverating. So repeating an action long after the stimulus that prompted it had ceased. So what Lama's doing here, so it's describing what the model output should be, repeatedly promising no additional errors, going on and on about a $2,000 fine. These aren't in our prompts. So just as previous jailbreaking approaches have used garbled text to circumvent model constraints by overwhelming attention mechanisms, it seems that bad OCR is having the same effect, causing the model to turn from the task given to it by the prompt to increasingly harsh continuations of the prompt itself. So the implication is that some LLMs will be more likely to introduce errors into the text correction the worse the OCR is. So next, we tried to remove the worst OCR before doing the word embedding. And we couldn't use existing methods because these tend to rely on identifying systemic errors, which are minimal in Trove for various reasons, or learning from uncorrected and corrected samples, which we don't have. So instead, we applied multiple NLP measures that we thought might indicate OCR errors, and I can talk about them later if anyone wants, but found only trivial correlations. And we eventually decided we'd collate four of them for reasons that I can discuss and remove the bottom 10% as part of the bottom 10% of this quality score. And currently, we're using this hopefully cleaner corpus to fine tune a BERT model trained on 19th century British books. And I'm happy to discuss that later. But what I wanted to do to finish was to discuss a final unexpected thing that these NLP measures of OCR quality have indicated. So we thought we'd find, as projects on both American and British historical digitized newspapers have, that OCR quality would improve over time as paper and printing improved. But that wasn't the case in our measures. Oh, that's not my slide either. So you can see they're sort of flat, the various measures. This is what a recent paper from the American newspapers found. You know, you've got this decrease in that case in bad OCR over time. So we thought maybe in Australia, rates of OCR would be different between metropolitan and provincial newspapers for the same reason of better printing, better quality paper. But in fact, we found no significance. And it's super interesting. I wonder, is it because, like Zavia's presentation, that we've got a fiction archive that it's producing these different effects, and these other American and British ones are with general newspaper pages? Who knows? And that's sort of the moral of the paper today, that, you know, there's such a lot about the interactions of machine learning, including LLMs and historical archives, that we have to learn and understand before we can get going with the research. And I'd love to hear if you've had any similar experiences. Thanks.
The National Film and Sound Archive of Australia acknowledges Australia’s Aboriginal and Torres Strait Islander peoples as the Traditional Custodians of the land on which we work and live and gives respect to their Elders both past and present.