Fantastic Futures 2024 - Day 1 - Session 5

Title:

Fantastic Futures 2024 - Day 1 - Session 5

Year

2024

Summary

Beyond ChatGPT: transformers models for collections

Presenter: Peter Leonard, Lindsay King

Transformers-based models have shown that many complicated problems in culture beyond digital text – a human voice, a handwritten word, even a scene from a motion picture – are now tractable to computation. Added to this is the simultaneous development of 'conversational interfaces' as a way of interacting with cultural material. Peter Leonard and Lindsay King draw on their experience working with Stanford's collections to explore several non-commercial multi-modal large language models and what conversational modalities can offer GLAM patrons.

Fantastic Futures 2024

Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.

Want to learn more about this event?
Visit the Fantastic Futures 2024 Hub

TRANSCRIPT

This transcript was generated by NFSA Bowerbird and may contain errors.

Thanks so much for the introduction. Thanks for the chance to come and speak here today. And I'm Lindsay King from Stanford University Libraries. This is Peter Leonard. I'm going to start with a brief overview of what we'll cover today and the ideas that we're motivated by. Transformers, of course, enable more and more complex human culture to be explored via computation. We can observe that in many of the presentations at this conference, but we also need for that to be true given the sheer number of digital cultural artifacts and the rate at which that number is increasing. We can still believe in close reading and the importance of physical objects while finding value in analyzing digital cultural heritage objects as humanities data. As we think about how people interact with our vast collections, chat-based interfaces are too popular to ignore as ways of interacting with cultural material. We may not like the chat GPT turn, especially with its many problematic implications for higher education, but the speed and ubiquity of its adoption means we can't just ignore it and hope it goes away. How can we make it better in our library, archive, and museum context and help people understand what it can and can't do, as well as how chat doesn't only mean chat GPT? It does seem like we are at a moment in both the development of digital cultural heritage systems and practices, as well as in the development of artificial intelligence, where transformers have literally transformed our possibilities, transformed what we need to think about, transformed our understanding of risk. It's often interesting to think about that we think of the T in CHAT GPT as being the most famous place where transformers occur. And this is true. But of course, the T is lurking in all sorts of other acronyms. It's in generative pretrained transformer. We've heard a little bit about BERT this morning, bidirectional encoder representations from transformers. There are vision transformers. And without getting into the actual details of how these models work, I think what's interesting is just to observe that if prior models such as convolutional or recurrent neural networks sort of tried to focus on the details with all sorts of different specific ways or lenses of understanding things, transformers ask us to be able to look at material all at once, to pay attention to almost everything at once. And that ability is, I think, sort of fundamentally what is contributing to the ability of processing all more complex forms of human culture. One of the things that is interesting about this shift from transformers, of course, is the 2017 paper, Attention is All You Need, this shift to looking at the whole thing at once, the shift towards what we call multi-headed self-attention. And I think if you step out a little bit and consider the implications of transformers, On the left of this slide, I've placed types of human culture that were very difficult to treat or to process computationally, even five years ago, even three years ago. Things that were not easily tractable. So we could have an audio recording of the human voice, as we heard about this morning. We could have video in the institution in which we stand. We might have human handwriting. These are now taking their place alongside Unicode-encoded text documents, which in various forms have formed the basis of a lot of text and data mining going back to Roberto Busse in the 1940s. We are using transformers to both recognize them and increasingly to sort of produce embeddings or lower dimensional representations of that data. We're storing those in vector databases and then through both traditional keyword search methodologies as well as this sort of promising but unproven opportunities of natural language chat, we're enabling search and discovery on top of that. So today, what I'd like to do is to take us through four examples of what transformers are doing for cultural heritage. You can decide if it's for good or for ill. We're going to talk a little bit about topic modeling embeddings rather than words with BERTopic. We're going to talk about the relationship between text and image in sort of very simple multimodal networks like CLIP. We're going to talk about having conversations with an archive. And then finally, we're going to be extending that idea a little further into the notion of multimodal conversations that are enabled by multimodal models. Let's start off with these. Anybody in the room has done topic modeling? Just raise your hand. That's great. So this is a well, I mean, well understood. This is a thing that we all, a lot of us do, and we all struggle with interpretation, and we have a lot of fun with it, especially those of us on the literary side. So what happened with sentence transformers and the way they were kind of instrumentalized in BERTopic, which is an open source package, is that we started working with embeddings or dimensionality reductions of corpora, of textual corpora, rather than words themselves. So we were working with mallet or latent Dirichlet allocation tools, we were looking at words and maybe chopping off some of the endings for stemming or lemmatization, but fundamentally we were working with linguistic tokens in words. Now we're working with word embeddings, which some of us recognize from word embedding models or vector space models. And these are essentially, rather than 100,000 dimensions in American English, we would have like a couple hundred dimensions. For good and for ill, BERTopic tends to treat the task of topic modeling embeddings as a classification or a clustering task. which means in most implementations the documents can only belong to one topic. And this is very different than the way that LDA implementations such as Mallet have done this in the past. It has advantages and it certainly has disadvantages. What's really interesting about the current moment is that as we start working with these sentence transformers, a lot of parts of the workflow actually become GPU accelerated, so we can do this work at a scale that was unimaginable before. But we also have to keep in mind the models with which we're working, which in the case of BERT Topic are generally sentence transformers. I think it remains an area of research to figure out how well do sentence transformers generalize to a chapter in a modernist novel. I'm certainly not confident about that question. So let's take a look at one example here. And what I'm going to show you is about 5,000 journal articles from my field, which is the field of Scandinavian studies. I could read all those articles, but nobody has time for that. What I really want to do is use sentence transformers to surface latent discourses within my field and understand how those work. So in a scale of one to five, we just create these embeddings with a sentence transformer. We reduce their dimensionality. We do a bunch of interesting clustering. And then what we end up with after that is kind of an interesting visualization of the semantic space of my particular field. And so what this sort of looks like in the image to your right, I'll just move the mouse around here a little bit, it's a three-dimensional space. I reduced it down to three dimensions. And what I'm going to be doing here is hovering my mouse over two clusters of topics, themes, or discourses. The first is kind of Norwegian stage realism, Ibsen and his contemporaries. And then a little north of that is the sort of Strindberg cluster. You'll see in that cluster the word Julie from the play, Frekensfolie, or Miss Julie. So it makes sense intuitively that the Ibsen and the Strindberg stage clusters are close to each other, and the Viking medieval stuff is out on the right. sentence transformers can do for a kind of distant reading, a kind of topic modeling of 5,000 articles in my particular field. And the reason I'm showing you this is to get us thinking about the notion of embeddings, the concept of embeddings, dimensionality reductions from all of the words in the Norwegian or the English or the aboriginal languages for that matter, but that still maintain a lot of their semantic values and allow us to understand knowledge this way. Then I want us to move over to beyond the written word, although I'm a literature person, I want to start thinking about the relationship between image and word. And there's been really great work in the cultural heritage sector with clip models, including by many people in this room. But clip networks, which are implicated in DALI and other text to image models, but are not solely responsible for that, can best be sort of understood as a kind of representation or a connection between the visual and the linguistic. What CLIP does is essentially create identically sized embedding spaces between linguistic or words data and then visual or pixel data. And then what's really exciting is it allows us to map between pixel and word, text to image. So what's kind of interesting is that you can use, of course, CLIP models for all sorts of different purposes. You can use them for projects that are both sort of generative as well as analytic. But I think that the thing that's interesting to, in my opinion, is really the ability to think about what they can offer us for non-explicit search, for more kind of latent search, what we call sort of evocation and ambiguity. And just as a quick example of how these networks are actually working behind the scenes, I'll show you two quick pictures. The first is of a cat and the dog. I'll try to summon the cat and the dog here. There we go. What I've done here is just as a sample, I've taken a clip network and I've said, here's a picture of a cat and a dog. If I type the words of the cat, what is activating in the pixel space for that? And it's obviously the cat. That's a pretty boring example. What if we go to something a lot more interesting? What if we take this picture of this business person on the left and I give it an adjective like formal? What will activate in the pixel space for the linguistic token, the adjective formal? And in this Western man's suit, it's really kind of the tie and the belt and stuff like that. So that's kind of interesting. And it suggests the best way to use CLIP is perhaps not to find cats, as compelling as that might be, but rather to search for things that are evocative, concepts, feelings, qualities that we haven't really been able to search for in our collections. And as a great example of how we're using this at Stanford, I'm going to turn it over to Lindsay. So even more promising for places like those that many of us work for is the ability to search for things that aren't simple labels. So we can search for concepts or feelings, as Peter was saying. And those associations between text and visual elements means that we approach something that's evoked by an image that is not necessarily present in the metadata. So maybe we can sort of work on both of those areas at the same time, the metadata and this kind of evocative search. So I want to demonstrate how this can be a compelling discovery tool, especially if the metadata for these images is lacking. What we're looking at is a demo from a sample of the Stanford Visual Resources Collection, which in total contains more than 250,000 digitized slides used in teaching over the past several decades. So most of those images were photographed from books and originally made into analog slides back in the day when that was the only way that you could teach with images. So there is item level metadata. It's often quite detailed because VRCs were the original digital collections and libraries when they began being digitized and cataloged on an item level in about the 2000s. However, CLIP can also work with no metadata as we've seen with projects from the National Library of Norway and the National Library of Sweden. And so in the case of Sanford's VRC, we can turn on the metadata display, which allows the researcher to sort of use this as a new discovery tool, searching and browsing, and then easily returning to the original image in our database. So this initial example search is for painterly abstraction, which is a phrase that has specific meaning in art history, but isn't a subject heading or the name of a movement that would appear in a record. And so the search results here are pretty accurate. And the next one is serene relaxation, which brought up a lot of architectural settings, but also some sculpture and a drawing. So it's interesting that it's going, catching multiple different types of works of art. And we can recognize how the model might arrive at these search results. So we have a lot of architecture slides in the collection. Of course, the weather is not part of how architecture slides are cataloged, but the sky is visually a very large part of the image. So when I search on overcast, they come up in the search. So we've got festive celebration, which brings together celebrations in India, China, and Germany from this little sample of the search results. And then we have words like desolation. We got a lot of landscape in this set of results, but also some paintings and even a photograph of an installation. And then I searched for candlelit ambiance, and I got literally some pictures of candles, but also sort of the ambiance, the feeling of the spaces, the quality of the light, and they're all interior, so that's kind of an interesting finding, too. Then I searched on urban angst. So we've got multiple media here too, from photography to painting to architecture, and some is more abstract, and some more representational, but it is, as you're seeing, evoking that feeling. Floral abundance speaks for itself. Again, multiple media, we've got some glass in there, paintings. And this is one of my favorites. I had searched on waves and I like how some of the waves are literal, as in the painting and the photograph at the lower right and pictures of Niagara Falls. But then many are just abstract or suggestions of waves. So you can see how the model would arrive at this with those overlapping spaces of text and image. So lastly, mysticism gives us contemporary to medieval images, many different traditions, and multiple media again. So it's really interesting to look at a huge collection like this and kind of think, what are new ways that we can search through this kind of information that we've been collecting for a long period of time? And are there new ways that this kind of collection can be useful when we have tools like this clip search? So ClipSearch also works with multilingual collections and search terms. For example, here, searching across a Norwegian collection, it definitely has no images of Bibimbap. But searching for Bibimbap, we can search on that term in Korean and get some search results with this Norwegian set, even though the linguistic spaces are not really overlapping. So they're very close. You get the closest images in that data set. So we're able to explore the collection in much different ways, but it shows that those vectors are working regardless of language. So now I will hand back to Peter. So far we've talked about the ways that transformers can be used to generate textual embeddings, which we can then cluster and try to classify in certain ways. We talked about the way that a combination of a vision transformer and actually a sort of earlier GPT-2 model have produced CLIP. the relationship in sort of imaginary space between pixel distributions and linguistic distributions. And so now it's finally time reluctantly to talk about the first chat application here. Before I agreed to talk about conversations with an archive, I promised myself I would make two points. So the first point I want to make is that language models are not knowledge models. There's something incredibly ironic like from an O. Henry short story, that we produced these amazing language models and then immediately misinterpreted them as stores of facts and truth. That's not what they are. They are capable of suggesting the next most probable token as part of a mass pre-training. They are not actually telling you the truth, but they know the shape of a probable answer and have a propensity to hallucinate in order to fill that curve. So what could go wrong with chatting with an archive? I think there's an interesting idea here that there is something very powerful about the sort of natural language layers of an LLM, that you can type something like, write me code that whatever, and that no, try again, comment it more. Those aren't instructions in a sort of Lisp or Python sense. They're just natural language. So that's very compelling to people. So what if we can capture the natural language instruction layers from an LLM but draw the facts or the truth or whatever you want to call it, the evidence from a separate set of documents you provide. How many in the room have done retrieval augmented generation? That's great. I think we all should be playing around with it. It's not a magic bullet. Under the hood, it's pretty sketchy. But I think it's really interesting as an evolution of the way we use LLMs to work with archives. The paper came out a couple years ago. And in fact, you can use all sorts of interesting packages in order to do this work. You can use an open source large language model. You could choose Stanford Alpaca. You could choose Meta's 3.2 release of Llama. You could choose all sorts of interesting things. There's software, which is sort of like middleware, like LangChain. There's a downloadable Windows executable from NVIDIA you can play around with. I won't mention Notebook LM from Google. That's probably the right way to do it, even though it's a closed model. It's so sophisticated. My interest is in Silicon Valley history. Stanford lies in Palo Alto. It's at the center of a lot of innovation, both for good and for ill. And what I'm really interested in is the archives that we have collected about the development of the valley from the post-war to the present day. I have a collection of about 34,000 documents. And they're only from 87 to 97. And what I love about this corpus is that it doesn't know anything about the modern internet or about modern computing. It sort of stops there in the Clinton administration. So what I want to do is build a chat GPT for the 1990s. And so if I connect my corpus with retrieval augmented generation to tech journalism that is from 87 to 97, I can ask it interesting specific questions. So for example, I might be able to ask it, will Apple be able to compete with Microsoft? Because this was actually on everybody's mind in the 90s, right? And the answer, which is 100% correct for 1997 is, I don't know, Microsoft could wipe the face of the earth with Apple. They've got all the money. They're really kind of taking over here. The future looks like it's waving a Windows flag. So I could then say, well, wait a minute. I've heard that Apple actually is going to have a new CEO, and he's going to turn things around. So what can the retrieval augmented generation system tell me about Gil Emilio? No, spoiler for those of you too young to remember, Gil Emilio did not save Apple Computer. But the next one after him did. But this is really good, because it stops its knowledge in 1997. That's exactly what I want. This is sort of a silly example, but you can imagine other examples of like, I've got a journal from the 1920s or from the 1890s. What did we think about evolution? Give me a Lamarckian perspective. I'm tired of this Darwinian supremacy. But more seriously, I think there's some things you could do with this archive itself to really sort of understand more specific situated knowledge, like how to advance the slides, things like that. You had good luck doing this, didn't you? See, look at that. OK, one more. If we ask this question, how does Illustrator handle Bezier curves? For those of you who have ever used Adobe Illustrator, it had an amazing ability to create object-oriented graphics. It did so in this incredibly complicated way with control points. In fact, Adobe Illustrator shipped with a VHS videotape in the box to try to explain to you how to use control points to do Bezier curves. And if you go forward just one more slide, not only is my retrieval augmented generation giving me the exact answer about how Bezier curves are controlled with control points in Adobe Illustrator, but it's actually citing three articles in my corpus from late 1987 and early 1988 about postscript illustration, which is exactly what it should be doing. I don't have confidence that Reddit knows about Adobe Illustrator control points, but I have confidence that my archive that I've connected it to do. Great, thank you. Next slide. So what we can now do is try to bring together some of the concepts that we've been speaking about. We've seen an early example of a multimodal network with CLPP. We've dabbled our toe into the notion of conversations with an archive. I'm going to hand things off to Lindsay for a discussion of multimodal conversations. So we're just going to show a couple of experiments with some different multimodal models. In this test, we gave Microsoft's Florence 2 model an image from the Bernice Bing papers at Stanford, which are recently digitized. And this model lets us choose from several different tasks. So three different levels of detail for a caption are the ones that I've picked here. And you can see that the captions are generally descriptive and not analytical, so may be useful for automated cataloging that would later be augmented by a person. But the more detailed captions starts to go in a more analytical direction, but stays pretty accurate. And building on that Florence 2 model, developers at Hugging Face fine-tuned Florence 2 specifically for visual question answering. So again, with the chat, asking the model a single question in text about an image or a video as opposed to using it to generate a caption. So this seems to result, again with another Bernice Bing image, it seems to result in lessening of the performance in this particular case when we're posing the captioning task as a question. What is in this image? And the answer between these two versions is much less detailed when we have to ask it a question, get it to do that chatting with us. So other new multimodal models, however, are designed from the ground up to have that conversational aspect with better performance. Lava and Lava Next are expanding on the idea behind Lama. So the LL still stands for large language with V for vision. This builds on the clip models that we showed that project text and image embeddings into that same space that's pre-trained on millions of pairs of words and images. And Lava performs well with still images, but it's harder to assess whether their claims of being really good with video are accurate. So taking another Bernice Bing archival photograph as our test data, Lava improves upon Florence 2's, the VQA version performance when we ask the same basic captioning task type of question, what is present in this image? And so on the right with Florence 2 VQA, you get one sentence. And then on the left, you get much more detail. So with the chat interface, we can go beyond that descriptive or captioning kind of request and we can ask some more analytical questions, but we can also follow up with the model to try to see what it's doing and ask more questions about how it's come up with something if we don't quite understand why it's saying something. So here we're following up on that model saying, the image captures a moment of creativity and relaxation. So then it sort of explains why it thinks that the image captures that moment of creativity. Before we get too excited about Lava's abilities, I put a red flag here because though the model successfully does go deeper in explaining itself, It also comes up with some very hand-wavy language that sounds like a kid padding a book report, as we often see with LLMs. So as Peter pointed out earlier, it's a language model, not a knowledge model. So it's not really adding to our understanding of this image, but it is just producing more words about it. So moving on to another model, mplugowl2 is designed to be multimodal and use the different modalities like text and image in something that they're calling modality collaboration to try to improve the performance between the different modes. So here's one. result of my several experiments with M-Plug Owl 2 when I was asking it questions about some 1920s images from a rare book dealer that I happen to have on my desktop. It's impressive to me that the model recognizes this woman playing tennis and that it's a work of art, so it knows it's not like a photograph. However, it does resort to chat GPT-esque vagueness because it says it's a unique and creative representation and makes the observation that art can evoke emotions, or simply serve as a visually appealing piece of art, so it's sort of more of that, like, hmm, I'm not sure, but interesting, it's interesting, the things that it's good at, and the things that it's less good at. Peter used unplugged L2 on another set of images from a Stanford Library's collection that he's going to tell you about. That's right. So does anybody in the room recognize any of the people on this screen? This is actually more stuff from the 1990s. This is Michael Spindler and John Scully, some of the executives of Apple Computer that we have in the archives at Stanford. So here's what I'm going to do with one of these multimodal models that Lindsay's been talking about. In fact, this one is MBLUG Owl 2, which we should mention was trained in mainland China, but works in English. And so what I'm going to do is I'm going to give it a prompt without telling the model anything about the picture. I'm just going to have these people. are starting a Norwegian death metal band, as you do. What's their band name and their album title? So here's what's interesting. The model is capable of reading the image. It understands these are executives, right? It talks about the boardroom, talks about the dark side of corporate power. And this is the kind of work I do every day, so I'm excited that this model is able to help me with it. But I think it shows visual comprehension, because I did not ask this model, what are these suits going to do in their new band? On the next slide, I want to preface just with a statement. This next slide I'm going to show you includes a representation of an indigenous Australian man. It's from an Australian film with a British director called Walkabout from the 1970s. And I struggled with this image to think about what kind of question I could ask the model to give it an interesting answer. I was a little bit afraid of what the model might say about this, and I wanted to think about what question I asked of the model. So in this still from Walkabout, I decided to do what Lindsay had mentioned, which was ask the model to give me something that is more than just a description. Let me take a research question I have, such as what is surprising or unexpected about this image. And I think what's kind of interesting is the model did a relatively good job. This could have gone off the rails in so many ways, but what it fixated on was the incongruity of a boy in, you might call this a public school uniform in the UK, in the middle of a desert. That was what was surprising or unexpected about the image that it was given. And this is an example of using the directedness of chat to our advantage. to not just say describe it, but tell me what's unexpected. A key point with these multimodal models is that we're not asking it to do a close reading of a film. We think humans should be doing close watching of films. But the way to think about it is imagine you had, theoretically, every single piece of film in Australia or in Norway, and you were trying to produce embeddings over all of that visual corpus, one of which might be the notion of unexpectedness or surprise. This proves that the models are, even at this early stage of development, capable of understanding that and producing embedding. But clearly video presents more challenges, more complex challenges for multimodal models. This is one attempt with an admittedly challenging subject. This is video of a performance piece by the artist Carolee Schneemann, in which she swings from a harness while drawing on the walls around her, so I don't really expect a model to be able to get this. But M. Plug Owl, too, does its best and tries to say what's happening and decides the harness is a horse bridle and then says the overall atmosphere is quite artistic and evokes feelings of nostalgia and simplicity, which the original work of art certainly does not. So that's another red flag there that we're not quite there with video analysis beyond a visual description. Peter has one more example of video. That's right. So I don't know if people in the room have ever seen The Seventh Seal by Ingmar Bergman. It's a real upper. Recommend it to everyone. But what is interesting about this, we just got LavaVL working locally on a GPU at Stanford, and so I was kind of interested in what the model would make of film scenes. I'm spoon-feeding it clips right now. I'm using PySceneDetect to sort of get scenes that cohere. I have yet to see a multimodal model that can process a two-hour film. If you look at the actual code that a lot of multimodal models use, they sample frames. I think we're still looking for diachronic understanding. This is actually not a horrible description of the moment where Antonius Bloch meets Death on the beach. He's back from the Crusades, as you do, and he decides to play a game of chess with Death to cheat his upcoming demise. It's pretty good. It's incorrect in the sense that the clip actually does contain dialogue and sound. but it's in sort of 1950s Swedish, so I forgive it for not understanding that. But again, these textual representations are just a imperfect representation of the underlying embeddings. If I had every single bit of Ingmar Bergman's film, or all, you know, post-war Swedish cinema, looking for the word chess, or game, or beach, or something would be a very powerful way to search through that archive, rather than expecting this model to be an amazing film critic of this one particular scene. So one of the themes we've been coming back to is, as with any other chat models, chat GPT, etc., we have to exercise healthy skepticism because there's definitely some hallucination happening as the model tries to give me, for example, the detailed caption that I've asked for. So in this case, we start off okay, and then we veer off into the tall grass that is mentioned in the caption, but not present in this painting at all. And then even more fun, I thought this was a case maybe of the model conflating this painting by Bernice Bing with another painting that it knew about somehow. But in fact, Carl Coulson, the artist mentioned, does not seem to exist. So therefore, neither does the 2015 painting that Florence too is describing very confidently here. So that gets a red flag for sure. If I wanted to auto-generate captions for all the slides in this archive, for example, if I was going to try to use it that way, I might not say that this would be the right way to do it. So we've shown you several directions that multimodal models could take us in, and I'm going to hand it back to Peter for some concluding thoughts. That's right. So I think the four things we wanted to mention at the very end was that, as I've been talking about, we don't expect these models, whether they're textual models, text-to-image models, multimodal models that can understand time, to produce perfect human-level understandings. What we're interested in doing is producing defensible embeddings at scale, I think, in order to enable search at scale. I think it's important to remember these imperfect captions are just one representation of what the model has seen or read. And then I think that really, even if this were like 40% good, it would still be better than the 0% good we have of most undescribed cultural heritage material. Our last slide today is some concluding thoughts, which is really on three points. First of all, I think it's very clear that whether it's WSPR for speech-to-text or the future WSPR that's even more inclusive and incorporates community perspective, as we heard about this morning, community data sovereignty, whether it's the TROCR waits for handwritten text recognition, they're going to take It turned manuscripts into viable future corpora for LLMs, probably making them more representative of women's writing, at least upper class women in the late 19th century. They're enabling all of these forms of previously very difficult to deal with cultural heritage material from attractability point of view. to be accessible for computation. As we've heard this morning, this has upsides and downsides. The second point is just that I don't use chat-based interfaces, but kids coming to Stanford do. And so we have to think through how can we make systems that are responsible, that are grounded, that are ethically sourced, as ways of interacting with cultural heritage material, because I do think there's a shift in generations going forward. And finally, I do think that the future might look like these natural language chat conversations with vector databases of embeddings. As foreign to me as that sounds, that does seem like a pretty obvious possible future trajectory. The question is, how do we inflect that trajectory, and how do we bring our values into that future? Thank you so much.

Fantastic Futures