
Presenter: Lizabeth Johnson
The National Security Research Center within LANL manages an analogue and digital collection of lab reports from in-house scientists and partner facilities, in addition to gifted archival collections. Lizabeth Johnson outlines the library's plan to enhance searchability across multiple repositories by implementing a vector database and a tailored ontology, relayed to a large language model.
Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.
Want to learn more about this event?
Visit the Fantastic Futures 2024 Hub
This transcript was generated by NFSA Bowerbird and may contain errors.
Hi, everyone. My name is Elizabeth Johnson, and I'm a librarian at the National Security Research Center, which is the classified library at Los Alamos National Laboratory in Los Alamos, New Mexico. As you can see, my presentation is entitled Enhanced Stewardship and Data Sovereignty Through the Implementation of an Ontology Enhanced Large Language Model. Los Alamos National Laboratory was founded in 1943, and the lab's classified library was created shortly after the lab itself. Over the past 80 years, the classified library has gone from a small library that only contained reports produced by lab scientists to a large facility that maintains the lab's reports as well as reports written by scientists at other partner facilities and archival collections that have been gifted to the library over the years by retiring scientists and staff members. These various collections are housed separately and some are digital, while others are still in physical form only. The goal of the project I'm going to describe is to make all of these collections more accessible, as the material in our collections is of value not just to scientists who want to learn more about legacy experiments at the lab, but also to historians who want to learn more about the history of the lab and some of its more well-known staff. To begin with, what we envision is a system where a user can write a query, which will then be put through a process of vector embedding. This will help to extract keywords and phrases and to map the relationships between those words and phrases. The data thus acquired will go into a vector database. The vector database will also be connected to our main document repository, which is an online database that currently holds millions of reports and other documents. Report titles, author names, and other keywords in the document repository will likewise be put through a process of vector embedding so the data can be compared with the data from the user query. The third part of this system will entail communication between the vector database and an ontology created by LANL staff. This ontology has been created to be specific to the kind of material that the classified library maintains. The information from the ontology will be made similarly accessible to the vector database through a vector embedding process. The data gained by comparing the query with the material in the document repository and the information from the ontology will then be transmitted to other smaller repositories before being relayed to the large language model, which will produce an answer to the user's query. Because this entire process will be developed by staff at the lab, and will undoubtedly need to be refined before it can be made widely available, we plan to employ subject matter experts to evaluate the initial responses produced by the LLM so that we can assess the accuracy of those responses. The SME evaluations will also help us to refine our ontology to make the whole process more effective and efficient. Our expectation is that establishing a system whereby users can query an LLM and receive responses will enhance our stewardship of the documents we maintain in the library's collections, some of which are currently hard to find and hard to access. We also expect that this system will enhance data sovereignty, as some of the smaller repositories are managed by specific groups at the lab who want to maintain control over their repositories but also want to make the data in those repositories more accessible to other researchers. We believe that adapting an LLM as a search tool for these various collections will help the Classified Library and its librarians to meet the needs of our customers and other vested interests at Los Alamos National Lab. Thank you.
The National Film and Sound Archive of Australia acknowledges Australia’s Aboriginal and Torres Strait Islander peoples as the Traditional Custodians of the land on which we work and live and gives respect to their Elders both past and present.