We acknowledge Australia’s Aboriginal and Torres Strait Islander peoples as the Traditional Custodians of the land on which we work and live and give respect to their Elders, past and present.

Read our Statement of Reflection

Your Cart

Your cart is empty right now...

Discover what's on
Your Stuff
Lists
No lists found
Create list
List name
0 Saved items
Updated: a few seconds ago
Getting Started
Get started with Your Stuff

A free Your Stuff account allows you to save, list and share your favourite collection items and articles. This account will give you access to Your Stuff, NFSA Player and Pro. You will need to create an additional account for Canberra event tickets.

Confirm
Skip to main content
National Film and Sound Archive of AustraliaNational Film and Sound Archive
National Film and Sound Archive of Australia
National Film and Sound Archive
National Film and Sound Archive of Australia
National Film and Sound Archive

Fantastic Futures 2024 - Day 2 - Session 14

2024

Fantastic Futures 2024 - Day 2 - Session 14

2024

    Enhanced stewardship and data sovereignty through the implementation of an ontology-enhanced ​large language model (LLM)

    Presenter: Lizabeth Johnson

    The National Security Research Center within LANL manages an analogue and digital collection of lab reports from in-house scientists and partner facilities, in addition to gifted archival collections. Lizabeth Johnson outlines the library's plan to enhance searchability across multiple repositories by implementing a vector database and a tailored ontology, relayed to a large language model.

    Fantastic Futures 2024

    Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.

    Learn more about this event at the Fantastic Futures 2024 hub

    Enhanced stewardship and data sovereignty through the implementation of an ontology-enhanced ​large language model (LLM)

    Presenter: Lizabeth Johnson

    The National Security Research Center within LANL manages an analogue and digital collection of lab reports from in-house scientists and partner facilities, in addition to gifted archival collections. Lizabeth Johnson outlines the library's plan to enhance searchability across multiple repositories by implementing a vector database and a tailored ontology, relayed to a large language model.

    Fantastic Futures 2024

    Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.

    Learn more about this event at the Fantastic Futures 2024 hub
    • This transcript was generated by NFSA Bowerbird and may contain errors.

      Hi, everyone. My name is Elizabeth Johnson, and I'm a librarian at the National Security Research Center, which is the classified library at Los Alamos National Laboratory in Los Alamos, New Mexico. As you can see, my presentation is entitled Enhanced Stewardship and Data Sovereignty Through the Implementation of an Ontology Enhanced Large Language Model. Los Alamos National Laboratory was founded in 1943, and the lab's classified library was created shortly after the lab itself. Over the past 80 years, the classified library has gone from a small library that only contained reports produced by lab scientists to a large facility that maintains the lab's reports as well as reports written by scientists at other partner facilities and archival collections that have been gifted to the library over the years by retiring scientists and staff members. These various collections are housed separately and some are digital, while others are still in physical form only. The goal of the project I'm going to describe is to make all of these collections more accessible, as the material in our collections is of value not just to scientists who want to learn more about legacy experiments at the lab, but also to historians who want to learn more about the history of the lab and some of its more well-known staff. To begin with, what we envision is a system where a user can write a query, which will then be put through a process of vector embedding. This will help to extract keywords and phrases and to map the relationships between those words and phrases. The data thus acquired will go into a vector database. The vector database will also be connected to our main document repository, which is an online database that currently holds millions of reports and other documents. Report titles, author names, and other keywords in the document repository will likewise be put through a process of vector embedding so the data can be compared with the data from the user query. The third part of this system will entail communication between the vector database and an ontology created by LANL staff. This ontology has been created to be specific to the kind of material that the classified library maintains. The information from the ontology will be made similarly accessible to the vector database through a vector embedding process. The data gained by comparing the query with the material in the document repository and the information from the ontology will then be transmitted to other smaller repositories before being relayed to the large language model, which will produce an answer to the user's query. Because this entire process will be developed by staff at the lab, and will undoubtedly need to be refined before it can be made widely available, we plan to employ subject matter experts to evaluate the initial responses produced by the LLM so that we can assess the accuracy of those responses. The SME evaluations will also help us to refine our ontology to make the whole process more effective and efficient. Our expectation is that establishing a system whereby users can query an LLM and receive responses will enhance our stewardship of the documents we maintain in the library's collections, some of which are currently hard to find and hard to access. We also expect that this system will enhance data sovereignty, as some of the smaller repositories are managed by specific groups at the lab who want to maintain control over their repositories but also want to make the data in those repositories more accessible to other researchers. We believe that adapting an LLM as a search tool for these various collections will help the Classified Library and its librarians to meet the needs of our customers and other vested interests at Los Alamos National Lab. Thank you.

    Industry professional? Go Pro

    Need to license this item? A/V professionals and researchers can shortlist licensing enquiries via our NFSA Pro catalogue search and membership.

    Get started with PRO

    Collections to explore

    • Start your own collection

      A free Your Stuff account allows you to save, organise and share your favourite videos, audio and stories.

    Personalized your experience

    Save, create and share

    With NFSA Your Stuff