
Presenter: Janet McDougall (for Terhi Nurmikko-Fuller)
In the age of AI and large datasets, smarter data needs to be represented in ways that capture various perspectives and nuances. This has the power to simultaneously reduce historical and social biases and maximise the potential of combining mapped data and powerful algorithms. Janet McDougall reports on a linked data project in motion at the ANU, identifying all content relating to Indigenous research across its myriad repositories.
Technology, language, history and creativity converged in Canberra for four days as cultural leaders gather for the world's first in-depth exploration of the opportunities and challenges of AI for the cultural sector.
Want to learn more about this event?
Visit the Fantastic Futures 2024 Hub
This transcript was generated by NFSA Bowerbird and may contain errors.
Hi team, so I'm Janet and I'm presenting on behalf of Terry. We crossed paths briefly on my way back from Philippines and she's in Austria, maybe Vienna I think. She says hi. So it's a bit of a group presentation. I believe we've got 12 minutes so I'm going to be pretty swift and I'll stick to the script. I'd like to acknowledge that this work was carried out at the ANU on the traditional lands of the Ngunnawal and the Ngambri people, and in Queensland on the lands of the Yugara and Turrbal people. We acknowledge and celebrate the first Australians, whose traditional lands these are, and pay our respects to elders past, present, and emerging. So we're a multidisciplinary team, linked data specialists, computer scientists, data archivists, metadata specialists, and spatial analysts. Terry and I are from the Australian Data Archive, and the others are from the First Nations Portfolio at ANU, and Currawong AI. This presentation is broadly in three parts. Underlying Rationale and Governance, Identification of Indigenous Research, and Technical. And I'd like to thank the previous conversationalists. That was a really fantastic presentation. prelude to a lot of the work I do. And how do I press here? What do I press to get my slide moving? Oh, I just point it. No, it hasn't moved. OK. So this Indigenous Data Initiative project is funded through the ANU First Nations portfolio, with contribution by Currawong AI to ARDC, that's the Australian Research Data Council, the commons, and related projects, plus in-kind by ANU institutions, such as the Australian Data Archive and the library archives, collections, et cetera, and the research support institutes. So the project aims to improve the governance of Indigenous data collected and held by the ANU. So in this context, Indigenous data refers to information or knowledge in any format or medium which is about and may affect indigenous peoples both collectively and individually. So that's from Mayim Nairi Wingara in 2018. At ANU, indigenous data holdings include, but are not limited to, information about research projects and publications, archival materials, physical collections, environmental and biological samples, as well as data collected through research. So universities have a responsibility to the communities that have had their information and knowledge extracted through the process of research. Accountability means Meaningful engagement in this process, although an acknowledgement of Indigenous people's rights to know what information has been collected about them, in line with the UN Declaration of Rights of Indigenous Peoples, the university's goal should ultimately be the repatriation of these data and failing that, embarking on a process of shared governance. So this metadata catalogue that we're building will provide opportunities to explore the relationships between ANU research and First Nations communities to assist the university research services in improving metadata labelling. And this is quite the crux. Because at the moment, there is no way of what research has been done on indigenous people about indigenous research. And help us to uncover the provenance information for physical collections and archival materials. So one of our colleagues, Chip Boyega, and I went to visit the Arcanthropology archive, it's just a room full of stuff, and we asked for the catalogue, and it was a bunch of spreadsheets with really chaotic metadata. So. Went too far. So first of all, we need to make clear that this is a process of improving the governance of indigenous data rather than indigenous data governance. It might seem like semantics, but the truth is a colonial institution like university can never actually do indigenous data sovereignty or governance. So just that's the base we work from. And to reiterate, the catalogue should not be public and will only be available to the communities whose data have been collected. And while the content of the catalogue is publicly available legacy data, it is not easily collated, which this project shows. But even so, and I'm sure many in the audience here would understand, bringing public data together has its own risks that, as individual pieces of data, even publicly available, don't bring. So internal access still needs to be decided through strong governance processes, and that's ongoing within the First Nations portfolio, and I presume will take quite some time to work through. This catalogue is an attempt towards fulfilling care principles around the data and metadata. It's not completely operating to full fare principles. But it is going in that direction while maintaining a respectful adherence to care. And we heard quite a bit earlier about what does this really mean. And adhering to care principles is a really complex part of the project. And we're still getting there. So we're at the identifying indigenous research component now. So embedding the UN Declaration of Rights of Indigenous Peoples in its business processes, the ANU has set upon a challenge of identifying all content relating to indigenous research across its myriad and, of course, completely disparate repositories, collections, libraries, and other information custodians. This is a massive task. It's complicated by the different schemas used to represent the heterogeneous collections. It is even more difficult made by the lack of any metadata level record of indigeneity. And I'm just going to clarify in this project, indigenous research is defined as an explicit connection to indigenous languages, to indigenous places, or were created, analyzed, or otherwise engaged within projects where the supervisor has been an indigenous researcher. Curse Terry for all these slides. Prior attempts to map indigenous research or research considered to be indigenous highlight the complexity of the situation as there are extremes in understanding or believing what indigenous data is. So one extreme is some data custodians may consider that all research carried out in Australia would be considered to have connections through country to an indigeneity. At the other end, there are many who struggle to see that connection, especially in fields that do not readily pertain to indigenous knowledge, art, culture, or lived experience as a domain of study. So that's where we're starting. What is indigenous research? We're now discussing the... Yep. Absence of indigeneity from metadata. As noted, recording of indigeneity as a specific field is not recorded in metadata records at any institutional data that we've come across. So because of the lack of existing university-wide policy on recognising and representing research in terms of engagement with Indigenous data, researchers and locations, we are using linked data methodologies and metadata catalogue and the metadata catalogue to explore the relationships between ANU research and First Nations communities. A major outcome will be to assist the university research services to improve metadata labelling. So this project, First Nations Portfolio, is working very closely with, I think they're called RIS now, they've had a name change, but they're the group that manage all the publications and research-oriented work at ANU. And so to help uncover provenance information for physical collections and for archived materials. Oh, now we're to the tech. And luckily for our colleagues from Karawong AI, that they bring the comp sci stuff for us, as well as linked data expertise. One is at the high end. Nick Carr likes the high end. ontologies and quick solutions, getting data programmatically. And Terri comes from the other side where she likes to get really close down to the data and do small stuff. So these are words, Terri's words for those who know her. This is a linked data project of institutional proportions. Not my words, but it actually describes the project. It is a critical evaluation of the challenges and opportunities from the process of converting the metadata from these three internally complex knowledge repositories to the machine-readable format of RDF. And a few slides back, I mentioned they were the ARIES publications, the collections, and anyone in the audience remember? Anyway, they're the three pillars. So these top-level ontologies, such as schema.org, as you can see, they're just the basic ones, the standard ones that Nick has used to produce this catalogue to get output, are combined with bespoke elements to represent this data in a way that adheres to international standards. but it's expanded to capture implicit connections and tacit knowledge. And so when we have custom there, ANUFMP has thesis supervisor, this is where we've, especially Terry, has spent time going, well, how can we find pathways to identify indigenous research. So we might say, well, a supervisor who has worked on indigenous material will most likely have their students doing that similar research. So it's interesting to sort of watch the linkages between supervisors and institutions and schools and colleges. So there's this whole background that we're just starting to be able to get into to look and think about I suppose, the power structure of research and who's had the money and where it comes from. I'll stop there because I did so in 12 minutes. Okay, so here is our opportunity to provide large language models with not just large data sets but with with knowledge graphs curated with fair and care principles that provide not just data points, but contextualized information capturing knowledge about and from indigenous contexts. So the purpose of the catalog is to enable indigenous communities to find research relating to them. So it needs to support their questions. Oh. And to reiterate, we represent publications, theses, and objects that have an explicit connection to Indigenous languages, to Indigenous places, or were created, analysed, or otherwise engaged with in projects where the supervisor has been an Indigenous researcher. So just to reiterate again, this is what we're looking for. Going forward, our aspiration is to expand the underlying knowledge graph to incorporate relationships that represent indigenous ways of knowing. The computational workflow includes existing vocabularies, such as language names, place names, indigenous organizations, for example, that have come from IATSIS and the Currawongii have worked towards improving those vocabularies. and then OREC for Indigenous Organisations, and they've scraped a few other vocabularies. So the point of this is to contribute to the logic to identify Indigenous material and pre-processing enrichment and conversion to RDF. So as more Indigenous research is discovered and added to the knowledge graph, We continue to query at a more granular level to search for knowledge overlooked at a higher level to develop the ontologies further. And this is, we're just sort of at this place now where the knowledge graph is big enough and we've collected enough data for that sort of starting to ask further questions. These reproducible steps and methodological guidelines can be utilized by other institutions to begin the task of identifying indigenous content in the heterogeneous collections, so I think Uni of Melbourne will be looking at doing this through the SIP for the same funding. And just the typical problem on campus, or for most campuses or institutions, that we've got these authorities, these institutional authorities of materials that we're using. But of course, through relationships in academia, people are saying, well, I've actually got these theses that are not recorded. And there are publications out there. So we actually have one with the ANU linguistics library, tiny little library. Who knows how long it will survive, hopefully. And so that will just literally be some of us going down there with a spreadsheet and collecting the fields that we need to include into the catalogue. And within our own institution, where Terry and I are, one of our developers scraped our website for publications to put in a new website. And we realised that there were about 900 publications. Karawong AI ran it against the catalogue. And they discovered about 600 new publications that aren't in the institutional repositories. So basically, it's an ongoing audit, an archaeological dig, basically, going through and looking for materials and trying to bring metadata against these materials. And if you think of a lot of the objects in the university, if we look at our anthropology, boulders and rocks and specimens and species, no one knows who collected them. They may come under one great big project, but there's a complete lack of provenance. Anyway, it's a great project and it'll be going for a couple of years, I think, and we're hoping other universities will pursue this path. Thank you.
The National Film and Sound Archive of Australia acknowledges Australia’s Aboriginal and Torres Strait Islander peoples as the Traditional Custodians of the land on which we work and live and gives respect to their Elders both past and present.