-
Notifications
You must be signed in to change notification settings - Fork 3
Home
Welcome to the Wiki for LinkedMusic, an open-source platform that combines detailed music metadata from many different databases into one searchable graph. It collects information about musical works, performances, recordings, composers, performers, and much more from various online databases and converts them into a unified RDF-based structure. This means you can access and explore relationships, history, and annotations across all these sources through a single endpoint.
To link our data, we reconcile entities to Wikidata using OpenRefine. Finally, all reconciled RDF data is loaded into a Virtuoso quad store, providing robust SPARQL querying performance and graph management. Eventually, this graph will be queried using a Large Language Model-based (LLM) interface, which will convert user inputs to a SPARQL query.
There is a production Virtuoso server running at https://virtuoso.simssa.ca/ and a staging server at https://virtuoso.staging.simssa.ca/. A McGill VPN is required to access the site remotely.
Data Ingestion:
- Backing Up Data: how to back up Virtuoso data with Arbutus
- Data Reconciliation Guidelines: guidelines for reconciling imported data with Wikidata
- RDF Conversion Guidelines
Querying (NLQ2SPARQL):
- How to Query Across Different Databases
- Sample LinkedMusic Queries: benchmark queries for testing the NLQ2SPARQL process
- NLQ2SPARQL: information about NLQ2SPARQL, which aims to enable large language models (LLMs) to create a SPARQL query when given a natural language query (musicological question)
- NLQ2SPARQL Q&A: similar as above
- List of Prefixes for SPARQL Queries
Virtuoso:
- Accessing the Virtuoso SPARQL Endpoint: how to run SPARQL queries for our Virtuoso graph database
- Exporting an RDF Graph from Virtuoso
- How to SSH into Virtuoso
- Importing Data to Virtuoso
- ISQL (Virtuoso): information about ISQL, which allows you to run SQL and SPARQL queries for Virtuoso through the command line
- Virtuoso Setup Guide: how to set up a new Virtuoso instance
- Visualizing the Graph Relationships: how to create the visualizations the graph relationship images
- Working with Virtuoso: miscellaneous advice for using Virtuoso
Other:
- Future Work
- Wikidata Uploading (Feast Day Project):
- Wikidata: Things we should add: a list of entities that could be added to Wikidata
Currently, we have finished reconciling the following databases to Wikidata:
In-progress:
Near future:
- SIMSSA Database
- CantusDB
- Weimar Jazz Database
- AcousticBrainz
- CritiqueBrainz
- ListenBrainz
- Cover Art Archive
- Digital Analysis of Chant Transmission
- Printed Sacred Music Database
- Renaissance Liturgical Imprints: A Census
For a more detailed outline of future work, see the Future Work Wiki page.
Today, music metadata is stored in over a hundred independent databases like MusicBrainz, RISM, TheSession, university archives, and more. Each system has its own set of fields, unique identifiers, and access methods. As a result, musicologists wanting to conduct in-depth research face a number of challenges:
- Fragmented Research: Scholars must switch between many interfaces and manually combine results to comprehensively study a single composer or piece.
- Varying Standards: Different naming conventions (e.g., “Mozart, W.A.” vs. “Wolfgang Amadeus Mozart”) and data formats cause confusion and mismatches.
- High Technical Barriers: Using SPARQL, parsing APIs, and handling bulk data dumps requires advanced technical skills.
- Custom Pipelines: Comparing data across sources often needs custom scripts for each pair of databases, making large-scale analysis difficult.
Our methodology embraces a flexible, linked‑data‑driven data lake architecture:
- Data Ingestion: We import complete metadata dumps and API feeds for each source into a Virtuoso‑based RDF data lake.
- Robust Entity Reconciliation: Entities are semi‑automatically aligned to Wikidata using OpenRefine reconciliation services. Each canonical URI retains provenance metadata for auditability.
- Federated Indexing with SESEMMI: While currently not yet implemented, the reconciled RDF graphs will be indexed in SESEMMI, our open‑source metasearch engine—enabling simultaneous, federated searches across LinkedMusic and external SPARQL endpoints.
- Natural‑Language Query Layer: We will also use large language models to translate user inputs into SPARQL queries, supporting multilingual, culturally sensitive search and eliminating barriers for non‑technical users.
- Comprehensive Coverage: Centralized access to metadata from hundreds of heterogeneous databases.
- Data Quality & Provenance: Stable URIs and reconciliation audit trails ensure high‑fidelity, reproducible research.
- Inclusive & Multilingual Search: Natural‑language queries combined with multilingual term mapping support culturally nuanced exploration.
- Scalable Extensibility: New databases can be integrated with minimal configuration, allowing the graph to evolve alongside research needs.
With LinkedMusic, musicologists, librarians, and developers gain a unified, transparent, and user‑friendly environment for comprehensive metadata exploration and analysis.