Home

What is LinkedMusic?

Welcome to the Wiki for LinkedMusic, an open-source platform that combines detailed music metadata from many different databases into one searchable graph. It collects information about musical works, performances, recordings, composers, performers, and much more from various online databases and converts them into a unified RDF-based structure. This means you can access and explore relationships, history, and annotations across all these sources through a single endpoint.

To link our data, we reconcile entities to Wikidata using OpenRefine. Finally, all reconciled RDF data is loaded into a Virtuoso quad store, providing robust SPARQL querying performance and graph management. Eventually, this graph will be queried using a Large Language Model-based (LLM) interface, which will convert user inputs to a SPARQL query.

There is a production Virtuoso server running at https://virtuoso.simssa.ca/ and a staging server at https://virtuoso.staging.simssa.ca/. A McGill VPN is required to access the site remotely.

Wiki Table of Contents

Data Ingestion:

Backing Up Data: how to back up Virtuoso data with Arbutus
Data Reconciliation Guidelines: guidelines for reconciling imported data with Wikidata
RDF Conversion Guidelines

Querying (NLQ2SPARQL):

How to Query Across Different Databases
Sample LinkedMusic Queries: benchmark queries for testing the NLQ2SPARQL process
NLQ2SPARQL: information about NLQ2SPARQL, which aims to enable large language models (LLMs) to create a SPARQL query when given a natural language query (musicological question)
NLQ2SPARQL Q&A: similar as above
List of Prefixes for SPARQL Queries

Virtuoso:

Accessing the Virtuoso SPARQL Endpoint: how to run SPARQL queries for our Virtuoso graph database
Exporting an RDF Graph from Virtuoso
How to SSH into Virtuoso
Importing Data to Virtuoso
ISQL (Virtuoso): information about ISQL, which allows you to run SQL and SPARQL queries for Virtuoso through the command line
Virtuoso Setup Guide: how to set up a new Virtuoso instance
Visualizing the Graph Relationships: how to create the visualizations the graph relationship images
Working with Virtuoso: miscellaneous advice for using Virtuoso

Other:

Future Work
Wikidata Uploading (Feast Day Project):
Wikidata: Things we should add: a list of entities that could be added to Wikidata

Completed Work

Currently, we have finished reconciling the following databases to Wikidata:

In-progress:

Near future:

For a more detailed outline of future work, see the Future Work Wiki page.

Why LinkedMusic?

The Challenge of Scattered Metadata

Today, music metadata is stored in over a hundred independent databases like MusicBrainz, RISM, TheSession, university archives, and more. Each system has its own set of fields, unique identifiers, and access methods. As a result, musicologists wanting to conduct in-depth research face a number of challenges:

Fragmented Research: Scholars must switch between many interfaces and manually combine results to comprehensively study a single composer or piece.
Varying Standards: Different naming conventions (e.g., “Mozart, W.A.” vs. “Wolfgang Amadeus Mozart”) and data formats cause confusion and mismatches.
High Technical Barriers: Using SPARQL, parsing APIs, and handling bulk data dumps requires advanced technical skills.
Custom Pipelines: Comparing data across sources often needs custom scripts for each pair of databases, making large-scale analysis difficult.

Our Solution

Our methodology embraces a flexible, linked‑data‑driven data lake architecture:

Data Ingestion: We import complete metadata dumps and API feeds for each source into a Virtuoso‑based RDF data lake.
Robust Entity Reconciliation: Entities are semi‑automatically aligned to Wikidata using OpenRefine reconciliation services. Each canonical URI retains provenance metadata for auditability.
Federated Indexing with SESEMMI: While currently not yet implemented, the reconciled RDF graphs will be indexed in SESEMMI, our open‑source metasearch engine—enabling simultaneous, federated searches across LinkedMusic and external SPARQL endpoints.
Natural‑Language Query Layer: We will also use large language models to translate user inputs into SPARQL queries, supporting multilingual, culturally sensitive search and eliminating barriers for non‑technical users.

Benefits for Researchers

Comprehensive Coverage: Centralized access to metadata from hundreds of heterogeneous databases.
Data Quality & Provenance: Stable URIs and reconciliation audit trails ensure high‑fidelity, reproducible research.
Inclusive & Multilingual Search: Natural‑language queries combined with multilingual term mapping support culturally nuanced exploration.
Scalable Extensibility: New databases can be integrated with minimal configuration, allowing the graph to evolve alongside research needs.

With LinkedMusic, musicologists, librarians, and developers gain a unified, transparent, and user‑friendly environment for comprehensive metadata exploration and analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

What is LinkedMusic?

Wiki Table of Contents

Completed Work

Why LinkedMusic?

The Challenge of Scattered Metadata

Our Solution

Benefits for Researchers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally