A NodeJS-based converter for translating GEDCOM files into JSON, with Linked Data context as well.
The GEDCOM genealogy data file format is a text-based format, but defines a hierarchical structure (first value of each line of data is the "indent level" for that data) so very easily translates into a JSON structure, which in this booming age of REST APIs, lots of services understand more readily than GEDCOM files.
The JSON-LD specification is an extension of JSON that adds context for associating the data with the Semantic Web/Linked Data web. This converter maps a few ontologies to various parameters in GEDCOM:
- Friend of a Friend (
foaf): People and common relations between them. - Relationship (
rel): Deeper relationship terms for relating two people. - Biography (
bio): Vocabulary for enumerating events in a person's life and participants in those events (GitHub Source). - Dublin Core (
dc): Vocabulary for citing sources and dates.
Output JSON:
node convert.js myFamilyTree.ged
Save JSON to a file:
node convert.js myFamilyTree.ged > myFamilyTree.json
The output structure of the convert.js script looks like:
{
"@context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"rel": "http://purl.org/vocab/relationship",
"bio": "http://purl.org/vocab/bio/0.1/",
"dc": "http://purl.org/dc/elements/1.1/"
},
"@graph": [
{
"@id": "_:I101",
"@type": "foaf:Person",
"foaf:name": "John /Smith/",
"foaf:gender": "M",
"bio:event": {
"@type": "bio:Birth",
"DATE": "1 APR 1900",
"bio:principal": {
"@id": "_:I101"
}
},
"bio:relationship": {
"@id": "_:F101"
}
},
{
"@id": "_:F101",
"@type": "bio:Relationship",
"bio:participant": [
{
"@id": "_:I101"
},
{
"@id": "_:I102"
}
]
}
]
}To be parsed into RDF, it will need an output structure like:
[
{
"@context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"rel": "http://purl.org/vocab/relationship",
"bio": "http://purl.org/vocab/bio/0.1/",
"dc": "http://purl.org/dc/elements/1.1/"
},
"@id": "_:I101",
"@type": "foaf:Person",
"bio:relationship": {
"@id": "_:F101"
},
"foaf:gender": "F",
"foaf:name": "Jane /Smith/"
},
{
"@context": {
"foaf": "http://xmlns.com/foaf/0.1/",
"rel": "http://purl.org/vocab/relationship",
"bio": "http://purl.org/vocab/bio/0.1/",
"dc": "http://purl.org/dc/elements/1.1/"
},
"@id": "_:I102",
"@type": "foaf:Person",
"bio:child": {
"@id": "_:I103"
},
"bio:relationship": {
"@id": "_:F101"
},
"foaf:gender": "F",
"foaf:name": "Betty /Smith/"
}
]Meaning return a list of objects, and every object has its own @context set. Then a converter like riot --output=RDF/XML ged.jsonld can convert it to RDF/XML. (TODO)
Grab the @graph property from the result JSON, which is an array of JSON objects. Objects that have a @type property of foaf:Person are INDI objects in the original GEDCOM, and @type of bio:Relationship are FAM objects in the original file. Between those two types, all the properties of the original data file should be present.
CONTitems are concatenated onto their parent items with a line breakTIMEitems are concatenated onto their parentDATEitems with a space- Events on an
INDIhave that individual asbio:principal
| GEDCOM | Linked Data | Note |
|---|---|---|
INDI |
foaf:Person |
|
INDI.NAME |
foaf:name |
|
INDI.SEX |
foaf:gender |
|
INDI.BIRT |
bio:Birth |
|
INDI.CHR |
bio:Baptism |
|
INDI.CHRA |
bio:Baptism |
|
INDI.BAPM |
bio:Baptism |
|
INDI.BLES |
bio:Baptism |
|
INDI.DEAT |
bio:Death |
|
INDI.BURI |
bio:Burial |
|
INDI.CREM |
bio:Cremation |
|
INDI.ADOP |
bio:Adoption |
|
INDI.BARM |
bio:BarMitzvah |
|
INDI.BASM |
bio:BasMitzvah |
|
INDI.CONF |
bio:IndividualEvent |
Confirmation |
INDI.FCOM |
bio:IndividualEvent |
First Communion |
INDI.ORDN |
bio:Ordination |
|
INDI.NATU |
bio:Naturalization |
|
INDI.EMIG |
bio:Emigration |
|
INDI.IMMI |
bio:IndividualEvent |
Immigration |
INDI.CENS |
bio:GroupEvent |
Census |
INDI.PROB |
bio:IndividualEvent |
Probate |
INDI.WILL |
bio:IndividualEvent |
Will |
INDI.GRAD |
bio:Graduation |
|
INDI.RETI |
bio:Retirement |
|
INDI.EVEN |
bio:IndividualEvent |
|
FAM |
bio:Relationship |
|
FAM.HUSB |
bio:participant |
Both husband and wife become bio:participants on the FAM Relationship; to find the gender, reference the related foaf:Person. |
FAM.WIFE |
bio:participant |
Both husband and wife become bio:participants on the FAM Relationship; to find the gender, reference the related foaf:Person. |
FAM.ANUL |
bio:Annulment |
|
FAM.CENS |
bio:GroupEvent |
Census |
FAM.DIV |
bio:Divorce |
|
FAM.DIVF |
bio:GroupEvent |
Divorce filed |
FAM.ENGA |
bio:GroupEvent |
Engagement |
FAM.MARR |
bio:Marriage |
|
FAM.MARB |
bio:GroupEvent |
Marriage Announcement |
FAM.MARC |
bio:GroupEvent |
Marriage Contract |
FAM.MARL |
bio:GroupEvent |
Marriage License |
FAM.MARS |
bio:GroupEvent |
Marriage Settlement |
FAM.EVEN |
bio:GroupEvent |
|
DATE |
dc:date |
|
SOUR |
dc:source |
Property on an object that points to the Source object |
SOUR |
dc:BibliographicResource |
Class that the above points to |
SOUR.DATA |
dc:coverage |
|
SOUR.DATA.DATE |
dc:temporal |
|
SOUR.AUTH |
dc:creator |
|
SOUR.TITL |
dc:title |
The GEDCOM format links individuals through FAM objects, with the HUSB, WIFE, and CHIL references pointing to the various individuals, rather than individuals referencing each other. This is useful for drawing family tree diagrams, as the parents are usually arranged horizontally and joined to a central node, which the children's lines sprout from.
But for traversing person-to-person relationships, it adds a needless step. The conversion script adds rel:childOf rel:siblingOf, rel:spouseOf, and rel:parentOf to the individual (foaf:Person) objects, so FAM/bio:Marriage objects can be bypassed if desired. Where applicable, the more strict bio:child, bio:father, and bio:mother are used instead.
-
CHILtags are left on theFAM(bio:Relationship) object to preserve the data of which marriage a child came from. -
If the
FAMobject has anANULtag, norel:spouseOfrelations are generated. (TODO) -
If the
FAMobject has anENGAtag, but noMARRtag,rel:engagedTois used instead ofrel:spouseOf. -
If the
FAMobject has noENGAand noMARRtag, norel:spouseOforrel:engagedToare created between the parents, but any children get the properrel:childOfandrel:siblingOfrelations added. -
If the
INDIobject has anFAMCtag withPEDIset to 'natural' or 'birth',bio:child/father/mothertags are used instead ofrel:childOf/parentOf. -
If the
FAM.CHILobject has_MRELor_FRELattributes (used by Family Tree Maker software to indicate pedigree) set to 'natural',bio:child/father/mothertags are used instead ofrel:childOf/parentOf. -
If an
ANUL,DIV, orDIVFexists on aFAMobject, thebio:concludingEventof thatbio:Marriageis set to that event. If bothDIVandDIVFexist,DIVtakes precedence as the concluding event. (TODO) -
If one of the partners in a
bio:Marriagehas a Death event (or the first occurring Death if both are), that Death event is set as thebio:concludingEventfor thebio:Marriageif noANUL,DIV, orDIVFexists. (TODO) -
If
DEATandBURIorCREMexist,bio:followingEventandbio:precedingEventrelationships are added. (TODO)
There are a few places in the GEDCOM structure that break the standard linkage between nodes that an RDF graph has. Namely, the INDI.FAMC.PEDI (Pedigree) and INDI.FAMC.STAT (Status) tags break the standard INDI.FAMC linkage. The PEDI and STAT attributes are not attributes of the FAM referenced by the FAMC ID, but rather attributes of the link that individual has with that family, which doesn't work well in JSON-LD. Technically, it's a reification of the link.
SOUR tags have the same situation; they are added onto a link to another node, and modify the link, rather than either of the nodes.
So, to get that to work properly, when an object (e.g. a foaf:name property on a foaf:Person) has a SOUR property, the parent object (foaf:Person in this example) gets a GEDREIF property with a value of:
{
"@type": "rdf:Statement",
"rdf:subject": "_:I101",
"rdf:predicate": "foaf:name",
"rdf:object": "John Smith",
"dc:source": "_:S101",
}If there are multiple SOUR references for that object, that property becomes an array of objects. If multiple SOUR references have the same ID, the rdf:predicate for that SOUR becomes an array of properties that source affects. (TODO)
For pedigree information on an INDI.FAMC, the INDI object gets a GEDREIF attribute, which is set to: (TODO)
{
"@type": "rdf:Statement",
"rdf:subject": "_:I101",
"rdf:predicate": "FAMC",
"rdf:object": "_:F101",
"dc:description": "natural"
}Breakdowns for being more specific about an INDI.NAME also exist in the GEDCOM specification. For example, an INDI with a GIVN and SURN additional tag on their NAME:
{
"@type": "rdf:Statement",
"rdf:subject": "_:I101",
"rdf:predicate": "foaf:name",
"rdf:object": "John Smith",
"GIVN": "John",
"SURN": "Smith"
}- Pedigree tree: D3 "elbow dendrogram" using the "tree" D3 layout.
- D3 smart force labels: Adding functinality to have labels "orbit" their node, and repel each other, so they stay out of each other's way.