8 posts tagged “gedcom”
I've developed a bit of Perl code (and TT2 templates) to take a GEDCOM file and output some FOAF (RDF/XML). This directory contains the result of that effort. Files named FXXX.xml are for families and IXXXX.xml files are for individuals.
With some consultation from Dan Brickley, I think i have a pretty decent start to the conversion. For families, I've define a foaf:Group and outlined its members. Then i give the hint of an alternate location for foaf data on each individual.
<foaf:Group rdf:about="http://www.alternation.net/ged2foaf/F001.xml#F001">
<foaf:member rdf:resource="http://www.alternation.net/ged2foaf/I0071.xml#I0071"/>
<foaf:member rdf:resource="http://www.alternation.net/ged2foaf/I0052.xml#I0052"/>
</foaf:Group>
<foaf:Person rdf:about="http://www.alternation.net/ged2foaf/I0071.xml#I0071">
<rdfs:seeAlso rdf:resource="http://www.alternation.net/ged2foaf/I0071.xml"/>
</foaf:Person>
In each person's file, I've put as much data as i can on that person, defined their relationships, and indicated which families they belong to. For each relative, i've defined "seeAlso"s as in the family files.
The trick is trying to define as much data as i need in each file without having too much duplication.
The RDF is all valid, so the next step is to try and scutter the directory and see what i can do with it. Anyone ever used Redland before? Though, Class::RDF seems neat.
I've only mentioned it once before, but I'm working on exporting FOAF from Gedcom. I hadn't touched it in a while, but I've recently revisited it.
The current implementation "works" (e.g. HTML View, FOAF View), however it fails to split things up along family unit lines (i.e. what children are associated with which spouse and which family, exactly, was this person a part of as a child).
Example:
Person X
--------
FAM 1 (C)
---------
Parent A
Parent B
FAM 2 (S)
---------
Spouse 1
Child 1
Child 2
Child 3
FAM 3 (S)
---------
Spouse 2
Child 4
FAM 4 (S)
---------
Spouse 3
FAM 5 (S)
---------
Spouse 4
Child 5
Child 6
The above example shows that Person X is part of 5 families. 4 as a spouse and one as a child. You can also see, specifically, that Children 1-3 were with Spouse 1 (hence a family unit).
It's important to note that each person and family unit is given a unique ID.
I was hoping to use foaf:Group to associate each person with their respective families, though I'm not exactly sure how.
You might have noticed that I've omitted any siblings of Person X in the above example. That may cause a problem due to the fact that the foaf:Group in which Person X is a child would be split over multiple files.
If foaf:Group had an inverse (see wiki entry) then i might be able to just say that Person X is a memberOf families 1, 2, 3, 4 and 5 and aggregate those files, focussing on Person X's relationships. Though i don't see a way to show relationships between people in groups.
Ideas anyone?
I've added a new feature to the Gedcom app. Each user's data can be exported as a FOAF (RDF/XML) file.
There isn't much in the way of user data exported (yet), but the neat part is the relationship vocab used. Eric has put together a small, but useful set of terms detailing relationships. This allows me to enhance foaf:Knows to show that person X is the parent of person Y, etc.
My hope is to use the foafnaut source to create neat interactive family-tree diagrams.
I've made several changes to the Gedcom app I've been constructing. The information page for each individual(example) now shows pictures, general info, any notes if available and any next of kin the database (with links to their records of course). I really need to come up with a decent naming scheme for the TMPL_VAR names used. It sucks to write a date range (date1 - date2) like so:
<!-- TMPL_IF NAME="dates" -->(<!-- TMPL_IF NAME="BIRT" --><!-- TMPL_LOOP NAME="BIRT" --><!-- TMPL_IF NAME="__FIRST__" --><!-- TMPL_IF NAME="DATE" --><!-- TMPL_VAR NAME="DATE" --><!-- TMPL_ELSE -->Unknown<!--/TMPL_IF --><!--/TMPL_IF --><!--/TMPL_LOOP --><!-- TMPL_ELSE -->Unknown<!--/TMPL_IF --> -<!-- TMPL_IF NAME="DEAT" --><!-- TMPL_LOOP NAME="DEAT" --><!-- TMPL_IF NAME="__FIRST__" --><!-- TMPL_IF NAME="DATE" --><!-- TMPL_VAR NAME="DATE" --><!-- TMPL_ELSE -->Unknown<!--/TMPL_IF --><!--/TMPL_IF --><!--/TMPL_LOOP --><!-- TMPL_ELSE -->Unknown<!--/TMPL_IF -->)<!--/TMPL_IF -->
Anywho, i tried to install mod_perl on that machine the other day and was having problems during make test - around 1/3 of all tests failed. I let it go for a day and went back at it eventually. Stumbling upon a thread on yahoo which recommeded installing Bundle::Apache. Of course this is to install mod_perl 1, so i had to quit out before it started to install that. However, I ran make test again and everything was peachy. I guess i missed the part in the installation guide about module prereqs... oops.
My father's place of employment holds weekly presentations. Last week was his turn. His presentation is entitled: "Researching and Publishing one's Family History".
There's a video online, as well as the powerpoint presentation.
If you're interested in genealogy, check it out.
I've moved red13 to a faster box.
I also took the time to add some caching to the gedcom app. I had a slight problem figuring out how i should track how fresh file are in the cache. I decided that a solution could be to append at timestamp to the end of the cache key.
My cache keys now look like: 'pagename::timestamp'. Unfortunately, i now have to grep() through the keys to find one that matches, rather than just calling $cache->get() right away.
Having just said that, i think it might be easier to use the gedcom file's timestamp to check the cache -- sort of like a version number. That way i can write $cache->get( $page . '::' . $ts ) right away and if undef is returned, a new file must be generated.
An ugly hack that i'd like to get rid off is how i handle cached and uncached data.
I stop the flow of the CGI::Application in cgiapp_prerun(). This sub will either use cached data, or grab the output of the appropriate runmode then stuff it into a CGI::App parameter. No matter which runmode was called it will always end up going to a "show_output" mode which simply return the value of said CGI::App parameter. Ah well, it works.
Download the code, if you want. I'll be making more changes soon.
I've setup a quick Gedcom CGI::App here.
Warning! It's DAMN slow. It's running on a P166 and reading straight out of the gedcom file.
TODO:
- generate static content, when needed (ie. if static content is older than gedcom file). This should allow people to simply plug-in a new or updated gedcom file and not have to worry about anything.
- create a better surname index
- better individuals index
- some sort of individual index
- search facility
- improve semi-arbitrary data harvesting. (including pointers -- currently pointers are shown as their type, and only one pointer is shown even if multiple exist in the record). The most obvious uses are for spousal and child-parent pointers.
- charts.
Anyway, that's in no way a complete list, but it's a start. Please post your comments/suggestions/feature requests here.
Eventually I'll move the code to CVS.
My family has a website with a fair amount of family information. My father and uncle have taken great care in storing everything in digital format. The family information is all stored in a GEDCOM file.
Unfortunately, the site is run by an extremely OLD perl script. I plan on doing some rewritting eventually adding in features my father and uncle have requested (like the ability to exclude personal information on people still living unless you've logged in, or something to that effect). Thanks to the GEDCOM module on CPAN, this is all quite easy.
I've already posted one GEDCOM script to perlmonks. Here's another quicky that will print a list of surnames in the database and the number of records for each:
use strict;use Gedcom;die 'ERROR: No file specified' unless $ARGV[0];die 'ERROR: File not found' unless -e $ARGV[0];my $ged = Gedcom->new(gedcom_file => $ARGV[0],read_only => 1) or die "ERROR: $!";my %seen;$seen{ $_->surname }++ foreach $ged->individuals;print "$_ [$seen{$_}]\n" foreach sort keys %seen;