Familypedia
Advertisement

WorldConnect purports to have 385 million names on file. That number is a little deceptive. One might think that such voluminous records would be sufficiently comprehensive to cover at least famous individuals.


A spot check reveals this is not the case. Take known birth year, location and the first and last name:

The quality control? Consider Isaac Cummings (1812): His occupation is Brooklyn, NY. His baptism is "M384" His emigration is "Doctor". It's junk. We could say, what the heck, we are a wiki and so someone eventually will correct these errors. Will they? We have had records on file since 2005 of articles that list a person's birthplace as wikipedia:location, or very unhelpfully wikipedia:earth (as opposed to birth on what other planet?).

Familypedia has to be better than this if we are to be anything other than a clone of gedcom dumping sites like WorldConnect. So what does this mean? I can probably identify some locations like Brooklyn NY, but location information might be legitimate for some notations, eg: Veterinarian in Brooklyn, NY. So for me to be able to toss Brooklyn, NY I have to know whether anything preceding it is a valid occupation name. No- this sort of data validation would be error prone.


Assume that someone does clean up this article of the errors, and removes the the baptism data as unintelligible. Now, say we come across a gedcom file whose information on Issac has been cloned from the same source. We successfully detect the duplicate and see to add any additional information this new record has about Issac Cummings born in 1812. And we come across the M384 again.

It gives me an ulcer just thinking about tar baby junk data like this. By what mechanism do we know that the M384 should not be added? It seems to me that without machine intelligence, zombie data like this is like the monster at the end of the B grade film- it just keeps on jumping up after apparently being killed- you drive a stake through it, and it jumps up- blast it to bits and it reassembles. WorldConnect " style="display:none" /> 02:01, 5 July 2009 (UTC)


Possible solutions: Copy source gedcom(s) to a subpage, then do a diff with new gedcoms and only add material that has not already been added. That way, subtracted material is not re-added.

Advertisement