Semantic genealogy logo-med.jpg
Facts Pages


11 May[]

    • Form junk
      • Check yes small.pngdo extended form
      • Check yes small.png preloads
      • Check yes small.png short descr - set wide 100%
      • article rating
      • gallery section derived from images
      • Check yes small.png building street line too big
      • convert Form long over to international strings
        • propagate updates to the subforms
        • need the international strings fleshed out more completely.
      • #if gallery, br clear all before gallery
      • Check yes small.pnggeneral sources- where to put the footnote?
      • Check yes small.png boilerplate text for empty (new article)
      • Check yes small.png siblings- maparray break this field
    • Check yes small.pngForm for image fields
    • Move over international to mediawiki namespace
  • What is this properties settings procedure. If I set a value of a property with multiple items (eg street addresses), are all of the other previously written ones erased when the page is saved? How then with ancestors do I iteratively build up a list? templates with parameters scheme? or string, then dump out to properties?
  • YDNA form- low priority- not too many people doing this yet.
  • set facts from info- do
    • do a quick check on countries from place fields
    • get children from other spouse check Thurstan did children
  • could set up source field as pairs, the first is the cite parameter name, the second is the value. eg cite|title=foo|author=bar = title~foo~author~bar
  • set general info missing properties:
    • fame 1-10, "notability", so that one could conceivably find notable ancestors.
    • female line mechanism? eg- the eve line thing.
    • images, documents, other files. done.
    • tags done.
  • run a bot to propagate all event properties uniformly
  • other languages Prototype (a long form?) Is a lang just a different kind of version page?
  • other stuff
    • multimedia
    • Check yes small.pngwikipedia references done
  • Check yes small.png descriptions in alternate languages
  • Instantiation of repeated forms variants
    • wedding2-6
    • occupations1-4?
    • education1-4?
    • emigration? or immigration?
    • residences? or mixed with Census event subpage?
tasks unrelated to smw[]
  • if it does exist, subst out the ifexist call, or set a maintenance property ifexists not needed with link, and bot eliminate them.

4 June[]

left off[]

  • worked through place infoboxes, through indian jurisdiction. Need to do the general German and Spanish place infoboxes in preparation for bulk xfers. I may not have indicated all infoboxes affected on the PhloxBot page. Examine history of Indian jursidiction to verify all changes to those infoboxes are current. We probably want to extract in a modified way than this scheme. I note that some interwikis have more info than the english version.
  • Atchison County, Missouri- The navboxes for places will probably look a lot different after the layout/ internationalization techniques.
  • Investigate mw:Extension:Header Tabs. This will work with semantic forms. I wonder if the Tabview extension will do the same to break up forms and deal with problem of autocompletion bug (only displays top of form).
  • didn't finish something having to do with mainpage to article name in one of the tabs or navbox templates


Sensor page[]

Formerly known as the "Watching Paint Dry Page"

This page executes all the time consuming calculations for an article. Periodically it is refreshed by a Bot so that updated information will be available to the main article, but the update cycle will only be as fast as wikia operations permit. If the sensor pages ever become too much of a load on the system, they can be protected so that only admins may update them.

The page makes no excuses about its slow speed. It's intention is to fully exercise SMW at the price of display speed. This might do the following tasks:

  • Derived article data:
    • suggested sex for redlink auto fill. if mate is male, suggest female & vice versa.
      • (if mother, if father, and contradiction with sex- autocat flag error)
    • County, State and Nation could be implied by lower level place names. EG: if city= Seattle, then main page property inherits from sensor page county King, state Washington, country USA. These pages would first inherit from sensor page because its names would be language specific. If none available and if the page shares data with another article, then it inherits from that article.
      Possibly, this data is picked up by javascript for use in the next editing session. Or more actively, perhaps it is formatted in a way that Autowikiabrowser can use to auto create the article or make the improvements in an automated way.
    • Wife doesn't list husband in "joined with" list, doesn't share wedding info. Suggest missing items.
    • Performs a search on the individual as participant in an event. Put a marker so that event can be incorporated into the main article
    • Suggested list could have info deposited by a bot- eg: a contributors list might be parsed from a history list- this would be used to get confirmation from user that the list is ok.
    • Tab Suggestions
      • look for all language versions
      • look for subpages.
  • Hinting the main page layout
    • detecting if there are children listed in Mate's articles. For efficiency, it Currently assumes that the lists are either all declared or all not declared, not mixed as is quite possible. Also, to put up the child heading we need to know in advance if the child search will turn up hits. We can set a property that the last search did result in children lists.
  • Root out family connections by looking for
    • siblings lists of children, and hoisting them to the parent article, or if parent unknown, distributing to the various sibling articles.
    • Finding lost children by querying for father or mother= article name.
  • Prepare info to be used when article is started. EG: the parent is already known via sibling lists.
  • Increased automatic redlink capability by looking at all weddings, also links for all children.
  • Validation checks- quality control
    • Factual inconsistencies
      • child born over 9 months after father or mother died
      • same name places... eg family members living and dieing in Moscow, Russia, but one is in Moscow Idaho.
    • Quality control
      • Autocat Living person- birth within 120 years, no death info.
      • No sources listed
      • All caps in names
      • No date in name, no paren in name
      • wrong date modifers- ca instead of c, before instead of bef.
      • Short name properties that include dates
      • Language of article does not match title's language suffix. (Missing suffix?)
        • Article links to many articles of language B, but article is marked as language A.
      • Wiki article name different from title (inconsistent with showfacts interwikis list)
      • Spam- Suspicious volume of external links.
      • Redirects- inbound link not the same as the current article name. (Link via father, mother, children, and so on.)
    • Orphan articles with no edits for a year with sparse properties (eg- no birth, death, relations filled out (no father, mother, spouse or children).

random accumulating tasks[]

  • Kezia Obama (c1941) picks up children g3 of unknown parent as well as her own.
  • verify there is a living person message for forms.
  • Soures and notes fields- it might be a good idea to escape the braces in facts do ref. Probably { { would be easiest to remember.
  • bug children- edit facts won't work on spouse article (no declared parms)
    • showfacts children- need a family flag so you can display just one family group. (eg for half siblings list).
  • scan for unknown and not found in info pages for person links. Nuke them.
  • add form edit button if showfacts interwikis is on the page.
  • Appears to be a problem parsing names with question marks Nest ferch Rhys (c1073-aft1136), see second husband in joined with parameter- doesn't show in display, or the property
  • There is something funky about the artifacts left over from the 2007 skin- for example this page is wide. It shouldn't.
  • Need Mediawiki messages for all sidebar items in other languages
  • Second pass phloxbot- after big move to person-ex:
    • flags for displaying anc/ desc stuff prob a subst ifexist will do it.
    • Ifmarried -> ifmarried.
  • some folks are tempted to put [ decoration for links on names. We should strip the brackets. This was done on Wilfred_F_Schulte
  • tune up {{showfacts biography}} - dates, vector based on lang. Maybe rewrite, see notes in /doc
  • bug in 3 digit years: Clodius I of Sicambria (bef176BC-c159BC)
  • we add pseudo tags with something like Den Haag (synonym). These are all linked to sharedFacts (base) article.
  • biography needs fix on age calcs.
  • add long_name display to Showfacts person-ex
  • Tabs template needs a looksee for updating. EG: Template:Tabs person (.fr)
  • bug: set date not doing circa, before aft.
  • bug:Edit Journeys etc doesn't work on new article. Needs to switch to correct special statement if the article is new.
  • We could search for info on documents like this one: Haralson county, Georgia/doc001.
  • mirror father, mother given and surname in person record. This will simplify querying.
  • fix autoinsert of sig Javascript
  • verify correct behavior with Wikireadr PDFs
  • use floatbox iframe to preformat queries
  • Create a Showfacts Awards that displays awards like in hard coded table at Julian Messerly (1924)
  • After 3 months I still hate coparent and partner. Maybe joined_with rather than coparent, or a non english term like conjoint
  • add missing items that Thurstan enumerated here: Template talk:Showfacts person/main
  • showfacts person wedding needs to query from spouse as well.
  • showfacts person restored to mixed parm plus query using packed parms eg:
  • Refactor multilingual message end pages
  • Youtube help/ walkthrough on using SMW interfaces
  • Maybe I can launch a form in a separate tab: see sparkla note on target .js hook
  • I wonder what happens if on a form, two separate "for template" sections access the same field. Can they both share the field? Say one is hidden, but the other accesses it. (# link to the edit box for it.- what if hidden?)
  • bug interlingua- it appears that Henry I is saving an en string into articles, but then the real articles list is then saved as a separate list. This means that the es article starts out ", es" and so links to showfacts person/, es. It's the en default hack I put in. This needs to be done properly.
  • {{showfacts children}} needs to do what Showinfo children did: iterate through all family groups for which this parent is a coparent.
  • We can enumerate all coparents with a search on coparent =<this article>. Then for each pair, we enumerate all children- either in father's or mother's.
    We can't use this method using parameters. We don't know how many we have, so we could do a for loop or a brute force set of if's for each group number. Ok. Maybe we should fix For and use that.
  • Sidebar can be dynamic by querying for properties on a page. It might be possible to do filtering on the sidebar. EG: list unique surnames for the report with number of hits for each. Click to filter by that surname. This click fires a new query for the main pane with the added query items. A filter UI could be built up in this manner, similar to those of ebay or other commercial sites eg: toys R Us "Narrow by" column
  • Dates really suck.
  • Children on form need to be greyed if data is on spouse's form. Need to load the data, provide mechanism to prevent second from adding data on the children. Maybe hide away the children form on the father record. Tell them to add with the mother, but allow them to do it if they really want (situation- mother unknown data).
  • Need to prevent similar redundancy protection on Siblings field. These are like notations fields, useful for quickly recording stuff during research, but can become stale later (inconsistent with full children list of common parents).
  • tabs that do biographies etc is not multilingual. subpage needs to be in language of basepage. /biografie, /Имиджи.
  • nav place has to be sorted, then finish the renames of subdiv, country. Maybe then import all the cities, counties and nation-subdiv1's of France/Germany/UK/Spain in de,en,es,fr.
  • Add MySql routines to AWB. Familypedians could download updates of valid person and place names for error checking and enhancement (adding coordinates, nation, etc.).
  • We aren't doing a lot about events articles independent of person articles. But we will want to do that, for example to recount a shared event- like a ship sinking, the microhistorical account of a particular Army unit in a battle. For instance, Iwo Jima, April, 1945: Company G, 2d Battalion, 24th Marines. Never mind about the big picture. We want to hear about what Frank and Bob did.
  • A common source of images is family events (trip to disneyland, to the world's fair, to yellowstone park, to the zoo, birthday parties, moving in to a new house). It may not be possible to formally recognize all these events, so perhaps we could create a few catch all events: outings, celebrations (birthdays, holiday, block party, anniverary dinner), achievement event (ballet/ musical performance/ art showing, boy scout award presentation, presentation of a professional or community service award). These events would be used to tag images as well as be used with the timeline object.
  • Property: Event article. Example value: 1939 World's fair, Sinking of the Titanic, wikinews events.
  • DanTman created a localizable Show/hide using JQuery. Details at w:c:dev:ShowHide. It may answer the language issue as well as the bugs with boxes that won't collapse on forms initialization.
  • siblings needs to be in a separate box- not showfacts person. For instance Caroline Lansdell (Abt 1839-Abt 1843)
  • An illegal date would not have three separate units divided by slashes or spaces. The first and the last, or the second and the third must be numeric.
    • This could be a subst, but it would look real ugly. Maybe bot test is best with regex, or maybe we get regex here? That would make for some arcane templates.... Maybe not.
  • Import: For Europe & most english speaking countries (Canada, India, Aus, NZ, south Africa, Kenya) :
    • I use my pywikipedia routines since I have the subroutines to pull in dependent templates and images.
    • Geo groups (use Daniel's toolserver filter):
    • In languages DE, ES
    • with commons pictures (esp. time period)
  • AWB generate Concepts that are analogs of the Born in wisconsin type autocats.
  • redlink text is not really a standard redlink, but also uses data known from the context of the source of the datalink. EG: an article that lists one article as a mother (redlink) could prefill the redink form for child1, father, and possibly sonme of the simbilings.
  • Add a BMD number in the pertinent locations on the form. Autogenerate something similar to this pattern:{{Cite web|url=|title=Index entry|accessmonthday=June 21|accessyear=2009|work=FreeBMD|publisher=ONS}} You probably can use currenttime on first execution of the script.
  • wikitext in code properties doesn't wrap- need to find out the CSS style he is using and set it for proper word wrap. {{Pre2}} probably has all the style options I want.
  • Need to go back to common sense to tidy up (blank) my experiments. for instance
  • add contributors and quality index to general info, possibly last edit.
  • death date is not getting time of day stripped like birth
  • really need to trial how the transcluded forms work. Transcluded sections will be central to form architecture.
  • set article person was generalized to set interlingua person, and localized strings put there, but it is more natural for death-causes (.es) to appear on the .es death form, not the interlingua page. This might be possible, since forms apparently allow the manipulation of the same property. However if interlingua also set the property, then there might be double storage. We could put a flag to check on that condition and abort interlingua save, but this error condition sophistication could be deferred as a feature for later. So we just redundantly do both.
    • Set interlingua needs to have the article setting rationalized to do it the same way as any other property. The solution doesn't have to be perfect, but the property names must be right so as not to become a blocking issue on translating info pages. On the localized form, the user does not see the property name in a list as on the interlingua page. They will simply see a label "Nombre de pila" and it the form will store it into given name (.es).
  • Need to remove the comments from preload because new editor doesn't like them. Alternate mechanism for comments would be dummy templates.
  • migrations etc set date-m records 2-February because the property is not numeric.
  • date-y displays with a comma because it is not type date- but in form must be displayed with type numeric.Fixed.
  • date-m will display as Aug Sep etc instead of number if we make it a date too. Won't fix. No date type for just month alone.
  • replicate migrations and weddings or do the transclusion way. Then residences
  • Redundant copies of data problem: Maybe mimic the "shipping address same as billing address" pattern for forms. Add a copy from spouse for children, wedding details. This might just display the information and grey the box rather than redundantly copy. Maybe extend this for siblings (copy from parents) and do this for smwbasepage. This smwbasepage idea as properties held in common.
  • Reports do not aggregate multiples, so need a template to format those. EG wedding1 locality, wedding2 locality etc must be listed under Wedding locality. Similarly for date, listing partial dates with full dates.
  • It should be possible to cram several kml values into a string, then unpack them and send via MediaWiki:Smw service google earth with an accurate text label, a linkback to the familypedia article, and some annotations like time of the event for the pushpin. Maybe I could even do several points like for queen of the west. In property:coord, click globe icon for example google earth on place.
  • Synonyms: SMW recognizes redirects as synonyms.  ;-)

              • At the time of this writing, {{#ask: [[Charlemagne (747-814)]]|?father=}} produces→ null
              The guys are blowing the expensive expression quota (536/500), and so "father" is not even getting stored as a property because facts from info is at the tail of the article and is not getting fully processed. Need to provide them facsimiles of all info page templates, maybe children right away because that is what is blowing the charlemagne and william I limits.


    • create info page mirror, cleaned of any embedded wikitext.
      • upgrade any dates with embedded aft, before, c1530 to estimate field and proper year, month etc.
      • Kentucky or Tennessee -> Kentucky;Tennessee
      • extract formated dates from burial, wedding, and baptism.
      • transfer place info to the street address field delimited by semicolons. (explicitly do runs scanning for known states and countries).
    • Clean sources field, replacing * and
      with semicolons.

other maintenance sweeps[]

    • create redlinked children and parent nodes.
    • correct misspellings of states

junk these[]

  • general info, general sources
  • blank: Template:Get grandparents & subs

wikia/ semantic engineer info requests[]

  • investigate whether there is a switch to default turn of bottom of page rdf/ facts field.
  • "$smwgInlineErrors = false;" so no type errors on blank fields
  • Are there metrics for measuring server load for a page, eg: NewPP limit report?
  • How do I check what the list of environment variables ($smwgInlineErrors) are?
  • What is the expansion limit for client side autocomplete? The forms docs say the default is 1000. Is that 1000 strings for each field, or 1000 for the entire page?
  • Bug autocomplete (top of screen)?
  • Is there a standard way to stick wikitext in a text field?
    • nowiki does not work, nor does pre. "not recognized" error.
  • Images stuck in properties of type page attempt display. Isn't there some way to indicate the right hand operands? EG image size, no display (prefix with :), etc? Currently, the default is |frameless|border|text-top]].

random bugs[]

  • templates that mess up wiki banner These are due to div unclosed errors and may be debugged with
    • template:documentation
    • navbox (probably tbar)

unrelated to smw[]

  • wikia header bugs- {{SMWintro}}- if no ::: indent, then ads display fine. With :::, then you must put a new line between the no include and the documentation template call, or the ads go wonkey. Who knows why- the divs are balanced in the documentation template, so it isn't that.
  • {{citation}} displays links wrong. There are tiny globes and the urls are displayed.

Major stuff[]


  • approach1: stay with the subpages thing, (this works because move takes all subpages).
    • In place of a single /info page, there are multiple /theory or /version pages. These correspond to an alternate theory of parentage for controversial individuals. They also more commonly correspond to alternate Gedcom records, many of which are for the most part redundant repetitions of the same material. Merging them becomes problematic since they may have been embellished on with some useful information prior to being re published with a new unique gedcom identifier.
      • To represent a gedcom file, a version would have links directly to the /version page of children, wife etc. Later, someone with a genealogy bot tool would be able to make assisted fixups of these records, collapsing them as possible. The gedcom ID, and possibly a signatureWp globe tiny.gif of the key features of the file would be stored so that further copies would not be redundantly collected over and over.
      • Theory files would be directly queried instead of the main article page.
    • Every complex data item becomes a subpage, eg: events, residences, marriages, children by marriage(?? actually, it may best be an #ask for Father=X, Mother=Y, but it is unclear- their are complexities to doing it this way- see issues section.) Note: this will generate a lot of clutter for an article. No better way to do it because we don't have aggregated objects (effectively, there are no n-ary relations).
    • Actually, there are n-ary objects "many-valued properties". However, the current implementation has some limitations that make them unacceptable:
      • You cannot use the special Allows value property to limit the values of any element of a many-valued property.
      • You cannot use the special Display units property to control how a specific element appears.
      • You cannot set the layout of the values; they will always appear as a comma-separated list.
      • You cannot create a timeline query of many-valued properties.

  • variant to 1: All complex objects are subpages, but instead of putting the fields on an info page, put them directly on the main page.
    • objection- this disallows the alternate theory approach, where each /theoryN page is switchable by preference.
  • approach 2: KISS (keep it simple, stupid). The main page carries the properties for the individual. These are the ones accessed
    • These may or may not be copied from the theory/ versions subpages but in any case the field names would be identical.
      • Note, the main page is also the English version. Issue for multilingualism?
    • It doesn't really matter where the properties are stored if they are accessed with a get function. The get function could access what the dominant theory is etc. But also the theory/ version could be explicitly stated.
    • At first glance this is the most untidy engineering approach. Much nicer if each aggregate data item is stored in a separate page, then common fields like date are not redundantly replicated (birth date, death date, Occupation1 date,...) but are instead are a generic date property of a birth, death, or occupation subpage.
      • Problem 1: canonically correct object orientation introduces complexity for template and tool writers, and this is not good given that many are enthusiast/hobbyists. Even simple queries are complex, for instance they will have to do indirection to get an any value. Eg: (#ask (#ask for the birth page name) property from the birth desired)
      • Problem 2: bot tools will find it much easier to deal with simple data models (one page per individual), rather than have to do IO to get at a potentially huge list of separate pages.
      • Problem 3: Competing theories on different facts is an additional dimension of permutations. If our model is a separate subpage for the death event, then what happens when we have an alternate theory of the death event? Do we create a theory subpage of that death event supbage? Whoa.
      • Problem 4: Our model for multilingual is to store unique items in a /(lang prefix) subpage. This means that each of the event subpages would also have to have a language subpage. Not pretty.


  • No watchlist notification Too much dynamic updating could present some undesirable side effects. For example, one way of doing children is to not have the parent article declare the children and the children redundantly declare the parent. Instead, have the declaration be in a single place- from the child article. That way you can query for has father and has mother and get the list for one set of children by one mother, and so on for each mother. The problem is that the article changes with no watchlist notification, so the user may not be aware that cherished articles are being changed.
  • Interaction with theories. Given the child parent model example above, does the child link to the basepage, or to one of the theory subpages? If theory page, what if some are dissimilar in unimportant respects, and the child is of theory 2, 5 and 7 versions, but not the others?
    • Maybe it is a good idea to redundantly link these, and fix inconsistencies via manually operated bot. EG: Father theory1 declares who it thinks the children are for each union, and theory2 declares another list. From the child side, they declare the father BASEPAGENAME from each theory page, enumerating any of the father theories that are excepted.
    • Social factors play into this- you want to have a site that is welcoming, but if you were ever in a bar with military guys, folks will quickly come to blows about whether one guy is being a phoney and claiming he was in a unit or participated in some action that he was nowhere near. So you have all these different versions from different perspectives. You have the Collective event stored in the Event: pseudo space, then you have all these assertions coming in from all these different articles asserting they were there. The military history buffs maintain the event integrity by listing the alternate theories etc, but have no interest in mediating these disputes. So the article declares the dominant theory of who it thinks were at the site, but also does an #ask for all the person Theory pages that claim the person was there. A bot shows where there are inconsistencies that the contributor may or may not want to resolve. The point is we make the addition of content relatively painless. Resolving such problems can be deferred if we have a good manually assisted edit tool.
      • The alternative is to have everything connected, and the military article only knows who participated by the soldiers who assert it. Similarly, the guy interested in just adding the article on his grandfather is not required to create a military battle article with all the particulars of that. He doesn't want to declare the battle article just to record the information he wants in his grandfather's article. If everything is connected, a lot of this work can't be deferred. In addition, the newbie could be subjected to all sorts of community pressure that he is mucking up articles on collective events (military battle) with unfounded assertions (that his great grandfather single handedly took the hill and saved his unit from certain annihilation).
  • Coordination with RDF- we will use accepted genealogy RDF and other data model ontologies where possible, but our focus is not on solving the database heterogeneity problems in the genealogy community. It's a very hard problem generic to all databases.
    • Background: accurate mapping of fields in databases is a hard problem that has not been solved even in where the database is used by the same company, using the same software with the same syntax and the same schemas. For instance, the interpretation of the fields is oftentimes dissimilar between operating groups and so analysis is impossible because apples to apples comparisons are impossible- eg: what is/ is not included in net revenue figures? One operating takes some operating expenses out of their net, others exclude these costs in order to inflate their apparent success in generating revenue. The field is the same, the entire record is the same, the software is the same, yet they cannot be meaningfully compared.

My thinking on this revolves around the question of which approach gets people to collaborate on high value common articles, rather than multiple essentially "owned" private copies of the same individual that from the POV of the contributor there is in fact disincentives to merge/ collaborate on. Some of the current ideas are recorded [Forum:Google_rank#Dutchies|here]], but implementation of gedcom import has not begun so it is early.

Switchable views[]

  • Scope of investigation: Limited. Investigate this only to the extent of determining that we are not going to paint ourselves into a corner for future options. Find out how I would likely do it in the future.
  • The solution probably has to do with the way we do user preferences, and so the only way we have of doing that sort of thing is through css and .js - like how we do the date formatting thing. We set a state, the css sets the formatting based on the state.
  • Basic technical field of battle challenge: Due to sever loading issues, you can't affect template code or otherwise generate custom articles per user. One way is to generate all views, and unhide the one that the user prefers. If the user disagrees with the dominant theory, they put in their preference of theory subpage as a property of the basepagename. The value is expressed in the html, and the client side .js code turns on or off the rendering of the data depending on which theory is preferred. That is one way of doing it.
    • The template emits span display none for the non dominant theories. So folks with not logged in/ without accounts see the dominant view. classes are assigned to these spans so they alternately have the display non overridden, and the dominant spans hidden. This is the work of the .js by consulting a hidden list of which theories are preferred by which users/ club names. Your preferences might state- show me the dominant view except in cases where this list of people think it should be something else: User1; My User; Project name; local genealogy group name; These groups might arbitrate among themselves what the more correct theory is. So subgroup collaboration can express an effective minority position.


The forms code has some autotranslate thing that I didn't read up on. Have no idea what they are doing, but in any case, our problem is bigger, because now we also must be polymorphicWp globe tiny.gif not just with respect to the user's theory preference, but their language preference. Whoo boy.

  • approach1: Main page alternate languages can be base pages (eg for alternate fonts- chinese, greek, cyrillic), but they point to the subpages of the english BASEPAGE, using the language prefix as a subpage eg /fr. Example info page approach for the obama article[1].

RDF Ontologies[]

  • GEDCOM RDF mapping [2]

SMW stuff that doesn't work[]

  • Text input allows text values from a form that will mess up display of the page [3]
  • No way to input wikitext into a field.
    • nowiki does not work, nor does pre. "not recognized" error.
    • Hackaround: It would be possible to hack set values eg: ((wp>USS Monitor))could be parsed and {{wp|USS Monitor}} could be displayed.
  • The docs say the following doesn't work, so don't use examples from some sites that may use them:
    • Inverse properties , eg siblings[4]
    • Domain and range restrictions, eg Father[5]
    • Number restrictions and functional properties
    • Transitivity (?)
  • date type requires full date. This is lame because oftentimes we have year only, year month , or circa type dates. (hackaround is to offer year, month boxes for partial.)
  • Forms don't support all table options. eg. background color style, see Form:Demo1, the place subbox should be background light green. Possible hackaround is to use html td code, and set the css fieldset and legend backgrounds to transparent. Fix is to look at what Yaron is doing in the php, but I won't get to that for ages, and this is sort of cosmetic stuff anyway. There are complex browser issues since IE apparently does things differently and special case code is necessary (what a shock). For instance here[6].

Missing special pages[]

  • No upload ontology
  • no upload vocabulary

Bug list[]

    • Nary does work? Mentions of this in Bug11411 can't have an n-ary relation composed of an enumerated type [8]

Bugs I ran into[]

  • if nowiki /nowiki span is passed into a field that will be processed by arraymap, the page will not save. Further, I noted that on the occasion that I tried this that many other SMW person pages refused to load until the problem article was restored to normal, suggesting a high severity bug.
  • textarea doesn't pay attention to the size field in order to clip it to a smaller size. It pushes the layout past the page width boundaries unless great care and table gymnastics are resorted to. workaround is to assume the default size and to work around it with cells. See long form families table especially. The sources and notes text area fields interact severely with the side picture without great care.
  • complex layout causes form to "forget" field attributes EG: Partial form set death does not pick up default values for death date-approx or calendar. Works fine on main form. Image width has correct values set in property, but if it does not redeclare it's property, it will not display the pulldown list.
    • workaround: declare property= on the field statements
  • declare does not work if there are spaces after (perhaps before) the equal sign.
  • date displays time 0:00:00 on second, third, forth.. Forms when editing with Form:person long form. If the page is instead edited with form:set death or form:set wedding1, then the value is set properly.
    • workaround: clip the minutes seconds using the #time function in the set templates. This is a temporary patch. #time needs a lot of fixup to handle single digit years etc. The real solution it so fix whatever the extension is doing. May be a bug, or caused by it getting confused by some of the table formatting stuff I put in the form to make it look less voluminous.
  • Upload bug when loading geer jpg
PHP fatal error in /usr/wikia/source/releases_200905.1/extensions/wikia/WikiaSpecialUploadInfo/WikiaSpecialUploadInfo.php line 19:
Call to a member function getTitle() on a non-object 
  • if you add and underscore to a parameter name instead of a space in an article, the next time the article is saved, it will repeat it as a parameter to other unrelated forms with random values. Example: reload [9] as current version then do a null edit using the person form.

Limitations & workarounds[]

  • As of the time of this writing, for autocompletion to work, it needs pages, not strings. Setting autocompletion on property= some page property works fine, but not with some string property. Properties do not pick up subproperties, so it seems to me that categories is the best way to go about this- otherwise you have a huge flat namespace that you have no option of segmenting in the future (eg just fill in with counties in Scotland).
  • Autocompletion is fine locally when there are a small number of values. With large numbers, the page load can be very slow, and you may run into a 1000 item max (not sure this is per page or per field). Remote autocompletion has no limits, the page loads faster, but the autocomplete is slower.
    • workaround plan: autocompletion on category|remote autocompletion.
  • As of the time of this writing, autocompletion box displays at the upper right corner of the page. It is hidden if the page is scrolled. This is very bad on a large page.
    • workaround: position all autocompletion fields on the first screen full of data.
    • Do not place free text box at top, move it to end of article. This will make regular wikitext reading of the article disconcerting for typical WP users.
    • recommend users go to smaller forms for most of their editing.

The following section is obsolete after the discovery of how to use stringfunctions to crack the page value so that a string may be extracted. See "Major discovery" below.

  • If a property is a page, then it can't be used for conditional ifs. If you need it to be a page, to get it's value, you have to store it as a string property. This makes code complicated. For example, "La Salle county (Texas)" can't be used in a line that displays city, county, state, because you duplicate texas. So you need to do a [[La Salle county (Texas)|La Salle county]]. You can make the short name a property of that article, but if the property is a page, you can't do the decoration with square brackets. Same problem with displaying surname. The article name as (surname) postpended. This shouldn't be displayed, but to strip it, you need access to the string value.
    • Never store articles as a pages. Always use strings and use square brackets to display as a link.
      • Counter argument: Properties of type page allow redlinks to go to a default form input. We are doing people that way, so maybe we should do everything that way, and just use the & template to do the dereferencing.
  • My old Info pages stuff is much more efficient than SMW at Query intensive operations. EG, with a version of Jan Willem te Kolstee (1830-1895)], a refresh can take up to 4 minutes and in many cases will time out. Processing tree is exactly the same- the only difference is that it is doing an #ask for a parent instead of a template call to return the parent from the info page. This uses the SMW version of the showinfo ancestors code. If you have 4 minutes to blow waiting for a page, click this version of the Jan Kolstee article that uses the SMW code.

Weird/ anomalous stuff[]

  • {{#ask: {{{1|}}} | ?pagename = }} returns lots of pages. Does it recognize pagename as a key word? or is it undefined when first operand is null?
  • form weird syntax: {{{info|page name=<Author[First name]> <Author[Last name]>}}} in page
    • "author" is the feeding template name for the form, and first name/ last name are parameters from it.
  • Gotchas
    • if you forget to insert the = after the property name, it will print out the field before the value:
    • If you put a link in a text field, (eg: [http:blah blah]), the property will not be stored, and subsequent #ask's won't work. This may be true if there is any wikitext in the field. The template page will display the property set statement, since it is not executed. Solution may be to urlencode all text fields, then unencode them.
    • When you save a page that has properties, it nukes all previously saved properties for the page. So if you remove a line that set the property, the property won't exist anymore when the page is re-saved. So on every save, every set property statement must execute with prior (or new) values, or they go away. Weird.
  • properties with multiple values are delimited by commas which in many cases cannot be substituted (you can use sep if outputing lists though). This is a fatal problem for fields that have commas within them.
    • workaround: if the multiple values can be output as pages, then what you do is #replace string ], with ]; (or other unique delimiter in place of semicolon. Then you can process normally using arraymaptemplate. I do this for child list processing.
  • Use of semicolons as a delimiter is deprecated. If ever there is text with an html entity eg &lquo; or &193;, then the parse will screw up totally.

Cool stuff[]

  • Timeline output (for all subprops of event date) [10]

Complex types[]

  • Marriage eg. Edward Riggs (1589)/Holmes-Riggs
  • residences (big because of censuses)
  • All other events with multiples: Occupations, Education
  • Migration event emigration: (from country, ship, ports), moving: mode of transport, reason, motivation
  • Citations? whoa. That certainly is complex, but will the user have to create a separate page for every one of these jokers?

Small pages[]

Something goes against the grain about having all these tiny pages sitting around. The tough thing about these events naming them.

  1. They aren't owned by one person- eg the husband. What if it turns out that the name of the person gets changed etc. Do they then have to move 20 of these bittey pages every time they rename? Hmmm. Maybe a bot can do this. And who owns them? Does the Husband own the marriage event or the wife? If the naming uses both people's names aren't you just doubling the chances of move due to a changed birth or death date? OK, maybe these events are owned by one of the parties- doesn't matter who- it's just a unique name.


  • Opening via form on redlink- To property "occupation"(s), add value has default form.
  • ??pagename should be a subpage, but user has entered the page name. So they could muck it up. OK. Maybe they enter a string, then save the form. On next form load, the template doesn't display the string, and instead creates the real field of type Page, decorating it with the proper prefix.
    • Note that for shared events like marriage or migration, say the husband already created a marriage page. Wife needs to poll the husband to see if a page already exists and link to that first.
  • Maybe there is a smarter form way of doing this.


  • Pro1: Removes a tremendous amount of clutter from main page.
  • Pro2: Tidies the Property namespace (no occupation8-locality, etc.)
  • Con1:This could potentially disrupt workflow. You have to open a new page to type in marriage info.
  • Conclusion1: Being a subpage doesn't imply anything. It has to go somewhere, and being top level means it will collide with other similar names. So keep these complex types as a subpage, use some logical naming convention, and call if folks want to get fancier later, they may do so.


  • Be careful with template logic and the #declare statement. Recall that declare accepts template parameter names, not values as the right hand operand. Template:Set families checks for value=Yes, but returns parameter name.
  • Semantic forms and SMW occaisionally have differences. For example, a checkbox on a boolean parameter may return "Yes/No" as the parameter value for the corresponding template parameter. However, SMW will return a boolean as either "true/false", not "Yes/No". This means that logic on the form template parameters must check for Yes/No while SMW logic must check for true/false.
  • you can pass arbitrary numbers of parameters using format=template. eg:
Template:Mycoolformatter will then be called with parameter color=blue and parameter width=100%.
  • to have a query return the page name not as a link, but as a string, add the operand "|link=none " to the query.
  • Major discovery- It is possible to derive the string from a page even if it has been returned as a link. You can use string functions on a returned page link, and it turns out that it is just wikitext for a link with bar and right hand value the same as the left. So you basically divide the number in half, minus the decoration characters, and you have the substring offset.
{{#sub:{{#ask: [[George Spencer Geer (1836)]] | ?father=}}|3|{{#expr:({{#len:{{#ask: [[George Spencer Geer (1836)]] | ?father=}} }}-6)/2}}}}
returns: ble class="sortable wikitable smwtable">  [[:George S
This means we can use pages with a great deal more freedom. They are never opaque, even redlinks.
Namespaces not 0 For pages not in namespace 0 the string is at offset 2, not offset 3 as for ns:0. I don't know why you'd ever want to use type page for an image, because it tries to display it. This might make sense for a wiki with images that are already the desired display size, but they seldom are. Examining the decoration, I see that they are constrained, so this is ok to use. The decoration currently is: |frameless|border|text-top]].
  • Subproperties means you can search for groups. EG if birth date, death date, marriage date are all subprops of event date, then searching on event date picks up hits on all of them.
  • Forms have a preload option mw:Extension:Semantic_Forms#Preloading_data
    • red text with preload- passing prefilled template values. (EG all the reflexive properties- husband and wife passed to child, common parents sent to siblings, father mother passed to parents for coparent fields... ) We don't do the expensive ifexist call, but do a #show <article name>|? sex as the property. If there is null return the article doesn't exist. Then using mw:Extension:Semantic Forms#Preloading data template-name[field-name]=field-value, we pass the values via url call- to Special:AddPage. Then make the link red with span style="color:red" . The red text is not produced after the fact, but integrated with the query that is filling the table with values. This is to be integrated with the localized name mapping that is done for populating tables (Henry I -> Enrique; The Hague -> Den Haag). We do the existence check and this red link procedure at this time.
  • There are a set and declare statements: [11]
  • default form for a namespace is set at genealogy:File, genealogy:main and so on...
  • The inline query parameter "default" is somewhat nonintuitive. It does not fire when there is no text, it fires when the query portion of the statement fails. They are not the same. Consider the following example article that does not have a birth locality property set:
{{#show:Abraham Hunsberger (1786-1860)|default=unknown locality|?birth locality}} returns "unknown locality"
This query does not display "unknown locality" as one might suspect. Instead it displays a null. This is because the query portion actually succeeded. There is indeed an article "Abraham Hunsberger (1786-1860)", so for that reason default text was not invoked. To get default to fire, we must construct a query that will fail. We ask for birth locality to also have some value.:
{{#ask:[[Abraham Hunsberger (1786-1860)]] [[birth locality::+]]|default=unknown locality|?birth locality}} returns "unknown locality"
This presents intimidating syntax for contributors. It would be possible to do a template such as
{{#show:Abraham Hunsberger (1786-1860)|template=if blank{{!}}return=unknown locality|?birth locality}} returns "unknown locality"
Where template {{if blank}} simply prints value of parameter "return" if there result of the query is blank.
  • Care must be taken in query templates because hidden values may be stored in strings returned from queries. For example, apparently identical strings will not evaluate as #ifeq because there are hidden property settings SMW::on and SMW::off. After these are stripped with #replace, the evaluation operates correctly. See for an instance of this in Template:Showfact/aux1.


  • July 22: Template Header is problematic. Showfacts person is always placed prior to whatever is in the free text area of the semantic form. Maybe there is a way to bypass, but for now, tabs lang goes in showperson facts, and the bio smw is moved after. Semantic forms does this with or without the free text area- it always rearranges the article, so a form either manipulates stuff at the very top, or the very end of an article. (maybe not. What about partial forms?)
  • place: locality= town, state, country, coordinates- all use Keyhole markup languageWp globe tiny.gif constructs
  • date: Lets assume not datetime since only astrologers really care about the time of day
  • calendar- Julian dates are implied for the typical period prior to 1582, but we will need calendar property for Roman/Julian/Gregorian.... Chinese, Hebrew and Islamic calendars will be important when we get some folks interested in those.
  • ISO8601/ dtend stuff. I see no need to normalize all dates into the Proleptic Gregorian calendarWp globe tiny.gif as required by ISO8601. Nor do I see any need to contort dates to fit non exclusive date requirements. microformat dtend means event ended before dtend. So death date is day of death plus one. No, I don't think any of our users will understand that reasoning one bit.
  • adopted. Ok, sensitive subject. Genes= family? Or does love/family culture = family. Is genealogy about nature or nurture. Well, both. Ok, so we need to make genetic inferences with the ydna and mtdna stuff, so we need to know birth mother, and birth father. So these are implied values from has_mother property. However a person can override. You can have_mother one person, but have_birth_mother someone else.
  • ok, all relations should go from bottom up due to recommendations from SMW folks. It makes queries efficient/simple if you say town "has location in state" Michigan, rather than enumerate all towns in the Michigan article.
    • Implies, we do children with a query for all articles that Has_father foo, or has_mother bar.
      • This means that every pretender to the throne could potentially assert a new spouse for the king, and the way that you find out spouses is you query for children, then ask for mother. Technically, I see the reasoning of not redundantly storing the information, but maybe they cache it. Like every time you save, it stores the query result into the property. This gets into the postgres triggered values problem where you would have all these queries cascading, potentially melting the server. I wonder how they protected against that. Do we do it the abstractly clean way, or the straightforward way tht doesn't involve any learning curve with the data model?
    • How do we do implied queries. Is that how the implies statement works in birth date (implies birth year)?
  • Why do I want to declare as a page? I get the autofill in thing but I can't do ifeq's on it because it has the wikitext decoration on it, so the evaluator doesn't see that they are equal. See query on govwiki user page of Harry Reid. It doesn't match committee.
    • Is there some way of lifting the name property from the page?
  • What is the load on the servers with all this crap? I hope the stats in the html are accurate...
  • Ok, complex data types they recommend going to a separate page. Might try that on a few, but geez, won't that make querying a little complicated? First get the union (mating partners) node, then get the marriage date, place attributes, and the evidenciary data backing up those assertions.
  • Family tree DNA order for YSTRs. There are FTD-12, FTD-25, FTD-37 and FTD-67. Each is a subset of the other, but if we just put them as strings, then we can query for all ftd-37= 12-24-14-11-14-16-12-12-12-13-... etc.
  • has property foo: we can change all names to has_father, has wife, etc. but for what purpose. All these fields are "has a" relations, so what's the point. Why not keep it short and sweet? Why bother users with the KBMS theory? Maybe only use with stuff like "part of union" when to use the property alone would be ambiguous.

Potential case studies[]


  • very elaborate gedcom origin record [12]
  • with _UID, !LINKS, !OCCUPATION: embedded GEDCOM field names [13]
  • Raw gedcom with sources, evidence/ event types, indirection examples [14]

DNA / YSTR / Haplogroup[]

  • Family tree projects

Development sequence[]

  • First work used subpages to store generic properties. Form:SMW-test3 created an event subpage eg George Spencer Geer (1836)/death
    • Since such compound structs are not possible from the main page, this was the best "clean" approach for data aggregates eg, death.state etc. Although it is best practice from an engineering perspective, and there would have been an economy of property names, it was abandoned for many reasons not the least of which it would have been more difficult for novices to access the values. The approach inherently requires indirection, and therefore code that references data must do a couple #ask's. Further discussion may be found above in the Model issues section.
  • Main page forms and partial forms were shown to be able to edit a normal article, with large volumes of properties. Forms were shown to be reformattable for more attractive UIs . Partial forms were shown to be able to produce bite sized chunks so that the user is not overwhelmed. New article form "short form" was demo'd for simplicity of initial article creation.
    • Double flushing emerged as a problem. Solution might be to merge the set variables templates into the infobox display. eg:
      1. infobox header,
      2. set birth cell template
      3. set baptism cell template
      4. set death cell template
      • This design would eliminate double flushing, but for items displayed elsewhere on the page (eg gallery section with images from birth, baptism, weddings)- these would require a double flush.
      • Re-ordering: editing with partial templates will reorder the sequence of the infobox cells. EG using form:set death would place the death cell prior to the birth cell. Bummer.
      • Alternatively, one could tie the setting of elements directly to the display of those elements. So if there were a gallery section, Wedding1 photos would not be set with the wedding1 dates and location, but in the set gallery template. If everyone agreed to a standard layout, then this might be workable. From the researcher standpoint, it makes data input workflow more haphazard, since it would not be possible given the current code to display the wedding1 data together, since they would occur in different templates.
      • As of 10 May, the idea is to tolerate the double flushing at least until the reordering problem is addressed.
  • "Everything" Form (Form:Person long form) cannot be burdened with rare and voluminous items (eg occupations 1-8, Military events 1-8, Weddings 1-8, Residences 1-8 and so on). The idea is that for the main events, indicate the first event on the main form and to use partial forms for the overflow. Less common events like Bar Mitzvah, sealing, adult baptism etc won't have any mention on the main form.
    • can forms be launched from forms?

Field renames/ changes[]

  • Spouse -> Partner, Children-s1 -> Children-p1 etc per Property talk:Spouse discussion
  • fm-children-S1 fm-children-p1 (no caps in property names)
  • nuke all fm-emigration
  • fm-attendees -> fm-people involved (generic message- must cover cases like remains)
    • fm-migration1 people (if message specific to event is needed)
  • form fields can apply type enforcement ("allows value") if specifically designating a property. This should use the superset property declaration so these may be easily changed in the future.
    • ...radiobutton|property=migration1 date-approx -> becomes date-approx
  • remove superfluous dashes eg date-approx becomes "date approx" (date modifier)
    • Actually- nuke date approx, replace with date modifier.


  • all events shall now place event at last so that numbers may be postpended. This applies to all events
    • This change is no longer proposed. Reason is ease of use for inline tagging. See section below
    • wedding1 date becomes date wedding1
    • "birth locality" becomes locality birth
  • Sex -> gender per foaf. Unnecessary. sex is three letters, and people are used to it.
  • property People depicted-> Depicts people? (FOAF Depicts) depiction? people depicted is natural and will be a synonym of foaf name


  • weight, height, DYS YSTR values each as separate fields.
  • ethnicity?
  • Property:Alternative father1, or Father-a, Father-b etc all as subprop of some "fathers" supercat? Or maybe just father? (maybe not the latter, since you wouldn't be able to search just on the likely father, which would be the meaning of the father field? maybe not numbers but letters so that dominance is not is implied by numbering. Do we set Father as dominant theory field or do we have a dominant theory field at all (EG not "father" if there is any controversy. Father becomes father-a, alternate becomes father-b....) Hmmm.
  • Matrilineal line eg:eve project. Name by oldest known ancestor? or make one up? or base on mitrochondrial number of some sort?

SMW does not mean ancestors are reduced to statistics[]

  • It is not required to use forms to add structured genealogy data. This information can be added inline, rather than using forms and person infobox. For an example, see Agnes Margaret Mucha (1893-1965).
  • The norm for many genealogy sites is to reduce ancestors to lists of tabular material, and much of this is driven my the way database software works. The obvious nexus between wikis and structured databases is Infoboxes, and that naturally has been the focus for microformats. It also presents low hanging fruit for SMW, through use of the Semantic forms extenstion. However, the core of SMW frees wikis from the tabular approach. It is entirely natural for family members to present the story of their family as a story, and will prefer familypedia on that basis. They may shun the tabular approach, but SMW entirely supports that. Full narratives that happen to have structured data within them.
  • It may turn out that we want to suggest that everyone use an infobox for quality reasons (less chaotic look and feel). However, even in that circumstance we might have people putting some optional information inline so that the infobox is less cluttered.
  • Observation- this affects naming. From an engineering perspective, the data types are central. Dates are a general type, with multiple forms but they are all dates. Events simply are the variants eg date birth; date death; date wedding; date yada yada yada. But from the user's perspective, the events are central, and the various details about them are the variants. Birth date, birth county, birth state, birth people present, birth notes. Maybe we keep the naming the same. Today, 25 May, I think so. Okay. This would chuck the whole inversion thing. It would be wedding1 date etc. Hmmm. I suppose the variant can go in the middle with not that much difficulty. Code will look a little uglier, but what the heck. Ease of inline naming is more important. I just don't know that folks will do it that much, or that we want that to be become a dominant way of stating things. Hmmm. If the community decides they like inline,then we would be painted in a corner if they wanted to rename all the properties because that would be really tough. So let's name assuming ease of inline use, and just accept the slightly greater complexity in the code. Face it, no one touches esoteric templates anyway, and in the grand scheme of things this "complexity" is trivial for an experienced template writer.
  • Ease of use and simplicity of template coding is possible because Properties support redirects. For example, property:birth nation-subdiv1 is technically accurate, but user hostile. We can have Property:birth state Property:birth province and Property:département de naissance for the purposes of inline coding, and all will map to Property:bith nation-subdiv1.


I don't see any harm in keeping up to date with this specification, but it is way immature (they have first name as well as given name, surname as well as family name...), and makes some requirements (eg surname is a string, not a page) that we don't want to observe. However, we should keep up to date on these because this will relate us to the larger world, and we want to make sure our semantics of usage is as close to theirs unless there is a very good reason why not.

Descendants/ Ancestors encoding[]

General problem definition:: Exponentiation expansions are generic to this problem domain, so optimizations will be necessary regardless how strong our software engine is at any point in time. In general, we will be using local processes to offload this to the client machines in order to execute massive depth searches or network walking that would time out on the server.

As of the time of this writing, the Semantic mediawiki engine allows traversal of the tree of relationships, but after about the third generation (either from top down children tree walking, or bottom up reading of father mother links), the query response times become excessive.


  1. Ancestors: Cache the ancestors tree in a single string field for each person. For example, cache 3 generations of ancestors using the string packing method I developed for Showinfo ancestors (actually, you could do 5 or eight generations too, but 3 seems like an ok place to start out). This cache can be reset from the form, and is not reloaded every time. This way, an ancestor tree could be loaded in lightning fashion because all you are doing is string operations not hitting the disk like a query would. The way the cache reset button works is that you put a radio button on the form called change this setting to reset ancestors. User clicks it, the value is passed to the template. The template code executes, and compares the setting on or off to what the stored (#ask) setting is. If they are different, do a refresh and update the property value so the next time the function is called, the values will be the same and so the refresh won't happen. Simple.
  2. Descendants: Alternately, by storing the values as multiple value properties, it would be possible to do set operations on descendants or ancestors. For example, to see if you are 2nd cousins with someone:
    • Assuming a number convention from parents as generation 1, then a 2nd cousin would share a generation 3 ancestor. Query for ancestors with ahnentafel number from 2^3 to 2^4 minus 1. Compare with set from the second person, and you have the intersection with one query. Pretty neat.
    • Why a mirror descendants tree? Eg:
      • Inbreeding analysis: Compare the list of descendants for all brothers and sisters. eg descendants brother1 AND (descendants brother2 OR descendants sister1 or sister2, etc). Any intersections are inbreeding candidates.
      • Social networking: list all the known living descendants of a given individual living in your country, in your own city. Maybe much of this would be private names. Ok- so list all those who died in the last 50 years. You could do that by anding the death date with the descendants list. Right? You can AND a Page list with an N-ary list, right? TBD- I need to test that one to see if they implemented it....

20 May Observations:

  • The ancestors list as an N-ary tree is probably what AMK and rtol need for analysis. For exhaustive reports, they probably also need it expressed as a string since it would be able to process very deep trees.
  • It definitely would be handy to have an external process walk these trees to compile these lists, since they can visit different articles and independently update them. Templates would be very challenged to do that. AWB is looking more and more necessary.

Test observations[]

  • Wow. Look at the expansions- this caching could be expensive. Adrianus Korver (1788-1846) has 7 children, and Pieter Korver (1817-1870) his son has 6. So if that's an average, then Andrianus has 42 records and that is just for detecting marrying cousins. If the pattern holds then you have possibly 246 descendants in gen3, and over a thousand in gen4. Let's assume that storing is not a problem- Worst case for gen3 you add say 1K descendants times the 20 bytes for a page name, so 20K per person article, times 100K people is 20 gig. (times 20 cents/gig = $4).
  • Ok, how about processing. You could hack up some things to do this for a few generations deep but ultimately you come up against some nasty limits so you really don't have a uniform solution with any kind of future. One tractable way of going about this is to do it externally via AWB and cache intermediate results at the client side, then store it back in a descendants or ancestors fields either using template style, or using explicit double colon style. OTOH, you are tied to a tool that not many people will be able to run, and has an indeterminate future. Maybe an iterative cascading template approach would work, where each template just does a little bit of the problem- say just goes to 2nd generation of processing, then a 2nd pass picks up the results of the first pass and concatenates them, and so on...

Walk through 1[]

  • Gen0 Adam
  • Gen1 Adam 2; Eve 3; child's ancestors: = 2^1 (+1)
  • Gen2 Adam 4; Eve 5; Fred 6; Ethyl 7; Darren 2; Samantha 3 ancestors: take Gen1, father side and add 2, mother side add 4.
  • Gen3 Adam 8; eve 9 Fred 10; Ethyl 11; Derwood 4; Samantha 5 Formula:
  • Gen3 (adam on mother's side:) Adam 12 Eve 13 Fred 14 Ethyl 15.
    • If ahn <4 add 2^1 if male, +2^2 if female
    • If ahn <8 add 2^2 if male, +2^3 if female
    • If ahn <2^4 add 2^3 if male, +2^4 if female
    • Refined: note the centrality of the generation#. The general rule is you always add 2^generation number if the tree is from the father, and add 2^(generation number+1) if from the mother. So how do we derive it. Ok finding the exponent of 2 is a log base 2 transform. #expr has logN (ln), so we can do this. To get log2 from ln, multiply 1/ln(2) times the ln of the ahn# and you have the log base 2. ln(2) is about 1.44 so we just multiply with that and round down to the nearest integer. That's your generation number (exp2base)
      • Formula: exp2base= floor( ln(AHN)*1.44 ) btw- floor means round down to the greatest integer less than x
        • ahn=1734, generation = 10
          • adding 1734 from a father's tree becomes 2758
          • adding 1734 from a mother's tree becomes 3782
            • calculation: {{#expr:1734 + 2^(floor( ln(1734)*1.44 )+1)}}
      • this power of 2 "exp2base" number is just another word for Generation #.

Ahnentafel observations[]

Volume test observations[]

  • Nothing seems to break. Very large Ahnen values can be stored, and very large numbers of ancestors can be crammed in the Ahnen field.
  • Querying Property:Ahnentafel (currently it is a many valued property) is problematic.
    • We'd like to be able to treat the N-ary property as if it were a normal query result. For instance we would like to process a list sorted by article name, or by Ahn value. Well, you can't do either as far as I can see. If display is all you want, then no problem- chop it up and put it in a table with class sortable.


But if you want to cull duplicates, you need to do successive string searches. If folks wanted to allow multiples eg the person is both their great grandmother on their father's side, but great great grandmother on their mother's side, then you'd do it this way. IMHO that is a rare kind of demand, and it is more desirable to keep the list as compact as possible and cull any duplicates at save time. So as we are adding, we just take each mother side ancestor and do a #pos on the father side ahnentafel list. If there is a hit, we don't add.

See interjection at Patterson (Talk) 05:12, 26 May 2009 (UTC)

Phlox Mining operations[]

  • Currently the good doctor is involved in paleontological expeditions in the UK, India, Oz, and the USA. The goal is to extract the placename semantics that familypedia needs.

SMW enhanced location templates[]

  • coor dm, and dms were modified to output Coord probably. It does not know if these are single coords (which in most cases they are, and indicate the location of the placename that is the subject of the article), or whether they are on a page with multiple coors- eg a list of mountain locations.
  • coor title dm and dms were modified to output Property:Coord, since this template is for indicating the location of the subject of the article.

Geographic infoboxes[]

The goal of the modification of these infoboxes is to extract semantic placename information.

  • Extract values that correspond to our country-state-county-locality hierarchy
    • Map to Properties Property:locality of county, Property:county of subdivision and so on.
      • The purpose of this is for querying. EG: the Gedcom says country X. I have a county name and what looks like a town name but they could be one of many different places. What is the valid set of counties for country X? What are the valid set of localities for County Y?
    • Create a clean category tree for use with Autocompletion feature in Forms. EG: Category:Valid name for county of Georgia (U.S. state). The category structure was created in an ad hoc way by wikipedians and was not intended as a disciplined structure for database error checking purposes. The Valid name structure has these rules:
      • Names of places in the category structure correspond directly to disambiguated article names eg: Georgia (U.S. state)
      • The category structure is restricted for naming use. No ancillary information such as subcategory Maps of County X. That stuff goes in the normal categories and are inteneded for this purpose (Category:Counties of California) etc.
      • The tree structure is uniform and globally applied, following the country-subdivision-county-locality naming convention Note subdivision was substituted for state and has the same generalized meaning as Property:State. Variation in naming of the categories (for example for localization) should not be necessary since these are invisible categories. Such variation will not be permitted until we are sure that Autocompletion and error checking functionality will not be impared. That is the primary mission of these structures, and if people imagine other uses that stand in the way of that goal, then that is fine, but they should implement such features in a different category tree.

Discussion on extractionOur genealogy structure is simple and has to do with the places we know from documents. Country-country primary subdivision (Province/State/Canton)- "Localities" (village to City) and the entity in between these last two, usually corresponding to "County".


Clearly, we need some facility for refreshing some content from Wikipedia. In particular, during the last year there has been considerable activity geocoding places with highly accurate coordinates. We normal way to refresh content without destroying work that Familypedian contributors have added. It seems to me that the information that we really require to be refreshed in an automated way is the semantic information we are extracting.

The place articles need to do a few things

  • Extract SMW information
  • Provide a contact bulletin board for local resources of information like other genealogy sites. This function is now filled by the county navboxes.
  • Allow Contributors to provide Microhistorical accounts that relate particular experiences of their ancestors of these placenames during different time periods. Local photos, history of various enterprises, etc. Should be included as headline material along with general historical information
  • Historical and biographical information should be prominent in a place article.
  • Wikipedia content may have some value, but should figure less prominently.

Current thinking is that the infobox data from the wikipedia article would be split off into subpages that are transcluded into the main page. These are not intended to be navigated to separately as we do with /biography and /ahnentafel subpages. One subpage would be SMW specific. Another would have Wikipedia content that is locked because any edits to it will be lost when it is refreshed next. The main page has whatever the contributors want to put for that location. This scheme allows a maximum of contributor editing flexibility, while keeping our SMW and Wikipedia reused content fresh using automated tools.

Multilingual vs Multiple language[]

For visitors to take advantage of the multilingual (Mediawiki message) capability of tables, they have to be logged in, which is too high a barrier. One idea was to use subpages, but many place names are not similar to each other. How does a user find "Den Haag" as "The Hague/Den Haag". Seems like they should be top level strings. However, thats a lot of names to crowd into the same namespace, and their will be collisions so what we do is postpend the language code. That means we can tell Den Haag (nl) from Den Haag (nl). So the naming standard shall be:

  • Names are as determined by English wikipedia.
  • Familypedia has an article named identically, and it stores the universal properties for the place: the translations to all languages, the coord, containment hierarchy, etc. The other languages must not store this data redundantly. They will store information local to that language, but properties concerning information that is true across all languages (coordinates etc) are in the english version.
  • Non english versions must postpend their language code to the end of the articles. Placenames must be spelled exactly as they are in the wikipedia for that language.
  • This central (english) version can easily be found by searching for the current pagename in the translated articles field. The only article where the page will appear in the english article that has that page registered in its the language version property.
  • Templates have versions with the same convention of postpended language code eg {{nav place}} {{nav place (nl)}}. Text is free of the constraints of multilingual messages. Users just translate the strings as they please and reformat tables as necessary.
  • A small text language tab bar is presented along the top of an article if any alternate language articles are available.

Multiple wikis, single smw[]

It is technically possible to access a common smw store from another wikia wiki. This would allow us to do stuff like what Jewage genealogy does with a common database and separate en ru and he wikis. The extension is mw:Extension:External_Data External data. An this point we could export all language articles into a separate wiki, and use standard interwiki links, as on the various Help wikis eg DE: w:c:Hilfe:Beispielseite.

Ref tag problematic[]

<ref> is implemented as a kind of extension that is evaluated prior to wikitext being expanded. This is a bad thing for SMW forms because the parameters sources and notes evaluate as empty inside of a ref tag. If you use an SMW property, you can bypass this limitation because the query will return a good value before or after wikitext is expanded. This comes at a cost of having to double flush after a change. It doesn't end there though, because the way localization works is that I look up the base page and pass it in a parameter. Now I can't do that, so I have to query for the smwbasepage everytime I want to indirect to it. I suppose I could make it less costly by saving it as a property on each of the sibling pages, but this is just begging a database with very little integrity. The old fashioned {{ref}} and {{note}} templates could be substituted, but these require manual numbering. The templates would have to be smart enough to use correct numbers depending on which notes and sources were present.

I think we just punt and dump all automatic footnoting. Instead, we list the sources and notes at the end. This doesn't bar folks from doing manual footnoting. All I am saying is that it is problematic doing it as an automatic function.

Forms- adding copy from[]

Copy from option on forms

  • Wedding, should totally be copiable from mate
  • Residences of spouse, siblings
  • Sources on all above events copied, maybe offer permissive copy too (everything- easier to cull out values).

Places autofill- click an item to autofill parent locations. (note that this feature is not possible for example where a city is in two counties, and these conditions would have to be recorded in the place database.