Familypedia
Advertisement

PhloxBot has been performing maintenance actvities on genealogy. Some of these activities can be performed as a service to other wikia, and proposals for contributions to other wikia are considered seriously. These tasks include:

  • Any of the standard operations of pywikipedia bots. For a a technical list of these operations, see m:Using the python wikipediabot#scripts. The most popular operations include:
    • Search and replace all strings X with Y- performed on a list of articles. These patterns can be far more complex than those possible in word processors.
    • Category moves/ removal- Go to all articles in category X and change [[Category:X]] statement to [[Category Y]], or remove category.
    • Template replacement- Go to all articles using Template X and remove or change to use Template Y.
    • Mass deletes- Delete a list of articles- useful for vandalism, or speedy delete tagged articles. (This requires assignment of sysop status.)
  • Custom scripts for wikia:
    • Import a list of wikipedia articles to a wikia, adding a template crediting WP, and also importing all images the article uses.
    • Scan an article for any images that are not yet uploaded to a wikia. If an image of the same name exists on commons or wikipedia, the bot will copy the image and the contents of its description page, giving commons or WP credit for the text and refering the user to the page for license assigned to the image.
  • Custom scripts specific to Genealogy
    • Given an infobox based article, copy the corresponding fields to an info page, and create other info pages (Biography and main page) for the subject.
      • status- currently works for wikipedia:Template:Infobox Officeholder- tested on US presidents and a few vice presidents, but will probably work on all articles using the officeholders master template- including MPs, Senators, Judges, Governors, etc. For full list see the wp template- it lists all of them.


Any of the above operations can probably be performed for other wikia. Please contact User talk:Phlox directly for inquiries. For users of Genealogy- proceed directly to User:PhloxBot/ Proposals, make the proposal there. Probably a good idea to let him know you have made a proposal on his talk page as he does not monitor it closely.


Status of this document: revamping to work into a general set of suggestions for other bot contributors.~ Phlox 20:18, 1 November 2007 (UTC)


Bot methodology:[]

The goal is full transparency and deliberation, so that folks understand they and not the bots are in control. No one wants to contribute to a site where arbitrary actions are taken by faceless bots that they have no say over.

Procedure: If a change might be controversial, try using the Warning method[]

  • Do your first pass on the set of problem articles. Add a template with a warning message. The warning message instructs the user to remove the template if they disagree with the proposed action, offers multiple lines of communication:
    • Page that exhaustively lists all pages that will be affected so that folks can quickly identify other problems they disagree with.
    • Exception/Not applicable Talk page- directs user to a talk page to briefly state why the article should be exempted.
    • Page presenting the Guidelines or policy that the Bot is implementing. Anyone with global issues about the policy may debate/ present their POV on the talk page.
  • After a period of time agreed upon with admins has expired, on the second pass the bot performs the proposed transformation.

Procedure: Requested tasks[]

  • User may post a requested task on a central bot requests page. if the change appears legitimate, it is included in the next bot run. EG:
    • Change all pages linking to George X (1800-1900) to George X (1810-1900) Reason: birthdate wrong!

Procedure: Pulling the plug- (Server loading/ Bezerk bot)[]

  • Bots runs on a separate account. If there are loading issues/ bot gone bezerk, any admin can turn off the bot by simply banning the account.
  • Runner of bot works out with responsible admin/bureaucrat/wikia tech what the edits per minute rate/ best server low load times is.
  • Construction of new bots/ new bot operators must test scripts on small runs before making large unattended runs- especially those which automatically create large numbers of articles. Such damage can be reversed easily with bots run from an admin account with deletion power, but this is really annoying.

Bot policy Page[]

  • eg- some sites state that bots users must request permission before running a bot on the site.
  • some sites prefer that only approved bot tools be used (eg pywikipedia) because it has protections against bad behavior (eg default throttling of the number of edits per minute)
  • Change log flooding: Wikimedia engines have support for a Bot flag on an account. This is assigned to trusted bot accounts so that admins may filter change logs/ rss feeds to non bot users.
  • Large scale runs must be approved by someone from a designated list of users.

Tasks[]

Task: Living People[]

  • walk all articles. For articles on individuals,
    • If birth date of indivdual can be determined from article with near certainty
    • Do Warning procedure pass
    • Add Template:Living ( or category "possibly living" as appropriate) to all pages that were not exempted.

Category moves[]

  • Cat move all immigrants of to emigrants of
    • Switzerland, etc.
  • Others?

Maintenance runs[]

  • Walk Category:Hash mark redirected categories‎. Move any members of these cats to the new categories.
  • Walk tree of images. For any red flag issues (whatever the list is to be determined eg: all numeric name, image not used anywhere, no category assigned, no license template.) Place a friendly reminder on the contributor's page... Thank you for uploading... etc.

Ideas for Bots[]

For making proposals, please see PloxBot/ Proposals.The following is kept for historical reasons and should be moved to the proposals page for discussion.

Privacy/defamation Violations[]

  • Scan for phone numbers, SSN#s, Credit card numbers.
  • For pages with email addresses, we might suggest to the user that they convert their email address to a bitmap, so that it is more difficult to be harvested by spammers.
  • For apparently living individuals, flag any articles for admin review if a list of inflamatory words are found (eg racial epithets, generally slanderous terms, profanity). Maybe this sort of thing would be useful for all articles, with the sensitivity turned much higher for living individuals.

Less serious but not less useful person matters[]

  • Last name encodings in various cats. Problem, user wants to look up the surname in a category eg WWII veterans, but it is useless becaus you have to look under every single letter, eg J's for the Joe and John Smiths, G for the George Smiths, etc. For such cats, add assumed last name to the category string eg [[category:Deaths in 1865|Lincoln, Abraham]] will produce a much more easily browsable list.
That deaths cat is not a very good example, because our standard templates ask contributors to pipe them like that; but Wikipedia has recently been using a magic word for people who appear in more than one cat, which in effect pipes all the cats; and I think one of our contributors (AMK?) has been moving in that direction with more automation of our person templates. Robin Patterson 22:48, 25 October 2007 (UTC)
    • List of Cats that would benefit from this?
  • Time Project
    • extract all births (and christenings?) / deaths and add cats for them. Maybe for other events? Marriages?
  • Geography
    • Cats: Birth by State/province/canton- add cat using assumed lastname.
I had been thinking we could have birthplace cats as well as birth year cats. Robin Patterson 22:48, 25 October 2007 (UTC)
    • By request from sign up page,
  • WP link enhancement
    • add wp link to first mentions of famous people in all articles.

Uncatted Images[]

Death, Marriage, Birth
  • These usually can be automatically cat'd to Death certifates, Surname, assume PD
  • Maybe some military ones can be done as well, if helper strings are provided.

Spam[]

  • visit all pages, eliminate link to the site if it is on a banned list. Elimination may have some issues.
    • It is easy to break the link- just remove the http junk and leave the display text.
    • It's harder to eliminate more, since you don't know how much of the text accompanying the link is promotional/ should be deleted. Could remove the whole line if it is in a link section/ looks like a bulleted item. But problematic if it is embedded in the article text. Sorry, but bots really shouldn't muck with text unless it has high certainty of what it is doing.
    • Walk the tree of all edits for a given user and eliminate whatever links they added.
  • Detection: visit all pages, compile a list of domains with links. Post the list as a page giving each domain, the numbers of links- whether they are to an identical page, what the change is in the last week/ two weeks/ months.

Harvesting Images from Commons/WP[]

  • This is a son of a gun of a chore to do manually, and there is a PY script that probably can be modified to easily Wikia it the other direction.
  • A variant would be to Wikia a WP article (prepending wikipedia to the links, auto transfering any referenced images), converting any WP templates to known wikia equivalents, removing junk irrelevant to wikia (eg language interwikis).
Advertisement