Updating the O-and-A Database

On July 25, 2020, Master Herveus d’Ormonde led a few interested heralds through an online session in which we were able to observe core portions of his workflow as Morsulus Herald, watching as he applied the changes from a recent LoAR to the O&A master database and published the changes to the public O&A web site.

I am attaching my notes from this session below in hopes that they might be of interest to other members of the community, although this is admittedly a fairly-obscure topic with a limited audience.

Desktop Environment

To perform the below workflow, you’ll need a desktop computer with a copy of the Morsulus Tools codebase. The indexer application is written in Perl TK and is set up to run on a Mac with X11, but should be portable to other operating systems.

You’ll also need a copy of the master database in SQLite form, and various other data files such as mike.hdg, which are currently not available to the public.

Morsulus Codebase

The code on GitHub at github.com/herveus/Morsulus-tools is mostly current, but doesn’t reflect some small changes made in the last year and a half.

Some of this code still looks very much like it did when Iulstan wrote it in the ‘90s. Some of the Perl is fairly old-school; for example, lots of scripts set $[ to 1, which generates a “deprecated” warning and which can’t be used under the most recent versions of Perl.

Other bits are new or have been cleaned up and rewritten in a more modern style. The indexer code (Perl, TK, X11) was written by Herveus shortly after he took over in 2000. XML extraction was rewritten to use XML::Twig module instead of regexes.

Old version of the code maintained master data in flat files; Herveus modified it to keep master data in SQLite database and export data to equivalent flat files.

Worksheet File

The worksheet is a text file containing a series of commands to be run in sequence to process an LoAR update. If you’re using a code editor like BBEdit, commands can be run from within the editor.

An old version of this worksheet is in GitHub, but Herveus has made some changes to the process, so check with him for a current example.

For every month’s LoAR, start by copying last month’s worksheet to a new dated folder.

Then change the numeric references to the current month — eg replace 20-05 with 20-06, and 2005 with 2006 — and do the same thing with the prior month (for use by commands which compare against the previous month) — eg replace 20-04 with 20-05, and 2004 with 2005.

Receive the XML Files

Emails arrive from the post-meeting clerk for PP1, PP2, and final, each with a copy of the LoAR in multiple formats including XML files, RTF, and others.

Those files are generated from OSCAR data after some processing by the post-meeting clerk.

Current post-meeting clerk is Emma de Fetherstan, soon to become Laurel.
Position will pass to Reis ap Tuder (who held this job 2013-2015).

The XML files are packaged as a zip containing:

a master file with year-month in name, like master_20-05.xml;
separate XML files for each kingdom;
a numbered errata file, like errata_127.xml.

The master.xml file contains the cover letter material and references to include each of the kingdom files.

Generate the Actions File

Run the xml_to_actions script to parse the XML files and generate a pipe-delimited “actions” file.

Occasionally this finds XML syntax errors like unescaped ampersands which need to be corrected by hand, and reported back to post-meeting clerk.

If there are saved edits from a previous pass saved in the worksheet, apply them now. For each edit, find the corresponding place in the actions file and paste the two (or three) replacement lines over it. (It’s okay if the ufo123 line numbers change a little bit from one pass to another; these are just used for reference and aren’t meaningful when applying actions.)

Apply the Actions File

When you run the apply_actions script, it takes a while and then (hopefully) outputs “all done.”

However, if it runs into problems, you’ll need to find the problem lines, edit them, apply those edits to the action file, and then re-run the apply-actions file.

Edits, Temporary and Otherwise

The worksheet is used to hold data edits that are created to address problems in the actions file.

To create an edit, find the problematic line, copy it and paste it into the worksheet twice:

For the first copy of the line, keep it as it is but prefix the action field with “ignore” so that apply_actions will skip over it.
Then for the second copy of the line, make whatever changes are necessary for it to apply cleanly.

The most common fixes are:

Tweaking something small about how the action is written
Fixing the spelling or accents on a name, such as filling in a missing ogonek on an O.

Preserve those edits between proof passes by keeping them in the worksheet; some of them can be reported back to the post-meeting clerk and hopefully will be fixed in the next pass, but others will need to be reapplied in the next pass.

There are usually just a few of these edits required every month, but sometimes there are large errata letters which require lots of manual edits.

Some edits can not be fixed by post-meeting clerk and always need manual processing at Morsulus.

Examples of these kinds of edits:

Branch name designator changes, like changing a canton to a shire.
Action “armory disposition” becomes “corrected device” [Editor’s note: this bit might be garbled.]
Action “badge for two names” becomes “badge association with first name” plus a note for the second name.

Things that often crop up in PP1:

Branch name update — these require manual processing, go to non-temporary Edits section of worksheet.
Armory being added for a name that was registered in the previous month — for PP1, edit this to be “new name and armory” so that the action will run successfully, then by the time we’re running the final release we should have finished processing the previous months’s actions and will be applying these changes to a database which includes that name.

Reading / Modifying Apply.pm

Sometimes an uncommon action is received and we need to review how the Apply.pm code works and possibly make changes to it.

The heart of Apply.pm is a big Perl hash mapping “cooked” actions to several handler methods.

First step is to “cook” the action string by forcing to lower case and replacing all quoted bits with “x” (and stashing the quoted bits in variables).
Method names in upper case run tests and throw an exception if condition is not met; for example the “register device to an existing name” action will not run unless the name in question is already registered.
Method names in lower case actually apply changes to the SQLite database.
All upper-case methods are run first, then if no exceptions all lower-case methods are run, both in alphabetical order.

There’s some legacy code in here left over from when the system used a flat-file for master data instead of SQLite.

There are some action names in the big mapping hash which are prefaced with a dash; these might depend on code which hasn’t yet been rewritten since the system was modified to use the SQLite database.
Many of the action methods still have some code after the first return call which are the old code which would export a new line of data to be appended to the master flat-file.

Index Armory with Descriptions

Blazons are handled separately from the armory registrations themselves.

Because sometimes blazons are changed during proofing, there are blazons in the database with associated descriptions which have been orphaned and are no longer connected to any registrations. These could be expunged from the database, but they don’t do any harm so they’re just left there.

Run the command to extract new blazons.

Run the indexer; this fires up a Perl TK/X11 app.

The indexer UI is divided into panels:

Top panel shows information about the blazon — its text, ID number, and associated notes.
Left side has space for current blazon’s description, and below that is a category search tool.
Remainder of the interface has tabbed section for picking categories and applying features.

At startup, the indexer reads mike.cat, the internal version of my.cat, which has the internal list of all categories, which includes which feature sets each category supports.

Indexer also reads a mike.hgg file which stores which categories appear in each heading / tab of the indexer tool.

The most-common categories appear in the MFUC (most frequently-used categories) heading / tab.

These are maintained by a simple process of counting uses within the indexer and writing the counts out to a file on request, then reading it back in again when the indexer starts up.
Could refresh those counts based on raw data from the oanda db but there’s no need, and this process is close enough.

Herveus takes two passes through — goes through all blazons picking out categories, then goes back through them all adding features to each category — but this workflow isn’t mandated by the application.

Some categories are not frequently used when conflict-checking but are still applied consistently:

Arrangement categories such as in-fess or in-saltire; these are only used for primary charges.
Demi-beasts are indexed both under the main beast type and then also under “beast, demi”

A few features are special:

The “sustained” charge-group feature is used for both maintained and sustained charges that are counted for difference — use “maintained” only for secondaries that don’t count for difference, like crowns on a beast

There are a few things in my.cat that still need to be fixed:

The “bird postures” featureset doesn’t include combined both-ways postures, like “rising” as a more general case of “rising to dexter” and “rising to sinister”. As a result, if you have two birds respectant, you need to index that armory under both “<bird-type>:rising to dexter” and also “<bird-type>:rising to sinister”.

Replacing the indexing process with something that didn’t run locally on Herveus’s computer would be useful step forward and a way to take some work off of Herveus’ plate.

When you’ve finished indexing, you need to export the new descriptions and integrate them into the database. You can leave the indexer app running while you do this in case there are problems that you need to go back and make changes.

Run the xlate script and the old_check script to look for problems.

Then run the merge_descs script to fold the new descriptions into the SQLite database.

If that runs cleanly, you’re done and you can exit the indexer.

Updates to my.cat

Changes to my.cat happen fairly regularly.

The most common types are additions of “see also” lines, specifying the name of some new charge that was registered and indicating which category it is being indexed under.

There are several different category files and they must all be updated in parallel, so sometimes if there are just a couple of changes they are held for a few months and then applied as a batch.

Hopefully we will cover this process in detail in a future session.

Run Checks and Prepare Release

Run the new_run_checks script.

That script always produces some output, but not all of it represents problems that need to be solved.
Compare the output of that command with the archived output of the same command from last month to see if anything unexpected has cropped up.

Then check the release notes to see if any special comments are required.

Run “make export.db” which will:

strip the angle brackets off of branch names.
run tempmap to rewrite categories.
run bin/accent -t Latin1 -s, which converts the internal data, all stored in Da’ud-encoded format, to Latin1 encoding, while also storing a note with the Da’ud format in cases where the Latin1 is ambiguous.

The tempmap script has a relatively simple mapping that converts some categories as used internally to the versions included in public releases.

For example, internal BIRD-PELICAN is externally coded as BIRD:pelican.
This simplifies the process of conflict checking, as you don’t have to run separate searches for CASTLE and TOWER.
Some of the entries in this file are no longer used; for example the “SWORD AND DAGGER*1A” mapping to “SWORD:odd shape” is for a category which was in use years ago but has now been expunged from the database.

That make command also generates a “db diffs” file which is created for Alexander Ravencroft of Meridies to simplify loading data into his Lotus Notes-based external index and search tool at Stentorian.us.

Publish the Release

Copy the files to the web server:

scp export.db to oanda.sca.org
Also scp the release notes and welcome.html

Then ssh to oanda.sca.org.

Stop the database server by running “ps -fu herveus” then “kill <pid>” to stop the oanda_server.pl process.

Run the push_oanda script to move the database into place.

Then edit the forms page.

Then start up the database server process, which listens on port 1066.

You can look at the log file to confirm it started and see what requests are being received.

The many “n1” requests in this file are coming from some external system which mirrors data out of the O&A; it generates a steady load of requests, but not so many that it overwhelms anything.

Finally, publish a notice of the release by copy-n-pasting the release notes to the usual places:

Compose an email to SCA-Heralds mailing list and the Morsulus mailing list.
Then post to “Heraldry Unofficial Chat” and “Baby Heralds” Facebook groups.

Congratulations! When you reach this point, you’ve successfully published the results of an LoAR to the Ordinary and Armorial web site.