Tardis:What SpellBot actually corrects: Difference between revisions

From Tardis Wiki, the free Doctor Who reference
Line 3: Line 3:
Following is the raw code of the bot routine (known as a "user-fix"), so that all users may see what exactly the bot is checking for.
Following is the raw code of the bot routine (known as a "user-fix"), so that all users may see what exactly the bot is checking for.
==Problem words==  
==Problem words==  
===Words impossible for a bot===
===Words impossible for a bot=== sepulchre sepulcre
Some words are beyond the capability of the bot, because they are valid spellings (even if of different words) in British English.  This list includes:
Some words are beyond the capability of the bot, because they are valid spellings (even if of different words) in British English.  This list includes:
*'''Check'''.  Americans use this word to mean not only a verb meaning ''to investigate'', but also a the noun, which is a financial instrument.  Because the first meaning is valid in BrEng, the bot can't be program to correct the other usage.  We'd end up with sentences like:
*'''Check'''.  Americans use this word to mean not only a verb meaning ''to investigate'', but also a the noun, which is a financial instrument.  Because the first meaning is valid in BrEng, the bot can't be program to correct the other usage.  We'd end up with sentences like:
Line 13: Line 13:
*'''Jail'''.  Yes, ''gaol'' is still correct, especially historically, but ''jail'' has largely supplanted it.  So we won't look to correct ''jail'' to ''gaol'', but neither will we try to correct ''gaol'' to ''jail''.  Because of the known presence of both spellings in DWU fiction, this one can't be decided by forum debate.  We just have to live with ''gaol'' occasionally popping up.  
*'''Jail'''.  Yes, ''gaol'' is still correct, especially historically, but ''jail'' has largely supplanted it.  So we won't look to correct ''jail'' to ''gaol'', but neither will we try to correct ''gaol'' to ''jail''.  Because of the known presence of both spellings in DWU fiction, this one can't be decided by forum debate.  We just have to live with ''gaol'' occasionally popping up.  
*'''Licence'''.  ''License'' is correct in BrEng as a noun, so the bot can't correct for Americans using ''license'' as a verb (and other parts of speech deriving from the verb form).  Other ''-ence'' words don't necessarily work this way.  ''Offence'' and ''defence'' are unambiguously correct — but then again their verb form is different — ''offend'' and ''defend'' — which means their gerunds and verbal nouns are different, too.
*'''Licence'''.  ''License'' is correct in BrEng as a noun, so the bot can't correct for Americans using ''license'' as a verb (and other parts of speech deriving from the verb form).  Other ''-ence'' words don't necessarily work this way.  ''Offence'' and ''defence'' are unambiguously correct — but then again their verb form is different — ''offend'' and ''defend'' — which means their gerunds and verbal nouns are different, too.
*'''Storey''' is the proper British spelling for a floor in a building.  Americans just spell this ''story'', as in ''a 15-story building''.  Obviously, the bot can't make this correction, because the word for a ''tale'' is spelled ''story'' on both sides of the Atlantic.
===Words requiring forum decision===
===Words requiring forum decision===
Other words are ''possible'' for a bot to correct, but the presence of two valid British spellings means that we will require a forum discussion.  Precedent for such forum debates over particular words can be found in the following threads:
Other words are ''possible'' for a bot to correct, but the presence of two valid British spellings means that we will require a forum discussion.  Precedent for such forum debates over particular words can be found in the following threads:

Revision as of 15:38, 9 June 2011

Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have bot enforcement of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled, and is being coded for bot use as of the second week of June, 2011. This page will see heavy updating throughout that week as the list is fully coded.

Following is the raw code of the bot routine (known as a "user-fix"), so that all users may see what exactly the bot is checking for.

Problem words

===Words impossible for a bot=== sepulchre sepulcre Some words are beyond the capability of the bot, because they are valid spellings (even if of different words) in British English. This list includes:

  • Check. Americans use this word to mean not only a verb meaning to investigate, but also a the noun, which is a financial instrument. Because the first meaning is valid in BrEng, the bot can't be program to correct the other usage. We'd end up with sentences like:
The Doctor chequed on Sarah Jane in her hospital room before going to the pathology lab.
  • Tire. Both sides of the Atlantic use tire as a verb. It's again the noun that's problematic. Americans view tire as the correct spelling for what the British would call a tyre. The bot can't figure this one out, so it doesn't even try.
  • Draft. Americans use this spelling for all senses, the British use both for different senses. All words beginning with drafts will be converted to draughts-, and the word drafty will be converted to draughty, but the word draft itself won't be touched by the bot, as that is a valid British spelling of the word. Clear as mud? Cool. Onwards, then . . .
  • Disc. Way, way, way too screwed up a word for a simple bot to handle. Disc jockey is fine on both sides of the divide, but so is floppy disk and hard disk. This one simply depends on context.
  • Practise. The -ise version of this word is the correct spelling for the verb in BrEng; the British noun ends in -ice. Americans use -ice for everything. Thus, the bot can't be of much use, except for participles derived from the verb. So, the bot will make no attempt to change the spelling of practice, but it will change practicing and practiced to practising and practised.
  • Jail. Yes, gaol is still correct, especially historically, but jail has largely supplanted it. So we won't look to correct jail to gaol, but neither will we try to correct gaol to jail. Because of the known presence of both spellings in DWU fiction, this one can't be decided by forum debate. We just have to live with gaol occasionally popping up.
  • Licence. License is correct in BrEng as a noun, so the bot can't correct for Americans using license as a verb (and other parts of speech deriving from the verb form). Other -ence words don't necessarily work this way. Offence and defence are unambiguously correct — but then again their verb form is different — offend and defend — which means their gerunds and verbal nouns are different, too.
  • Storey is the proper British spelling for a floor in a building. Americans just spell this story, as in a 15-story building. Obviously, the bot can't make this correction, because the word for a tale is spelled story on both sides of the Atlantic.

Words requiring forum decision

Other words are possible for a bot to correct, but the presence of two valid British spellings means that we will require a forum discussion. Precedent for such forum debates over particular words can be found in the following threads:

  • Judgment. There's no agreement on either side of the Atlantic whether this word should be judgment or judgement. Oddly, most British spell-checkers will red-flag judgment, even though that's the official spelling in Commonwealth courts. We have at least one story title preferencing the version with two es — Judgement of the Judoon. But still, this word will require a special forum discussion to decide which way we want to spell it.
  • Connexion. This British spelling of connection is not universally used in Britain. Connection is correct in Britain, too, so the bot won't try to force connection into a connexion-shaped hole.
  • Simidgen, smidgeon, smidgin. They're all valid spellings for the same word, on both sides of the Atlantic. The only way the bot could be useful is if we had a forum discussion to settle on one of the three spellings.
  • Yogurt passes most British spell-checkers today, but so does yoghurt. We'll leave both well alone until a forum discussion decides the matter.
  • Almanac is universally the way it's spelt in American English, and increasingly the way Britons spell it, too. Still, some old-timers will go for almanack. Until a forum discussion decides otherwise, the bot won't enforce either spelling.
  • Gasses/gases. Both spellings are correct on both sides of the Atlantic. The bot won't correct for either until a forum discussion settles on a particular spelling.
  • Programme/program. Both spellings pass British spell-checkers, even though there's a perception that "programme" is Britsh and "program" is American. Until there's a community decision on spelling, the bot won't touch either spelling.
  • Griffin/Gryphon. Both pass British spell-checkers, so it'll take a forum discussion to decide which way we want to go.
  • Inflexion is the way the British have historically spelt inflection, but modern British spell-checkers pass both spellings. So, the bot won't enforce either without a forum decision to the contrary.
  • Instal/install. Modern British spell-checkers are cool with both, so the bot is, too. But the proper British spelling is instalment, not installment.
  • Mediaeval. Ironically this spelling is now considered archaic and is most often seen in British academic writing. Medieval, the American spelling, passes British spell checks, too. So, in the absence of a forum decision to the contrary, the bot won't try to correct either spelling.
  • Praesidium/presidium/presidiums/presidia is the archaic British spelling of presidium. It no longer passes default settings on British spell-checkers, so the bot will correct to presidium, which is also the American spelling. Note that the plural of the word is more confused. Both presidiums and presidia pass British spell-checkers. So the case here is complicated. The singular form of the verb will be corrected to presidium. The plural will be corrected from praesidiums to presidiums. But presidia will go uncorrected, unless a forum discussion decides on one or the other plural form.
  • Pizzaz/pizzazz. Both spellings pass modern British spell checkers, although historically the three-z version was British and the four-z version was American. A forum discussion will be required before the bot corrects for either.

Valid British spellings actively corrected

Thanks to the ubiquity of American spellings in pop culture, there are a few cases where valid if archaic British spellings can be corrected to standard American spellings without the need for a forum decision. In such cases, modern British usage hews closely to the American, and clearly argues against more archaic forms.

  • Primaeval. Due to the presence of the modern television ITV/BBCA television series with which many Doctor Who fans will be familiar, as well as BFA: Primeval, the unambiguously American spelling "wins" the contest. Primaeval will be actively corrected to primeval. This shouldn't ruffle too many feathers, since modern British spell-checkers fail primaeval.

Confusing words

Some words are homonyms, and look like they're archaic forms of other words, but in fact are totally different words. The bot will therefore correct in an unexpected way, which is why those ways are explained here.

  • Philtre is not the British spelling of filter, but of the American philter. It's a noun meaning "love potion", not a verb meaning "to remove impurities".
  • Pouffe is not an archaic British spelling of poof, but a current spelling for what Americans would call a pouf — that is, a nice, thick cushion you can sit on. Thus, the bot will correct pouf to pouffe. Not that we actually expect pouf to ever be used in a sentence on this wiki — except on this very page, which the bot doesn't patrol.
  • Groyne is a British spelling of groin only when it's not. It doesn't refer to that region of the human anatomy between the legs. That's groin on both sides of the Atlantic. Groyne is, instead, a civil engineering term, referring to a construction that controls erosion. Technically, it's known as a groin in the US, but almost no one calls it that. It's more commonly known as a breakwater, bulwark or seawall. Frankly, groynes are so often called by these more specific names, even in Britain, that groyne doesn't pass modern British spell-checkers. It's probably never been used at any time in the DWU, but still it's correct to say something like: "All seawalls are groynes, but not all groynes are seawalls." The bot won't correct away from it, but because groin is BrEng correct in its anatomical sense, it won't correct groin to groyne, either.

How to read the code

The code works by telling the bot to look for the word described before the comma. Then it replaces it with the word after the comma. A most basic expression would be:

{u'color',u'colour')

This looks for the American "color", then replaces it with the British "colour".

Because typing every permutation of a word, including all words that share the same root and capitalised variants, would be very time-consuming, most of the code won't work in such a simplistic way. Most of it uses a "regular expression" — or regex — to find a lot of hits with just one line. Here's an explanation of the regex used in this code:

  • The expression ([Cc]) means "look for either capitalised or lowercase versions of the letter C
  • (.?) means, "You, Mr. Fancy Computer bot thing, might find some more letters to the right of this point. Grab 'em all up to the next space only."
  • /1 means, "take whatever is in the first parentheses and put it here"
  • /2 means, "take whatever is in the second parentheses and put it here"
  • /3 means, "take whatever is in the second parentheses and put it here"

Thus, if we have the expression,

(r'([Cc])apitaliz(.?)', r'\1capitalis\2')

It means, roughly,

Look for all words, beginning with either a capital or lowercase C, which are followed by the letters "apitaliz" + any other letters you find until the next space. Then, keep the form of the letter c that you find, stick on "apitalis", and add back in any letters you orginally found after the "z".

In other words, find, Capitaliz-, keep the C capitalised, switch the z to an s, then stick on "-e', "-ing", "-ed", or "-ation", as appropriate.

Correcting all related words at once

Now let's take a look at arguably the most complicated coding here. What if I wanted to change every word that had favor as a root? How could I take care of words that had both a prefix and a suffix, like disfavorable? Putting together everything we've learned so far, it would be:

(r'(.?)([Ff])avor(.?)',r'\1\2avour\3')

The leading (.?) means check to see if there's a prefix. The ([Ff]) switch checks for capitalisation of the root letter f. The (.?) at the end checks for suffixes. Now we have three parentheses instead of just two. So \1 means the prefix, \2 puts the letter f in with proper capitalisation, and \3 adds any suffixes.

This one statement will therefore switch over: favor, favors, favored, disfavor, disfavored, unfavorable, favoring, disfavoring, favorable, and almost certainly a few more.

When correcting to British leaves an American spelling around

A few words — mostly those which have -log in them — retain the Amercian spelling even after changing to the British. For instance:

AmEng: dialogBrEng: dialogue, but dialogue still contains dialog

This means that the next time the bot is run, it will find dialog again and attempt to replace it. After several passes, you'll end up with something like dialogueueueueue, which is obviously not desirable. Thus, we must find a way to limit the search to only the case of dialog+space, dialog+puncutation mark. Here's how we do it:

r'dialog(\.|\;|\:| |\!|\,|\?+)'

The pipes (|) act as a switch. They say, look for this character | that character | or the other character. The plus sign at the very end says "at least one time". And the back slashes (\) escape the punctuation marks from their usual special meanings.

Altogether then, what this statement says is, "Look for the word dialog followed by either a period, a semi-colon, a colon, an exclamation mark, a space or a comma that's present at least once." It will therefore find only:

  • the speed of his dialog was rapid
  • how could he have forgotten his dialog?
  • dialog: the bane of the actor
  • he had far too many lines of dialog!
  • They had a fruitful dialog; however, the humans would soon kill the Silurians.

Cases where regex fails

Not every word on our list has been switched using regex expressions. Sometimes it's easier just to type up a switch of literal characters, as when a word serves as the root of no other words.

The code

The following code will change over time, as more words are added. The final word in the English language that has a British/American difference is yodelling. Once you see that word on this list, you'll know the bot is fully programmed.

fixes['spelling'] = {
    'regex': True,
    'recursive': True,
    'msg': {
        'en':u'Enforcing [[tardis:spelling policy|spelling policy]].'
        },
    'replacements': [
        (r'([Aa])ccessoriz(.?)', r'\1ccessoris\2'),
        (r'([Aa])cclimitiz(.?)',r'\1cclimatis\2'),
        (r'([Aa])ccouterments',r'\1ccoutrements'),
        (r'( +)eon( +)',r'\1aeon\2'),
        (r'( +)eons( +)',r'\1aeons\2'),
        (r'([Aa])erogram( +)',r'\1erogramme\2'),
        (r'([Aa])erograms',r'\1erogrammes'),
        (r'( +)esthete(.?)( +)',r'\1aesthete\2\3'),
        (r'( +)esthetic(.?)( +)',r'\1aesthetic\2\3'),
        (u'( +)etiology',u'\1aetiology'),
        (r'(.?)aging',r'\1ageing'),
        (r'([Aa])ggrandizement',r'\1ggrandisement'),
        (r'([Aa])goniz(.?)', r'\1gonis\2'),
        (r'([Aa])luminum', r'\1luminium'),
        (r'([Aa])mortize( +)',r'\1mortise\2'),
        (r'([Aa])mortiz(.?)',r'\1mortis\2'),
        (r'([Aa])ampitheater( +)',r'\1mphitheatre\2'),
        (r'([Aa])mpitheaters',r'\1mphitheatres'),
        (r'([Aa])mphitheater( +)',r'\1mphitheatre\2'),
        (r'([Aa])mphitheaters',r'\1mphiteatres'),
        (r'([Aa])nemi(.?)',r'\1naemi\2'),
        (r'([Aa])nesthesia',r'\1naesthesia'),
        (r'([Aa])nestheti(.?)',r'\1naestheti\2'),
        (r'([Aa])nalog( +)',r'\1nalogue\2'),
        (r'([Aa])nalogs',r'\1nalogues'),
        (r'([Aa])nalyze( +)',r'\1nalyse\2'),
        (r'([Aa])nalyz(.?)',r'\1nalys\2'),
        (r'([Aa])ngliciz(.?)',r'\1nglicis\2'),
        (r'([Aa])nnualized',r'\1nnualised'),
        (r'([Aa])ntagoniz(.?)',r'\1ntagonis\2'),
        (r'([Aa])pologiz(.?)',r'\1pologis\2'),
        (u'appall',u'appal'),
        (u'appalls',u'appals'),
        (r'([Aa])ppetiz(.?)',r'\1ppetis\2'),
        (r'([Aa])rbor(.?)',r'\1rbour\2'),
        (r'([Aa])rcheolog(.?)',r'\1rchaeolog\2'),
        (u'ardor',u'ardour'),
        (r'([Aa])rmor(.?)',r'\1rmour\2'),
        (r'([Aa])rtifact(.?)',r'\1rtefact\2'),
        (r'([Aa])uthoriz(.?)',r'\1uthoris\2'),
        (r'([Aa])x( +)',r'\1xe\2'),
        (r'([Bb])ackpedaled', r'\1ackpedalled'),
        (r'([Bb])ackpedaling', r'\1ackpedalling'),
        (r'([Bb])anister(.?)', r'\1annister\2'),
        (r'([Bb])aptiz(.?)',r'\1aptis\2'),
        (r'([Bb])astardiz(.?)',r'\1astardis\2'),
        (r'([[Bb]])attleax( +)',r'\1attleaxe\2'),
        (r'([[Bb]])alk(.?)',r'\1aulk\2'),
        (r'([[Bb]])edeviled',r'\1edevilled'),
        (r'([[Bb]])edevling',r'\1edevilling'),
        (r'(.?)([[Bb]])ehavior(.?)',r'\1\2ehaviour\3'),
        (r'([[Bb]])ehoove(.?)',r'\1ehove\2'),
        (r'([[Bb]])ejeweled',r'\1ejewelled'),
        (r'(.?)([Ll])abor(.?)',r'\1\2abour\3'),
        (r'([Bb])eveled',r'\1evelled'),
        (r'([Bb])evies',r'\1evvies'),
        (r'([Bb])evy',r'\1evvy'),
        (r'([Bb])iased',r'\1iassed'),
        (r'([Bb])iasing',r'\1iassing'),
        (r'([Bb])inging',r'\1ingeing'),
        (r'([Bb])ougainvillea(.?)',r'\1ougainvillaea\2'),
        (r'([Bb])owdleriz(.?)',r'\1owdleris\2'),
        (r'([Bb])reathalyz(.?)',r'\1reathalys\2'),
        (r'([Bb])rutaliz(.?)',r'\1rutalis\2'),
        (r'(.?)([Bb])usses',r'\1\2uses'),
        (r'([Bb])ussing',r'\1using'),
        (r'([Cc])esarean(.?)',r'\1aesarean\2'),
        (r'([Cc])aliber(.?)',r'\1alibre\2'),
        (r'([Cc])aliper(.?)',r'\1calliper\2'),
        (r'([Cc])alisthenics',r'\1allisthenics'),
        (r'([Cc])analiz(.?)',r'\1analis\2'),
        (r'([Cc])ancelation',r'\1ancellation'),
        (r'([Cc])ancelations',r'\1ancellations'),
        (r'([Cc])anceled',r'\1ancelled'),
        (r'([Cc])anceling',r'\1ancelling'),
        (r'([Cc])andor',r'\1andour'),
        (r'([Cc])annibaliz(.?)',r'\1annibalis\2'),
        (r'([Cc])anibaliz(.?)',r'\1annibalisi\2'),
        (r'([Cc])anibalis(.?)',r'\1annibalis\2'),
        (r'([Cc])anoniz(.?)',r'\1anonis\2'),
        (r'([Cc])apitaliz(.?)',r'\1apitalis\2'),
        (r'([Cc])arameliz(.?)',r'\1aramelis\2'),
        (r'([Cc])arboniz(.?)',r'\1arbonis\2'),
        (r'([Cc])aroled',r'\1arolled'),
        (r'([Cc])aroling',r'\1arolling'),
        (r'([Cc])atalog( +)',r'\1atalogue\1'),
        (r'([Cc])atalogs( +)',r'\1atalogues\2'),
        (r'([Cc])ataloged',r'\1atalogued'),
        (r'([Cc])ataloging',r'\1ataloguing'),
        (r'([Cc])atalyz(.?)',r'\1atalys\2'),
        (r'([Cc])ategoriz(.?)',r'\1ategoris\2'),
        (r'([Cc])auteriz(.?)',r'\1auteris\2'),
        (r'([Cc])avilled',r'\1avilled'),
        (r'([Cc])aviling',r'\1avilling'),
        (r'(.?)([Gg])ram( +)',r'\1gramme\2'),
        (r'(.?)([Gg])rams',r'\1grammes'),
        (r'(.?)([Ll])iter(.?)',r'\1\2itre\3'),
        (r'(.?)([Mm])eter(.?)',r'\1\2etre\3'),
        (r'([Cc])entraliz(.?)',r'\1entralis\2'),
        (r'(.?)([Cc])enter(.?)',r'\1\2entre\3'),
        (r'([Cc])hanneled',r'\1hannelled'),
        (r'([Cc])hanneling',r'\1hannelling'),
        (r'([Cc])haracteriz(.?)',r'\1haracteris\2'),
        (r'([Cc])heckbook(.?)',r'\1hequebook\2'),
        (r'([Cc])hili',r'\1hilli'),
        (r'([Cc])himera(.?)',r'\1himaera\2'),
        (r'([Cc])hiseled',r'\1hiselled'),
        (r'([Cc])hiseling',r'\1hiselling'),
        (r'([Cc])irculariz(.?)',r'\1ircularis\2'),
        (r'([Cc])iviliz(.?)',r'\1ivilis\2'),
        (r'([Cc])lamor(.?)',r'\1lamour\2'),
        (r'([Cc])langor',r'\1langour'),
        (r'([Cc])larinetist',r'\1larinettist'),
        (r'([Cc])ollectiviz(.?)',r'\1ollectivis\2'),
        (r'([Cc])oloniz(.?)',r'\1olonis\2'),
        (r'(.?)([Cc])olor(.?)',r'\1\2olour\3'),
        (r'([Cc])ommercializ(.?)',r'\1ommercialis\2'),
        (r'([Cc])ompartmentaliz(.?)',r'\1ompartmentalis\2'),
        (r'([Cc])omputeriz(.?)',r'\1omputeris\2'),
        (r'([Cc])onceptualiz(.?)',r'\1onceptualis\2'),
        (r'([Cc])ontextualize(.?)',r'\1ontextualis\2'),
        (r'([Cc])oz(.?)',r'\1os\2'),
        (r'([Cc])ouncilor(.?)',r'\1ouncillor\2'),
        (r'([Cc])ounselor(.?)',r'\1ounsellor\2'),
        (r'([Cc])ounseling',r'\1ounselling'),
        (r'([Cc])ounseled',r'\1ounselled'),
        (r'([Cc])renelated',r'\1renellated'),
        (r'([Cc])riminaliz(.?)',r'\1riminialis\2'),
        (r'([Cc])riticiz(.?)',r'\1riticis\2'),
        (r'([Cc])rueler',r'\1rueller'),
        (r'([Cc])ruelest',r'\1ruellest'),
        (r'([Cc])rystalliz',r'\1rystallis\2'),
        (r'([Cc])udgeled', r'\1udgelled'),
        (r'([Cc])udgeling',r'\1udgelling'),
        (r'([Cc])ustomiz(.?)',r'\1ustomis\2'),
        (r'([Cc])ipher(.?)',r'\1ypher\2'),
        (r'([Dd])ecentraliz(.?)',r'\1ecentralis\2'),
        (r'([Dd])ecriminaliz(.?)',r'\1ecriminalis\2'),
        (r'([Dd])efense(.?)',r'\1efence\2'),
        (r'(.?)([H,h])umaniz(.?)',r'\1\2umanis\3'),
        (r'(.?)([Dd])emeanor',r'\1\2emeanour'),
        (r'(.?)([Mm])ilitariz(.?)',r'\1\2ilitaris\3'),
        (r'(.?)([Mm])obiliz(.?)',r'\1\2obilis\3'),
        (r'([Dd])emocratiz(.?)',r'\1emocratis(.?)'),
        (r'([Dd])emoniz(.?)',r'\1emonis(.?)'),
        (r'(.?)([Mm])oraliz(.?)',r'\1\2oralis\3'),
        (r'(.?)([Nn])ationaliz(.?)',r'\1\2ationalis\3'),
        (r'([Dd])eodoriz(.?)',r'\1eodoris\2'),
        (r'(.?)([Pp])ersonaliz(.?)',r'\1\2ersonalis\3'),
        (r'([Dd])eputiz(.?)',r'\1eputis\2'),
        (r'(.?)([Ss])ensitiz(.?)',r'\1\2ensitis\3'),
        (r'(.?)([Ss])tabliz(.?)',r'\1\2tablis\3'),
        (r'([Dd])ialed',r'\1ialled'),
        (r'([Dd])ialing',r'\1ialling'),
        (r'([Dd])ialog( +)',r'\1ialogue\2'),
        (r'([Dd])ialogs( +)',r'\1ialogues\2'),
        (r'([Dd])iarrhea',r'\1iarrhoea'),
        (r'([Dd])igitiz(.?)',r'\1igitis\2'),
        (r'([Dd])isemboweled',r'\1isembowelled'),
        (r'([Dd])isemboweling',r'\1isembowelling'),
        (r'(.?)([Ff])avor(.?)',r'\1\2avour\3'),
        (r'([D,d])isheveled',r'\1ishevelled'),
        (r'(.?)([Hh])onor(.?)',r'\1\2\onour\3'),
        (r'(.?)([Oo])rganization(.?)',r'\1\2rganisation\3'),
        (r'([Dd])istil( +)',r'\1istill\2'),
        (r'([Dd])istils',r'\1istills'),
        (r'([Dd])ramatiz(.?)',r'\1ramatis\2'),
        (r'([Dd])rafts(.?)',r'\1raughts\2'),
        (r'([Dd])rafty',r'\1raughty'),
        (r'([Dd])rafti(.?)',r'\1raughti\2'),
        (r'([Dd])riveled',r'\1rivelled'),
        (r'([Dd])riveling',r'\1rivelling'),
        (r'([Dd])ueled',r'\1uelled'),
        (r'([Dd])ueling',r'\1uelling'),
        (r'([Ee])conomiz(.?)',r'\1conomis'),
        (r'([Ee])dema',r'\1doema'),
        (r'([Ee])ditorializ(.?)',r'\1ditorialis\2'),
        (r'([Ee])mpathiz(.?)',r'\1mpathis\2'),
        (r'([Ee])nameled',r'\1namelled'),
        (r'([Ee])nameling',r'\1namelling'),
        (r'([Ee])namor(.?)',r'\1namour\2'),
        (r'([Ee])ncyclopedi(.?)',r'\1ncyclopaedi\2'),
        (r'([Ee])ndeavor(.?)',r'\1ndeavour\2'),
        (r'(.?)([Ee])nergiz(.?)',r'\1\2nergis\3'),
        (r'([Ee])nroll(.?)',r'\1nrol\2'),
        (r'([Ee])nthrall(.?)',r'\1nthral\2'),
        (r'([Ee])paulet( +)',r'\1paulette\2'),
        (r'([Ee])paulets',r'\1paulettes'),
        (r'([Ee])pilog( +)',r'\1pilogue'),
        (r'([Ee])pilogs',r'\1pilogues'),
        (r'([Ee])pitomiz(.?)',r'\1pitomis\2'),
        (r'([Ee])qualiz(.?)',r'\1qualis\2'),
        (r'([Ee])ulogiz(.?)',r'\1ulogis\2'),
        (r'([Ee])vangeliz(.?)',r'\1vangelis\2'),
        (r'([Ee])xorciz(.?)',r'\1xorcis\2'),
        (r'([Ee])xtemporiz(.?)',r'\1xtemporis\2'),
        (r'([Ee])xternaliz(.?)',r'\1xternalis\2'),
        (r'([Ff])actoriz(.?)',r'\1actoris\2'),
        (r'([Ff])eces',r'\1aeces'),
        (r'([Ff])ecal',r'\1aecal'),
        (r'([Ff])amiliariz(.?)',r'\1amiliaris\2'),
        (r'([Ff])antasiz(.?)',r'\1antasis\2'),
        (r'([Ff])eminiz(.?)',r'\1eminis\2'),
        (r'([Ff])ertiliz(.?)',r'\1ertilis\2'),
        (r'([Ff])ervor',r'\1ervor'),
        (r'([Ff])iber(.?)',r'\1ibre\2'),
        (r'([Ff])ictionaliz(.?)',r'\1ictionalis\2'),
        (r'([Ff])ilet(.?)',r'\1illet\2'),
        (r'([Ff])inaliz(.?)',r'\1inalis\2'),
        (r'(.?)([Ff])lavor(.?)',r'\1\2\lavour\3'),
        (r'([Ff])etal',r'\1oetal'),
        (r'([Ff])etus(.?)',r'\1oetus\2'),
        (r'([Ff])etid',r'\1oetid'),
        (r'([Ff])ormaliz(.?)',r'\1ormalis\2'),
        (r'([Ff])ossiliz(.?)',r'\1ossilis\2'),
        (r'([Ff])raterniz(.?)',r'\1raternis\2'),
        (r'([Ff])ulfill( +)',r'\1ulfil\2'),
        (r'([Ff])ulfillment',r'\1ulfilment'),
        (r'([Ff])unnelled',r'\1unneled'),
        (r'([Ff])unnelling',r'\1unneling'),
        (r'([Gg])alvaniz(.?)',r'\1alvanis\2'),
        (r'([Gg])amboled',r'\1ambolled'),
        (r'([Gg])amboling',r'\1amboling'),
        (r'([Gg])eneraliz(.?)',r'\1eneralis\2'),
        (r'([Gg])hettoiz(.?)',r'\1hettois\2'),
        (r'([Gg])lamoriz(.?)',r'\1lamoris\2'),
        (r'([Gg])lamor( +)',r'\1lamour\2'),
        (r'([Gg])lobaliz(.?)',r'\1lobalis\2'),
        (r'([Gg])luing',r'\1ueing'),
        (r'([Gg])oiter(.?)',r'\1oitre\2'),
        (r'([Gg])onorrhea',r'\1onorrhoea'),
        (r'([Gg])raveled',r'\1ravelled'),
        (r'([Gg])ray( +)',r'\1rey\2'),
        (r'([Gg])ray(.?)',r'\1rey\2'),
        (r'([Gg])roveled',r'\1rovelled'),
        (r'([Gg])roveling',r'\1rovelling'),
        (r'([Gg])royne(.?)',r'\1roin\2'),
        (r'([Gg])rueling(.?)',r'\1ruelling\2'),
        (r'([Gg])ynacol(.?)',r'\1ynaecol\2'),
        (r'([Hh])ematolog(.?)',r'\1aematolog\2'),
        (r'([Hh])emo(.?)',r'\1aemo\2'),
        (r'([Hh])arbor(.?)',r'\1arbour\2'),
        (r'([Hh])armoniz(.?)',r'\1armonis\2'),
        (r'([Hh])omeopath(.?)',r'\1omoeopath\2'),
        (r'([Hh])omogeniz(.?)',r'\1omogenis\2'),
        (r'([Hh])ospitaliz(.?)',r'\1ospitalis\2'),
        (r'([Hh])umor(.?)',r'\1umour\2'),
        (r'([Hh])ybridiz(.?)',r'\1ybridis\2'),
        (r'([Hh])ypnotiz(.?)',r'\1ypnotis\2'),
        (r'([Hh])ypothesiz(.?)',r'\1ypothesis\2'),
        (r'([Ii])dealiz(.?)',r'\1dealis\2'),
        (r'([Ii])doliz(.?)',r'\1dolis\2'),
        (r'(.?)([Mm])obiliz(.?)',r'\1\2obilis\3'),
        (r'([Ii])mmortaliz(.?)',r'\1mmortalis\2'),
        (r'([Ii])mmuniz(.?)',r'\1mmunis\2'),
        (r'([Ii])mpaneled',r'\1mpanelled'),
        (r'([Ii])mpaneling',r'\1mpanelling'),
        (r'([Ii])mperiled',r'\1mperilled'),
        (r'([Ii])mperiling',r'\1mperilling'),
        (r'([Ii])ndividualiz(.?)',r'\1ndividualis\2'),
        (r'([Ii])ndustrializ(.?)',r'\1ndustrialis\2'),
        (r'([Ii])nstill(.?)',r'\1nstil\2'),
        (r'([Ii])nitialed',r'\1nitialled'),
        (r'([Ii])nitialing',r'\1nitialling'),
        (r'([Ii])nstallment(.?)',r'\1nstalment\2'),
        (r'([Ii])nstitutionaliz(.?)',r'\1nstitutionalis\2'),
        (r'([Ii])ntellectualiz(.?)',r'\1ntellectualis\2'),
        (r'(.?)([Nn])ationaliz(.?)',r'\1ationalis\2'),
        (r'([Ii])nternaliz(.?)',r'\1nternalis\2'),
        (r'([Ii])oniz(.?)',r'\1onis\2'),
        (r'([Ii])taliciz(.?)',r'\1talicis\2'),
        (r'([Ii])temiz(.?)',r'\1temis\2'),
        (r'([Jj])eopardiz(.?)',r'\1eopardis\2'),
        (r'([Jj])eweler(.?)',r'\1eweller\2'),
        (r'([Ll])abeled',r'\1abelled'),
        (r'([Ll])abeling',r'\1abelling'),
        (r'([Ll])ackluster',r'\1acklustre'),
        (r'(.?)([Ll])egaliz(.?)',r'\1\2egalis\3'),
        (r'(.?)([Ll])egitimiz(.?)',r'\1\2egitimis\3'),
        (r'([Ll])ukemia',r'\1eukaemia'),
        (r'(.?)([Ll])evele(.?)',r'\1\2evelle\3'),
        (r'(.?)([Ll])eveling',r'\1\2evelling'),
        (r'([Ll])ibeled',r'\1ibelled'),
        (r'([Ll])ibelous',r'\1ibellous'),
        (r'([Ll])ibeling',r'\1ibelling'),
        (r'([Ll])iberaliz(.?)',r'\1iberalis\2'),
        (r'([Ll])ioniz(.?)',r'\1ionis\2'),
        (r'([Ll])iquidiz(.?)',r'\1iquidis\2'),
        (r'([Ll])ocaliz(.?)',r'\1ocalis\2'),
        (r'([Ll])ouver(.?)',r'\1ouvre\2'),
        (r'([Ll])uster',r'\1ustre'),
        (r'(.?)([Mm])agnetiz(.?)',r'\1\2agnetis\3'),
        (r'([Mm])aneuver(.?)',r'\1anoeuvr\2'),
        (r'([Mm])arginiliz(.?)',r'\1arginilis\2'),
        (r'([Mm])arshaled',r'\1arshalled'),
        (r'([Mm])arshaling',r'\arshalling'),
        (r'([Mm])arveled',r'\1arvelled'),
        (r'([Mm])arveling',r'\1arvelling'),
        (r'([Mm])arvelo(.?)',r'\1arvello\2'),
        (r'(.?)([Mm])aterializ(.?)',r'\1\2aterialis\3'),
        (r'([Mm])aximiz(.?)',r'\1aximis\2'),
        (r'([Mm])eager',r'\1eager'),
        (r'([Mm])echaniz(.?)',r'\1echanis\2'),
        (r'([Mm])emorializ(.?)',r'\1emorialis\2'),
        (r'([Mm])emoriz(.?)',r'\1emoris\2'),
        (r'([Mm])esmeriz(.?)',r'\1esmoris\2'),
        (r'([Mm])etaboliz(.?)',r'\1etabolis\2'),
        (r'([Mm])iniaturiz(.?)',r'\1iniaturis\2'),
        (r'([Mm])inimiz(.?)',r'\1inimis\2'),
        (r'([Mm])iter(.?)',r'\1itre\2'),
        (r'([Mm])odele(.?)',r'\1odelle\2'),
        (r'([Mm])odeling',r'\1odelling'),
        (r'([Mm])oderniz(.?)',r'\1odernis\2'),
        (r'([Mm])oisturiz(.?)',r'\1oisturis\2'),
        (r'([Mm])onolog( +)',r'\1onologue\2'),
        (r'([Mm])onologs',r'\1onologues'),
        (r'([Mm])onopoliz(.?)',r'\1onopolis\2'),
        (r'([Mm])old(.?)',r'\1ould\2'),
        (r'([Mm])olt(.?)',r'\1oult\2'),
        (r'([Mm])ustache(.?)',r'\1oustache\2'),
        (r'([Nn])aturaliz(.?)',r'\1aturalis\2'),
        (r'([Nn])eighbor(.?)',r'\1eighbour\2'),
        (r'([Nn])aturaliz(.?)',r'\1aturalis\2'),
        (r'([Nn])eutraliz(.?)',r'\1eutralis\2'),
        (r'([Nn])ormaliz(.?)',r'\1ormalis\2'),
        (r'(.?)([Oo])dor(.?)',r'\1\2dour\3'),
        (r'esophagus(.?)',r'oesophagus\1'),
        (r'Esophagus(.?)',r'Oesophagus\1'),
        (u'estrogen',u'oestrogen'),
        (u'Estrogen',u'Oestrogen'),
        (r'([Oo])ffense(.?)',r'\1ffence\2'),
        (r'([Oo])melet( +)',r'\1melette\2'),
        (r'([Oo])melets',r'\1melettes'),
        (r'(.?)([Oo])ptimiz(.?)',r'\1\2ptimis\3'),
        (r'(.?)([Oo])rganiz(.?)',r'\1\2rganis\3'),
        (r'([Oo])rthopedic(.?)',r'\1rthopaedic\2'),
        (r'([Oo])straciz(.?)',r'\1stracis\2'),
        (r'([Oo])xidiz(.?)',r'\1xidis\2'),
        (r'([Pp])ederast(.?)',r'\1aederast\2'),
        (r'([Pp])ediatric(.?)',r'\1aediatric\2'),
        (r'([Pp])edo( +)',r'\1aedo\2'),
        (r'([Pp])edophil(.?)',r'\1aedophil\2'),
        (r'([Pp])aleo(.?)',r'\1alaeo\2'),
        (r'([Pp])anelist(.?)',r'\1anellist\2'),
        (r'([Pp])araliz(.?)',r'\1aralys\2'),
        (r'([Pp])arceled',r'\1arcelled'),
        (r'([Pp])arceling',r'\1arcelling'),
        (r'([Pp])arlor(.?)',r'\1arlour\2'),
        (r'([Pp])articulariz(.?)',r'\1articularis\2'),
        (r'([Pp])assiviz(.?)',r'\1assivis\2'),
        (r'([Pp])asteuriz(.?)',r'\1asteuris\2'),
        (r'([Pp])atroniz(.?)',r'\1atronis\2'),
        (r'([Pp])edestrianiz(.?)',r'\1edestrianis\2'),
        (r'([Pp])enaliz(.?)',r'\1enalis\2'),
        (r'([Pp])enciled',r'\1encilled'),
        (r'([Pp])enciling',r'\1encilling'),
        (r'([Pp])harmacopeia(.?)',r'\1harmacopoeia\2'),
        (r'([Pp])hilosophiz(.?)',r'\1hilosophis\2'),
        (r'([Pp])hilter(.?)',r'\1hiltre\2'),
        (r'([Pp])lagiariz(.?)',r'\1lagiaris\2'),
        (r'([Pp])low( +)',r'\1lough\2'),
        (r'([Pp])low(.?)',r'\1lough\2'),
        (r'(.?)([Pp])olariz(.?)',r'\1\2olaris\3'),
        (r'(.?)([Pp])oliticiz(.?)',r'\1\2oliticis\3'),
        (r'([Pp])opulariz(.?)',r'\1opularis\2'),
        (r'([Pp])ouf( +)',r'\1ouffe\2'),
        (r'([Pp])oufs',r'\1ouffes'),
        (r'([Pp])racticed',r'\1ractised'),
        (r'([Pp])racticing',r'\1ractising'),
        (r'([Pp])raesidium(.?)',r'\1residium\2'),
        (r'(.?)([Pp])ressuriz(.?)',r'\1ressuris\1'),
        (r'([Pp])retenc(.?)',r'\1retens\2'),
        (r'([Pp])rimaeval',r'\1rimeval'), #Correcting in favour of American spelling#
        (r'(.?)([Pp])rioritiz(.?)',r'\1\2rioritis\3'),
        (r'(.?)([Pp])rivatiz(.?)',r'\1\2rivatis\3'),
        (r'([Pp])roffesionaliz(.?)',r'\1roffesionalis\2'),
        (r'([Pp])rolog( +)',r'\1rologue\2'),
        (r'([Pp])rologs',r'\1rologues'),
        (r'([Pp])ropagandiz(.?)',r'\1ropagandis\2'),
        (r'([Pp])roselytiz(.?)',r'\1roselytis\2'),
        (r'([Pp])ubliciz(.?)',r'\1ublicis\2'),
        (r'([Pp])ulveriz(.?)',r'\1ulveris\2'),
        (r'([Pp])ummeled',r'\1ummelled'),
        (r'([Pp])ummeling',r'\1ummelling'),
        (r'([Pp])ajama(.?)',r'\1yjama\2'),
        (r'([Qq])uarreled',r'\1uarrelled'),
        (r'([Qq])uarreling',r'\1uqarrelling'),
        (r'([Rr])adicaliz(.?)',r'\1adicalis\2'),
        (r'([Rr])ancor(.?)',r'\1ancour\2'),
        (r'([Rr])andomiz(.?)',r'\1andomis\2'),
        (r'([Rr])ationaliz(.?)',r'\1ationalis\2'),
        (r'(.?)([Rr])aveled',r'\1\2avelled'),
        (r'(.?)([Rr])aveling',r'\1\2avelling'),
        (r'(.?)([Rr])ealiz(.?)',r'\1\2ealis\3'),
        (r'(.?)([Rr])ecogniz(.?)',r'\1\2ecognis\3'),
        (r'([Rr])econnoiter(.?)',r'\1\2econnoitre\3'),
        (r'([Rr])efueled','\1efuelled'),
        (r'([Rr])efueling','\1efuelling'),
        (r'(.?)([Rr])egulariz(.?)',r'\1\2\egularis\3'),
        (r'([Rr])evele(.?)',r'\1evelle\2'),
        (r'([Rr])eveling',r'\1evelling'),
        (r'(.?)([Vv])italiz(.?)',r'\1\2evitalis\3'),
        (r'([Rr])evolutioniz(.?)',r'\1evolutionis\2'),
        (r'([Rr])hapodiz(.?)',r'\1hapodis\2'),
        (r'([Rr])igor(.?)',r'\1igour\2'),
        (r'([Rr])itualiz(.?)',r'\1itualis\2'),
        (r'([Rr])ivaled',r'\1ivalled'),
        (r'([Rr])ivaling',r'\1ivalling'),
        (r'([Rr])omanticiz(.?)',r'\1omanticis\2'),
        (r'([Rr])umor(.?)',r'\1umour\2'),
        ],
    'exceptions': {
        'inside-tags': [
            'pre',
            'code',
            'nowiki',
            'hyperlink',
            'link',
            'comment',
            ]
        'category': [
            'spelling',
            ]
        }
    }