Tardis:What SpellBot actually corrects: Difference between revisions

From Tardis Wiki, the free Doctor Who reference
(chimaera)
 
(31 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[category:spelling]]Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have '''bot enforcement''' of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled, and is being coded for bot use as of the second week of June, 2011.  This page will see heavy updating throughout that week as the list is fully coded.
{{lock}}{{mosnav|p=Use British English|Spelling|Spelling cheat card|Spell checking|Spell checking with a Mac|Spell checking with Opera|Spell checking with Chrome|Spell checking with Firefox|SpellBot|c=British English|}}
 
{{summ|This is the master list of everything for which [[T:SBOT|SpellBot]] corrects, along with detailed notes about the rationale for some of the coding decisions and explanations of some of SpellBot's key limitations.  [[#The list|The list]] itself is written in {{w|regex|regex}}, so if you're unfamiliar with that language, it may take you a moment to get used to the symbology.}}
Following is the raw code of the bot routine (known as a "user-fix"), so that all users may see what exactly the bot is checking for.
{{sc|T:SBOT LIST}}
==Problem words==  
Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have '''bot enforcement''' of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled into a bot routine known as a "user-fix", so that all users may see what exactly the bot is checking for.
===Words impossible for a bot===  
== Problem words ==
Some words are beyond the capability of the bot, because they are valid spellings (even if of different words) in British English. This list includes:
=== Words impossible for a bot ===
*'''Check'''. Americans use this word to mean not only the verb ''to inquire after'' or ''to investigate'', but also the noun, which is a financial instrument. Because BrEng spells the verb that way, too, the bot can't be programmed to correct the other usage. We'd end up with sentences like:
Some words are beyond the capability of the bot, because they are valid spellings (even if of different words) in British English. This list includes:
* '''Check'''. Americans use this word to mean not only the verb ''to inquire after'' or ''to investigate'', but also the noun, which is a financial instrument. Because BrEng spells the verb that way, too, the bot can't be programmed to correct the other usage. We'd end up with sentences like:
::The Doctor chequed on Sarah Jane in her hospital room before going to the pathology lab.
::The Doctor chequed on Sarah Jane in her hospital room before going to the pathology lab.
*'''Tire'''. Both sides of the Atlantic use '''tire''' as a verb. It's again the noun that's problematic. Americans view ''tire'' as the correct spelling for what the British would call ''a tyre''. The bot can't figure this one out, so it doesn't even try.  
* '''Tire'''. Both sides of the Atlantic use '''tire''' as a verb. It's again the noun that's problematic. Americans view ''tire'' as the correct spelling for what the British would call ''a tyre''. The bot can't figure this one out, so it doesn't even try.  
*'''Draft'''. Americans use this spelling for all senses, the British use both for different senses. All words beginning with ''drafts'' will be converted to ''draughts-'', and the word ''drafty'' will be converted to ''draughty'', but the word ''draft'' itself won't be touched by the bot, as that is a valid British spelling of the word. Clear as mud? Cool. Onwards, then . . .
* '''Draft'''. Americans use this spelling for all senses, the British use both for different senses. All words beginning with ''drafts'' will be converted to ''draughts-'', and the word ''drafty'' will be converted to ''draughty'', but the word ''draft'' itself won't be touched by the bot, as that is a valid British spelling of the word. Clear as mud? Cool. Onwards, then . . .  
*'''Disc'''. Way, way, way too screwed up a word for a simple bot to handle. ''Disc jockey'' is fine on both sides of the divide, but so is ''floppy disk'' and ''hard disk''. This one simply depends on context.
* '''Disc'''. Way, way, way too screwed up a word for a simple bot to handle. ''Disc jockey'' is fine on both sides of the divide, but so is ''floppy disk'' and ''hard disk''. This one simply depends on context.
*'''Practise'''. The ''-ise'' version of this word is the correct spelling for the verb in BrEng; the British noun ends in ''-ice''. Americans use ''-ice'' for everything. Thus, the bot can't be of much use, except for participles derived from the verb. So, the bot will make no attempt to change the spelling of ''practice'', but it will change ''practicing'' and ''practiced'' to ''practising'' and ''practised''.
* '''Practise'''. The ''-ise'' version of this word is the correct spelling for the verb in BrEng; the British noun ends in ''-ice''. Americans use ''-ice'' for everything. Thus, the bot can't be of much use, except for participles derived from the verb. So, the bot will make no attempt to change the spelling of ''practice'', but it will change ''practicing'' and ''practiced'' to ''practising'' and ''practised''.
*'''Jail'''. Yes, ''gaol'' is still correct, especially historically, but ''jail'' has largely supplanted it. So we won't look to correct ''jail'' to ''gaol'', but neither will we try to correct ''gaol'' to ''jail''. Because of the known presence of both spellings in DWU fiction, this one can't be decided by forum debate. We just have to live with ''gaol'' occasionally popping up.  
* '''Jail'''. Yes, ''gaol'' is still correct, especially historically, but ''jail'' has largely supplanted it. So we won't look to correct ''jail'' to ''gaol'', but neither will we try to correct ''gaol'' to ''jail''. Because of the known presence of both spellings in DWU fiction, this one can't be decided by forum debate. We just have to live with ''gaol'' occasionally popping up.  
*'''Licence'''. ''License'' is correct in BrEng as a noun, so the bot can't correct for Americans using ''license'' as a verb (and other parts of speech deriving from the verb form). Other ''-ence'' words don't necessarily work this way. ''Offence'' and ''defence'' are unambiguously correct — but then again their verb form is different — ''offend'' and ''defend'' — which means their gerunds and verbal nouns are different, too.
* '''Licence'''. ''License'' is correct in BrEng as a noun, so the bot can't correct for Americans using ''license'' as a verb (and other parts of speech deriving from the verb form). Other ''-ence'' words don't necessarily work this way. ''Offence'' and ''defence'' are unambiguously correct — but then again their verb form is different — ''offend'' and ''defend'' — which means their gerunds and verbal nouns are different, too.
*'''Storey''' is the proper British spelling for a floor in a building. Americans just spell this ''story'', as in ''a 15-story building''. Obviously, the bot can't make this correction, because the word for a ''tale'' is spelled ''story'' on both sides of the Atlantic.
* '''Storey''' is the proper British spelling for a floor in a building. Americans just spell this ''story'', as in ''a 15-story building''. Obviously, the bot can't make this correction, because the word for a ''tale'' is spelled ''story'' on both sides of the Atlantic.
*'''Chimaera/chimera'''. Standard British spelling in the Cambridge Online Dictionary is '''chimera''', but there are other sources which state that the British prefer '''chimaera'''. '''Chimaera''', however, is the universal spelling for a certain kind of fish, and the mountain from which the legend of the chimera got its name. The bot's going to steer well clear of all this. It's probably best to spell however the particular source spells this word.
* '''Chimaera/chimera'''. Standard British spelling in the Cambridge Online Dictionary is '''chimera''', but there are other sources which state that the British prefer '''chimaera'''. '''Chimaera''', however, is the universal spelling for a certain kind of fish, and the mountain from which the legend of the chimera got its name. The bot's going to steer well clear of all this. It's probably best to spell however the particular source spells this word.
===Words requiring forum decision===
=== Words requiring forum decision ===
Other words are ''possible'' for a bot to correct, but the presence of two valid British spellings means that we will require a forum discussion. Precedent for such forum debates over particular words can be found in the following threads:
Other words are ''possible'' for a bot to correct, but the presence of two valid British spellings means that we will require a forum discussion. Precedent for such forum debates over particular words can be found in the following threads:
<dpl>
<dpl>
category=spelling debates
category=spelling debates
columns=2
columns=2
</dpl>
</dpl>
*'''Judgment'''. There's no agreement on either side of the Atlantic whether this word should be ''judgment'' or ''judgement''. Oddly, most British spell-checkers will red-flag ''judgment'', even though that's the official spelling in Commonwealth courts. We have at least one story title preferencing the version with two ''e''s — ''[[Judgement of the Judoon]]''. But still, this word will require a special forum discussion to decide which way we want to spell it.
* '''Judgment'''. There's no agreement on either side of the Atlantic whether this word should be ''judgment'' or ''judgement''. Oddly, most British spell-checkers will red-flag ''judgment'', even though that's the official spelling in Commonwealth courts. We have at least one story title preferencing the version with two ''e''s — ''[[Judgement of the Judoon]]''. But still, this word will require a special forum discussion to decide which way we want to spell it.
*'''Connexion'''. This British spelling of ''connection'' is not ''universally'' used in Britain. ''Connection'' is correct in Britain, too, so the bot won't try to force ''connection'' into a ''connexion''-shaped hole.  
* '''Connexion'''. This British spelling of ''connection'' is not ''universally'' used in Britain. ''Connection'' is correct in Britain, too, so the bot won't try to force ''connection'' into a ''connexion''-shaped hole.  
*'''Simidgen, smidgeon, smidgin'''. They're all valid spellings for the same word, on ''both'' sides of the Atlantic. The only way the bot could be useful is if we had a forum discussion to settle on one of the three spellings.  
* '''Simidgen, smidgeon, smidgin'''. They're all valid spellings for the same word, on ''both'' sides of the Atlantic. The only way the bot could be useful is if we had a forum discussion to settle on one of the three spellings.  
*'''Yogurt''' passes most British spell-checkers today, but so does ''yoghurt''. We'll leave both well alone until a forum discussion decides the matter.
* '''Yogurt''' passes most British spell-checkers today, but so does ''yoghurt''. We'll leave both well alone until a forum discussion decides the matter.
*'''Almanac''' is universally the way it's spelt in American English, and increasingly the way Britons spell it, too. Still, some old-timers will go for ''almanack''. Until a forum discussion decides otherwise, the bot won't enforce either spelling.
* '''Almanac''' is universally the way it's spelt in American English, and increasingly the way Britons spell it, too. Still, some old-timers will go for ''almanack''. Until a forum discussion decides otherwise, the bot won't enforce either spelling.
*'''Gasses/gases'''. Both spellings are correct on both sides of the Atlantic. The bot won't correct for either until a forum discussion settles on a particular spelling.
* '''Gasses/gases'''. Both spellings are correct on both sides of the Atlantic. The bot won't correct for either until a forum discussion settles on a particular spelling.
*'''Programme/program'''. Both spellings pass British spell-checkers, even though there's a ''perception'' that "programme" is Britsh and "program" is American. Until there's a community decision on spelling, the bot won't touch either spelling.
* '''Programme/program'''. Both spellings pass British spell-checkers, even though there's a ''perception'' that "programme" is Britsh and "program" is American. Until there's a community decision on spelling, the bot won't touch either spelling.
*'''Griffin/Gryphon'''. Both pass British spell-checkers, so it'll take a forum discussion to decide which way we want to go.
* '''Griffin/Gryphon'''. Both pass British spell-checkers, so it'll take a forum discussion to decide which way we want to go.
*'''Inflexion''' is the way the British have historically spelt ''inflection'', but modern British spell-checkers pass both spellings. So, the bot won't enforce either without a forum decision to the contrary.
* '''Inflexion''' is the way the British have historically spelt ''inflection'', but modern British spell-checkers pass both spellings. So, the bot won't enforce either without a forum decision to the contrary.
*'''Instal/install'''. Modern British spell-checkers are cool with both, so the bot is, too. But the proper British spelling is ''instalment'', not ''installment''.
* '''Instal/install'''. Modern British spell-checkers are cool with both, so the bot is, too. But the proper British spelling is ''instalment'', not ''installment''.
*'''Mediaeval'''. Ironically this spelling is now considered archaic and is most often seen in British academic writing. ''Medieval'', the American spelling, passes British spell checks, too. So, in the absence of a forum decision to the contrary, the bot won't try to correct ''either'' spelling.  
* '''Mediaeval'''. Ironically this spelling is now considered archaic and is most often seen in British academic writing. ''Medieval'', the American spelling, passes British spell checks, too. So, in the absence of a forum decision to the contrary, the bot won't try to correct ''either'' spelling.  
*'''Praesidium/presidium/presidiums/presidia''' is the archaic British spelling of ''presidium''. It no longer passes default settings on British spell-checkers, so the bot will correct to ''presidium'', which is also the American spelling. Note that the plural of the word is more confused. Both presidiums and presidia pass British spell-checkers. So the case here is complicated. The singular form of the verb will be corrected to ''presidium''. The plural will be corrected from ''praesidiums'' to ''presidiums''. But ''presidia'' will go uncorrected, unless a forum discussion decides on one or the other plural form.
* '''Praesidium/presidium/presidiums/presidia''' is the archaic British spelling of ''presidium''. It no longer passes default settings on British spell-checkers, so the bot will correct to ''presidium'', which is also the American spelling. Note that the plural of the word is more confused. Both presidiums and presidia pass British spell-checkers. So the case here is complicated. The singular form of the verb will be corrected to ''presidium''. The plural will be corrected from ''praesidiums'' to ''presidiums''. But ''presidia'' will go uncorrected, unless a forum discussion decides on one or the other plural form.
*'''Pizzaz/pizzazz'''. Both spellings pass modern British spell checkers, although historically the three-z version was British and the four-z version was American. A forum discussion will be required before the bot corrects for either.
* '''Pizzaz/pizzazz'''. Both spellings pass modern British spell checkers, although historically the three-z version was British and the four-z version was American. A forum discussion will be required before the bot corrects for either.  
*'''Siphon/syphon'''. Both pass most modern British spell-checkers (and American spell-checkers, for that matter).  
* '''Siphon/syphon'''. Both pass most modern British spell-checkers (and American spell-checkers, for that matter).  
*'''Ton/tonne'''. Both pass British spell-checkers, so the bot won't touch either until forum decision to the contrary.
* '''Ton/tonne'''. Both pass British spell-checkers, so the bot won't touch either until forum decision to the contrary.
*'''Tranquility/tranquillity'''. Both pass British spell-checkers, though it's unclear whether this is because the simple noun is actively spelled both ways in Britain, or if it's because of Tranquility Base (see below).
* '''Tranquility/tranquillity'''. Both pass British spell-checkers, though it's unclear whether this is because the simple noun is actively spelled both ways in Britain, or if it's because of Tranquility Base (see below).
===Valid British spellings actively corrected===
* Notwithstanding the general fact that the bot will not correct for either '''chimera''' or '''chimaera''' (see above), there probably should be a forum discussion on the proper spelling of [[Space Station Chimera]], since we're technically inventing that spelling. The novelisation doesn't actually offer us a spelling, so it could equally be [[Space Station Chimaera]].  
Thanks to the ubiquity of American spellings in pop culture, there are a few cases where valid if archaic British spellings can be corrected to standard American spellings without the need for a forum decision.  In such cases, modern British usage hews closely to the American, and clearly argues ''against'' more archaic forms.
=== Valid British spellings actively corrected ===
*'''Primaeval'''.  Due to the presence of the modern television ITV/BBCA television series with which many ''[[Doctor Who]]'' fans will be familiar, as well as [[BFA]]: ''[[Primeval]]'', the unambiguously American spelling "wins" the contest. ''Primaeval'' will be actively corrected to ''primeval''. This shouldn't ruffle too many feathers, since modern British spell-checkers fail ''primaeval''.
Thanks to the ubiquity of American spellings in pop culture, there are a few cases where valid if archaic British spellings can be corrected to standard American spellings without the need for a forum decision. In such cases, modern British usage hews closely to the American, and clearly argues ''against'' more archaic forms.
*'''Tranquillize/tranquillise/tranquilise/tranquilize'''. There are four different ways to spell this one damned word (and all words deriving from it).  Ridiculous.  The ''-ll'' versions are both okay in BrEng; the ''-l'' versions are both okay in AmEng. However, only ''one'' spelling passes ''modern'' British spell-checkers.  Therefore ''tranquillise'' shall be deemed correct, and the bot will correct the other three spellings.  
* '''Primaeval'''. Due to the presence of the modern television ITV/BBCA television series with which many ''[[Doctor Who]]'' fans will be familiar, as well as [[AUDIO]]: ''[[Primeval]]'', the unambiguously American spelling "wins" the contest. ''Primaeval'' will be actively corrected to ''primeval''. This shouldn't ruffle too many feathers, since modern British spell-checkers fail ''primaeval''.
*Notwithstanding the general fact that the bot will not correct for either '''chimera''' or '''chimaera''', there probably should be a forum discussion on the proper spelling of [[Space Station Chimera]], since we're technically inventing that spelling.  The novelisation doesn't actually offer us a spelling, so it could equally be [[Space Station Chimaera]].
* '''Tranquillize/tranquillise/tranquilise/tranquilize'''. There are four different ways to spell this one damned word (and all words deriving from it). Ridiculous. The ''-ll'' versions are both okay in BrEng; the ''-l'' versions are both okay in AmEng. However, only ''one'' spelling passes ''modern'' British spell-checkers. Therefore ''tranquillise'' shall be deemed correct, and the bot will correct the other three spellings.  
*'''Tranquility Base'''. The bot will actively correct ''Tranquillity Base'' to the IAU-standard ''Tranquility Base''.
* '''Tranquility Base'''. The bot will actively correct ''Tranquillity Base'' to the IAU-standard ''Tranquility Base''.  
===Confusing words===
=== Confusing words ===
Some words are homonyms, and ''look'' like they're archaic forms of other words, but in fact are ''totally'' different words. Other words behave one way as a root, but are spelt differently once suffixes are added. The bot will therefore correct in an unexpected way, which is why those ways are explained here.  
Some words are homonyms, and ''look'' like they're archaic forms of other words, but in fact are ''totally'' different words. Other words behave one way as a root, but are spelt differently once suffixes are added. The bot will therefore correct in an unexpected way, which is why those ways are explained here.  
*'''Philtre''' is not the British spelling of ''filter'', but of the American ''philter''. It's a noun meaning "love potion", not a verb meaning "to remove impurities".  
* '''Philtre''' is not the British spelling of ''filter'', but of the American ''philter''. It's a noun meaning "love potion", not a verb meaning "to remove impurities".  
*'''Pouffe''' is not an archaic British spelling of ''poof'', but a current spelling for what Americans would call a ''pouf'' — that is, a nice, thick cushion you can sit on. Thus, the bot will correct ''pouf'' to ''pouffe''. Not that we actually expect ''pouf'' to ever be used in a sentence on this wiki — except on this very page, which the bot doesn't patrol.
* '''Pouffe''' is not an archaic British spelling of ''poof'', but a current spelling for what Americans would call a ''pouf'' — that is, a nice, thick cushion you can sit on. Thus, the bot will correct ''pouf'' to ''pouffe''. Not that we actually expect ''pouf'' to ever be used in a sentence on this wiki — except on this very page, which the bot doesn't patrol.  
*'''Groyne''' ''is'' a British spelling of ''groin'' only when it's not. It doesn't refer to that region of the human anatomy between the legs. That's ''groin'' on both sides of the Atlantic. ''Groyne'' is, instead, a civil engineering term, referring to a construction that controls erosion. Technically, it's known as a ''groin'' in the US, but almost no one calls it that. It's more commonly known as a ''breakwater'', ''bulwark'' or ''seawall''. Frankly, groynes are so often called by these more specific names, even in Britain, that ''groyne'' doesn't pass modern British spell-checkers. It's probably never been used at any time in the DWU, but still it's correct to say something like: "All seawalls are groynes, but not all groynes are seawalls." The bot won't correct ''away'' from it, but because ''groin'' is BrEng correct in its anatomical sense, it won't correct ''groin'' to ''groyne'', either.
* '''Groyne''' ''is'' a British spelling of ''groin'' only when it's not. It doesn't refer to that region of the human anatomy between the legs. That's ''groin'' on both sides of the Atlantic. ''Groyne'' is, instead, a civil engineering term, referring to a construction that controls erosion. Technically, it's known as a ''groin'' in the US, but almost no one calls it that. It's more commonly known as a ''breakwater'', ''bulwark'' or ''seawall''. Frankly, groynes are so often called by these more specific names, even in Britain, that ''groyne'' doesn't pass modern British spell-checkers. It's probably never been used at any time in the DWU, but still it's correct to say something like: "All seawalls are groynes, but not all groynes are seawalls." The bot won't correct ''away'' from it, but because ''groin'' is BrEng correct in its anatomical sense, it won't correct ''groin'' to ''groyne'', either.
*'''Vapour'''/'''vaporise'''. Yeah, this one's a beauty. As a plain noun, Americans always spell this one ''vapor'', while Britons always go for ''vapour''. That's pretty much the definitional British/American spelling difference. What's weird is what happens when you turn the noun into a verb by adding ''-ize'', or, as the British would have it, ''-ise''. Suddenly the cosmetic ''u'' is gone. So, the bot will correct ''vapor'' to ''vapour'', but ''vaporize'' only to ''vaporise''. ''Vapourise'' is incorrect on both sides of the Atlantic.
* '''Vapour'''/'''vaporise'''. Yeah, this one's a beauty. As a plain noun, Americans always spell this one ''vapor'', while Britons always go for ''vapour''. That's pretty much the definitional British/American spelling difference. What's weird is what happens when you turn the noun into a verb by adding ''-ize'', or, as the British would have it, ''-ise''. Suddenly the cosmetic ''u'' is gone. So, the bot will correct ''vapor'' to ''vapour'', but ''vaporize'' only to ''vaporise''. ''Vapourise'' is incorrect on both sides of the Atlantic.
*'''Odour/Deodorise'''. This pair works the same way as ''vapour/vaporise''. ''Deodourise'' is wrong; deodorise is right.
* '''Odour/Deodorise'''. This pair works the same way as ''vapour/vaporise''. ''Deodourise'' is wrong; deodorise is right.


==How to read the code==
== How to read the code ==
The code works by telling the bot to look for the word described before the comma. Then it replaces it with the word after the comma. A most basic expression would be:
The code works by telling the bot to look for the word described before the comma. Then it replaces it with the word after the comma. A most basic expression would be:
:{u'color',u'colour')
:{u'color',u'colour')
This looks for the American "color", then replaces it with the British "colour".
This looks for the American "color", then replaces it with the British "colour".  


Because typing every permutation of a word, including all words that share the same root and capitalised variants, would be ''very'' time-consuming, most of the code won't work in such a simplistic way. Most of it uses a "regular expression" — or [[wikipedia:regex|regex]] — to find a lot of hits with just one line. Here's an explanation of the regex used in this code:  
Because typing every permutation of a word, including all words that share the same root and capitalised variants, would be ''very'' time-consuming, most of the code won't work in such a simplistic way. Most of it uses a "regular expression" — or [[wikipedia:regex|regex]] — to find a lot of hits with just one line. Here's an explanation of the regex used in this code:  
*The expression ([Cc]) means "look for either capitalised or lowercase versions of the letter C
* The expression ([Cc]) means "look for either capitalised or lowercase versions of the letter C
*(.?) means, "You, Mr. Fancy Computer bot thing, might find some more letters to the right of this point. Grab 'em all up to the next space only."
* (.?) means, "You, Mr. Fancy Computer bot thing, might find some more letters to the right of this point. Grab 'em all up to the next space only."


*/1 means, "take whatever is in the first parentheses and put it here"
* /1 means, "take whatever is in the first parentheses and put it here"
*/2 means, "take whatever is in the second parentheses and put it here"
* /2 means, "take whatever is in the second parentheses and put it here"
*/3 means, "take whatever is in the second parentheses and put it here"
* /3 means, "take whatever is in the third parentheses and put it here"


Thus, if we have the expression,
Thus, if we have the expression,
:(r'([Cc])apitaliz(.?)', r'\1capitalis\2')
:(r'([Cc])apitaliz(.?)', r'\1capitalis\2')
It means, roughly,
It means, roughly,
:Look for all words, beginning with either a capital or lowercase C, which are followed by the letters "apitaliz" + any other letters you find until the next space. Then, keep the form of the letter c that you find, stick on "apitalis", and add back in any letters you orginally found after the "z".
:Look for all words, beginning with either a capital or lowercase C, which are followed by the letters "apitaliz" + any other letters you find until the next space. Then, keep the form of the letter c that you find, stick on "apitalis", and add back in any letters you orginally found after the "z".


In other words, find, Capitaliz-, keep the C capitalised, switch the z to an s, then stick on "-e', "-ing", "-ed", or "-ation", as appropriate.
In other words, find, Capitaliz-, keep the C capitalised, switch the z to an s, then stick on "-e', "-ing", "-ed", or "-ation", as appropriate.


===Correcting all related words at once===
=== Correcting all related words at once ===
Now let's take a look at arguably the most complicated coding here. What if I wanted to change every word that had ''favor'' as a root? How could I take care of words that had both a prefix and a suffix, like ''disfavorable''? Putting together everything we've learned so far, it would be:
Now let's take a look at arguably the most complicated coding here. What if I wanted to change every word that had ''favor'' as a root? How could I take care of words that had both a prefix and a suffix, like ''disfavorable''? Putting together everything we've learned so far, it would be:
:(r'(.?)([Ff])avor(.?)',r'\1\2avour\3')
:(r'(.?)([Ff])avor(.?)',r'\1\2avour\3')
The leading (.?) means check to see if there's a prefix. The ([Ff]) switch checks for capitalisation of the root letter f. The (.?) at the end checks for suffixes. Now we have three parentheses instead of just two. So \1 means the prefix, \2 puts the letter f in with proper capitalisation, and \3 adds any suffixes.
The leading (.?) means check to see if there's a prefix. The ([Ff]) switch checks for capitalisation of the root letter f. The (.?) at the end checks for suffixes. Now we have three parentheses instead of just two. So \1 means the prefix, \2 puts the letter f in with proper capitalisation, and \3 adds any suffixes.  


This one statement will therefore switch over: favor, favors, favored, disfavor, disfavored, unfavorable, favoring, disfavoring, favorable, and almost certainly a few more.
This one statement will therefore switch over: favor, favors, favored, disfavor, disfavored, unfavorable, favoring, disfavoring, favorable, and almost certainly a few more.


===When correcting to British leaves an American spelling around===
=== When correcting to British leaves an American spelling around ===
A few words — mostly those which have ''-log'' in them — retain the Amercian spelling even after changing to the British. For instance:
A few words — mostly those which have ''-log'' in them — retain the Amercian spelling even after changing to the British. For instance:
 
:''AmEng'': '''dialog''' → ''BrEng'': '''dialogue''', but ''dialogue'' still contains ''dialog''
This means that the next time the bot is run, it will find ''dialog'' again and attempt to replace it. After several passes, you'll end up with something like ''dialogueueueueue'', which is obviously not desirable. Thus, we must find a way to limit the search to ''only'' the case of ''dialog+space'', ''dialog+puncutation mark''. Here's how we do it:<pre>r'dialog(\.|\;|\:| |\!|\,|\?+)'</pre> The pipes (|) act as a switch. They say, look for ''this character'' | ''that character'' | or ''the other character''. The plus sign at the very end says "at least one time". And the back slashes (\) escape the punctuation marks from their usual special meanings.
 
Altogether then, what this statement says is, "Look for the word ''dialog'' followed by either a period, a semi-colon, a colon, an exclamation mark, a space or a comma that's present at least once." It will therefore find ''only'':
* the speed of his dialog was rapid
* how could he have forgotten his dialog?
* dialog: the bane of the actor
* he had far too many lines of dialog!
* They had a fruitful dialog; however, the humans would soon kill the Silurians.
 
== Cases where regex fails ==
Not every word on our list has been switched using regex expressions. Sometimes it's easier just to type up a switch of literal characters, as when a word serves as the root of no other words.
 
== The list ==
The following code is what's at the heart of our automated BrEng spelling enforcement. It was tested throughout 2011 and eventually completed its first run through the main [[help:namespaces|namespace]] on 31 October 2011, with a secondary confirming run on 1 November 2011.
 
The exceptions bit at the very end is particularly important to its function on Tardis. The list of exceptions is not currently organised in any way, but the bot really wouldn't work properly without exceptions.


:''AmEng'': '''dialog''' &rarr; ''BrEng'': '''dialogue''', but ''dialogue'' still contains ''dialog''
The most common British spelling that Spellbot 3.0 will not correct is "Honor" with a capital ''H''. It's fine with lower case ''honor'', but it's not yet been determined how to except for a name with a diacritic in it. Thus [[Honoré Lechasseur]] can't currently be excepted unless the bot simply ignores "Honor". This is doubly useful since, for reasons equally unclear, the exception for "[[Honor Blackman]]" is not being, well, ''honoured''. Since ''honor'' does not typically begin a sentence, this compromise is believed acceptable for the time being.  
This means that the next time the bot is run, it will find ''dialog'' again and attempt to replace it.  After several passes, you'll end up with something like ''dialogueueueueue'', which is obviously not desirable.  Thus, we must find a way to limit the search to ''only'' the case of ''dialog+space'', ''dialog+puncutation mark''.   Here's how we do it:<pre>r'dialog(\.|\;|\:| |\!|\,|\?+)'</pre> The pipes (|) act as a switch.  They say, look for ''this character'' | ''that character'' | or ''the other character''.  The plus  sign at the very end says "at least one time".  And the back slashes (\) escape the punctuation marks from their usual special meanings.  


Altogether then, what this statement says is, "Look for the word ''dialog'' followed by either a period, a semi-colon, a colon, an exclamation mark, a space or a comma that's present at least once."  It will therefore find ''only'':
This is the state of the code as it existed during the SpellBot run by [[User:SV7|SV7]] in the week of 12 March 2023.
*the speed of his dialog was rapid
<syntaxhighlight lang="python" line>
*how could he have forgotten his dialog?
*dialog: the bane of the actor
*he had far too many lines of dialog!
*They had a fruitful dialog; however, the humans would soon kill the Silurians.


==Cases where regex fails==
#SBOT version 3.0
Not every word on our list has been switched using regex expressions. Sometimes it's easier just to type up a switch of literal characters, as when a word serves as the root of no other words.
#Enforces BrEng spelling, with exceptions
#relevant to the Doctor Who universe
#and usage found on tardis.wiki
#released under CC-BY-SA 3.0 license
#by User:CzechOut
#Originally published: 1 November 2011 (CzechOut)
#Current version: 17 March 2023 (SOTO)


==The code==
The following code will change over time, as more words are added.  The final word in the English language that has a British/American difference is ''yodelling''.  Once you see that word on this list, you'll know the bot is fully programmed. 
<pre>
fixes['spelling'] = {
fixes['spelling'] = {
     'regex': True,
     'regex': True,
     'recursive': True,
     'recursive': True,
     'msg': {
     'msg': {
         'en':u'Enforcing [[tardis:spelling policy|spelling policy]].'
         'en':u'Enforcing [[T:SPELL]]'
         },
         },
     'replacements': [
     'replacements': [
Line 126: Line 143:
         (r'([Aa])nesthesia',r'\1naesthesia'),
         (r'([Aa])nesthesia',r'\1naesthesia'),
         (r'([Aa])nestheti(.?)',r'\1naestheti\2'),
         (r'([Aa])nestheti(.?)',r'\1naestheti\2'),
(r'([Aa])na?esthetiz(.?)',r'\1naesthetis\2'), #SOTO bug fix
         (r'([Aa])nalog( +)',r'\1nalogue\2'),
         (r'([Aa])nalog( +)',r'\1nalogue\2'),
         (r'([Aa])nalogs',r'\1nalogues'),
         (r'([Aa])nalogs',r'\1nalogues'),
Line 134: Line 152:
         (r'([Aa])ntagoniz(.?)',r'\1ntagonis\2'),
         (r'([Aa])ntagoniz(.?)',r'\1ntagonis\2'),
         (r'([Aa])pologiz(.?)',r'\1pologis\2'),
         (r'([Aa])pologiz(.?)',r'\1pologis\2'),
         (u'appall',u'appal'),
         (r'([Aa])ppall( +)',r'\1ppal\2'),  
         (u'appalls',u'appals'),
         (r'([Aa])ppalls',r'\1ppals'),
         (r'([Aa])ppetiz(.?)',r'\1ppetis\2'),
         (r'([Aa])ppetiz(.?)',r'\1ppetis\2'),
         (r'([Aa])rbor(.?)',r'\1rbour\2'),
         (r'([Aa])rbor(.?)',r'\1rbour\2'),
Line 143: Line 161:
         (r'([Aa])rtifact(.?)',r'\1rtefact\2'),
         (r'([Aa])rtifact(.?)',r'\1rtefact\2'),
         (r'(.?)([Aa])uthoriz(.?)',r'\1\2uthoris\3'),
         (r'(.?)([Aa])uthoriz(.?)',r'\1\2uthoris\3'),
         (r'([Aa])x( +)',r'\1xe\2'),
         (r'( +)([Aa])x( +)',r'\1\2xe\3'),
         #BBBB#
         #BBBB#
         (r'(.?)([Pp])edaled', r'\1\2edalled'),
         (r'(.?)([Pp])edaled', r'\1\2edalled'),
         (r'(.?)([Pp])edaling', r'\1\2edalling'),
         (r'(.?)([Pp])edaling', r'\1\2edalling'),
        (r'([Bb])anister(.?)', r'\1annister\2'),
         (r'([Bb])aptiz(.?)',r'\1aptis\2'),
         (r'([Bb])aptiz(.?)',r'\1aptis\2'),
         (r'([Bb])astardiz(.?)',r'\1astardis\2'),
         (r'([Bb])astardiz(.?)',r'\1astardis\2'),
         (r'([[Bb]])attleax( +)',r'\1attleaxe\2'),
         (r'([Bb])attleax( +)',r'\1attlee\2'),
         (r'([[Bb]])alk(.?)',r'\1aulk\2'),
         (r'([Bb])alk(.?)',r'\1aulk\2'),
         (r'([[Bb]])edeviled',r'\1edevilled'),
         (r'([Bb])edeviled',r'\1edevilled'),
         (r'([[Bb]])edevling',r'\1edevilling'),
         (r'([Bb])edevling',r'\1edevilling'),
         (r'(.?)([[Bb]])ehavior(.?)',r'\1\2ehaviour\3'),
         (r'(.?)([Bb])ehavior(.?)',r'\1\2ehaviour\3'),
         (r'([[Bb]])ehoove(.?)',r'\1ehove\2'),
         (r'([Bb])ehoove(.?)',r'\1ehove\2'),
         (r'([[Bb]])ejeweled',r'\1ejewelled'),
         (r'([Bb])ejeweled',r'\1ejewelled'),
         (r'(.?)([Ll])abor(.?)',r'\1\2abour\3'),
         (r'(.?)([Ll])abor( +)',r'\1\2abour\3'),
        (r'(.?)([Ll])abored',r'\1\2aboured'),
         (r'([Bb])eveled',r'\1evelled'),
         (r'([Bb])eveled',r'\1evelled'),
         (r'([Bb])evies',r'\1evvies'),
         (r'([Bb])evies',r'\1evvies'),
Line 173: Line 191:
         (r'([Cc])esarean(.?)',r'\1aesarean\2'),
         (r'([Cc])esarean(.?)',r'\1aesarean\2'),
         (r'([Cc])aliber(.?)',r'\1alibre\2'),
         (r'([Cc])aliber(.?)',r'\1alibre\2'),
         (r'([Cc])aliper(.?)',r'\1calliper\2'),
         (r'([Cc])aliper(.?)',r'\1alliper\2'),
         (r'([Cc])alisthenics',r'\1allisthenics'),
         (r'([Cc])alisthenics',r'\1allisthenics'),
         (r'([Cc])analiz(.?)',r'\1analis\2'),
         (r'([Cc])analiz(.?)',r'\1analis\2'),
Line 190: Line 208:
         (r'([Cc])aroled',r'\1arolled'),
         (r'([Cc])aroled',r'\1arolled'),
         (r'([Cc])aroling',r'\1arolling'),
         (r'([Cc])aroling',r'\1arolling'),
         (r'([Cc])atalog( +)',r'\1atalogue\1'),
         (r'([Cc])atalog( +)',r'\1atalogue\2'),
         (r'([Cc])atalogs( +)',r'\1atalogues\2'),
         (r'([Cc])atalogs( +)',r'\1atalogues\2'),
         (r'([Cc])ataloged',r'\1atalogued'),
         (r'([Cc])ataloged',r'\1atalogued'),
Line 199: Line 217:
         (r'([Cc])avilled',r'\1avilled'),
         (r'([Cc])avilled',r'\1avilled'),
         (r'([Cc])aviling',r'\1avilling'),
         (r'([Cc])aviling',r'\1avilling'),
         (r'(.?)([Gg])ram( +)',r'\1\2gramme\3'),
         (r'(.?)([Gg])ram( +)',r'\1\2ramme\3'),
         (r'(.?)([Gg])rams',r'\1\2grammes'),
         (r'(.?)([Gg])rams',r'\1\2rammes'),
         (r'(.?)([Ll])iter(.?)',r'\1\2itre\3'),
         (r'(.?)([Ll])iter( +)',r'\1\2itre\3'),
        (r'(.?)([Ll])iters',r'\1\2itres'),       
         (r'(.?)([Mm])eter(.?)',r'\1\2etre\3'),
         (r'(.?)([Mm])eter(.?)',r'\1\2etre\3'),
         (r'([Cc])entraliz(.?)',r'\1entralis\2'),
         (r'([Cc])entraliz(.?)',r'\1entralis\2'),
         (r'(.?)([Cc])enter(.?)',r'\1\2entre\3'),
         (r'(.?)([Cc])enter( +)',r'\1\2entre\3'),
        (r'([Cc])enters',r'\1entres'),
        (r'([Cc])entered',r'\1entred'),
        (r'([Cc])entering',r'\1entring'),
         (r'([Cc])hanneled',r'\1hannelled'),
         (r'([Cc])hanneled',r'\1hannelled'),
         (r'([Cc])hanneling',r'\1hannelling'),
         (r'([Cc])hanneling',r'\1hannelling'),
Line 210: Line 232:
         (r'([Cc])heckbook(.?)',r'\1hequebook\2'),
         (r'([Cc])heckbook(.?)',r'\1hequebook\2'),
         (r'([Cc])hili',r'\1hilli'),
         (r'([Cc])hili',r'\1hilli'),
        (r'([Cc])himera(.?)',r'\1himaera\2'),
         (r'([Cc])hiseled',r'\1hiselled'),
         (r'([Cc])hiseled',r'\1hiselled'),
         (r'([Cc])hiseling',r'\1hiselling'),
         (r'([Cc])hiseling',r'\1hiselling'),
Line 220: Line 241:
         (r'([Cc])ollectiviz(.?)',r'\1ollectivis\2'),
         (r'([Cc])ollectiviz(.?)',r'\1ollectivis\2'),
         (r'([Cc])oloniz(.?)',r'\1olonis\2'),
         (r'([Cc])oloniz(.?)',r'\1olonis\2'),
         (r'(.?)([Cc])olor(.?)',r'\1\2olour\3'),
        (r'([Cc])olor(.?)',r'\1olour\2'),
         (r'(.?)([Cc])olored',r'\1\2oloured'),
        (r'(.?)([Cc])oloring',r'\1\2olouring'),
        (r'(.?)([Cc])oloriz(.?)',r'\1\2olouris\3'),
         (r'([Cc])ommercializ(.?)',r'\1ommercialis\2'),
         (r'([Cc])ommercializ(.?)',r'\1ommercialis\2'),
         (r'([Cc])ompartmentaliz(.?)',r'\1ompartmentalis\2'),
         (r'([Cc])ompartmentaliz(.?)',r'\1ompartmentalis\2'),
         (r'([Cc])omputeriz(.?)',r'\1omputeris\2'),
         (r'([Cc])omputeriz(.?)',r'\1omputeris\2'),
         (r'([Cc])onceptualiz(.?)',r'\1onceptualis\2'),
         (r'([Cc])onceptualiz(.?)',r'\1onceptualis\2'),
         (r'([Cc])ontextualize(.?)',r'\1ontextualis\2'),
         (r'([Cc])ontextualiz(.?)',r'\1ontextualis\2'),
         (r'([Cc])oz(.?)',r'\1os\2'),
         (r'([Cc])oz(.?)',r'\1os\2'),
         (r'([Cc])ouncilor(.?)',r'\1ouncillor\2'),
         (r'([Cc])ouncilor(.?)',r'\1ouncillor\2'),
Line 240: Line 264:
         (r'([Cc])udgeling',r'\1udgelling'),
         (r'([Cc])udgeling',r'\1udgelling'),
         (r'([Cc])ustomiz(.?)',r'\1ustomis\2'),
         (r'([Cc])ustomiz(.?)',r'\1ustomis\2'),
         (r'([Cc])ipher(.?)',r'\1ypher\2'),
         (r'( +)([Cc])ipher(.?)',r'\1\2ypher\3'),
         #DDDD#
         #DDDD#
         (r'([Dd])ecentraliz(.?)',r'\1ecentralis\2'),
         (r'([Dd])ecentraliz(.?)',r'\1ecentralis\2'),
Line 268: Line 292:
         (r'(.?)([Ff])avor(.?)',r'\1\2avour\3'),
         (r'(.?)([Ff])avor(.?)',r'\1\2avour\3'),
         (r'([D,d])isheveled',r'\1ishevelled'),
         (r'([D,d])isheveled',r'\1ishevelled'),
         (r'(.?)([Hh])onor(.?)',r'\1\2\onour\3'),
         (r'(.?)honor(.?)',r'\1honour\2'), #making this recognize only lower-case h because of Honor Blackman and Honore
         (r'(.?)([Oo])rganization(.?)',r'\1\2rganisation\3'),
         (r'(.?)([Oo])rganization(.?)',r'\1\2rganisation\3'),
         (r'([Dd])istil( +)',r'\1istill\2'),
         (r'([Dd])istil( +)',r'\1istill\2'),
         (r'([Dd])istils',r'\1istills'),
         (r'([Dd])istils',r'\1istills'),
         (r'([Dd])ramatiz(.?)',r'\1ramatis\2'),
         (r'([Dd])ramatiz(.?)',r'\1ramatis\2'),
         (r'([Dd])rafts(.?)',r'\1raughts\2'),
         #(r'([Dd])rafts(.+)',r'\1raughts\2'), will need to do something else for draughtman, people
         (r'([Dd])rafty',r'\1raughty'),
         (r'([Dd])rafty',r'\1raughty'),
         (r'([Dd])rafti(.?)',r'\1raughti\2'),
         (r'([Dd])rafti(.?)',r'\1raughti\2'),
Line 285: Line 309:
         (r'([Ee])ditorializ(.?)',r'\1ditorialis\2'),
         (r'([Ee])ditorializ(.?)',r'\1ditorialis\2'),
         (r'([Ee])mpathiz(.?)',r'\1mpathis\2'),
         (r'([Ee])mpathiz(.?)',r'\1mpathis\2'),
         (r'(.?)([Ee])mphasiz(.?)',r'\1\2mphasis\2'),
         (r'(.?)([Ee])mphasiz(.?)',r'\1\2mphasis\3'),
         (r'([Ee])nameled',r'\1namelled'),
         (r'([Ee])nameled',r'\1namelled'),
         (r'([Ee])nameling',r'\1namelling'),
         (r'([Ee])nameling',r'\1namelling'),
         (r'([Ee])namor(.?)',r'\1namour\2'),
         (r'([Ee])namor(.?)',r'\1namour\2'),
         (r'([Ee])ncyclopedi(.?)',r'\1ncyclopaedi\2'),
         # (r'([Ee])ncyclopedi(.?)',r'\1ncyclopaedi\2'),
         (r'([Ee])ndeavor(.?)',r'\1ndeavour\2'),
         (r'([Ee])ndeavor(.?)',r'\1ndeavour\2'),
         (r'(.?)([Ee])nergiz(.?)',r'\1\2nergis\3'),
         (r'(.?)([Ee])nergiz(.?)',r'\1\2nergis\3'),
         (r'([Ee])nroll(.?)',r'\1nrol\2'),
         (r'([Ee])nroll(.?)',r'\1nrol\2'),
         (r'([Ee])nthrall(.?)',r'\1nthral\2'),
(r'([Ee])nrol(ed|ing)',r'\1nroll\2'), #tense exceptions -SOTO
         (r'([Ee])nthrall( +)',r'\1nthral\2'), #only enthrall is one l
         (r'([Ee])paulet( +)',r'\1paulette\2'),
         (r'([Ee])paulet( +)',r'\1paulette\2'),
         (r'([Ee])paulets',r'\1paulettes'),
         (r'([Ee])paulets',r'\1paulettes'),
Line 318: Line 343:
         (r'([Ff])ilet(.?)',r'\1illet\2'),
         (r'([Ff])ilet(.?)',r'\1illet\2'),
         (r'([Ff])inaliz(.?)',r'\1inalis\2'),
         (r'([Ff])inaliz(.?)',r'\1inalis\2'),
         (r'(.?)([Ff])lavor(.?)',r'\1\2\lavour\3'),
         (r'(.?)([Ff])lavor(.?)',r'\1\2lavour\3'),
         (r'([Ff])etal',r'\1oetal'),
         (r'([Ff])etal',r'\1oetal'),
         (r'([Ff])etus(.?)',r'\1oetus\2'),
         (r'([Ff])etus(.?)',r'\1oetus\2'),
Line 338: Line 363:
         (r'([Gg])lamor( +)',r'\1lamour\2'),
         (r'([Gg])lamor( +)',r'\1lamour\2'),
         (r'([Gg])lobaliz(.?)',r'\1lobalis\2'),
         (r'([Gg])lobaliz(.?)',r'\1lobalis\2'),
         (r'([Gg])luing',r'\1ueing'),
         (r'([Gg])luing',r'\1lueing'),
         (r'([Gg])oiter(.?)',r'\1oitre\2'),
         (r'([Gg])oiter(.?)',r'\1oitre\2'),
         (r'([Gg])onorrhea',r'\1onorrhoea'),
         (r'([Gg])onorrhea',r'\1onorrhoea'),
         (r'([Gg])raveled',r'\1ravelled'),
         (r'([Gg])raveled',r'\1ravelled'),
         (r'([Gg])ray( +)',r'\1rey\2'),
         (r'gray( +)',r'grey\1'), #probably shouldn't include cap G#
         (r'([Gg])ray(.?)',r'\1rey\2'),
         (r'gray(.?)',r'grey\1'),
         (r'([Gg])roveled',r'\1rovelled'),
         (r'([Gg])roveled',r'\1rovelled'),
         (r'([Gg])roveling',r'\1rovelling'),
         (r'([Gg])roveling',r'\1rovelling'),
Line 372: Line 397:
         (r'([Ii])ndividualiz(.?)',r'\1ndividualis\2'),
         (r'([Ii])ndividualiz(.?)',r'\1ndividualis\2'),
         (r'([Ii])ndustrializ(.?)',r'\1ndustrialis\2'),
         (r'([Ii])ndustrializ(.?)',r'\1ndustrialis\2'),
         (r'([Ii])nstill(.?)',r'\1nstil\2'),
         (r'([Ii])nstill( +)',r'\1nstil\2'),
         (r'([Ii])nitialed',r'\1nitialled'),
         (r'([Ii])nitialed',r'\1nitialled'),
         (r'([Ii])nitialing',r'\1nitialling'),
         (r'([Ii])nitialing',r'\1nitialling'),
Line 405: Line 430:
         (r'([Ll])ocaliz(.?)',r'\1ocalis\2'),
         (r'([Ll])ocaliz(.?)',r'\1ocalis\2'),
         (r'([Ll])ouver(.?)',r'\1ouvre\2'),
         (r'([Ll])ouver(.?)',r'\1ouvre\2'),
         (r'([Ll])uster',r'\1ustre'),
         (r'( +)([Ll])uster',r'\1\2ustre'),
         #MMMM#
         #MMMM#
         (r'(.?)([Mm])agnetiz(.?)',r'\1\2agnetis\3'),
         (r'(.?)([Mm])agnetiz(.?)',r'\1\2agnetis\3'),
         (r'(.?)([Mm])aneuver(.?)',r'\1\2anoeuvr\3'),
         (r'(.?)([Mm])aneuver(.?)',r'\1\2anoeuvre\3'),
(r'(.?)([Mm])anoeuvreed(.?)',r'\1\2anoeuvred\3'), #catching exception -SOTO#
         (r'([Mm])arginiliz(.?)',r'\1arginilis\2'),
         (r'([Mm])arginiliz(.?)',r'\1arginilis\2'),
         (r'([Mm])arshaled',r'\1arshalled'),
         (r'([Mm])arshaled',r'\1arshalled'),
Line 434: Line 460:
         (r'([Mm])onopoliz(.?)',r'\1onopolis\2'),
         (r'([Mm])onopoliz(.?)',r'\1onopolis\2'),
         (r'(.?)([Mm])old(.?)',r'\1\2ould\3'),
         (r'(.?)([Mm])old(.?)',r'\1\2ould\3'),
         (r'([Mm])olt(.?)',r'\1oult\2'),
        (r'([Mm])olted',r'\1oulted'),
        (r'([Mm])olting',r'\1oulting'),
         (r'([Mm])olt( +)',r'\1oult\2'),
         (r'([Mm])ustache(.?)',r'\1oustache\2'),
         (r'([Mm])ustache(.?)',r'\1oustache\2'),
         #NNNN#
         #NNNN#
Line 490: Line 518:
         (r'([Pp])racticing',r'\1ractising'),
         (r'([Pp])racticing',r'\1ractising'),
         (r'([Pp])raesidium(.?)',r'\1residium\2'),
         (r'([Pp])raesidium(.?)',r'\1residium\2'),
         (r'(.?)([Pp])ressuriz(.?)',r'\1ressuris\1'),
         (r'(.?)([Pp])ressuriz(.?)',r'\1\2ressuris\3'),
         (r'([Pp])retens(.?)',r'\1retenc\2'),
         (r'([Pp])retens(.?)',r'\1retenc\2'),
         (r'([Pp])rimaeval',r'\1rimeval'), #Correcting in favour of American spelling#
         (r'([Pp])rimaeval',r'\1rimeval'), #Correcting in favour of American spelling#
Line 518: Line 546:
         (r'(.?)([Rr])ecogniz(.?)',r'\1\2ecognis\3'),
         (r'(.?)([Rr])ecogniz(.?)',r'\1\2ecognis\3'),
         (r'([Rr])econnoiter(.?)',r'\1econnoitre\2'),
         (r'([Rr])econnoiter(.?)',r'\1econnoitre\2'),
         (r'([Rr])efueled','\1efuelled'),
         (r'([Rr])efueled',r'\1efuelled'),
         (r'([Rr])efueling','\1efuelling'),
         (r'([Rr])efueling',r'\1efuelling'),
         (r'(.?)([Rr])egulariz(.?)',r'\1\2\egularis\3'),
         (r'(.?)([Rr])egulariz(.?)',r'\1\2\egularis\3'),
         (r'([Rr])evele(.?)',r'\1evelle\2'),
         (r'([Rr])evele(.?)',r'\1evelle\2'),
         (r'([Rr])eveling',r'\1evelling'),
         (r'([Rr])eveling',r'\1evelling'),
         (r'(.?)([Vv])italiz(.?)',r'\1\2vitalis\3'),
         (r'(.?)([Vv])italiz(.?)',r'\1\2italis\3'),
         (r'([Rr])evolutioniz(.?)',r'\1evolutionis\2'),
         (r'([Rr])evolutioniz(.?)',r'\1evolutionis\2'),
         (r'([Rr])hapodiz(.?)',r'\1hapodis\2'),
         (r'([Rr])hapodiz(.?)',r'\1hapodis\2'),
         (r'([Rr])igor(.?)',r'\1igour\2'),
         (r'( +)([Rr])igor( +)',r'\1\2igour\3'),
         (r'([Rr])itualiz(.?)',r'\1itualis\2'),
         (r'([Rr])itualiz(.?)',r'\1itualis\2'),
         (r'(.?)([Rr])ivaled',r'\1\2ivalled'),
         (r'(.?)([Rr])ivaled',r'\1\2ivalled'),
Line 533: Line 561:
         (r'([Rr])umor(.?)',r'\1umour\2'),
         (r'([Rr])umor(.?)',r'\1umour\2'),
         #SSSS#
         #SSSS#
         (r'([Ss])aber(.?)',r'\1sabre\2'),
         (r'([Ss])aber(.?)',r'\1abre\2'),
         (r'([Ss])altpeter',r'\1altpetre'),
         (r'([Ss])altpeter',r'\1altpetre'),
         (r'(.?)([Ss])anitiz(.?)',r'\1\2anitis\3'),
         (r'(.?)([Ss])anitiz(.?)',r'\1\2anitis\3'),
         (r'([Ss])atiriz(.?)',r'\1atiris\2'),
         (r'([Ss])atiriz(.?)',r'\1atiris\2'),
         (r'([Ss])avior(.?)',r'\1aviour\2'),
         (r'([Ss])avior(.?)',r'\1aviour\2'),
         (r'(.?)([Ss])avor(.?)',r'\1\2avour\3'),
         (r'(.?)savor(.?)',r'\1savour\2'), #recognizes only lower-case s, because of Gerald Savory
         (r'([Ss])candaliz(.?)',r'\1candalis\2'),
         (r'([Ss])candaliz(.?)',r'\1candalis\2'),
         (r'([Ss])keptic(.?)',r'\1ceptic\2'),
         (r'([Ss])keptic(.?)',r'\1ceptic\2'),
         (r'([Ss])cepter(.?)',r'\1sceptre\2'),
         (r'([Ss])cepter(.?)',r'\1ceptre\2'),
         (r'([Ss])crutiniz(.?)',r'\1crutinis\2'),
         (r'([Ss])crutiniz(.?)',r'\1crutinis\2'),
         (r'([Ss])eculariz(.?)',r'\1ecularis\2'),
         (r'([Ss])eculariz(.?)',r'\1ecularis\2'),
Line 566: Line 594:
         (r'([Ss])omber',r'\1ombre'),
         (r'([Ss])omber',r'\1ombre'),
         (r'([Ss])pecializ(.?)',r'\1pecialis\2'),
         (r'([Ss])pecializ(.?)',r'\1pecialis\2'),
         (r'([Ss])pecter(.?)',r'\1pectre\2'),
         (r'( +)([Ss])pecter(.?)',r'\1\2pectre\3'),
         (r'([Ss])piraled',r'\1piralled'),
         (r'([Ss])piraled',r'\1piralled'),
         (r'([Ss])piraling',r'\1piraling'),
         (r'([Ss])piraling',r'\1piraling'),
Line 587: Line 615:
         (r'([Ss])wiveling',r'\1wiveling'),
         (r'([Ss])wiveling',r'\1wiveling'),
         (r'([Ss])ymboliz(.?)',r'\1ymbolis\2'),
         (r'([Ss])ymboliz(.?)',r'\1ymbolis\2'),
         (r'([Ss])ympathiz(.?)',r'\1ympathasis\2'),
         (r'([Ss])ympathiz(.?)',r'\1ympathis\2'),
         (r'(.?)([Ss])ynchroniz(.?)',r'\1\2ynchronis\3'),
         (r'(.?)([Ss])ynchroniz(.?)',r'\1\2ynchronis\3'),
         (r'(.?)([Ss])ynthesiz(.?)',r'\1\2ynthesis\3'),
         (r'(.?)([Ss])ynthesiz(.?)',r'\1\2ynthesis\3'),
Line 601: Line 629:
         (r'([Tt])oxemia',r'\1oxaemia'),
         (r'([Tt])oxemia',r'\1oxaemia'),
         (r'([Tt])ranquiliz(.?)',r'\1ranquillis\2'),
         (r'([Tt])ranquiliz(.?)',r'\1ranquillis\2'),
         (r'([Tt])ranquilis(.?)',r'\1tranquillis\2'),
         (r'([Tt])ranquilis(.?)',r'\1ranquillis\2'),
         (r'([Tt])ranquilliz(.?)',r'\1ranquillis\2'), #correcting archaic BrEng form to modern BrEng#
         (r'([Tt])ranquilliz(.?)',r'\1ranquillis\2'), #correcting archaic BrEng form to modern BrEng#
         (r'([Tt])ranquillity ([Bb])ase',r'Tranquility Base'), #correcting to IAU standard#
         (r'([Tt])ranquillity ([Bb])ase',r'Tranquility Base'), #correcting to IAU standard#
Line 615: Line 643:
         (r'([Tt])yraniz(.?)',r'\1yranis\2'),
         (r'([Tt])yraniz(.?)',r'\1yranis\2'),
         #UUUU#
         #UUUU#
        (r'(.?)([Uu])tiliz(.?)',r'\1utilis\2'),
         (r'([Uu])nioniz(.?)',r'\1nionis\2'),
         (r'([Uu])nioniz(.?)',r'\1nionis\2'),
         (r'([Uu])ntrameled',r'\1ntramelled'),
         (r'([Uu])ntrameled',r'\1ntramelled'),
Line 622: Line 649:
         #VVVV#
         #VVVV#
         (r'([Vv])alor',r'\1alour'),
         (r'([Vv])alor',r'\1alour'),
         (r'([Vv])andaliz(.?)',r'\1andalis'),
         (r'([Vv])andaliz(.?)',r'\1andalis\2'),
         (r'(.?)([Vv])aporiz(.?)',r'\1\2apouris\3'),
         (r'(.?)([Vv])aporiz(.?)',r'\1\2aporis\3'),
         (r'([Vv])apor( +)',r'\1apour\2'),
         (r'([Vv])apor( +)',r'\1apour\2'),
         (r'([Vv])apors',r'\1apours'),
         (r'([Vv])apors',r'\1apours'),
Line 629: Line 656:
         (r'(.?)erbaliz(.?)',r'\1erbalis\2'),
         (r'(.?)erbaliz(.?)',r'\1erbalis\2'),
         (r'([Vv])ictimiz(.?)',r'\1ictimis\2'),
         (r'([Vv])ictimiz(.?)',r'\1ictimis\2'),
         (r'([Vv])igor',r'\1igour'),
         (r'([Vv])igor( +)',r'\1igour\2'),
         (r'([Vv])isualiz(.?)',r'\1isualis\2'),
         (r'([Vv])isualiz(.?)',r'\1isualis\2'),
         (r'([Vv])ocaliz(.?)',r'\1ocalis\2'),
         (r'([Vv])ocaliz(.?)',r'\1ocalis\2'),
Line 639: Line 666:
         (r'([Ww])esterniz(.?)',r'\1esternis\2'),
         (r'([Ww])esterniz(.?)',r'\1esternis\2'),
         (r'([Ww])omaniz(.?)',r'\1omanis\2'),
         (r'([Ww])omaniz(.?)',r'\1omanis\2'),
         (r'([Ww])oolen(.?)',r'\1ollen\2'),
         (r'([Ww])oolen(.?)',r'\1oollen\2'),
         (r'([Ww])oolies',r'\1oollies'),
         (r'([Ww])oolies',r'\1oollies'),
         (r'([Ww])ooly',r'\1oolly'),
         (r'([Ww])ooly',r'\1oolly'),
Line 658: Line 685:
             'link',
             'link',
             'comment',
             'comment',
             ]
            'center',
            'color',
            'captiontextcolor',
            'gallery',
            'syntaxhighlight'
             ],
         'category': [
         'category': [
             'spelling',
             'spelling',
             ]
             ],
        'inside': [
            'Similarities in Proto-Cultural Artifacts',
            'Honor_Blackman',
            'Savory',
            'tachometer',
            'mileometer',
            'spectrometer',
            'diameter',
            'diameters',
            'pentameter',
            'pentameters',
            'chronometer',
            'chronometers',
            'geometer',
            'geometers',
            'rateometer',
            'rateometers',
            'Rateometer',
            'Rateometers',
            'interferometer',
            'EMF meter',
            'parameter',
            'altimeter',
            'altimeters',
            'parameters',
            'Graystark',
            'perimeter',
            'pretension',
            'Good Neighbors',
            'stingray',
            'stingrays',
            'anagram',
            'Anagram',
            'anagrams',
            'Anagrams',
            'Previsualization}}',
            'behemoth',
            'Behemoth',
            'behemoths',
            'Yourfavoritemartian',
            'Music-a-grams',
            'anagrams',
            'Anagrams',
            'hologram',
            'Hologram',
            'Holograms',
            'holograms',
            'electrocardiogram',
            'electrocardiograms',
            'pentagram',
            'pentagrams',
            'telegram',
            'telegrams',
            'transgram',
            'transgrams',
            'Transgram',
            'Transgrams',
            'diagram',
            'diagrams',
            'Diagram',
            'Diagrams',
            'engram',
            'engrams',
            'Grigory',
            'Unauthorized Guide',
            'Honor Blackman', #this isn't being excpted and i don't know why#
            'Medal of Honor',
            'Arborge Quince',
            'program', #need a forum discussion here#
            'programs',
            'reprogram',
            'deprogram',
            'pictogram',
            'pictograms',
            'Pictogram',
            'Pictograms',
            'phonogram',
            'phonograms',
            'background-color',
            'color:',
            'color :',
            'border-color',
            'text-align: center;',
            'text-align:center;',
            'align=center',
            'align = center',
            'align= center',
            'align =center',
            'position=center',
            '</ center>',
'{{color', #SOTO
'{{Color', #SOTO
            'Encyclopedia of Fantastic',
            'themonster',
            'arboreal',
            'Moldova',
            'Fun at the Funeral Parlor',
            'humorous',
            'Humorous',
            'limiter',
            'appalling',
            'appalled',
            'Splendorosa',
            'Demeter',
            'cemetery',
            'Cemetery',
            'Gerald Savory',
            'Savory',
            'Johnson Space Center',
            'Kennedy Space Center',
            'Center',
            'Catalog',
            'Chilitern',
            'chemotherapy',
            'Chemothreapy',
            'Colorado',
            'previsualization',
            'Scarborough',
            'Akoshemon',
            'Plowman',
            'torpedo',
            'torpedos',
            'Torpedo',
            'Torpedos',
            'stingray',
            'stingrays',
            'Stingray',
            'Stingrays',
            'Beccy Armory',
            'Honore', #None of these attempts to
            'Honoré', #except Honoré Lechasseur works
            'Honoré', #The bot has been made to not correct
            'Honore', #for capital-H Honor
            'lightsaber',
            'Polygram',
            'Majestic Theater',
            'Taplow',
            'Fyodor',
            'Target Practice',
            'target practice',
            'Synthesizing Starfields', #doesn't appear to work
            'Pearl Harbor',
            'Mercury Theater',
            'Event Synthesizer',
            'bgcolor',
            'blasphemous', #dunno why this is being triggered as blasphaemous
            'grams operator', #not sure this is a real word, but it appears on DMP
'Parallelogram', #Work by SOTO from here on out...#
'parallelogram',
'Color Assists',
'Colorist: ',
'colorsport.co.uk',
'The Armored Creature of 004X',
'-an-unauthorized-guide-to',
'-the-unauthorized-guide-to',
'Department of Defense',
'instagram',
'Instagram',
'Cozens',
'thecozens',
'Dougray',
'Plowman',
'plowmanal',
' smiter',
' Smiter',
            ],
         }
         }
     }
     }
</pre>
</syntaxhighlight>

Latest revision as of 20:24, 21 April 2024

LockedTab.png
This is the master list of everything for which SpellBot corrects, along with detailed notes about the rationale for some of the coding decisions and explanations of some of SpellBot's key limitations. The list itself is written in regex, so if you're unfamiliar with that language, it may take you a moment to get used to the symbology.

Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have bot enforcement of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled into a bot routine known as a "user-fix", so that all users may see what exactly the bot is checking for.

Problem words

Words impossible for a bot

Some words are beyond the capability of the bot, because they are valid spellings (even if of different words) in British English. This list includes:

  • Check. Americans use this word to mean not only the verb to inquire after or to investigate, but also the noun, which is a financial instrument. Because BrEng spells the verb that way, too, the bot can't be programmed to correct the other usage. We'd end up with sentences like:
The Doctor chequed on Sarah Jane in her hospital room before going to the pathology lab.
  • Tire. Both sides of the Atlantic use tire as a verb. It's again the noun that's problematic. Americans view tire as the correct spelling for what the British would call a tyre. The bot can't figure this one out, so it doesn't even try.
  • Draft. Americans use this spelling for all senses, the British use both for different senses. All words beginning with drafts will be converted to draughts-, and the word drafty will be converted to draughty, but the word draft itself won't be touched by the bot, as that is a valid British spelling of the word. Clear as mud? Cool. Onwards, then . . .
  • Disc. Way, way, way too screwed up a word for a simple bot to handle. Disc jockey is fine on both sides of the divide, but so is floppy disk and hard disk. This one simply depends on context.
  • Practise. The -ise version of this word is the correct spelling for the verb in BrEng; the British noun ends in -ice. Americans use -ice for everything. Thus, the bot can't be of much use, except for participles derived from the verb. So, the bot will make no attempt to change the spelling of practice, but it will change practicing and practiced to practising and practised.
  • Jail. Yes, gaol is still correct, especially historically, but jail has largely supplanted it. So we won't look to correct jail to gaol, but neither will we try to correct gaol to jail. Because of the known presence of both spellings in DWU fiction, this one can't be decided by forum debate. We just have to live with gaol occasionally popping up.
  • Licence. License is correct in BrEng as a noun, so the bot can't correct for Americans using license as a verb (and other parts of speech deriving from the verb form). Other -ence words don't necessarily work this way. Offence and defence are unambiguously correct — but then again their verb form is different — offend and defend — which means their gerunds and verbal nouns are different, too.
  • Storey is the proper British spelling for a floor in a building. Americans just spell this story, as in a 15-story building. Obviously, the bot can't make this correction, because the word for a tale is spelled story on both sides of the Atlantic.
  • Chimaera/chimera. Standard British spelling in the Cambridge Online Dictionary is chimera, but there are other sources which state that the British prefer chimaera. Chimaera, however, is the universal spelling for a certain kind of fish, and the mountain from which the legend of the chimera got its name. The bot's going to steer well clear of all this. It's probably best to spell however the particular source spells this word.

Words requiring forum decision

Other words are possible for a bot to correct, but the presence of two valid British spellings means that we will require a forum discussion. Precedent for such forum debates over particular words can be found in the following threads:

  • Judgment. There's no agreement on either side of the Atlantic whether this word should be judgment or judgement. Oddly, most British spell-checkers will red-flag judgment, even though that's the official spelling in Commonwealth courts. We have at least one story title preferencing the version with two es — Judgement of the Judoon. But still, this word will require a special forum discussion to decide which way we want to spell it.
  • Connexion. This British spelling of connection is not universally used in Britain. Connection is correct in Britain, too, so the bot won't try to force connection into a connexion-shaped hole.
  • Simidgen, smidgeon, smidgin. They're all valid spellings for the same word, on both sides of the Atlantic. The only way the bot could be useful is if we had a forum discussion to settle on one of the three spellings.
  • Yogurt passes most British spell-checkers today, but so does yoghurt. We'll leave both well alone until a forum discussion decides the matter.
  • Almanac is universally the way it's spelt in American English, and increasingly the way Britons spell it, too. Still, some old-timers will go for almanack. Until a forum discussion decides otherwise, the bot won't enforce either spelling.
  • Gasses/gases. Both spellings are correct on both sides of the Atlantic. The bot won't correct for either until a forum discussion settles on a particular spelling.
  • Programme/program. Both spellings pass British spell-checkers, even though there's a perception that "programme" is Britsh and "program" is American. Until there's a community decision on spelling, the bot won't touch either spelling.
  • Griffin/Gryphon. Both pass British spell-checkers, so it'll take a forum discussion to decide which way we want to go.
  • Inflexion is the way the British have historically spelt inflection, but modern British spell-checkers pass both spellings. So, the bot won't enforce either without a forum decision to the contrary.
  • Instal/install. Modern British spell-checkers are cool with both, so the bot is, too. But the proper British spelling is instalment, not installment.
  • Mediaeval. Ironically this spelling is now considered archaic and is most often seen in British academic writing. Medieval, the American spelling, passes British spell checks, too. So, in the absence of a forum decision to the contrary, the bot won't try to correct either spelling.
  • Praesidium/presidium/presidiums/presidia is the archaic British spelling of presidium. It no longer passes default settings on British spell-checkers, so the bot will correct to presidium, which is also the American spelling. Note that the plural of the word is more confused. Both presidiums and presidia pass British spell-checkers. So the case here is complicated. The singular form of the verb will be corrected to presidium. The plural will be corrected from praesidiums to presidiums. But presidia will go uncorrected, unless a forum discussion decides on one or the other plural form.
  • Pizzaz/pizzazz. Both spellings pass modern British spell checkers, although historically the three-z version was British and the four-z version was American. A forum discussion will be required before the bot corrects for either.
  • Siphon/syphon. Both pass most modern British spell-checkers (and American spell-checkers, for that matter).
  • Ton/tonne. Both pass British spell-checkers, so the bot won't touch either until forum decision to the contrary.
  • Tranquility/tranquillity. Both pass British spell-checkers, though it's unclear whether this is because the simple noun is actively spelled both ways in Britain, or if it's because of Tranquility Base (see below).
  • Notwithstanding the general fact that the bot will not correct for either chimera or chimaera (see above), there probably should be a forum discussion on the proper spelling of Space Station Chimera, since we're technically inventing that spelling. The novelisation doesn't actually offer us a spelling, so it could equally be Space Station Chimaera.

Valid British spellings actively corrected

Thanks to the ubiquity of American spellings in pop culture, there are a few cases where valid if archaic British spellings can be corrected to standard American spellings without the need for a forum decision. In such cases, modern British usage hews closely to the American, and clearly argues against more archaic forms.

  • Primaeval. Due to the presence of the modern television ITV/BBCA television series with which many Doctor Who fans will be familiar, as well as AUDIO: Primeval, the unambiguously American spelling "wins" the contest. Primaeval will be actively corrected to primeval. This shouldn't ruffle too many feathers, since modern British spell-checkers fail primaeval.
  • Tranquillize/tranquillise/tranquilise/tranquilize. There are four different ways to spell this one damned word (and all words deriving from it). Ridiculous. The -ll versions are both okay in BrEng; the -l versions are both okay in AmEng. However, only one spelling passes modern British spell-checkers. Therefore tranquillise shall be deemed correct, and the bot will correct the other three spellings.
  • Tranquility Base. The bot will actively correct Tranquillity Base to the IAU-standard Tranquility Base.

Confusing words

Some words are homonyms, and look like they're archaic forms of other words, but in fact are totally different words. Other words behave one way as a root, but are spelt differently once suffixes are added. The bot will therefore correct in an unexpected way, which is why those ways are explained here.

  • Philtre is not the British spelling of filter, but of the American philter. It's a noun meaning "love potion", not a verb meaning "to remove impurities".
  • Pouffe is not an archaic British spelling of poof, but a current spelling for what Americans would call a pouf — that is, a nice, thick cushion you can sit on. Thus, the bot will correct pouf to pouffe. Not that we actually expect pouf to ever be used in a sentence on this wiki — except on this very page, which the bot doesn't patrol.
  • Groyne is a British spelling of groin only when it's not. It doesn't refer to that region of the human anatomy between the legs. That's groin on both sides of the Atlantic. Groyne is, instead, a civil engineering term, referring to a construction that controls erosion. Technically, it's known as a groin in the US, but almost no one calls it that. It's more commonly known as a breakwater, bulwark or seawall. Frankly, groynes are so often called by these more specific names, even in Britain, that groyne doesn't pass modern British spell-checkers. It's probably never been used at any time in the DWU, but still it's correct to say something like: "All seawalls are groynes, but not all groynes are seawalls." The bot won't correct away from it, but because groin is BrEng correct in its anatomical sense, it won't correct groin to groyne, either.
  • Vapour/vaporise. Yeah, this one's a beauty. As a plain noun, Americans always spell this one vapor, while Britons always go for vapour. That's pretty much the definitional British/American spelling difference. What's weird is what happens when you turn the noun into a verb by adding -ize, or, as the British would have it, -ise. Suddenly the cosmetic u is gone. So, the bot will correct vapor to vapour, but vaporize only to vaporise. Vapourise is incorrect on both sides of the Atlantic.
  • Odour/Deodorise. This pair works the same way as vapour/vaporise. Deodourise is wrong; deodorise is right.

How to read the code

The code works by telling the bot to look for the word described before the comma. Then it replaces it with the word after the comma. A most basic expression would be:

{u'color',u'colour')

This looks for the American "color", then replaces it with the British "colour".

Because typing every permutation of a word, including all words that share the same root and capitalised variants, would be very time-consuming, most of the code won't work in such a simplistic way. Most of it uses a "regular expression" — or regex — to find a lot of hits with just one line. Here's an explanation of the regex used in this code:

  • The expression ([Cc]) means "look for either capitalised or lowercase versions of the letter C
  • (.?) means, "You, Mr. Fancy Computer bot thing, might find some more letters to the right of this point. Grab 'em all up to the next space only."
  • /1 means, "take whatever is in the first parentheses and put it here"
  • /2 means, "take whatever is in the second parentheses and put it here"
  • /3 means, "take whatever is in the third parentheses and put it here"

Thus, if we have the expression,

(r'([Cc])apitaliz(.?)', r'\1capitalis\2')

It means, roughly,

Look for all words, beginning with either a capital or lowercase C, which are followed by the letters "apitaliz" + any other letters you find until the next space. Then, keep the form of the letter c that you find, stick on "apitalis", and add back in any letters you orginally found after the "z".

In other words, find, Capitaliz-, keep the C capitalised, switch the z to an s, then stick on "-e', "-ing", "-ed", or "-ation", as appropriate.

Correcting all related words at once

Now let's take a look at arguably the most complicated coding here. What if I wanted to change every word that had favor as a root? How could I take care of words that had both a prefix and a suffix, like disfavorable? Putting together everything we've learned so far, it would be:

(r'(.?)([Ff])avor(.?)',r'\1\2avour\3')

The leading (.?) means check to see if there's a prefix. The ([Ff]) switch checks for capitalisation of the root letter f. The (.?) at the end checks for suffixes. Now we have three parentheses instead of just two. So \1 means the prefix, \2 puts the letter f in with proper capitalisation, and \3 adds any suffixes.

This one statement will therefore switch over: favor, favors, favored, disfavor, disfavored, unfavorable, favoring, disfavoring, favorable, and almost certainly a few more.

When correcting to British leaves an American spelling around

A few words — mostly those which have -log in them — retain the Amercian spelling even after changing to the British. For instance:

AmEng: dialogBrEng: dialogue, but dialogue still contains dialog

This means that the next time the bot is run, it will find dialog again and attempt to replace it. After several passes, you'll end up with something like dialogueueueueue, which is obviously not desirable. Thus, we must find a way to limit the search to only the case of dialog+space, dialog+puncutation mark. Here's how we do it:

r'dialog(\.|\;|\:| |\!|\,|\?+)'

The pipes (|) act as a switch. They say, look for this character | that character | or the other character. The plus sign at the very end says "at least one time". And the back slashes (\) escape the punctuation marks from their usual special meanings.

Altogether then, what this statement says is, "Look for the word dialog followed by either a period, a semi-colon, a colon, an exclamation mark, a space or a comma that's present at least once." It will therefore find only:

  • the speed of his dialog was rapid
  • how could he have forgotten his dialog?
  • dialog: the bane of the actor
  • he had far too many lines of dialog!
  • They had a fruitful dialog; however, the humans would soon kill the Silurians.

Cases where regex fails

Not every word on our list has been switched using regex expressions. Sometimes it's easier just to type up a switch of literal characters, as when a word serves as the root of no other words.

The list

The following code is what's at the heart of our automated BrEng spelling enforcement. It was tested throughout 2011 and eventually completed its first run through the main namespace on 31 October 2011, with a secondary confirming run on 1 November 2011.

The exceptions bit at the very end is particularly important to its function on Tardis. The list of exceptions is not currently organised in any way, but the bot really wouldn't work properly without exceptions.

The most common British spelling that Spellbot 3.0 will not correct is "Honor" with a capital H. It's fine with lower case honor, but it's not yet been determined how to except for a name with a diacritic in it. Thus Honoré Lechasseur can't currently be excepted unless the bot simply ignores "Honor". This is doubly useful since, for reasons equally unclear, the exception for "Honor Blackman" is not being, well, honoured. Since honor does not typically begin a sentence, this compromise is believed acceptable for the time being.

This is the state of the code as it existed during the SpellBot run by SV7 in the week of 12 March 2023.

#SBOT version 3.0
#Enforces BrEng spelling, with exceptions
#relevant to the Doctor Who universe 
#and usage found on tardis.wiki
#released under CC-BY-SA 3.0 license 
#by User:CzechOut
#Originally published: 1 November 2011 (CzechOut)
#Current version: 17 March 2023 (SOTO)

fixes['spelling'] = {
    'regex': True,
    'recursive': True,
    'msg': {
        'en':u'Enforcing [[T:SPELL]]'
        },
    'replacements': [
        #AAAA#
        (r'([Aa])ccessoriz(.?)', r'\1ccessoris\2'),
        (r'([Aa])cclimitiz(.?)',r'\1cclimatis\2'),
        (r'([Aa])ccouterments',r'\1ccoutrements'),
        (r'( +)eon( +)',r'\1aeon\2'),
        (r'( +)eons( +)',r'\1aeons\2'),
        (r'([Aa])erogram( +)',r'\1erogramme\2'),
        (r'([Aa])erograms',r'\1erogrammes'),
        (r'( +)esthete(.?)( +)',r'\1aesthete\2\3'),
        (r'( +)esthetic(.?)( +)',r'\1aesthetic\2\3'),
        (u'( +)etiology',u'\1aetiology'),
        (r'( +)aging',r'\1ageing'),
        (r'([Dd])e(.?)aging',r'\1e\2ageing'),
        (r'([Aa])ggrandizement',r'\1ggrandisement'),
        (r'([Aa])goniz(.?)', r'\1gonis\2'),
        (r'([Aa])luminum', r'\1luminium'),
        (r'([Aa])mortize( +)',r'\1mortise\2'),
        (r'([Aa])mortiz(.?)',r'\1mortis\2'),
        (r'(.?)([Tt])heater(.?)',r'\1\2heatre\3'),
        (r'([Aa])nemi(.?)',r'\1naemi\2'),
        (r'([Aa])nesthesia',r'\1naesthesia'),
        (r'([Aa])nestheti(.?)',r'\1naestheti\2'),
		(r'([Aa])na?esthetiz(.?)',r'\1naesthetis\2'), #SOTO bug fix
        (r'([Aa])nalog( +)',r'\1nalogue\2'),
        (r'([Aa])nalogs',r'\1nalogues'),
        (r'(.?)([Aa])nalyze( +)',r'\1\2nalyse\3'),
        (r'(.?)([Aa])nalyz(.?)',r'\1\2nalys\3'),
        (r'([Aa])ngliciz(.?)',r'\1nglicis\2'),
        (r'([Aa])nnualized',r'\1nnualised'),
        (r'([Aa])ntagoniz(.?)',r'\1ntagonis\2'),
        (r'([Aa])pologiz(.?)',r'\1pologis\2'),
        (r'([Aa])ppall( +)',r'\1ppal\2'), 
        (r'([Aa])ppalls',r'\1ppals'),
        (r'([Aa])ppetiz(.?)',r'\1ppetis\2'),
        (r'([Aa])rbor(.?)',r'\1rbour\2'),
        (r'([Aa])rcheolog(.?)',r'\1rchaeolog\2'),
        (u'ardor',u'ardour'),
        (r'([Aa])rmor(.?)',r'\1rmour\2'),
        (r'([Aa])rtifact(.?)',r'\1rtefact\2'),
        (r'(.?)([Aa])uthoriz(.?)',r'\1\2uthoris\3'),
        (r'( +)([Aa])x( +)',r'\1\2xe\3'),
        #BBBB#
        (r'(.?)([Pp])edaled', r'\1\2edalled'),
        (r'(.?)([Pp])edaling', r'\1\2edalling'),
        (r'([Bb])aptiz(.?)',r'\1aptis\2'),
        (r'([Bb])astardiz(.?)',r'\1astardis\2'),
        (r'([Bb])attleax( +)',r'\1attlee\2'),
        (r'([Bb])alk(.?)',r'\1aulk\2'),
        (r'([Bb])edeviled',r'\1edevilled'),
        (r'([Bb])edevling',r'\1edevilling'),
        (r'(.?)([Bb])ehavior(.?)',r'\1\2ehaviour\3'),
        (r'([Bb])ehoove(.?)',r'\1ehove\2'),
        (r'([Bb])ejeweled',r'\1ejewelled'),
        (r'(.?)([Ll])abor( +)',r'\1\2abour\3'),
        (r'(.?)([Ll])abored',r'\1\2aboured'),
        (r'([Bb])eveled',r'\1evelled'),
        (r'([Bb])evies',r'\1evvies'),
        (r'([Bb])evy',r'\1evvy'),
        (r'([Bb])iased',r'\1iassed'),
        (r'([Bb])iasing',r'\1iassing'),
        (r'([Bb])inging',r'\1ingeing'),
        (r'([Bb])ougainvillea(.?)',r'\1ougainvillaea\2'),
        (r'([Bb])owdleriz(.?)',r'\1owdleris\2'),
        (r'([Bb])reathalyz(.?)',r'\1reathalys\2'),
        (r'([Bb])rutaliz(.?)',r'\1rutalis\2'),
        (r'(.?)([Bb])usses',r'\1\2uses'),
        (r'([Bb])ussing',r'\1using'),
        #CCCC#
        (r'([Cc])esarean(.?)',r'\1aesarean\2'),
        (r'([Cc])aliber(.?)',r'\1alibre\2'),
        (r'([Cc])aliper(.?)',r'\1alliper\2'),
        (r'([Cc])alisthenics',r'\1allisthenics'),
        (r'([Cc])analiz(.?)',r'\1analis\2'),
        (r'([Cc])ancelation',r'\1ancellation'),
        (r'([Cc])ancelations',r'\1ancellations'),
        (r'([Cc])anceled',r'\1ancelled'),
        (r'([Cc])anceling',r'\1ancelling'),
        (r'([Cc])andor',r'\1andour'),
        (r'([Cc])annibaliz(.?)',r'\1annibalis\2'),
        (r'([Cc])anibaliz(.?)',r'\1annibalisi\2'),
        (r'([Cc])anibalis(.?)',r'\1annibalis\2'),
        (r'([Cc])anoniz(.?)',r'\1anonis\2'),
        (r'([Cc])apitaliz(.?)',r'\1apitalis\2'),
        (r'([Cc])arameliz(.?)',r'\1aramelis\2'),
        (r'([Cc])arboniz(.?)',r'\1arbonis\2'),
        (r'([Cc])aroled',r'\1arolled'),
        (r'([Cc])aroling',r'\1arolling'),
        (r'([Cc])atalog( +)',r'\1atalogue\2'),
        (r'([Cc])atalogs( +)',r'\1atalogues\2'),
        (r'([Cc])ataloged',r'\1atalogued'),
        (r'([Cc])ataloging',r'\1ataloguing'),
        (r'([Cc])atalyz(.?)',r'\1atalys\2'),
        (r'([Cc])ategoriz(.?)',r'\1ategoris\2'),
        (r'([Cc])auteriz(.?)',r'\1auteris\2'),
        (r'([Cc])avilled',r'\1avilled'),
        (r'([Cc])aviling',r'\1avilling'),
        (r'(.?)([Gg])ram( +)',r'\1\2ramme\3'),
        (r'(.?)([Gg])rams',r'\1\2rammes'),
        (r'(.?)([Ll])iter( +)',r'\1\2itre\3'),
        (r'(.?)([Ll])iters',r'\1\2itres'),        
        (r'(.?)([Mm])eter(.?)',r'\1\2etre\3'),
        (r'([Cc])entraliz(.?)',r'\1entralis\2'),
        (r'(.?)([Cc])enter( +)',r'\1\2entre\3'),
        (r'([Cc])enters',r'\1entres'),
        (r'([Cc])entered',r'\1entred'),
        (r'([Cc])entering',r'\1entring'),
        (r'([Cc])hanneled',r'\1hannelled'),
        (r'([Cc])hanneling',r'\1hannelling'),
        (r'([Cc])haracteriz(.?)',r'\1haracteris\2'),
        (r'([Cc])heckbook(.?)',r'\1hequebook\2'),
        (r'([Cc])hili',r'\1hilli'),
        (r'([Cc])hiseled',r'\1hiselled'),
        (r'([Cc])hiseling',r'\1hiselling'),
        (r'([Cc])irculariz(.?)',r'\1ircularis\2'),
        (r'(.?)([Cc])iviliz(.?)',r'\1\2ivilis\3'),
        (r'([Cc])lamor(.?)',r'\1lamour\2'),
        (r'([Cc])langor',r'\1langour'),
        (r'([Cc])larinetist',r'\1larinettist'),
        (r'([Cc])ollectiviz(.?)',r'\1ollectivis\2'),
        (r'([Cc])oloniz(.?)',r'\1olonis\2'),
        (r'([Cc])olor(.?)',r'\1olour\2'),
        (r'(.?)([Cc])olored',r'\1\2oloured'),
        (r'(.?)([Cc])oloring',r'\1\2olouring'),
        (r'(.?)([Cc])oloriz(.?)',r'\1\2olouris\3'),
        (r'([Cc])ommercializ(.?)',r'\1ommercialis\2'),
        (r'([Cc])ompartmentaliz(.?)',r'\1ompartmentalis\2'),
        (r'([Cc])omputeriz(.?)',r'\1omputeris\2'),
        (r'([Cc])onceptualiz(.?)',r'\1onceptualis\2'),
        (r'([Cc])ontextualiz(.?)',r'\1ontextualis\2'),
        (r'([Cc])oz(.?)',r'\1os\2'),
        (r'([Cc])ouncilor(.?)',r'\1ouncillor\2'),
        (r'([Cc])ounselor(.?)',r'\1ounsellor\2'),
        (r'([Cc])ounseling',r'\1ounselling'),
        (r'([Cc])ounseled',r'\1ounselled'),
        (r'([Cc])renelated',r'\1renellated'),
        (r'([Cc])riminaliz(.?)',r'\1riminialis\2'),
        (r'([Cc])riticiz(.?)',r'\1riticis\2'),
        (r'([Cc])rueler',r'\1rueller'),
        (r'([Cc])ruelest',r'\1ruellest'),
        (r'([Cc])rystalliz(.?)',r'\1rystallis\2'),
        (r'([Cc])udgeled', r'\1udgelled'),
        (r'([Cc])udgeling',r'\1udgelling'),
        (r'([Cc])ustomiz(.?)',r'\1ustomis\2'),
        (r'( +)([Cc])ipher(.?)',r'\1\2ypher\3'),
        #DDDD#
        (r'([Dd])ecentraliz(.?)',r'\1ecentralis\2'),
        (r'([Dd])ecriminaliz(.?)',r'\1ecriminalis\2'),
        (r'([Dd])efense(.?)',r'\1efence\2'),
        (r'(.?)([H,h])umaniz(.?)',r'\1\2umanis\3'),
        (r'(.?)([Dd])emeanor',r'\1\2emeanour'),
        (r'(.?)([Mm])ilitariz(.?)',r'\1\2ilitaris\3'),
        (r'(.?)([Mm])obiliz(.?)',r'\1\2obilis\3'),
        (r'([Dd])emocratiz(.?)',r'\1emocratis(.?)'),
        (r'([Dd])emoniz(.?)',r'\1emonis(.?)'),
        (r'(.?)([Mm])oraliz(.?)',r'\1\2oralis\3'),
        (r'(.?)([Nn])ationaliz(.?)',r'\1\2ationalis\3'),
        (r'([Dd])eodoriz(.?)',r'\1eodoris\2'),
        (r'(.?)([Pp])ersonaliz(.?)',r'\1\2ersonalis\3'),
        (r'([Dd])eputiz(.?)',r'\1eputis\2'),
        (r'(.?)([Ss])ensitiz(.?)',r'\1\2ensitis\3'),
        (r'(.?)([Ss])tabliz(.?)',r'\1\2tablis\3'),
        (r'([Dd])ialed',r'\1ialled'),
        (r'([Dd])ialing',r'\1ialling'),
        (r'([Dd])ialog( +)',r'\1ialogue\2'),
        (r'([Dd])ialogs( +)',r'\1ialogues\2'),
        (r'([Dd])iarrhea',r'\1iarrhoea'),
        (r'([Dd])igitiz(.?)',r'\1igitis\2'),
        (r'([Dd])isemboweled',r'\1isembowelled'),
        (r'([Dd])isemboweling',r'\1isembowelling'),
        (r'(.?)([Ff])avor(.?)',r'\1\2avour\3'),
        (r'([D,d])isheveled',r'\1ishevelled'),
        (r'(.?)honor(.?)',r'\1honour\2'), #making this recognize only lower-case h because of Honor Blackman and Honore
        (r'(.?)([Oo])rganization(.?)',r'\1\2rganisation\3'),
        (r'([Dd])istil( +)',r'\1istill\2'),
        (r'([Dd])istils',r'\1istills'),
        (r'([Dd])ramatiz(.?)',r'\1ramatis\2'),
        #(r'([Dd])rafts(.+)',r'\1raughts\2'), will need to do something else for draughtman, people
        (r'([Dd])rafty',r'\1raughty'),
        (r'([Dd])rafti(.?)',r'\1raughti\2'),
        (r'([Dd])riveled',r'\1rivelled'),
        (r'([Dd])riveling',r'\1rivelling'),
        (r'([Dd])ueled',r'\1uelled'),
        (r'([Dd])ueling',r'\1uelling'),
        #EEEE#
        (r'([Ee])conomiz(.?)',r'\1conomis'),
        (r'([Ee])dema',r'\1doema'),
        (r'([Ee])ditorializ(.?)',r'\1ditorialis\2'),
        (r'([Ee])mpathiz(.?)',r'\1mpathis\2'),
        (r'(.?)([Ee])mphasiz(.?)',r'\1\2mphasis\3'),
        (r'([Ee])nameled',r'\1namelled'),
        (r'([Ee])nameling',r'\1namelling'),
        (r'([Ee])namor(.?)',r'\1namour\2'),
        # (r'([Ee])ncyclopedi(.?)',r'\1ncyclopaedi\2'),
        (r'([Ee])ndeavor(.?)',r'\1ndeavour\2'),
        (r'(.?)([Ee])nergiz(.?)',r'\1\2nergis\3'),
        (r'([Ee])nroll(.?)',r'\1nrol\2'),
		(r'([Ee])nrol(ed|ing)',r'\1nroll\2'), #tense exceptions -SOTO
        (r'([Ee])nthrall( +)',r'\1nthral\2'), #only enthrall is one l
        (r'([Ee])paulet( +)',r'\1paulette\2'),
        (r'([Ee])paulets',r'\1paulettes'),
        (r'([Ee])pilog( +)',r'\1pilogue'),
        (r'([Ee])pilogs',r'\1pilogues'),
        (r'([Ee])pitomiz(.?)',r'\1pitomis\2'),
        (r'([Ee])qualiz(.?)',r'\1qualis\2'),
        (r'([Ee])ulogiz(.?)',r'\1ulogis\2'),
        (r'([Ee])vangeliz(.?)',r'\1vangelis\2'),
        (r'([Ee])xorciz(.?)',r'\1xorcis\2'),
        (r'(.?)([Tt])emporiz(.?)',r'\1\2emporis\2'),
        (r'([Ee])xternaliz(.?)',r'\1xternalis\2'),
        #FFFF#
        (r'([Ff])actoriz(.?)',r'\1actoris\2'),
        (r'([Ff])eces',r'\1aeces'),
        (r'([Ff])ecal',r'\1aecal'),
        (r'([Ff])amiliariz(.?)',r'\1amiliaris\2'),
        (r'([Ff])antasiz(.?)',r'\1antasis\2'),
        (r'([Ff])eminiz(.?)',r'\1eminis\2'),
        (r'([Ff])ertiliz(.?)',r'\1ertilis\2'),
        (r'([Ff])ervor',r'\1ervor'),
        (r'([Ff])iber(.?)',r'\1ibre\2'),
        (r'([Ff])ictionaliz(.?)',r'\1ictionalis\2'),
        (r'([Ff])ilet(.?)',r'\1illet\2'),
        (r'([Ff])inaliz(.?)',r'\1inalis\2'),
        (r'(.?)([Ff])lavor(.?)',r'\1\2lavour\3'),
        (r'([Ff])etal',r'\1oetal'),
        (r'([Ff])etus(.?)',r'\1oetus\2'),
        (r'([Ff])etid',r'\1oetid'),
        (r'([Ff])ormaliz(.?)',r'\1ormalis\2'),
        (r'([Ff])ossiliz(.?)',r'\1ossilis\2'),
        (r'([Ff])raterniz(.?)',r'\1raternis\2'),
        (r'([Ff])ulfill( +)',r'\1ulfil\2'),
        (r'([Ff])ulfillment',r'\1ulfilment'),
        (r'([Ff])unneled',r'\1unnelled'),
        (r'([Ff])unneling',r'\1unnelling'),
        #GGGG#
        (r'([Gg])alvaniz(.?)',r'\1alvanis\2'),
        (r'([Gg])amboled',r'\1ambolled'),
        (r'([Gg])amboling',r'\1amboling'),
        (r'([Gg])eneraliz(.?)',r'\1eneralis\2'),
        (r'([Gg])hettoiz(.?)',r'\1hettois\2'),
        (r'([Gg])lamoriz(.?)',r'\1lamoris\2'),
        (r'([Gg])lamor( +)',r'\1lamour\2'),
        (r'([Gg])lobaliz(.?)',r'\1lobalis\2'),
        (r'([Gg])luing',r'\1lueing'),
        (r'([Gg])oiter(.?)',r'\1oitre\2'),
        (r'([Gg])onorrhea',r'\1onorrhoea'),
        (r'([Gg])raveled',r'\1ravelled'),
        (r'gray( +)',r'grey\1'), #probably shouldn't include cap G#
        (r'gray(.?)',r'grey\1'),
        (r'([Gg])roveled',r'\1rovelled'),
        (r'([Gg])roveling',r'\1rovelling'),
        (r'([Gg])rueling(.?)',r'\1ruelling\2'),
        (r'([Gg])ynacol(.?)',r'\1ynaecol\2'),
        #HHHH#
        (r'([Hh])ematolog(.?)',r'\1aematolog\2'),
        (r'([Hh])emo(.?)',r'\1aemo\2'),
        (r'([Hh])arbor(.?)',r'\1arbour\2'),
        (r'([Hh])armoniz(.?)',r'\1armonis\2'),
        (r'([Hh])omeopath(.?)',r'\1omoeopath\2'),
        (r'([Hh])omogeniz(.?)',r'\1omogenis\2'),
        (r'([Hh])ospitaliz(.?)',r'\1ospitalis\2'),
        (r'([Hh])umor(.?)',r'\1umour\2'),
        (r'([Hh])ybridiz(.?)',r'\1ybridis\2'),
        (r'([Hh])ypnotiz(.?)',r'\1ypnotis\2'),
        (r'([Hh])ypothesiz(.?)',r'\1ypothesis\2'),
        #IIII#
        (r'([Ii])dealiz(.?)',r'\1dealis\2'),
        (r'([Ii])doliz(.?)',r'\1dolis\2'),
        (r'(.?)([Mm])obiliz(.?)',r'\1\2obilis\3'),
        (r'([Ii])mmortaliz(.?)',r'\1mmortalis\2'),
        (r'([Ii])mmuniz(.?)',r'\1mmunis\2'),
        (r'(.?)([Pp])aneled',r'\1\2anelled'),
        (r'(.?)([Pp])aneling',r'\1\2anelling'),
        (r'([Ii])mperiled',r'\1mperilled'),
        (r'([Ii])mperiling',r'\1mperilling'),
        (r'([Ii])ndividualiz(.?)',r'\1ndividualis\2'),
        (r'([Ii])ndustrializ(.?)',r'\1ndustrialis\2'),
        (r'([Ii])nstill( +)',r'\1nstil\2'),
        (r'([Ii])nitialed',r'\1nitialled'),
        (r'([Ii])nitialing',r'\1nitialling'),
        (r'([Ii])nstallment(.?)',r'\1nstalment\2'),
        (r'([Ii])nstitutionaliz(.?)',r'\1nstitutionalis\2'),
        (r'([Ii])ntellectualiz(.?)',r'\1ntellectualis\2'),
        (r'(.?)([Nn])ationaliz(.?)',r'\1ationalis\2'),
        (r'([Ii])nternaliz(.?)',r'\1nternalis\2'),
        (r'([Ii])oniz(.?)',r'\1onis\2'),
        (r'([Ii])taliciz(.?)',r'\1talicis\2'),
        (r'([Ii])temiz(.?)',r'\1temis\2'),
        #JJJJ
        (r'([Jj])eopardiz(.?)',r'\1eopardis\2'),
        (r'([Jj])eweler(.?)',r'\1eweller\2'),
        #KKKK#
        #None known#
        #LLLL#
        (r'([Ll])abeled',r'\1abelled'),
        (r'([Ll])abeling',r'\1abelling'),
        (r'([Ll])ackluster',r'\1acklustre'),
        (r'(.?)([Ll])egaliz(.?)',r'\1\2egalis\3'),
        (r'(.?)([Ll])egitimiz(.?)',r'\1\2egitimis\3'),
        (r'([Ll])ukemia',r'\1eukaemia'),
        (r'(.?)([Ll])evele(.?)',r'\1\2evelle\3'),
        (r'(.?)([Ll])eveling',r'\1\2evelling'),
        (r'([Ll])ibeled',r'\1ibelled'),
        (r'([Ll])ibelous',r'\1ibellous'),
        (r'([Ll])ibeling',r'\1ibelling'),
        (r'([Ll])iberaliz(.?)',r'\1iberalis\2'),
        (r'([Ll])ioniz(.?)',r'\1ionis\2'),
        (r'([Ll])iquidiz(.?)',r'\1iquidis\2'),
        (r'([Ll])ocaliz(.?)',r'\1ocalis\2'),
        (r'([Ll])ouver(.?)',r'\1ouvre\2'),
        (r'( +)([Ll])uster',r'\1\2ustre'),
        #MMMM#
        (r'(.?)([Mm])agnetiz(.?)',r'\1\2agnetis\3'),
        (r'(.?)([Mm])aneuver(.?)',r'\1\2anoeuvre\3'),
		(r'(.?)([Mm])anoeuvreed(.?)',r'\1\2anoeuvred\3'), #catching exception -SOTO#
        (r'([Mm])arginiliz(.?)',r'\1arginilis\2'),
        (r'([Mm])arshaled',r'\1arshalled'),
        (r'([Mm])arshaling',r'\arshalling'),
        (r'([Mm])arveled',r'\1arvelled'),
        (r'([Mm])arveling',r'\1arvelling'),
        (r'([Mm])arvelo(.?)',r'\1arvello\2'),
        (r'(.?)([Mm])aterializ(.?)',r'\1\2aterialis\3'),
        (r'([Mm])aximiz(.?)',r'\1aximis\2'),
        (r'([Mm])eager',r'\1eager'),
        (r'([Mm])echaniz(.?)',r'\1echanis\2'),
        (r'([Mm])emorializ(.?)',r'\1emorialis\2'),
        (r'([Mm])emoriz(.?)',r'\1emoris\2'),
        (r'([Mm])esmeriz(.?)',r'\1esmoris\2'),
        (r'([Mm])etaboliz(.?)',r'\1etabolis\2'),
        (r'([Mm])iniaturiz(.?)',r'\1iniaturis\2'),
        (r'([Mm])inimiz(.?)',r'\1inimis\2'),
        (r'([Mm])iter(.?)',r'\1itre\2'),
        (r'(.?)([Mm])odele(.?)',r'\1\2odelle\3'),
        (r'(.?)([Mm])odeling',r'\1\2odelling'),
        (r'([Mm])oderniz(.?)',r'\1odernis\2'),
        (r'([Mm])oisturiz(.?)',r'\1oisturis\2'),
        (r'([Mm])onolog( +)',r'\1onologue\2'),
        (r'([Mm])onologs',r'\1onologues'),
        (r'([Mm])onopoliz(.?)',r'\1onopolis\2'),
        (r'(.?)([Mm])old(.?)',r'\1\2ould\3'),
        (r'([Mm])olted',r'\1oulted'),
        (r'([Mm])olting',r'\1oulting'),
        (r'([Mm])olt( +)',r'\1oult\2'),
        (r'([Mm])ustache(.?)',r'\1oustache\2'),
        #NNNN#
        (r'([Nn])aturaliz(.?)',r'\1aturalis\2'),
        (r'([Nn])eighbor(.?)',r'\1eighbour\2'),
        (r'([Nn])aturaliz(.?)',r'\1aturalis\2'),
        (r'([Nn])eutraliz(.?)',r'\1eutralis\2'),
        (r'([Nn])ormaliz(.?)',r'\1ormalis\2'),
        #OOOO#
        (r'([Oo])dor( +)',r'\1dour\2'),
        (r'([Oo])dors',r'\1dours'),
        (r'( +)esophagus(.?)',r'\1oesophagus\2'),
        (r'( +)Esophagus(.?)',r'\1Oesophagus\2'),
        (u'( +)estrogen',u'\1oestrogen'),
        (u'( +)Estrogen',u'\1Oestrogen'),
        (r'([Oo])ffense(.?)',r'\1ffence\2'),
        (r'([Oo])melet( +)',r'\1melette\2'),
        (r'([Oo])melets',r'\1melettes'),
        (r'(.?)([Oo])ptimiz(.?)',r'\1\2ptimis\3'),
        (r'(.?)([Oo])rganiz(.?)',r'\1\2rganis\3'),
        (r'([Oo])rthopedic(.?)',r'\1rthopaedic\2'),
        (r'([Oo])straciz(.?)',r'\1stracis\2'),
        (r'([Oo])xidiz(.?)',r'\1xidis\2'),
        #PPPP#
        (r'([Pp])ederast(.?)',r'\1aederast\2'),
        (r'([Pp])ediatric(.?)',r'\1aediatric\2'),
        (r'([Pp])edo( +)',r'\1aedo\2'),
        (r'([Pp])edophil(.?)',r'\1aedophil\2'),
        (r'([Pp])aleo(.?)',r'\1alaeo\2'),
        (r'([Pp])anelist(.?)',r'\1anellist\2'),
        (r'([Pp])araliz(.?)',r'\1aralys\2'),
        (r'([Pp])arceled',r'\1arcelled'),
        (r'([Pp])arceling',r'\1arcelling'),
        (r'([Pp])arlor(.?)',r'\1arlour\2'),
        (r'([Pp])articulariz(.?)',r'\1articularis\2'),
        (r'([Pp])assiviz(.?)',r'\1assivis\2'),
        (r'([Pp])asteuriz(.?)',r'\1asteuris\2'),
        (r'([Pp])atroniz(.?)',r'\1atronis\2'),
        (r'([Pp])edestrianiz(.?)',r'\1edestrianis\2'),
        (r'([Pp])enaliz(.?)',r'\1enalis\2'),
        (r'([Pp])enciled',r'\1encilled'),
        (r'([Pp])enciling',r'\1encilling'),
        (r'([Pp])harmacopeia(.?)',r'\1harmacopoeia\2'),
        (r'([Pp])hilosophiz(.?)',r'\1hilosophis\2'),
        (r'([Pp])hilter(.?)',r'\1hiltre\2'),
        (r'([Pp])lagiariz(.?)',r'\1lagiaris\2'),
        (r'([Pp])low( +)',r'\1lough\2'),
        (r'([Pp])low(.?)',r'\1lough\2'),
        (r'(.?)([Pp])olariz(.?)',r'\1\2olaris\3'),
        (r'(.?)([Pp])oliticiz(.?)',r'\1\2oliticis\3'),
        (r'([Pp])opulariz(.?)',r'\1opularis\2'),
        (r'([Pp])ouf( +)',r'\1ouffe\2'),
        (r'([Pp])oufs',r'\1ouffes'),
        (r'([Pp])racticed',r'\1ractised'),
        (r'([Pp])racticing',r'\1ractising'),
        (r'([Pp])raesidium(.?)',r'\1residium\2'),
        (r'(.?)([Pp])ressuriz(.?)',r'\1\2ressuris\3'),
        (r'([Pp])retens(.?)',r'\1retenc\2'),
        (r'([Pp])rimaeval',r'\1rimeval'), #Correcting in favour of American spelling#
        (r'(.?)([Pp])rioritiz(.?)',r'\1\2rioritis\3'),
        (r'(.?)([Pp])rivatiz(.?)',r'\1\2rivatis\3'),
        (r'([Pp])roffesionaliz(.?)',r'\1roffesionalis\2'),
        (r'([Pp])rolog( +)',r'\1rologue\2'),
        (r'([Pp])rologs',r'\1rologues'),
        (r'([Pp])ropagandiz(.?)',r'\1ropagandis\2'),
        (r'([Pp])roselytiz(.?)',r'\1roselytis\2'),
        (r'([Pp])ubliciz(.?)',r'\1ublicis\2'),
        (r'([Pp])ulveriz(.?)',r'\1ulveris\2'),
        (r'([Pp])ummeled',r'\1ummelled'),
        (r'([Pp])ummeling',r'\1ummelling'),
        (r'([Pp])ajama(.?)',r'\1yjama\2'),
        #QQQQ#
        (r'([Qq])uarreled',r'\1uarrelled'),
        (r'([Qq])uarreling',r'\1uqarrelling'),
        #RRRR#
        (r'([Rr])adicaliz(.?)',r'\1adicalis\2'),
        (r'([Rr])ancor(.?)',r'\1ancour\2'),
        (r'([Rr])andomiz(.?)',r'\1andomis\2'),
        (r'([Rr])ationaliz(.?)',r'\1ationalis\2'),
        (r'(.?)([Rr])aveled',r'\1\2avelled'),
        (r'(.?)([Rr])aveling',r'\1\2avelling'),
        (r'(.?)([Rr])ealiz(.?)',r'\1\2ealis\3'),
        (r'(.?)([Rr])ecogniz(.?)',r'\1\2ecognis\3'),
        (r'([Rr])econnoiter(.?)',r'\1econnoitre\2'),
        (r'([Rr])efueled',r'\1efuelled'),
        (r'([Rr])efueling',r'\1efuelling'),
        (r'(.?)([Rr])egulariz(.?)',r'\1\2\egularis\3'),
        (r'([Rr])evele(.?)',r'\1evelle\2'),
        (r'([Rr])eveling',r'\1evelling'),
        (r'(.?)([Vv])italiz(.?)',r'\1\2italis\3'),
        (r'([Rr])evolutioniz(.?)',r'\1evolutionis\2'),
        (r'([Rr])hapodiz(.?)',r'\1hapodis\2'),
        (r'( +)([Rr])igor( +)',r'\1\2igour\3'),
        (r'([Rr])itualiz(.?)',r'\1itualis\2'),
        (r'(.?)([Rr])ivaled',r'\1\2ivalled'),
        (r'([Rr])ivaling',r'\1ivalling'),
        (r'([Rr])omanticiz(.?)',r'\1omanticis\2'),
        (r'([Rr])umor(.?)',r'\1umour\2'),
        #SSSS#
        (r'([Ss])aber(.?)',r'\1abre\2'),
        (r'([Ss])altpeter',r'\1altpetre'),
        (r'(.?)([Ss])anitiz(.?)',r'\1\2anitis\3'),
        (r'([Ss])atiriz(.?)',r'\1atiris\2'),
        (r'([Ss])avior(.?)',r'\1aviour\2'),
        (r'(.?)savor(.?)',r'\1savour\2'), #recognizes only lower-case s, because of Gerald Savory
        (r'([Ss])candaliz(.?)',r'\1candalis\2'),
        (r'([Ss])keptic(.?)',r'\1ceptic\2'),
        (r'([Ss])cepter(.?)',r'\1ceptre\2'),
        (r'([Ss])crutiniz(.?)',r'\1crutinis\2'),
        (r'([Ss])eculariz(.?)',r'\1ecularis\2'),
        (r'([Ss])ensationaliz(.?)',r'\1ensationalis\2'),
        (r'([Ss])entimentaliz(.?)',r'\1entimentalis\2'),
        (r'([Ss])epulcher(.?)',r'\1epulchre\2'),
        (r'([Ss])erializ(.?)',r'\1erialis\2'),
        (r'([Ss])ermoniz(.?)',r'\1ermonis\2'),
        (r'([Ss])hoveled',r'\1hovelled'),
        (r'([Ss])hoveling',r'\1hovelling'),
        (r'([Ss])hriveled',r'\1hrivelled'),
        (r'([Ss])hriveling',r'\1hrivelling'),
        (r'([Ss])ignaliz(.?)',r'\1ignalis\2'),
        (r'([Ss])ignaled',r'\1ignalled'),
        (r'([Ss])ignaling',r'\1ignalling'),
        (r'([Ss])molder(.?)',r'\1moulder\2'),
        (r'([Ss])niveled',r'\1nivelled'),
        (r'([Ss])niveling',r'\1nivelling'),
        (r'([Ss])norkeled',r'\1norkelled'),
        (r'([Ss])norkeling',r'\1norkelling'),
        (r'(.?)([Ss])ocializ(.?)',r'\1\2ocialis\3'),
        (r'([Ss])odomiz(.?)',r'\1odomis\2'),
        (r'(.?)([Ss])olemniz(.?)',r'\1\2olemnis\3'),
        (r'([Ss])omber',r'\1ombre'),
        (r'([Ss])pecializ(.?)',r'\1pecialis\2'),
        (r'( +)([Ss])pecter(.?)',r'\1\2pectre\3'),
        (r'([Ss])piraled',r'\1piralled'),
        (r'([Ss])piraling',r'\1piraling'),
        (r'([Ss])plendor(.?)',r'\1plendour\2'),
        (r'([Ss])quirreled',r'\1quirrelled'),
        (r'([Ss])quirreling',r'\1quirrelling'),
        (r'(.?)([Ss])tabliz(.?)',r'\1\2tablis\3'),
        (r'(.?)([Ss])tandardiz(.?)',r'\1\2tandardis\3'),
        (r'([Ss])tenciled',r'\1tencilled'),
        (r'([Ss])tenciling',r'\1tencilling'),
        (r'(.?)([Ss])teriliz(.?)',r'\1\2terilis\3'),
        (r'(.?)([Ss])tigmatiz(.?)',r'\1\2tigmatis\3'),
        (r'(.?)([Ss])ubsidiz(.?)',r'\1\2ubsidis\3'),
        (r'([Ss])uccor(.?)',r'\1uccour\2'),
        (r'([Ss])ulfa(.?)',r'\1ulpha\2'),
        (r'([Ss])ulfi(.?)',r'\1ulphi\2'),
        (r'([Ss])ulfu(.?)',r'\1ulphu\2'),
        (r'([Ss])ummariz(.?)',r'\1ummaris\2'),
        (r'([Ss])wiveled',r'\1wivelled'),
        (r'([Ss])wiveling',r'\1wiveling'),
        (r'([Ss])ymboliz(.?)',r'\1ymbolis\2'),
        (r'([Ss])ympathiz(.?)',r'\1ympathis\2'),
        (r'(.?)([Ss])ynchroniz(.?)',r'\1\2ynchronis\3'),
        (r'(.?)([Ss])ynthesiz(.?)',r'\1\2ynthesis\3'),
        (r'(.?)([Ss])ystematiz(.?)',r'\1\2ystematis\3'),
        #TTTT#
        (r'([Tt])antaliz(.?)',r'\1antalis\2'),
        (r'([Tt])asseled',r'\1asselled'),
        (r'([Tt])enderiz(.?)',r'\1enderis\2'),
        (r'([Tt])erroriz(.?)',r'\1erroris\2'),
        (r'([Tt])heoriz(.?)',r'\1heoris\2'),
        (r'([Tt])oweled',r'\1owelled'),
        (r'([Tt])oweling',r'\1owelling'),
        (r'([Tt])oxemia',r'\1oxaemia'),
        (r'([Tt])ranquiliz(.?)',r'\1ranquillis\2'),
        (r'([Tt])ranquilis(.?)',r'\1ranquillis\2'),
        (r'([Tt])ranquilliz(.?)',r'\1ranquillis\2'), #correcting archaic BrEng form to modern BrEng#
        (r'([Tt])ranquillity ([Bb])ase',r'Tranquility Base'), #correcting to IAU standard#
        (r'([Tt])ransistoriz(.?)',r'\1ransistoris\2'),
        (r'([Tt])raumatiz(.?)',r'\1raumatis\2'),
        (r'([Tt])ravelers',r'\1ravellers'), #other forms under "ravelled" above#
        (r'([Tt])ravelog( +)',r'\1ravelogue\2'),
        (r'([Tt])ravelogs',r'\1ravelogues'),
        (r'([Tt])rvializ(.?)',r'\1rivialis\2'),
        (r'([Tt])umor(.?)',r'\1umour\2'),
        (r'([Tt])unneled',r'\1unnelled'),
        (r'([Tt])unneling',r'\1unnelling'),
        (r'([Tt])yraniz(.?)',r'\1yranis\2'),
        #UUUU#
        (r'([Uu])nioniz(.?)',r'\1nionis\2'),
        (r'([Uu])ntrameled',r'\1ntramelled'),
        (r'(.?)([Uu])rbaniz(.?)',r'\1\2rbanis\3'),
        (r'(.?)([Uu])tiliz(.?)',r'\1\2tilis\3'),
        #VVVV#
        (r'([Vv])alor',r'\1alour'),
        (r'([Vv])andaliz(.?)',r'\1andalis\2'),
        (r'(.?)([Vv])aporiz(.?)',r'\1\2aporis\3'),
        (r'([Vv])apor( +)',r'\1apour\2'),
        (r'([Vv])apors',r'\1apours'),
        (r'([Vv])aporiz(.?)',r'\1aporis\2'), #Weirdly, words that have vapour as a root lose the cosmetic 'u' #
        (r'(.?)erbaliz(.?)',r'\1erbalis\2'),
        (r'([Vv])ictimiz(.?)',r'\1ictimis\2'),
        (r'([Vv])igor( +)',r'\1igour\2'),
        (r'([Vv])isualiz(.?)',r'\1isualis\2'),
        (r'([Vv])ocaliz(.?)',r'\1ocalis\2'),
        (r'([Vv])ulaniz(.?)',r'\1ulcanis\2'),
        (r'([Vv])ulgariz(.?)',r'\1ulgaris\2'),
        #WWWW#
        (r'([Ww])easeled',r'\1easelled'),
        (r'([Ww])weaseling',r'\1easelling'),
        (r'([Ww])esterniz(.?)',r'\1esternis\2'),
        (r'([Ww])omaniz(.?)',r'\1omanis\2'),
        (r'([Ww])oolen(.?)',r'\1oollen\2'),
        (r'([Ww])oolies',r'\1oollies'),
        (r'([Ww])ooly',r'\1oolly'),
        #XXXX#
        #None known#
        #YYYY#
        (r'([Yy])odeled',r'\1odelled'),
        (r'([Yy])odeling',r'\1odelling'),
        #ZZZZ#
        #None known#
        ],
    'exceptions': {
        'inside-tags': [
            'pre',
            'code',
            'nowiki',
            'hyperlink',
            'link',
            'comment',
            'center',
            'color',
            'captiontextcolor',
            'gallery',
            'syntaxhighlight'
            ],
        'category': [
            'spelling',
            ],
        'inside': [
            'Similarities in Proto-Cultural Artifacts',
            'Honor_Blackman',
            'Savory',
            'tachometer',
            'mileometer',
            'spectrometer',
            'diameter',
            'diameters',
            'pentameter',
            'pentameters',
            'chronometer',
            'chronometers',
            'geometer',
            'geometers',
            'rateometer',
            'rateometers',
            'Rateometer',
            'Rateometers',
            'interferometer',
            'EMF meter',
            'parameter',
            'altimeter',
            'altimeters',
            'parameters',
            'Graystark',
            'perimeter',
            'pretension',
            'Good Neighbors',
            'stingray',
            'stingrays',
            'anagram',
            'Anagram',
            'anagrams',
            'Anagrams',
            'Previsualization}}',
            'behemoth',
            'Behemoth',
            'behemoths',
            'Yourfavoritemartian',
            'Music-a-grams',
            'anagrams',
            'Anagrams',
            'hologram',
            'Hologram',
            'Holograms',
            'holograms',
            'electrocardiogram',
            'electrocardiograms',
            'pentagram',
            'pentagrams',
            'telegram',
            'telegrams',
            'transgram',
            'transgrams',
            'Transgram',
            'Transgrams',
            'diagram',
            'diagrams',
            'Diagram',
            'Diagrams',
            'engram',
            'engrams',
            'Grigory',
            'Unauthorized Guide',
            'Honor Blackman', #this isn't being excpted and i don't know why#
            'Medal of Honor',
            'Arborge Quince',
            'program', #need a forum discussion here#
            'programs',
            'reprogram',
            'deprogram',
            'pictogram',
            'pictograms',
            'Pictogram',
            'Pictograms',
            'phonogram',
            'phonograms',
            'background-color',
            'color:',
            'color :',
            'border-color',
            'text-align: center;',
            'text-align:center;',
            'align=center',
            'align = center',
            'align= center',
            'align =center',
            'position=center',
            '</ center>',
			'{{color', #SOTO
			'{{Color', #SOTO
            'Encyclopedia of Fantastic',
            'themonster',
            'arboreal',
            'Moldova',
            'Fun at the Funeral Parlor',
            'humorous',
            'Humorous',
            'limiter',
            'appalling',
            'appalled',
            'Splendorosa',
            'Demeter',
            'cemetery',
            'Cemetery',
            'Gerald Savory',
            'Savory',
            'Johnson Space Center',
            'Kennedy Space Center',
            'Center',
            'Catalog',
            'Chilitern',
            'chemotherapy',
            'Chemothreapy',
            'Colorado',
            'previsualization',
            'Scarborough',
            'Akoshemon',
            'Plowman',
            'torpedo',
            'torpedos',
            'Torpedo',
            'Torpedos',
            'stingray',
            'stingrays',
            'Stingray',
            'Stingrays',
            'Beccy Armory',
            'Honore', #None of these attempts to 
            'Honoré', #except Honoré Lechasseur works
            'Honoré', #The bot has been made to not correct
            'Honore', #for capital-H Honor
            'lightsaber',
            'Polygram',
            'Majestic Theater',
            'Taplow',
            'Fyodor',
            'Target Practice',
            'target practice',
            'Synthesizing Starfields', #doesn't appear to work
            'Pearl Harbor',
            'Mercury Theater',
            'Event Synthesizer',
            'bgcolor',
            'blasphemous', #dunno why this is being triggered as blasphaemous
            'grams operator', #not sure this is a real word, but it appears on DMP
			'Parallelogram', #Work by SOTO from here on out...#
			'parallelogram',
			'Color Assists',
			'Colorist: ',
			'colorsport.co.uk',
			'The Armored Creature of 004X',
			'-an-unauthorized-guide-to',
			'-the-unauthorized-guide-to',
			'Department of Defense',
			'instagram',
			'Instagram',
			'Cozens',
			'thecozens',
			'Dougray',
			'Plowman',
			'plowmanal',
			' smiter',
			' Smiter',
            ],
        }
    }