Tardis:What SpellBot actually corrects

From Tardis Wiki, the free Doctor Who reference
Revision as of 17:42, 6 June 2011 by CzechOut (talk | contribs)

Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have bot enforcement of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled, and is being coded for bot use as of the second week of June, 2011. This page will see heavy updating throughout that week as the list is fully coded.

Following is the raw code of that boy routine, so that all users may see what exactly the bot is checking for.

How to read the code

A few words on regex expressions for the uninitiated:

  • The expression ([Cc]) means "look for either capitalised or lowercase versions of the letter C
  • (.?) means, "You, Mr. Fancy Computer bot thing, might find some more letters to the right of this point. Grab 'em all up to the next space only."
  • /1 means, "take whatever is in the first parentheses and put it here"
  • /2 means, "take whatever is in the second parentheses and put it here"

Thus, if we have the expression,

(r'([Cc])apitaliz(.?)', r'\1capitalis\2')

It means, roughly,

Look for all words, beginning with either a capital or lowercase C, which are followed by the letters "apitaliz" + any other letters you find until the next space. Then, keep the form of the letter c that you find, stick on "apitalis", and add back in any letters you orginally found after the "z".

In other words, find, Capitaliz-, keep the C a capital, then stick on "-e', "-ing", "-ed", or "-ation", as appropriate.

Many differences in British/American spelling have to do with just the sort of one-letter-before-the-suffix switchout. Some are more complicated, and have to be dealt with on a more individual, and less automated, basis.

The code

The following code will change over time, as more words are added. The final word in the English language that has a British/American difference is yogurts. Once you see that word on this list, you'll know the bot is fully programmed.

fixes['spelling'] = {
   'regex': True,
   'recursive': True,
   'msg': {
       'en':u'Enforcing spelling policy.'
       },
   'replacements': [
       (u'accessorize', u'accessorise'),
       (u'accessorized', u'accessorised'),
       (u'accessorizes', u'accessorises'),
       (u'accessorizing', u'accessorising'),
       (u'acclimitization',u'acclimatisation'),
       (u'acclimatize',u'acclimatise'),
       (u'acclimatized',u'acclimatised'),
       (u'acclimatizes',u'acclimatises'),
       (u'acclimatizing',u'acclimatising'),
       (u'accounterments',u'accoutrements'),
       (u'eon',u'aeon'),
       (u'eons',u'aeons'),
       (u'aerogram',u'aerogramme'),
       (u'aerograms',u'aerogrammes'),
       (u'esthete',u'aesthete'),
       (u'esthetes',u'aesthetes'),
       (u'esthetic',u'aesthetic'),
       (u'esthetically', u'aesthetically'),
       (u'ethetics', u'aesthetics'),
       (u'etiology',u'aetiology'),
       (u'aging',u'ageing'),
       (u'aggrandizement',u'aggrandisement'),
       (u'agonize', u'agonise'),
       (u'agonized',u'agonised'),
       (u'agonizes',u'agonises'),
       (u'agonizing',u'agonising'),
       (u'agonizingly',u'agonisingly'),
       (u'almanac',u'almanack'),
       (u'almanac',u'almanacks'),
       (u'aluminum', u'aluminium'),
       (u'amortizable',u'amortisable'),
       (u'amortization',u'amortisation'),
       (u'amortizations',u'amortisations'),
       (u'amortize',u'amortise'),
       (u'amortized',u'amortised'),
       (u'amortizes',u'amortises'),
       (u'amortizing',u'amortising'),
       (u'ampitheater',u'amphitheatre'),
       (u'ampitheaters',u'amphitheatres'),
       (u'anemia',u'anaemia'),
       (u'anemic',u'anaemic'),
       (u'anesthesia',u'anaesthesia'),
       (u'anesthetic',u'anaesthetic'),
       (u'anesthetics',u'anaesthetics'),
       (u'anesthetize',u'anaesthetise'),
       (u'anesthetized',u'anaesthetised'),
       (u'anesthetizes',u'anaesthetises'),
       (u'anesthetizing',u'anaesthetising'),
       (u'anesthetist',u'anaesthetist'),
       (u'anesthetists',u'anaesthetists'),
       (u'analog',u'analogue'),
       (u'analogs',u'analogues'),
       (u'analyze',u'analyse'),
       (u'analyzed',u'analysed'),
       (u'analyzes',u'analyses'),
       (u'analyzing',u'analysing'),
       (u'anglicize',u'anglicise'),
       (u'anglicized',u'anglicised'),
       (u'anglicizes',u'anglicises'),
       (u'anglicizing',u'anglicising'),
       (u'annualized',u'annualised'),
       (u'antagonize',u'antagonise'),
       (u'antagonized',u'antagonised'),
       (u'antagonizes',u'antagonises'),
       (u'antagonizing',u'antagonising'),
       (u'apologize',u'apologise'),
       (u'apologized',u'apologised'),
       (u'apologizes',u'apologises'),
       (u'apologizing',u'apologising'),
       (u'appall',u'appal'),
       (u'appalls',u'appals'),
       (u'appetizer',u'appetiser'),
       (u'appetizers',u'appetisers'),
       (u'appetizing',u'appetising'),
       (u'appetizingly',u'appetisingly'),
       (u'arbor',u'arbour'),
       (u'arbors',u'arbours'),
       (u'archeological',u'archaeological'),
       (u'archeologically',u'archaeologically'),
       (u'archeologist',u'archaeologist'),
       (u'archeologists',u'archaeologists'),
       (u'archeology',u'archaeology'),
       (u'ardor',u'ardour'),
       (u'armor',u'armour'),
       (u'armored',u'armoured'),
       (u'armorer',u'armourer'),
       (u'armorers',u'armourers'),
       (u'armories',u'armouries'),
       (u'armory',u'armoury'),
       (u'artifact',u'artefact'),
       (u'artifacts',u'artefacts'),
       (u'authorize',u'authorise'),
       (u'authorized',u'authorised'),
       (u'authorizes',u'authorises'),
       (u'authorizing',u'authorising'),
       (u'ax',u'axe'),
       (u'backpedaled', 'backpedalled'),
       (u'backpedaling', 'backpedalling'),
       (u'banister', u'bannister'),
       (u'banisters',u'bannisters'),
       (u'baptize',u'baptise'),
       (u'baptized',u'baptised'),
       (u'baptizes',u'baptises'),
       (u'baptizing',u'baptising'),
       (u'bastardize',u'bastardise'),
       (u'bastardized',u'bastardised'),
       (u'bastardizes',u'bastardises'),
       (u'bastardizing',u'bastardising'),
       (u'battleax',u'battleaxe'),
       (u'balk',u'baulk'),
       (u'balked',u'baulked'),
       (u'balking',u'baulking'),
       (u'balks',u'baulks'),
       (u'bedeviled',u'bedevilled'),
       (u'bedevling',u'bedevilling'),
       (u'behavior',u'behaviour'),
       (u'behavoral',u'behavioural'),
       (u'behaviorism',u'behaviourism'),
       (u'behaviorist',u'behaviourist'),
       (u'behaviorists',u'behaviourists'),
       (u'behaviors',u'behaviours'),
       (u'behoove',u'behove'),
       (u'behooved',u'behoved'),
       (u'behooves',u'behoves'),
       (u'bejeweled',u'bejewelled'),
       (u'belabor',u'belabour'),
       (u'belabored',u'belaboured'),
       (u'belaboring',u'belabouring'),
       (u'belabors',u'belabours'),
       (u'beveled',u'bevelled'),
       (u'bevies',u'bevvies'),
       (u'bevy','bevvy'),
       (u'biased',u'biassed'),
       (u'biasing',u'biassing'),
       (u'binging',u'bingeing'),
       (u'bougainvillea',u'bougainvillaea'),
       (u'bougainvilleas',u'bougainvillaeas'),
       (u'bowdlerize',u'bowdlerise'),
       (u'bowdlerized',u'bowdlerised'),
       (u'bowdlerizes',u'bowdlerises'),
       (u'bowdlerizing',u'bowdlerising'),
       (u'breathalyze',u'breathalyse'),
       (u'breathalyzed',u'breathalysed'),
       (u'breathalyzer',u'breathalyser'),
       (u'breathalyzers',u'breathalysers'),
       (u'breathalyzes',u'breathalyses'),
       (u'breathalyzing',u'breathalysing'),
       (u'brutalize',u'brutalise'),
       (u'brutalized',u'brutalised'),
       (u'brutalizes',u'brutalises'),
       (u'brutalizing',u'brutalising'),
       (u'busses',u'buses'),
       (u'bussing',u'busing'),
       (u'cesarean',u'caesarean'),
       (u'cesareans',u'caesareans'),
       (u'caliber',u'calibre'),
       (u'calibers',u'calibres'),
       (u'([Cc])aliper(.?)',u'\1calliper\2'),
       (u'([Cc])alisthenics',u'\1allisthenics'),
       (u'canalize',u'canalise'),
       (u'canalized',u'canalised'),
       (u'canalizes',u'canalises'),
       (u'canalizing',u'canalising'),
       (u'([Cc])ancelation',u'\1ancellation'),
       (u'([Cc])ancelations',u'\1ancellations'),
       (u'([Cc])anceled',u'\1ancelled'),
       (r'([Cc])anceling',r'\1ancelling'),
       (u'([Cc])andor',u'\1andour'),
       (r'([Cc])annibaliz(.?)',r'\1annibalis\2'),
       (r'([Cc])anibaliz(.?)',r'\1annibalisi\2'),
       (r'([Cc])anibalis(.?)',r'\1annibalis\2'),
       (r'([Cc])anoniz(.?)',r'\1anonis\2'),
       (r'([Cc])apitaliz(.?)',r'\1apitalis\2'),
       (r'([Cc])arameliz(.?)',r'\1aramelis\2'),
       (r'([Cc])arboniz(.?)',r'\1arbonis\2'),
       (r'([Cc])arolled',r'\1arolled'),
       (r'([Cc])arolling',r'\1arolling'),
       (r'([Cc])atalog','\1atalogue'),
       (r'([Cc])atalogs','\1atalogues'),
       (r'([Cc])ataloged','\1atalogued'),
       (r'([Cc])ataloging','\1ataloguing'),
       (r'([Cc])atalyz(.?)','\1atalys\2'),
       (r'([Cc])ategoriz(.?)','\1ategoris\2'),
       (r'([Cc])auteriz(.?)','\1auteris\2'),
       (r'([Cc])avil(.?)','\1avill\2'),