Tech, emailconfirmed, Administrators (Semantic MediaWiki), Curators (Semantic MediaWiki), Administrators
12,465
edits
m (CzechOut moved page Tardis:Spelling/user-fixes.py to Tardis:What SpellBot actually corrects) |
Bongolium500 (talk | contribs) |
||
(25 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have '''bot enforcement''' of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled | {{lock}}{{mosnav|p=Use British English|Spelling|Spelling cheat card|Spell checking|Spell checking with a Mac|Spell checking with Opera|Spell checking with Chrome|Spell checking with Firefox|SpellBot|c=British English|}} | ||
{{summ|This is the master list of everything for which [[T:SBOT|SpellBot]] corrects, along with detailed notes about the rationale for some of the coding decisions and explanations of some of SpellBot's key limitations. [[#The list|The list]] itself is written in {{w|regex|regex}}, so if you're unfamiliar with that language, it may take you a moment to get used to the symbology.}} | |||
{{sc|T:SBOT LIST}} | |||
Because even the most conscientious of editors will occasionally make spelling errors, there is a need to have '''bot enforcement''' of the spelling policy. A comprehensive list of the differences between British and American spellings has been compiled into a bot routine known as a "user-fix", so that all users may see what exactly the bot is checking for. | |||
== Problem words == | == Problem words == | ||
=== Words impossible for a bot === | === Words impossible for a bot === | ||
Line 40: | Line 41: | ||
=== Valid British spellings actively corrected === | === Valid British spellings actively corrected === | ||
Thanks to the ubiquity of American spellings in pop culture, there are a few cases where valid if archaic British spellings can be corrected to standard American spellings without the need for a forum decision. In such cases, modern British usage hews closely to the American, and clearly argues ''against'' more archaic forms. | Thanks to the ubiquity of American spellings in pop culture, there are a few cases where valid if archaic British spellings can be corrected to standard American spellings without the need for a forum decision. In such cases, modern British usage hews closely to the American, and clearly argues ''against'' more archaic forms. | ||
* '''Primaeval'''. Due to the presence of the modern television ITV/BBCA television series with which many ''[[Doctor Who]]'' fans will be familiar, as well as [[ | * '''Primaeval'''. Due to the presence of the modern television ITV/BBCA television series with which many ''[[Doctor Who]]'' fans will be familiar, as well as [[AUDIO]]: ''[[Primeval]]'', the unambiguously American spelling "wins" the contest. ''Primaeval'' will be actively corrected to ''primeval''. This shouldn't ruffle too many feathers, since modern British spell-checkers fail ''primaeval''. | ||
* '''Tranquillize/tranquillise/tranquilise/tranquilize'''. There are four different ways to spell this one damned word (and all words deriving from it). Ridiculous. The ''-ll'' versions are both okay in BrEng; the ''-l'' versions are both okay in AmEng. However, only ''one'' spelling passes ''modern'' British spell-checkers. Therefore ''tranquillise'' shall be deemed correct, and the bot will correct the other three spellings. | * '''Tranquillize/tranquillise/tranquilise/tranquilize'''. There are four different ways to spell this one damned word (and all words deriving from it). Ridiculous. The ''-ll'' versions are both okay in BrEng; the ''-l'' versions are both okay in AmEng. However, only ''one'' spelling passes ''modern'' British spell-checkers. Therefore ''tranquillise'' shall be deemed correct, and the bot will correct the other three spellings. | ||
* '''Tranquility Base'''. The bot will actively correct ''Tranquillity Base'' to the IAU-standard ''Tranquility Base''. | * '''Tranquility Base'''. The bot will actively correct ''Tranquillity Base'' to the IAU-standard ''Tranquility Base''. | ||
Line 62: | Line 63: | ||
* /1 means, "take whatever is in the first parentheses and put it here" | * /1 means, "take whatever is in the first parentheses and put it here" | ||
* /2 means, "take whatever is in the second parentheses and put it here" | * /2 means, "take whatever is in the second parentheses and put it here" | ||
* /3 means, "take whatever is in the | * /3 means, "take whatever is in the third parentheses and put it here" | ||
Thus, if we have the expression, | Thus, if we have the expression, | ||
Line 94: | Line 95: | ||
Not every word on our list has been switched using regex expressions. Sometimes it's easier just to type up a switch of literal characters, as when a word serves as the root of no other words. | Not every word on our list has been switched using regex expressions. Sometimes it's easier just to type up a switch of literal characters, as when a word serves as the root of no other words. | ||
== The | == The list == | ||
The following code is what's at the heart of our automated BrEng spelling enforcement. It was tested throughout 2011 and eventually completed its first run through the main [[namespace]] on 31 October 2011, with a secondary confirming run on 1 November 2011. | The following code is what's at the heart of our automated BrEng spelling enforcement. It was tested throughout 2011 and eventually completed its first run through the main [[help:namespaces|namespace]] on 31 October 2011, with a secondary confirming run on 1 November 2011. | ||
The exceptions bit at the very end is particularly important to its function on | The exceptions bit at the very end is particularly important to its function on Tardis. The list of exceptions is not currently organised in any way, but the bot really wouldn't work properly without exceptions. | ||
The most common British spelling that Spellbot | The most common British spelling that Spellbot 3.0 will not correct is "Honor" with a capital ''H''. It's fine with lower case ''honor'', but it's not yet been determined how to except for a name with a diacritic in it. Thus [[Honoré Lechasseur]] can't currently be excepted unless the bot simply ignores "Honor". This is doubly useful since, for reasons equally unclear, the exception for "[[Honor Blackman]]" is not being, well, ''honoured''. Since ''honor'' does not typically begin a sentence, this compromise is believed acceptable for the time being. | ||
This is the state of the code as it existed | This is the state of the code as it existed during the SpellBot run by [[User:SV7|SV7]] in the week of 12 March 2023. | ||
< | <syntaxhighlight lang="python" line> | ||
# | #SBOT version 3.0 | ||
#Enforces BrEng spelling, with exceptions | #Enforces BrEng spelling, with exceptions | ||
#relevant to the Doctor Who universe | #relevant to the Doctor Who universe | ||
#and usage found on tardis. | #and usage found on tardis.wiki | ||
#released under CC-BY-SA 3.0 license | #released under CC-BY-SA 3.0 license | ||
#by User:CzechOut | #by User:CzechOut | ||
#1 November 2011 | #Originally published: 1 November 2011 (CzechOut) | ||
#Current version: 17 March 2023 (SOTO) | |||
fixes['spelling'] = { | fixes['spelling'] = { | ||
Line 141: | Line 143: | ||
(r'([Aa])nesthesia',r'\1naesthesia'), | (r'([Aa])nesthesia',r'\1naesthesia'), | ||
(r'([Aa])nestheti(.?)',r'\1naestheti\2'), | (r'([Aa])nestheti(.?)',r'\1naestheti\2'), | ||
(r'([Aa])na?esthetiz(.?)',r'\1naesthetis\2'), #SOTO bug fix | |||
(r'([Aa])nalog( +)',r'\1nalogue\2'), | (r'([Aa])nalog( +)',r'\1nalogue\2'), | ||
(r'([Aa])nalogs',r'\1nalogues'), | (r'([Aa])nalogs',r'\1nalogues'), | ||
Line 164: | Line 167: | ||
(r'([Bb])aptiz(.?)',r'\1aptis\2'), | (r'([Bb])aptiz(.?)',r'\1aptis\2'), | ||
(r'([Bb])astardiz(.?)',r'\1astardis\2'), | (r'([Bb])astardiz(.?)',r'\1astardis\2'), | ||
(r'( | (r'([Bb])attleax( +)',r'\1attlee\2'), | ||
(r'( | (r'([Bb])alk(.?)',r'\1aulk\2'), | ||
(r'( | (r'([Bb])edeviled',r'\1edevilled'), | ||
(r'( | (r'([Bb])edevling',r'\1edevilling'), | ||
(r'(.?)( | (r'(.?)([Bb])ehavior(.?)',r'\1\2ehaviour\3'), | ||
(r'( | (r'([Bb])ehoove(.?)',r'\1ehove\2'), | ||
(r'( | (r'([Bb])ejeweled',r'\1ejewelled'), | ||
(r'(.?)([Ll])abor( +)',r'\1\2abour\3'), | (r'(.?)([Ll])abor( +)',r'\1\2abour\3'), | ||
(r'(.?)([Ll])abored',r'\1\2aboured'), | (r'(.?)([Ll])abored',r'\1\2aboured'), | ||
Line 188: | Line 191: | ||
(r'([Cc])esarean(.?)',r'\1aesarean\2'), | (r'([Cc])esarean(.?)',r'\1aesarean\2'), | ||
(r'([Cc])aliber(.?)',r'\1alibre\2'), | (r'([Cc])aliber(.?)',r'\1alibre\2'), | ||
(r'([Cc])aliper(.?)',r'\ | (r'([Cc])aliper(.?)',r'\1alliper\2'), | ||
(r'([Cc])alisthenics',r'\1allisthenics'), | (r'([Cc])alisthenics',r'\1allisthenics'), | ||
(r'([Cc])analiz(.?)',r'\1analis\2'), | (r'([Cc])analiz(.?)',r'\1analis\2'), | ||
Line 238: | Line 241: | ||
(r'([Cc])ollectiviz(.?)',r'\1ollectivis\2'), | (r'([Cc])ollectiviz(.?)',r'\1ollectivis\2'), | ||
(r'([Cc])oloniz(.?)',r'\1olonis\2'), | (r'([Cc])oloniz(.?)',r'\1olonis\2'), | ||
(r' | (r'([Cc])olor(.?)',r'\1olour\2'), | ||
(r'(.?)([Cc])olored',r'\1\2oloured'), | (r'(.?)([Cc])olored',r'\1\2oloured'), | ||
(r'(.?)([Cc])oloring',r'\1\2olouring'), | (r'(.?)([Cc])oloring',r'\1\2olouring'), | ||
Line 246: | Line 249: | ||
(r'([Cc])omputeriz(.?)',r'\1omputeris\2'), | (r'([Cc])omputeriz(.?)',r'\1omputeris\2'), | ||
(r'([Cc])onceptualiz(.?)',r'\1onceptualis\2'), | (r'([Cc])onceptualiz(.?)',r'\1onceptualis\2'), | ||
(r'([Cc]) | (r'([Cc])ontextualiz(.?)',r'\1ontextualis\2'), | ||
(r'([Cc])oz(.?)',r'\1os\2'), | (r'([Cc])oz(.?)',r'\1os\2'), | ||
(r'([Cc])ouncilor(.?)',r'\1ouncillor\2'), | (r'([Cc])ouncilor(.?)',r'\1ouncillor\2'), | ||
Line 310: | Line 313: | ||
(r'([Ee])nameling',r'\1namelling'), | (r'([Ee])nameling',r'\1namelling'), | ||
(r'([Ee])namor(.?)',r'\1namour\2'), | (r'([Ee])namor(.?)',r'\1namour\2'), | ||
(r'([Ee])ncyclopedi(.?)',r'\1ncyclopaedi\2'), | # (r'([Ee])ncyclopedi(.?)',r'\1ncyclopaedi\2'), | ||
(r'([Ee])ndeavor(.?)',r'\1ndeavour\2'), | (r'([Ee])ndeavor(.?)',r'\1ndeavour\2'), | ||
(r'(.?)([Ee])nergiz(.?)',r'\1\2nergis\3'), | (r'(.?)([Ee])nergiz(.?)',r'\1\2nergis\3'), | ||
(r'([Ee])nroll(.?)',r'\1nrol\2'), | (r'([Ee])nroll(.?)',r'\1nrol\2'), | ||
(r'([Ee])nrol(ed|ing)',r'\1nroll\2'), #tense exceptions -SOTO | |||
(r'([Ee])nthrall( +)',r'\1nthral\2'), #only enthrall is one l | (r'([Ee])nthrall( +)',r'\1nthral\2'), #only enthrall is one l | ||
(r'([Ee])paulet( +)',r'\1paulette\2'), | (r'([Ee])paulet( +)',r'\1paulette\2'), | ||
Line 430: | Line 434: | ||
(r'(.?)([Mm])agnetiz(.?)',r'\1\2agnetis\3'), | (r'(.?)([Mm])agnetiz(.?)',r'\1\2agnetis\3'), | ||
(r'(.?)([Mm])aneuver(.?)',r'\1\2anoeuvre\3'), | (r'(.?)([Mm])aneuver(.?)',r'\1\2anoeuvre\3'), | ||
(r'(.?)([Mm])anoeuvreed(.?)',r'\1\2anoeuvred\3'), #catching exception -SOTO# | |||
(r'([Mm])arginiliz(.?)',r'\1arginilis\2'), | (r'([Mm])arginiliz(.?)',r'\1arginilis\2'), | ||
(r'([Mm])arshaled',r'\1arshalled'), | (r'([Mm])arshaled',r'\1arshalled'), | ||
Line 546: | Line 551: | ||
(r'([Rr])evele(.?)',r'\1evelle\2'), | (r'([Rr])evele(.?)',r'\1evelle\2'), | ||
(r'([Rr])eveling',r'\1evelling'), | (r'([Rr])eveling',r'\1evelling'), | ||
(r'(.?)([Vv])italiz(.?)',r'\1\ | (r'(.?)([Vv])italiz(.?)',r'\1\2italis\3'), | ||
(r'([Rr])evolutioniz(.?)',r'\1evolutionis\2'), | (r'([Rr])evolutioniz(.?)',r'\1evolutionis\2'), | ||
(r'([Rr])hapodiz(.?)',r'\1hapodis\2'), | (r'([Rr])hapodiz(.?)',r'\1hapodis\2'), | ||
Line 556: | Line 561: | ||
(r'([Rr])umor(.?)',r'\1umour\2'), | (r'([Rr])umor(.?)',r'\1umour\2'), | ||
#SSSS# | #SSSS# | ||
(r'([Ss])aber(.?)',r'\ | (r'([Ss])aber(.?)',r'\1abre\2'), | ||
(r'([Ss])altpeter',r'\1altpetre'), | (r'([Ss])altpeter',r'\1altpetre'), | ||
(r'(.?)([Ss])anitiz(.?)',r'\1\2anitis\3'), | (r'(.?)([Ss])anitiz(.?)',r'\1\2anitis\3'), | ||
Line 564: | Line 569: | ||
(r'([Ss])candaliz(.?)',r'\1candalis\2'), | (r'([Ss])candaliz(.?)',r'\1candalis\2'), | ||
(r'([Ss])keptic(.?)',r'\1ceptic\2'), | (r'([Ss])keptic(.?)',r'\1ceptic\2'), | ||
(r'([Ss])cepter(.?)',r'\ | (r'([Ss])cepter(.?)',r'\1ceptre\2'), | ||
(r'([Ss])crutiniz(.?)',r'\1crutinis\2'), | (r'([Ss])crutiniz(.?)',r'\1crutinis\2'), | ||
(r'([Ss])eculariz(.?)',r'\1ecularis\2'), | (r'([Ss])eculariz(.?)',r'\1ecularis\2'), | ||
Line 610: | Line 615: | ||
(r'([Ss])wiveling',r'\1wiveling'), | (r'([Ss])wiveling',r'\1wiveling'), | ||
(r'([Ss])ymboliz(.?)',r'\1ymbolis\2'), | (r'([Ss])ymboliz(.?)',r'\1ymbolis\2'), | ||
(r'([Ss])ympathiz(.?)',r'\ | (r'([Ss])ympathiz(.?)',r'\1ympathis\2'), | ||
(r'(.?)([Ss])ynchroniz(.?)',r'\1\2ynchronis\3'), | (r'(.?)([Ss])ynchroniz(.?)',r'\1\2ynchronis\3'), | ||
(r'(.?)([Ss])ynthesiz(.?)',r'\1\2ynthesis\3'), | (r'(.?)([Ss])ynthesiz(.?)',r'\1\2ynthesis\3'), | ||
Line 644: | Line 649: | ||
#VVVV# | #VVVV# | ||
(r'([Vv])alor',r'\1alour'), | (r'([Vv])alor',r'\1alour'), | ||
(r'([Vv])andaliz(.?)',r'\1andalis'), | (r'([Vv])andaliz(.?)',r'\1andalis\2'), | ||
(r'(.?)([Vv])aporiz(.?)',r'\1\2aporis\3'), | (r'(.?)([Vv])aporiz(.?)',r'\1\2aporis\3'), | ||
(r'([Vv])apor( +)',r'\1apour\2'), | (r'([Vv])apor( +)',r'\1apour\2'), | ||
Line 661: | Line 666: | ||
(r'([Ww])esterniz(.?)',r'\1esternis\2'), | (r'([Ww])esterniz(.?)',r'\1esternis\2'), | ||
(r'([Ww])omaniz(.?)',r'\1omanis\2'), | (r'([Ww])omaniz(.?)',r'\1omanis\2'), | ||
(r'([Ww])oolen(.?)',r'\ | (r'([Ww])oolen(.?)',r'\1oollen\2'), | ||
(r'([Ww])oolies',r'\1oollies'), | (r'([Ww])oolies',r'\1oollies'), | ||
(r'([Ww])ooly',r'\1oolly'), | (r'([Ww])ooly',r'\1oolly'), | ||
Line 681: | Line 686: | ||
'comment', | 'comment', | ||
'center', | 'center', | ||
'color', | |||
'captiontextcolor', | |||
'gallery', | |||
'syntaxhighlight' | |||
], | ], | ||
'category': [ | 'category': [ | ||
Line 750: | Line 759: | ||
'Grigory', | 'Grigory', | ||
'Unauthorized Guide', | 'Unauthorized Guide', | ||
'Honor Blackman', #this isn't being excpted and i don't know why | 'Honor Blackman', #this isn't being excpted and i don't know why# | ||
'Medal of Honor', | 'Medal of Honor', | ||
'Arborge Quince', | 'Arborge Quince', | ||
'program', #need a forum discussion here | 'program', #need a forum discussion here# | ||
'programs', | 'programs', | ||
'reprogram', | 'reprogram', | ||
Line 773: | Line 782: | ||
'align= center', | 'align= center', | ||
'align =center', | 'align =center', | ||
' | 'position=center', | ||
'</ center>', | '</ center>', | ||
'{{color', #SOTO | |||
'{{Color', #SOTO | |||
'Encyclopedia of Fantastic', | 'Encyclopedia of Fantastic', | ||
'themonster', | 'themonster', | ||
Line 786: | Line 796: | ||
'appalling', | 'appalling', | ||
'appalled', | 'appalled', | ||
'Splendorosa', | |||
'Demeter', | 'Demeter', | ||
'cemetery', | 'cemetery', | ||
Line 794: | Line 805: | ||
'Kennedy Space Center', | 'Kennedy Space Center', | ||
'Center', | 'Center', | ||
' | 'Catalog', | ||
'Chilitern', | 'Chilitern', | ||
'chemotherapy', | 'chemotherapy', | ||
'Chemothreapy', | 'Chemothreapy', | ||
'Colorado', | |||
'previsualization', | |||
'Scarborough', | 'Scarborough', | ||
'Akoshemon', | 'Akoshemon', | ||
Line 819: | Line 832: | ||
'Taplow', | 'Taplow', | ||
'Fyodor', | 'Fyodor', | ||
'Target Practice', | |||
'target practice', | |||
'Synthesizing Starfields', #doesn't appear to work | 'Synthesizing Starfields', #doesn't appear to work | ||
'Pearl Harbor', | 'Pearl Harbor', | ||
'Mercury Theater', | 'Mercury Theater', | ||
'Event Synthesizer', | 'Event Synthesizer', | ||
'bgcolor', | |||
'blasphemous', #dunno why this is being triggered as blasphaemous | 'blasphemous', #dunno why this is being triggered as blasphaemous | ||
'grams operator', #not sure this is a real word, but it appears on DMP | 'grams operator', #not sure this is a real word, but it appears on DMP | ||
'Parallelogram', #Work by SOTO from here on out...# | |||
'parallelogram', | |||
'Color Assists', | |||
'Colorist: ', | |||
'colorsport.co.uk', | |||
'The Armored Creature of 004X', | |||
'-an-unauthorized-guide-to', | |||
'-the-unauthorized-guide-to', | |||
'Department of Defense', | |||
'instagram', | |||
'Instagram', | |||
'Cozens', | |||
'thecozens', | |||
'Dougray', | |||
'Plowman', | |||
'plowmanal', | |||
' smiter', | |||
' Smiter', | |||
], | ], | ||
} | } | ||
} | } | ||
</ | </syntaxhighlight> | ||
edits