User:CzechOut/Bot tricks

The following are a list of tricks I've learned while using pywikipedia.

add_text.py

One of the harder things to do with bots is to work on pages that have no categories. This is because bots depend upon categories for many of their functions. However, bots can be used on pages without categories, as long as you go about things creatively.

If you have a user who is constantly uploading pictures without licenses, it may be easiest just to look for their work, to the exclusion of other people. Here's a run that'll look for only their additions to the file namespace:

python add_text.py -text:"{{bbcvidcover}}" -namespace:6 -usercontribs:"Doctor Who 63"  -except:"\{\{[Bb]bcvidcover"

Note that this goes through all their work in namespace 6. So it doesn't look at only their unlicensed work in that namespace. Note that the parameter -uncatfiles doesn't actually help, here. It doesn't hurt, but it doesn't actually confine the search to just those things in namespace 6 modified by Doctor Who 63 which are also uncategorised.

However, -uncatfiles is helpful if you don't have that many files to look after. This is what you use if you just want to add {{bbcvidcover}} to pages that aren't categorised.

python add_text.py -text:"{{bbcvidcover}}" -uncatfiles

Course, this is a slow way to go about things, because you probably won't want to add a single template to all the uncategorised files. If you want to filter things a bit, you can instead try to find patterns in the titles of the uncategorised files.

-titleregex:

allows you to make up your own matching rules. But if you can see a quick and dirty pattern at the beginning of a filename, you might try this instead:

-prefixindex:"File:<whatever>"

This method is perfect for quickly licensing achievements badges, because they ll start with the term "File:badge".

What if you want to replace something about a title that has both exclamation points and single quotes for italics? This is pretty dicey, because the exclamation point has to be escaped, and you've got to figure out a way to get around the single quotes. Here's a useful expression:

python replace.py -summary:"see [[forum:Prefix war: Doctor Who Adventures vs. Doctor Who Annuals]]: DWAN --> DWS" -regex "\[\[DWAN\]\]\: ''\[\[Grand Theft Planet(\!\]\]'')" "[[DWS]]: ''[[Grand Theft Planet\1" -ref:'Grand Theft Planet!'

Note what's going on here. the -ref line must be in single quotes. The regex for the original term must have parentheses around the part of the page name that's causing the most difficulty, so that it can just be dumped into the replacement term as a \1. After trying for a bit, I couldn't find anything else that worked in command line operation of the bot. Of course, my guess it that you might well need something like this, even if using a user-fix.

Cleanup after pagefromfile.py

python replace.py "\<(.*)(\[\[.*\]\])(.*)\>" "{{hidecat}}\2" -regex -subcatsr:"Articles containing potentially dated statements from 2015"

This would kep any categories you have in the sea of code that's unfortunately generated by pagefromfile.py.

Update: Actually, it turns out that the code is only generated when using an .xml file. If you instead just use an .rtf or, better, a regular .txt file (with Unicode 8), things work out nicely.

Update again:

.txt files with Unicode 8 are the very best option.  If your text includes symbols like curly braces (as with template calls), you absolutely need a plain .txt file.

Regex snippets

This gets rid of empty sections, in this case External link/links:

-regex "\=\= External .*\n''.*''"

This replaces a multi-line series variable with a singular one, while at the same time preserving spaces between "series" and the = sign:

python replace.py -regex 'series( *)=.*' "series\1=[[DWM comic stories|''DWM'' comic stories]]|" -summary:"Only one linked item per series variable.  Otherwise, it's VERY unclear what the previous/next line refers to" -cat:'Fourth Doctor DWM comic stories'

To automatically add brackets around things, generally use bracket. However, for dab pages, where you have a list of things all starting with the same letters, use this:
```
python replace.py -regex -page:"Pagename" "StartingString(.*)\r" "* [[StartingString\1]]"
```

Pagesfromfile: creating pages based on dabbed titles

Strip the dab with

python replace.py -regex "(.*)\((.*)\)\r" "\1" -page:User:CzechOut/Sandbox10

Add on the "right side" of coding necessary to use pagefromfile.py:

python replace.py -regex "\r" " comic story images'''\n[[Category:TVC comic story images]]\nyyyy\nxxxx" -page:"User:CzechOut/Sandbox10"

Add to the "left side":

python replace.py -regex "\n(.*)'''" "'''Category:\1'''" -page:"User:CzechOut/Sandbox10"

If starting with a list of names of stories, the results will go from:

Story name

to

'''Category:Story name comic story images'''

yyyy

xxxx

Simple duplication of an entry on a list

To create a duplicate on a list

 python replace.py -regex "(.*)\n" "\1\n\1 (comic story)" -page:User:CzechOut/Sandbox10

Log file --> something usable by movepages.py

Paste logfile onto a page, like user:CzechOut/Sandbox10

Get rid of the "Getting" statements with

python replace.py -regex "Getting.*\n" "" -page:User:CzechOut/Sandbox10

Get rid of everything that's already disambigged with

python replace.py -regex "\n(.*)\)\r" "" -page:User:CzechOut/Sandbox10

Create duplicates of each name, then add (disambiguation term) with

 python replace.py -regex "(.*)\r" "\1\n\1 (comic story)" -page:User:CzechOut/Sandbox10

Put brackets on the right side with

python replace.py -regex "(.*)\r" "\1]]" -page:User:CzechOut/Sandbox10

Put brackets on the left side with

python replace.py -regex "(.*)\]\]" "[[\1]]" -page:User:CzechOut/Sandbox10

Depending on the number of items on your list, the last two steps can take a long time. It'll look like the bot is frozen, but it's not.

HTML bullet stripper

To strip HTML tags do this:

python replace.py -regex "<ul>|<\/ul>|<li>|<\/li>" "" -cat:"Doctor Who (2005) television stories" -summary:"getting rid of bulleting in infobox"

This will then leave you with a series of links directly abutting each other.

python replace.py -regex -summary:"putting commas between links" "\]\[" "], [" -subcat:"Doctor Who (2005) television stories"

This will put a comma and a space between two abutting links.

python replace.py -regex -summary:"putting commas between links" "\)\[" "), [" -subcat:"Doctor Who (2005) television stories"

This will take care of those few instances of a parentheses abutting a link

Stripping a variable of its link

Many times it's better to have an unlinked variable than a linked one. To strip an existing variable of its linkage, do the following:

python replace.py -regex -summary:"stripping prev/next story, adding dab for better link" 'previous story( *)=(.*)\[\[(.*)\]\]' "previous story\1=\2\3 (TV story)" -cat:"Doctor Who (1963) television stories"

That works fine, as long as people have actually built the infobox in the "correct" way, i.e. one variable per line. But if they squash it all down so that the infobox and entire text of the article is on one line, the regex is far too greedy and will create unexpected replacements. The following is much better:

python replace.py -regex -summary:"stripping prev/next story, adding dab for better link" 'next story( *?)=(.*?)\[\[(.*?)\]\]' "next story\1=\2\3 (TV story)" -subcat:"television stories"

The quick and nasty way to build huge lists of stories

Let's say you have a list of stories with improper disambiguation terms. Or maybe a list without disambiguation terms at all. Instead of typing everything out by hand, like ya did with the British spell checker, use regex to instantly deliver a list that you can immediately plug into a user-fix.

python replace.py -page:user:CzechOut/Sandbox13 -regex "(.*?)\(comic story\)" "u'\1(short story)', u'\1(comic story)',\n"

What this does is take raw dump of un-linked text — in this case, things ending in (comic story). It then strips (comic story), and adds the basic structure for user-fix.py replacements. This will then correct every instance where a story has been misidentified as a (short story) and convert it to a proper (comic story). Obviously here, we're using u instead of r cause there's no regex to this replacement. It's totally literal, allowing us to use u.

Creating mass categories

python replace.py -regex -page:User:CzechOut/Sandbox14 "\n(.*?)\r" "'''Category:\1'''\n{{ImageLink}}\n{{TitleSort}}\n[[Category:CON images]]\nyyyy\nxxxx\n"

Coverting {{{appearances}}} to {{{only}}}

You'll have to go on semi-automatic, but this'll do the job:

python replace.py -regex -summary:"converting {{{appearances}}} to {{{only}}}" "appearances( *?)=( *?)\[\[(.*)\]\]\:( *?)''\[\[(.*)\]\]''\r" "only\1=\2\5"  -page:"Dave Finn"

Getting rid of whole sections

Here's an example of how to get rid of a whole section. It depends on knowing the format the section is in, however. Any sections called "Timeline" that deviate from this pattern won't be affected.

python replace.py -regex "\=\= Timeline \=\=\r.*\n\*.*\r.*\n\*.*\n" "" -summary:"Getting rid of timeline sections per [[Forum:Timeline sections on pages]]" -catr:"stories"

Using API to generate quick lists

Derive a list of things. This example will give you a list of all user blog comments:

http://tardis.wikia.com/api.php?action=query&list=allpages&apnamespace=501&default=500&aplimit=1000

Then, cut and paste results over at User:CzechOut/API. Then, run the following two strippers:

LEFT SIDE STRIP 

python replace.py -regex '( +?)\<p pageid\=\"(.*?)\" ns\=\"501\" title\=' '' -page:User:CzechOut/API

RIGHT SIDE STRIP

python replace.py -regex '\" \/\>' '' -page:User:CzechOut/API

You'll end up with a list to put into TextEdit. Convert to UTF-8 and save as a .txt file. That then lets you do the following final step:

python replace.py delete.py -file:Filename.txt

Stripping double vertical spaces

python replace.py "(\n\r)(\n\r)" "" -page:"The End (audio story)" -regex

Fixing specific stories' prefixes

Begin by getting rid of the junk that pagegenerators.py creates:

python replace.py -regex "Getting.*\n" "" -page:User:CzechOut/Sandbox10

Then move on to create your user-fix. This takes into account citations that have the dab term, and those that don't (but it leaves behind a dab termed reference)

python replace.py -page:user:CzechOut/Sandbox10 -regex "(.*?)( +?)\((comic story)\)" "(r'\[\[DWM\]\]\: \'\'\[\[\1\2\\(\3\)\|\1\]\]\'\'', r'[[COMIC]]: ''[[\1\2(\3)|\1]]'''),\n(r'\[\[DWM\]\]\: \'\'\[\[\1\]\]\'\'', r'[[COMIC]]: ''[[\1\2(\3)|\1]]'''),\n"

Then you need to make the single quotes on the replacement expression turn into double quotes, or the replacement won't be able to replace the single quotes used to denote italics.

LEFT SIDE

python replace.py -page:user:CzechOut/Sandbox10  "r'[[" 'r"[['

RIGHT SIDE

python replace.py -page:user:CzechOut/Sandbox10  "')," '"),'

Cut and paste the results of user:CzechOut/Sandbox10 into user-fixes.py, and you're off to the races.

Switching wikilinks for templates

python replace.py -regex "\[\[[Tt]he Master \(UNIT years\)(.*?)\]\]" "{{Delgado}}" -ref:"The Master (UNIT years)"

Making sure that stubs have tags

Some people like to put stubs directly into a category rather than using a proper stub template. To fix this problem, first add the stub to everything in the right category.

python add_text.py -regex -text:"{{TV cast stub}}" -except:"\{\{[Tt]V cast stub\}\}" -category:"TV cast stubs" -summary:"adding stub tag"

Then, you need to go back and strip the category that was mistakenly put on the page:

python replace.py -regex "\[\[[Cc]ategory:[Tt]V cast stubs\]\]\r\n" "" -category:"TV cast stubs"

Creating date pages the right way

In addition to other things that need to be dumped on date pages, don't forget to make sure that {{DayNav}} only appears on the day page in question. You don't want it transcluding over on to Transmat pages.

{{#ifeq:{{PAGENAME}}|{{subst:PAGENAME}}|{{DayNav}}|}}

{{DayNav}} will probably also need a little rejiggering to ignore dab terms. It might have to process page names through {{StoryTitle}}, or something very like {{StoryTitle}}.

Refreshing pages

Not strictly a bot trick, but it is a Terminal thing: to refresh a page that's just not serving properly (like a CSS) file, go into Terminal and perform the following curl

curl -X purge "http://url.url.com"

That's a capital X. If yout want to get headers, use -I instead

Fix to js

This is what I'm using:

$('#WikiaRail').bind('DOMNodeInserted', function(event) { //fires after lazy-loading takes place.  if ($('#WikiaRecentActivity').size()) { //check that #WikiaRecentActivity has been loaded if (!$('#mosbox').size()) { //check to make sure it hasn't already been added. $('#WikiaRecentActivity').before( **add your stuff here** ); } }  }); //end of DOMNodeInserted block

So, where i said "**add your stuff here**", this will work:

$('#WikiaRecentActivity').before(comboString2);

Obviously, you can just stick a second block in there for your twitter feed too

Anonymous

Search

User:CzechOut/Bot tricks

Namespaces

More

Page actions

Contents

add_text.py

Cleanup after pagefromfile.py

Regex snippets

Pagesfromfile: creating pages based on dabbed titles

Simple duplication of an entry on a list

Log file --> something usable by movepages.py

HTML bullet stripper

Stripping a variable of its link

The quick and nasty way to build huge lists of stories

Creating mass categories

Coverting {{{appearances}}} to {{{only}}}

Getting rid of whole sections

Using API to generate quick lists

Stripping double vertical spaces

Fixing specific stories' prefixes

Switching wikilinks for templates

Making sure that stubs have tags

Creating date pages the right way

Refreshing pages

Fix to js

Navigation

Navigation

Topical pages

Other useful pages

Community

Wiki tools

Wiki tools

Anonymous

Search

User:CzechOut/Bot tricks

add_text.py

Cleanup after pagefromfile.py

Regex snippets

Pagesfromfile: creating pages based on dabbed titles

Simple duplication of an entry on a list

Log file --> something usable by movepages.py

HTML bullet stripper

Stripping a variable of its link

The quick and nasty way to build huge lists of stories

Creating mass categories

Coverting {{{appearances}}} to {{{only}}}

Getting rid of whole sections

Using API to generate quick lists

Stripping double vertical spaces

Fixing specific stories' prefixes

Switching wikilinks for templates

Making sure that stubs have tags

Creating date pages the right way

Refreshing pages

Fix to js

Navigation

Wiki tools

Page tools