$theTitle=wp_title(" - ", false); if($theTitle != "") { ?>
dabbling, frivolling, idling, loafing, loitering, playing and procrastinating
30 Sep // php the_time('Y') ?>
I have previously written about vandalism on Wikipedia when I was constructing Wikipedia Vandalism Watch. After continued watching and reverting on Wikipedia I started to notice trends in the vandalism, and that most malicious edits almost always included certain phrases and symbols. Some of these are hard to separate out from real edits, and others are not. The most used and easiest to detect phrase is the good old “!!!!!”. So I toyed with the idea of incorporating some sort of Recent Changes RSS feed reader, and then adding in some text checks, but with Wikipedia Vandalism Watch being a read only client, i.e. just shows you edits from flagged users I didn’t think this to be much use. So instead I went about creating a new application, this time an unattended robot to scan for, and automatically remove vandalism.
Admittedly this sounded much easier than it actually is when I started, with various hurdles getting in the way, one example of which is the Wikipedia token editing. Which means you can’t just send a POST request to the server to edit an article. Once things like that had been solved things slowly fell into place. About half way through development I started the Request for Approval process. This is done by the Bot Approvals Group, as this allows them to regulate the standard that bots are made to before you are allowed to run them on Wikipedia. But of course, I can see you asking already, “it’s Wikipedia, and public web-site, how could they stop me?”. Well the simple answer to that one is that if a bot is spotted without being approved, the bot, and your account will be banned straight away. Bots without approval are not welcome on Wikipedia in any shape or form, as unless its been approved it could at any moment go off the rails and destroy many articles before someone could either turn it off or ban it. It’s also the ruling with approved bots that if your bot does the same thing, the owner is responsible and he or she should clean up the mess.
After some discussions with some members of the Bot Approvals Group of how the bot worked, and how it detected vandalism I was granted a trial of 50 edits in the main name space. This trial was completed quite slowly due to the restrictions that were applied to my bot, of 2 edits per minute for 40 reverts, then for the last 10 reverts, I could let it run as fast as it could find vandalism. After a few false positives along the way, and some conflicts, the trial was completed.
At the time of writing, the bot has completed the trial and is currently awaiting news and information on how to proceed. If you want some more technical information, or just to see what the bot is currently up to, check out its User page on Wikipedia.
28 Aug // php the_time('Y') ?>
Wow.. been a while since I have posted on my blog… I wonder if anyone actually reads it. Anyway, the reasons of why I haven’t been posting will be addressed in another article, which will be… sometime :P.
So, anyone who reads or uses Wikipedia knows that one of the problems of allowing anyone to edit articles is exactly that. It is plagued by idiots editing, defacing and blanking documents at an almost guaranteed once a minute. Recently after trying to browse an article, instead of receiving the information I wanted, I ended up with some bastardised version with various spews of wisdom from a random individual. After managing to work out how to undo this and revert the article back to its last normal state I got into the habit of fixing random articles as they were vandalised. Soon enough after I had been reverting articles I noticed a pattern with people vandalising the pages, if someone’s defaced one article chances are they have (or will) do the same to another.
Now on Wikipedia this is easy enough to do as you just watch the users “Contributions” page, and see what they change. However if you are monitoring (or patrolling as the Wikipedians like to call it) the Recent Changes page, you may end up with about 10-20 editors in the space of 30 minutes of watching, and keeping track of all of their changes is somewhat problematic as refreshing that many contribution pages is just a waste of time. By the time you have refreshed them all you’d have to start again instantly just to keep up.
With Wikipedia now reaching nearly 2 million English articles, this problem of people thinking its smart or clever to randomly change and deface articles will only increase. So I have decided to try and do something to help out…
Wikipedia Vandalism Watch v1.0.0.1
This is a Windows application coded in Visual C#, and its sole purpose is to monitor specified users’ contributions pages. If it detects a top edit (i.e. what is currently being displayed to the world) within the past 10 minutes it will flag it for you, and allow you to directly open the diff’s for the specific article. All you need to do is add users who you think will deface an article and it will monitor them for you. When it scans for edits it will present its results in a table so you can easily see what has changed.
If you have no-one to monitor and you just want to see how the program works add the user “ClueBot” and select “Force Scan” from the menu in the top left. This will usually have at least 1 top edit every 10minutes and will provide results for you to see. ClueBot is an automated bot which attempts to fix various obvious vandalism such as page blanking.
Thanks go to…
Murray-Mint for some tips and pointers of how to start coding in C#
Cheez for helping with the regular expression of doom for finding edits
Download
This application comes with no guarantees and therefore, you use this software at your own personal risk.
Download (78kb) - Wikipedia Vandalism Watch - wikipedia-vandalism-watch_v1.0.0.1.zip