Forum:Bot proposal: Automatic cleanup of Visual Editor failures

From the RuneScape Wiki, the wiki for all things RuneScape
Jump to: navigation, search
Forums: Yew Grove > Bot proposal: Automatic cleanup of Visual Editor failures
Archive
This page or section is an archive.
Please do not edit the contents of this page.
This thread was archived on 6 November 2012 by Iiii I I I.

I would like to propose a fully automated unsupervised bot that would clean up various Visual Editor failures.

Among the failures that would be fixed by the bot would be:

Among the common failures that would not be fixed by the bot, there would be the tendency for the Visual Editor to slip one letter into a link or out of it, or remove one space after a wikilink in edits. For example, after successive Visual Editor edits:

[[Bandos chestplate]] is a
[[Bandos chestplate]]is a
[[Bandos chestplate]]isa

or

[[Bandos chestplate]] is a
[[Bandos chestplate|Bandos chestplat]]e is a

I do not know how to detect and correct these automatically without false positives.

The bot would watch Special:Recentchanges and, when it finds an article edit, wait from 4 to 15 minutes before coming in and making its cleanup on the same article, resetting the timer if another edit is seen. This delay ensures that users don't get more edit conflicts than they should, that articles in construction are not edited too fast, and that rollback still works for counter-vandalism purposes.

The delay could be adjusted automatically, looking for the swiftness of vandalism reversion in the Recentchanges of the preceding hour: 4 minutes if reversions are swift, and 15 minutes if they aren't or there were no reversions.

Full source would be provided under the GPLv2 or GPLv3, of course.

Did I miss any other edits I could do automatically? What about the edit delay? Is this even viable or desirable? Discuss below.

 a proofreader ▸ 

06:53, October 18, 2012 (UTC)

Discussion

Support - We should just burn visual editor until nothing is left, and then burn it some more. Unfortunately, I guess we'll have to stick with mopping up after whatever mess it creates. I approve this bot! 222 talk 07:19, October 18, 2012 (UTC)

Support - New wiki editors are probably using the visual editor to edit pages, but they may not notice that this will mess up the edits in source mode. Still, some kind of bots is needed to correct their mistakes repeatedly before these new editors can understand how to edit in source mode. We should give compassion to these potential editors. Explore and enjoy the world! TIMMMO Work it with all my heart!++Discuss Sign 15:12, October 18, 2012 (UTC)

Support burning of Visual Editor and proposal - Off the top of my head, I can't think of anything you've missed. 15 minutes sounds like a good time to wait for the bot to fix up stuff. Blaze_fire.png12.png 16:04, October 18, 2012 (UTC)

Support - I'd prefer it if we could just lose visual altogether, but I understand that isn't a solution. A bot to clean up after it seems like a good alternative solution. Small recharge gem.png AnselaJonla Slayer-icon.png 16:08, October 18, 2012 (UTC)

Another failure - As seen in Captured Temple r6437155, two consecutive links pointing to the same page could be merged.

 a proofreader ▸ 

18:39, October 18, 2012 (UTC)

Comment - I don't know about whatever language this bot is going to be written in, but you might want to use this JS code for reference to base your code on. It is flawless at fixing the things (but does need regex). The script can be seen in action at http://jsfiddle.net/aVZQs/. If you are able to view the old revision too, you could use the script at http://jsfiddle.net/FMjCu/2/ for reference. That is an (almost) flawless script which checks if any link that has non-whitespace characters immediately after it occurs in the old revision too, but then with a space between the closing ]s and the text after it. See the demo for an example on line 1 of the example text boxes. I hope this helps you develop your bot, and gives you an idea of how it could be possible to do. I don't know Java or any other bot-making languages, so I'm afraid JS is the closest to a programming language in which I can explain what would work. Good luck :D JOEYTJE50TALKpull my finger 19:46, October 18, 2012 (UTC)

Support - So many of those little Cs or long internal links go unnoticed, and with this, we'd finally have a way to crack down on them. If it can work well (and not accidentally remove information in the process), this could help enormously! And if anyone could make it, it's you. Good luck, Proofie! ɳex undique 19:58, October 18, 2012 (UTC)

Support - This will help us remove all those span tags when editing. I like this idea — Jr Mime (talk) 20:01, October 18, 2012 (UTC)

Little span tags? I've seen some of those around, now that you mention it, but I don't remember whether they followed a certain pattern. Can you explain?  a proofreader ▸  20:10, October 18, 2012 (UTC)
Look for rgb (0,0,0); any real editor would use #000 MolMan 20:12, October 18, 2012 (UTC)
Thats not strictly true, although admittedly I've only noticed rgba rather than rgb --Henneyj 23:07, October 18, 2012 (UTC)
I seen a couple of those, they are rare and annoying to remove. — Jr Mime (talk) 23:08, October 18, 2012 (UTC)
They contain a font-family as well. MolMan 23:09, October 18, 2012 (UTC)

Comment - What about the random p/span tags tags? Iirc the span tags tend to have nothing in them, whilst the p tags use the default setting for line height, font-size, etc. I would image scanning new user contribs would throw up a few of these RTE errors. cqm 21:32,18/10/2012 (UTC) (UTC)

All I see in newbie contribs is a [[File:Placeholder|300px]]. If I come across an odd p, I'll report the diff here. Can you do the same please?  a proofreader ▸  21:52, October 18, 2012 (UTC)
Placeholder is the default new page (sadly) not visual editor. MolMan 23:09, October 18, 2012 (UTC)
http://runescape.wikia.com/wiki/Hidden_updates?diff=6437343&oldid=6437334 Like so? --Henneyj 23:13, October 18, 2012 (UTC)
That's honestly the most random glitch, only seen it maybe twice. MolMan 23:15, October 18, 2012 (UTC)
I've seen it way more often than twice. It used to be a very common bug, but even though it's less common now, it still occurs. JOEYTJE50TALKpull my finger 16:24, October 19, 2012 (UTC)
That may be because most occurrences of it are eaten by the AbuseFilter. See abuse log entry 41116 (which apparently only sysops can see?) for one that didn't occur on 2012 Hallowe'en event but could have occurred without an AbuseFilter. There are probably way more.  a proofreader ▸  06:48, November 3, 2012 (UTC)
Skulgrimen r6451864 has one of the odd spans.  a proofreader ▸  23:31, October 21, 2012 (UTC)
Ya, that's an example. I still haven't seen one where a font-family is identified as well; when I do I'll post it here, assuming no one else has by then. MolMan 23:34, October 21, 2012 (UTC)
2012 Hallowe'en event r6496724 has one of the odd spans with an explicit line-height.  a proofreader ▸  23:07, October 29, 2012 (UTC)
blurp MolMan 23:22, October 29, 2012 (UTC)

Support - I [[Suppor|Suppo]]rt this. Hair 23:49, October 18, 2012 (UTC)

Support - You mean [[Support|Sup]][[Support|o]][[Support|r]]t? MolMan 16:58, October 19, 2012 (UTC)
You forgot a p. HaidroH rune.pngEagle feather 3.pngCandle (blood red).png 1XqyDNM.png Crystal triskelion fragment 3.pngHazelmere's signet ring.png 02:13, October 20, 2012 (UTC)
No, I meant "suport"; Hairr is bad spella. MolMan 02:15, October 20, 2012 (UTC)

Support - You've done it again... HaidroH rune.pngEagle feather 3.pngCandle (blood red).png 1XqyDNM.png Crystal triskelion fragment 3.pngHazelmere's signet ring.png 06:04, October 19, 2012 (UTC)

Support - {C if you can fix {C it. ajr 12:42, October 19, 2012 (UTC)

Question - Why do we even have the RTE enabled? I think we should consider whether or not we want it in the first place. In source mode there are guides and buttons above and below to help wikimarkup newbs, and I've asked -- it's possible to ask to have it turned off. Or, we could use AWB to put __NOWYSIWYG__ on a list from Special:AllPages. Michagogo (talk) 12:55, October 19, 2012 (UTC)

Since when is it possible to ask for our wiki to only use source mode? I was under the impression that the RTE, along with other site changes (nav bar, oasis), was not open to change. sssSp7p.pngIjLCqFF.png 14:24, October 19, 2012 (UTC)
I think visual mode is much friendlier to new editors and is probably more likely to encourage new people to contribute. Source may be better if you want knuckle down to the serious stuff but if you're a reader who doesn't edit and want to correct a short passage of text, visual is more approachable and manageable. --Henneyj 16:51, October 19, 2012 (UTC)
There was a thread about this, I can't find it, but it was closed mainly because new editors would have no idea how to edit. HaidroH rune.pngEagle feather 3.pngCandle (blood red).png 1XqyDNM.png Crystal triskelion fragment 3.pngHazelmere's signet ring.png 02:13, October 20, 2012 (UTC)
This might be what you're looking for. As Liquidhelium said, "Second of all, disabling the RTE requires that all editors have some knowledge of wiki markup, which is certainly not the case. While I dislike the RTE and have disabled it on my account, I realize that it helps new users by providing a more user-friendly method for editing". Blaze_fire.png12.png 01:16, October 27, 2012 (UTC)

Support - Aha, so that's what that random crap is. What about double/triple etc. categories? User_talk:Fswe1 Fswe1 Brassica Prime symbol.png 07:30, October 27, 2012 (UTC)

That can be the case sometimes. But often, it's the fault of the abuse filter. How it does that, I cbf to explain; just trust me ;) MolMan 17:02, October 27, 2012 (UTC)
The double categories bug is because the abusefilter warns them about something, and the rich text editor renders the categories again (because the categories are rendered seperately), causing it to show up multiple times. This is still a RTE bug, but it does happen when the abusefilter warns the user. JOEYTJE50TALKpull my finger 15:10, October 29, 2012 (UTC)
Not to be a dick, but the abusefilter may add categories regardless of using RTE of not. It's done so to me. MolMan 22:43, November 1, 2012 (UTC)
According to the most recent staff bug fix blog, this has now been fixed anyway. cqm 01:26, 2 Nov 2012 (UTC) (UTC)

Support - Yes, yes, a thousand times yes. User:Exor Solieve 02:19, October 29, 2012 (UTC)

Comment - What about the RTE converting things like spaces to their html equivalents eg. &|gt; as opposed to >. I would draw the line at character not found on a 'normal' keyboard, such as ¿, ö and å but that's just personal preference and very much based on your perception of a normal keyboard. A spanish user would likely have a ¿ on their keyboard where and american would likely not. In the case of ö, an unsuspecting editor might be confused by Sköll if they saw &|ouml; when using source mode, so perhaps it's better if they get converted too.

On the other hand this may be completely outside the scope of the bot and AWB should be able to do that particular task with no/little effort required. cqm 17:32, 31 Oct 2012 (UTC) (UTC)

I don't know about &lt;, &gt; and &amp; to be honest; the last time I tried to replace those automatically, it made a tag example – say "<includeonly>" – into the includeonly tag itself, invoking it on the wiki. As of late, though, I see more and more stray &apos; and &nbsp; entities...  a proofreader ▸  22:15, October 31, 2012 (UTC)
I'm running Cåmdroid to test whether replacing these is a viable option. The list of codes it will change into characters can be seen here. cqm 23:40, 31 Oct 2012 (UTC) (UTC)
It seems to run good. But, god, those RTE editors are changing all ' to amps or ampos... — Jr Mime (talk) 23:42, October 31, 2012 (UTC)
I've fixed a fair few of the &apos; errors. Curiously, I seem to be running into a duplication of other language links with HTML entities in. With AWB's general fixes activated the duplication is removed in mainspace, but for some reason not in Beta pages. I'm also picking up on what I think is a relic of Ajrbot - Something like Movario&#39;s notes - found in infoboxes usually which seems to be from converting {{PAGENAME}} to text not dependant on the page name. cqm 01:26, 2 Nov 2012 (UTC) (UTC)

Comment - Drop rate, as of r6232510, contains a legitimate {C} ... in a math tag. I'll need to require that a {C} be contained in no tag or only a certain list of whitelisted HTML tags for it to be replaced. Keeping track of a blacklist (nowiki, pre, source, math, Saradomin knows what else) will be much more work and testing!

 a proofreader ▸ 

21:46, November 1, 2012 (UTC)

Proposal - Not that I use IRC or anything… but how about a way to sic the bot onto a page that's already been checked for vandalism, will not be a large edit, and is ready for the cleanup that no one wants to do? MolMan 22:43, November 1, 2012 (UTC)

Something like {{compress}}? Assuming the bot doesn't discriminate between lazy, experienced editors and those using visual I don't see the need as it has a time after which it edits anyway. Something like Hallowe'en event 2012 poses more of a problem (with it's current level of activity). There could be all sorts of errors introduced by successive anon edits that we are too lazy or inactive to fix and there is no discernable opening for the bot to fix the errors. cqm 01:26, 2 Nov 2012 (UTC) (UTC)
Well, for 2012 Hallowe'en event there is a window of opportunity about 1 hour after the start of every wave, when all the locations have been written. Until then, there will be odd spans and things.  a proofreader ▸  01:48, November 2, 2012 (UTC)
I was thinking more like !FIXNAOYOUFOOL <page name> MolMan 01:50, November 2, 2012 (UTC)

Testing - Since 1 November I have started to code the bot, without edits, to test how well it works, detecting undoes and reversions and telling me what it would fix in pages. Using a worst-case test on the sandbox, r6530423, I determined that the code works, but the category duplication code could use a bit of logic to remove superfluous whitespace left between removed categories. The recent changes monitor code also kept crashing with null reference exceptions, but I believe I have that working better now.

I would like to ask the readers of this thread to stop removing superfluous spans, URLs that are too long, and duplicated categories in articles so the test can go more smoothly. I will request closure in a few days.

 a proofreader ▸ 

19:35, November 4, 2012 (UTC)


This request for closure is complete A user has requested closure for Bot proposal: Automatic cleanup of Visual Editor failures. Request complete. The reason given was: Fries are done.  a proofreader ▸  03:21, November 6, 2012 (UTC)

Closed - Fries are done. --Iiii I I I 03:39, November 6, 2012 (UTC)