Forum:RSW image renamer: Tweaks

From the RuneScape Wiki, the wiki for all things RuneScape
Jump to: navigation, search
Forums: Yew Grove > RSW image renamer: Tweaks
Archive
This page or section is an archive.
Please do not edit the contents of this page.
This thread was archived on 10 April 2012 by TyA.

Introduction

It has come to my attention that User:RSW image renamer is still creating many temporary red links due to its move-edit pattern. I would like to have a discussion on the optimal edit rate here.

Background information

The bot, when faced with a request on User:RSW image renamer/Requests, currently does the following:

  • Parse the requests out of the revision, placing the valid requests in a list.
  • Split the request into chunks of 100 files.
  • Move files and update references from each chunk separately. Files are moved, then references are gathered. 1 edit is made per page for a chunk, in which references to all images in the chunk are updated.
  • Post on the requestor's talk page.

Unfortunately the bot runs into 3 opposite issues when editing.

Issue 1: Duration of temporary red links

Between the start of a chunk of moves, and the end of its reference updating, there is a window of time during which there are red links for all images in the chunk.

If the chunk is allowed to be the entire request, and the request is especially long (say 3000 images), this window can be a bit more than a day, and the probability that someone runs into a red link while viewing a page goes up with the request's size.

If the chunk is one image, then this window can be seconds to minutes, depending on how many pages reference an image, and will only be noticed by people viewing or editing a page referencing the one image being renamed.

Users who perform image maintenance may not know whether it is the bot making moves and reference updates when they edit a page, or whether it is a human making errors in file names on pages, or that the window is temporary and that red links will likely be fixed at the end of the chunk. Users who read wiki pages see more and more red links on all pages until they become unreadable due to the images being absent.

Issue 2: Revision creation and number of pages

If a chunk is allowed to be the entire request, then the bot makes 1 edit per page, guaranteed, in which all of the images referenced on the page are updated at once. The number of pages, and therefore the number of edits, must be below Special:Allpages, so the time spent making edits has a well-defined maximum. For example, if a request is 3,000 images and Special:Allpages is 45,000 pages, then the worst-case request has to make 45,000 edits.

If a chunk is one image, then the number of edits per page is as many as there are images on it (which may be above 10 for some pages). The number of pages for each reference update must still be below Special:Allpages, but this time it can be multiplied by the number of images in the request in the worst case. For example, if a request is 3,000 images and Special:Allpages is 45,000 pages, then the worst-case request has to make 135,000,000 edits. (Using an average case of 10 images on a page, that number is a more reasonable 450,000 edits, but that is still above 45,000.)

Adding more revisions clutters up the history for a page, raises the size of Wikia's database, and makes it more likely that counter-vandalism is broken (rollbackers may spuriously fail to rollback vandalism if an edit by RSW image renamer appears on the same page between the vandal's and theirs) and that users will receive edit conflicts.

Issue 3: Total request time and other requestors

From issue 2, it follows that the time spent processing a request depends on the number of page edits, because the time spent doing image moves is constant.

If a chunk is allowed to be the entire request, then the bot makes fewer edits. A requestor's request is fulfilled more quickly.

If a chunk is one image, then the bot makes more edits. A requestor's request is fulfilled much more slowly.

A second person may place a request, not knowing that the bot will be busy for weeks, and some of the images in his/her request may get deleted or swapped with another image during the first person's request. For example, the first requestor may ask that File:Example.png be moved to File:Example old.png. The second requestor asks that File:Example old.png be moved to File:Example 2007.png to make way for other years, but unfortunately, 4 days later, another image is moved into File:Example old.png manually. The request is now going to move the wrong image.

The trade-off (Summary of issues)

So then, if a chunk is allowed to be the entire request, then the images in it will stay red links throughout the request, there will be 1 more revision in the history of each page referencing the image, and the request will be fulfilled quickly.

However, if a chunk is one image, then the images will stay red links for mere minutes, there will be on average 10 more revisions in the history of each page because it references 10 images that are being renamed, and the request will be fulfilled very slowly.

Chunks can also be anywhere between half of a request or 2 files. Half of a request would be closer to 1 chunk per request; 2 files would be closer to 1 file per chunk. I have chosen 100 after discussing it on user talk pages: User talk:Joeytje50/A14, User talk:Hofmic.

Proposals

1. The bot does all of a request in a chunk.
2. The bot does more than 100 files in a chunk. Discuss below.
3. The bot continues doing 100 files in a chunk.
4. The bot does fewer than 100 files in a chunk. Discuss below.
5. The bot does 1 file at a time.
6. The bot is shut down.

Discussion

Neutral as thread creator.

 a proofreader ▸ 

21:54, March 13, 2012 (UTC)

Neutral - As thread reader. --LiquidTalk 21:57, March 13, 2012 (UTC)

But seriously, I don't really see that big of an issue with the current method. I'm fine with waiting about a day for all the files to update. --LiquidTalk 21:57, March 13, 2012 (UTC)

The bot does 1 file at a time - The above discussion is not really accurate. "there is a window of time during which there are red links for all images in the chunk" is redlinks left for hours. I don't think I've seen one as old as a day, but is it really okay to have redlinks for a day? The wiki is a live system with active users continuously. No redlinks is a have-to-have. There is no maintenance window where admins can take the wiki down and run day long jobs on it. Files that end in png instead of PNG is a nice-to-have. They already work with PNG. Between have-to-have and nice-to-have, have-to-have should always win. --Saftzie (talk) 22:15, March 13, 2012 (UTC)

"PNG to png" is an issue with the request, not the bot. Later on, the bot will make name changes for RS:IMG#NAME, such as moving File:Example-wielded-detail.png to File:Example equipped.png.  a proofreader ▸  22:23, March 13, 2012 (UTC)
I don't think the reason for the rename matters so much. Making all the names conform to a convention is a reasonable task. The bot, however, should go about the task in a reasonable manner, whether it's one file or every file in the wiki. --Saftzie (talk) 07:57, March 14, 2012 (UTC)
My small reply was only to your "nice-to-have versus have-to-have" comment. It was not meant to invalidate your issue with the bot or anything; it is quite valid.  a proofreader ▸  08:02, March 14, 2012 (UTC)
List of books has been like that for a couple of days now. As far as I can tell it edited the links, but moved the files to another name entirely. cqm talk 00:32, March 14, 2012 (UTC)
I'll look into the list of books soon. Maybe a log entry shows the bot failing to edit that page. That's the kind of bug I want to know about, so thanks!  a proofreader ▸  00:47, March 14, 2012 (UTC)
The bot has never failed to edit List of books. However, as you said on IRC, it did edit the Clockwork book.PNG into Clockwork Book.png instead of Clockwork book.png in List of books.  a proofreader ▸  02:07, March 14, 2012 (UTC)
The request that contained Clockwork book.PNG also included a rename of Book.PNG to Book.png. As this file's name is contained in Clockwork book.PNG with the first character of Book.PNG considered case-insensitively (because File:book.PNG refers to that file as well), Clockwork book.PNG was renamed to Clockwork Book.png. This is a very strong argument for replacing files one at a time in pages.  a proofreader ▸  02:13, March 14, 2012 (UTC)
Scratch that, it's just a special case of the "only require punctuation around the file name" code. File:Someother book.png is embedded as File:Someother book.png in that page, but could just as easily be embedded as Someother book.png in another page's template invocation, like {{Infobox item}}, and I only have the {{ }} and = to go along with those. I can't really fix that, unless I make more code to look at context. The bot then becomes AI, and AI is not my cup of tea.  a proofreader ▸  03:17, March 14, 2012 (UTC)
Scratch that too, {{Infobox Item}} takes full image links. Is there anything on the wiki that does not take full image links? Drop line maybe?  a proofreader ▸  03:50, March 14, 2012 (UTC)
I've come across a couple of instances where the .PNG part was embedded into the template. It seems to be a small issue though.
I think I've tracked down another issue with userboxes. Whilst the bot edits the userbox itself, it doesn't edit userboxes that are grouped together, such as Template:Userbox/guild. You can normally spot them from the relevant usage instruction page, in this case Runescape:Userboxes/Guild membership. cqm talk 19:22, March 14, 2012 (UTC)

Users who perform image maintenance - Take a look at Special:WantedFiles. It was at 34 4-5 days ago, and currently stands at 203. I stopped manually changing the files when Joey essentially swore blind that the bot does all the necessary changes automatically at which point I thought I'd leave it to see what happens. 4-5 days later and... well it's kind of self-explanatory.

On a side note, is there a system for identifying pages that have the file but have not been changed due to spam filters? Take RuneScape:User of the Month/August 2007 which I manually changed a day after File:Scythe.png was moved and the file which I missed had still not been altered until I noticed it just now. I assume it's because the VSTF spam filter blocked edits due to an existing symbol on the page that I guess dated from the original discussion judging by the lack of edits for 2 years before me, which stopped the bot from completing the task. cqm talk 00:18, March 14, 2012 (UTC)

If it's at 203, and Hofmic's request has had about 800 images done now, that means the additional 169 images are from my interruption of Hofmic's request because it was doing them all in 1 chunk (and redlinking for days). After that, its behaviour was correct. It was interrupted in alphabetical order at F, so if any image later than F is in Special:Wantedfiles, it's still buggy. That's the kind of bug I want to know about, so thanks! I'll watch this more closely for a bit.
Regarding the VSTF spam filter and that page, it should put a failure message on the requestor's talk page saying something like the following:

References to some files in your request could not be updated in pages. Below is a list of failures.

  • File:Example.png could not be updated to point to File:New example.png on:
    • Spamfiltered page (org.mediawiki.MediaWiki$MediaWikiException: hookaborted: The modification you tried to make was aborted by an extension hook)

 a proofreader ▸  00:47, March 14, 2012 (UTC)
Hmm. There's an odd Wikia bug. Special:Wantedfiles is saying that File:Multicombat.PNG has 237 links, but following through to Special:Whatlinkshere/File:Multicombat.PNG gives one link in a possibly-protected page. Some more are correctly "wanted", though...  a proofreader ▸  01:18, March 14, 2012 (UTC)
That's because the wanted files is a cache from ~5.00am UTC of whatever day we're on. If the wanted files get removed after that time then the cache will get updated the following day to reflect that. Images used in templates also cause some issues with what links here, so take what you see with a pinch of salt. Purging the page clears the wanted file (providing you've altered the template correctly), but won't take it off what links here. It's confusing, I know. cqm talk 01:24, March 14, 2012 (UTC)


Support option 3: One hundred files per chunk - I think we've found the balance here. 100 files should not have red links for more than 10 minutes, while still being fairly reasonable in terms of overall speed and number of edits. To those who are watching the number of wanted pages, the count is due to Wikia's horrid caching, so if the count was created in the middle of a request, there could be thousands of "wanted pages" even though by the time anyone notices, they could easily be fixed. That's a limitation of Wikia's software, not the bot, and applies to manual moves as well (no sane person would want to manually move a file that appears on a hundred pages, though).

There is one limitation that errors in moving won't be learned of by humans until the bot has finished the entire request and posted on the requester's talk page. Perhaps the bot could also maintain another list (on a subpage of its own user page?) updated every time it finds an error. Otherwise, if the link to image A can't be fixed and the bot still has two days of moving ahead of it, we won't actually receive that error message until the bot is totally done. Or perhaps errors could be reported on the requester's talk page after every chunk of 100 (but success won't be reported until the entire request is complete).

One possibly far too difficult/time consuming/ambitious possibility would be to create chunks based on the number of pages that must be edited (say, 250), rather than the number of files to move, though if the requested moves appear on a large number of pages each, then the overall speed of the bot is still too slow, but on the far more common images which appear on one or two pages, it could be an effective method of balancing red link time to overall time. Ideally this would be done by creating a chunk, checking an image for the number of pages it exists on and adding it to the "chunk, also adding the pages it appears on to a list. This would continue until the list of pages is equal to or greater than the chosen number (eg, 250). This system is ambitious, but could prevent the possibility of a chunk of images with each image appearing on more than 100 pages (at a worst case scenario, one hundred images each appearing on 100 pages could need up to 10,000 edits).

At any rate, for now, the current 100-images-a-chunk is the best balance of all the options, as 1 at a time is far too slow overall and more than 100 can cause a large number of red links to be left for too long. Hofmic Talk 02:20, March 14, 2012 (UTC)

The idea of keeping a log of failures as a subpage edited every so often is nice.
I also like the idea of making chunks based on the number of referencing pages, but that could complicate task resuming when the bot or computer is killed: the request needs to gather references first, then make the splits, instead of the reverse. All of the splits could be lost that way. It could make requestors unsure of what's happening and request again, not knowing if the request data is lost on the bot or if the request is just lagging.  a proofreader ▸  02:38, March 14, 2012 (UTC)
Chunks of 100 are still keeping redlinks around for hours. For a bot, minutes would be unacceptable. Hours should be completely out of the question for a live system. --Saftzie (talk) 04:37, March 14, 2012 (UTC)
Just to clarify, when I say there are redlinks for hours, I'm not looking at a list of wanted pages. I see a redlink, then I look at when the original target was moved. And it really has been hours. You say it shouldn't be more than 10 minutes, but yet it is. --Saftzie (talk) 08:06, March 14, 2012 (UTC)
An example: The bot renamed the image File:Neck slot.png at 05:06, 13 March 2012. At 18:29, 13 March 2012, I manually edited Template:Infobox Bonuses to use the new name, because the bot hadn't yet. That's 13 hours, 23 minutes. --Saftzie (talk) 09:11, March 14, 2012 (UTC)
Not forgetting the use of it on Template:Equipment amongst 7 other files which I corrected 20 hours after Saftzie corrected the other mentioned use. I would agree 100 is not a good thing if it does indeed take hours for the bot to correct every link. For what it's worth, I originally thought the bot didn't correct templates due it taking so long to update them. cqm talk 18:06, March 14, 2012 (UTC)
Regarding Proof in chat and the reply below, it appears that the red links in question are not related to the number of files per packet, but rather the fact that there's no known way to know if an image is used in a template conditionally (I blame Wikia). However, I stand by my support, and doubt we'll generally have to do moves like this commonly. The few areas were images aren't moved would have the same output if I moved it manually, and in the future, there'll likely be far fewer uppercase extensions (since we've eliminated a good several thousand). Hofmic Talk 02:36, March 15, 2012 (UTC)
Focusing on the blame is not the thing to do. What you're saying otherwise, though, is that the bot should continue to run as it is, because of all the times it hasn't failed, because it only fails some of the time. I think the bar needs to be higher. I support the idea of what this bot is supposed to be doing, but, for whatever the reason, it's not free of errors yet. --Saftzie (talk) 03:34, March 15, 2012 (UTC)
<tab reset> Good point. But what do you suggest we do for the "error management", as even doing one move - all the edits, there's still the errors. I would think that creating a "live" list of errors as the bot goes along in addition to posting all at the end of the task would be beneficial for the catchable errors, though the aforementioned template issue would not be covered by that (I wonder if they would appear in wanted files, even though they don't appear in WhatLinksHere). Hofmic Talk 03:48, March 15, 2012 (UTC)

Support 5 - It would be most beneficial to all of us if the bot moved one file, fixed all of its links, then repeated that for all the other files. That is how many editors used to do it when we were manually moving files. This would be slightly slower, but would mean maintenance users would not have to waste time checking for false positives and/or interrogating users about whether the files are really red-linking everywhere. 222 talk 05:09, March 14, 2012 (UTC)

Comment - The issue with images on templates, transcluded according to the value of a parameter, continues to bite. Template:Infobox Bonuses was completely ignored by the bot, despite containing an instance of File:Weapon slot.png (formerly File:Weapon slot.PNG), because it gets transcluded on a page only if slot=weapon. These conditional references do not appear in the API.

 a proofreader ▸ 

02:22, March 15, 2012 (UTC)

Is there any way we can know when it fails? Is there any way to find out how many times it's already failed that hasn't been caught by someone manually? --Saftzie (talk) 03:34, March 15, 2012 (UTC)
The only way to know when the bot fails is to examine pages. The API, Special:Whatlinkshere and Special:Wantedfiles are all broken if one of them is, which is what's happening right now for those files (except File:Whatever Book.PNG; that's a case-insensitive first-letter match bug). As for the failures that leave a trace in Special:Wantedfiles, those are usually caught in a second pass through the pages by the bot, but can be verified at Special:Wantedfiles. Some of them can also appear on the requestor's talk page. A few people have caught some pages that the bot couldn't edit due to an abuse filter that way.  a proofreader ▸  21:21, March 25, 2012 (UTC)

Support 5 - Individual jobs are most reliable and considering bots don't spam Recent Changes, there's no real downside in the first place. Smuff [citation provided] 20:37, March 15, 2012 (UTC)

Is taking, say, ten times longer a "real" downside? Do you have any idea how long the extension moves took? I made the request 23:51, 11 March 2012 and it was posted as completed 23:47, 14 March 2012. That's almost three full days. If each image appears on an average of ten pages, it could take ten times longer (one month!) to complete the request. Images that appear on a hundred pages? Let's not even start. Hofmic Talk 00:40, March 16, 2012 (UTC)
Also, that was only 2905 files. In a hypothetical request with 8000+ images as Edmyg considers below, it could take considerably longer. And while the bot is handling a long request, NO other requests can be handled, so if someone wants to move their own file that appears on 20 pages (because nobody wants to do that manually, really), they'll have to wait for the bot to finish, and waiting a month seems completely impractical. Hofmic Talk 00:45, March 16, 2012 (UTC)
If doing batch moves takes a while, then it takes a while. It's more important to retain the integrity of the wiki. Also, if someone needs a single file moved, they can post on RS:AR. --Saftzie (talk) 01:21, March 16, 2012 (UTC)

Support 4 - Doing it one at a time seems like a somewhat pointless exercise for the moment as the bot is designed to make this easier. I'd be happy with requests of 10, considering how many pages the images could be used on. I should stress these would be temporary measures, but with people putting in 8000+ images to be moved in one go, you cannot seriously expect there not to be problems. No disrespect to Proof, but every program beyond a certain complexity has bugs in it, it just takes time to find them all. cqm talk 21:17, March 15, 2012 (UTC)

To clarify, due to the below, I support a low number, and would be fine with one at a time. cqm talk 23:37, March 24, 2012 (UTC)

Comment of being annoyed - Everyone is supporting something different. We need some consistency if this has any hope of being closed properly. sssSp7p.pngIjLCqFF.png 15:58, March 24, 2012 (UTC)

To be honest, I am currently caring less and less about the choice provided we can actually get this closed. The inability to use the bot while the thread is open is worse than doing one at a time (heck, doing one at a time for the entire duration that the thread was open could probably finish off an entire request of the remaining files). I still stand by my opinion that a small number, be it one or ten, is incredibly inefficient and could cause tasks to take far too long, but would rather see this thread closed with any resolution than leaving the bot inactive while we (very slowly) squabble. Hofmic Talk 03:13, March 25, 2012 (UTC)

Support 5 - Brains nailed it perfectly. User:Exor Solieve 21:13, March 25, 2012 (UTC)

Support 4/5 - Anywhere from 1-20 would be be fine with me, 100 is still too many. I prioritise the function of the wiki higher than the bot's operation and feel that the bot having the negative effects that it has with larger chunks (images off page for a while, editor confusion) is not satisfactory. There are disadvantages to smaller chunks but they are not seen to an average user on an article. --Henneyj 16:50, March 26, 2012 (UTC)

Support 5 - Correct me if I'm wrong please. But in the grand scheme of things, wouldn't doing a large request, one file at a time, take the same amount of time as doing them in chunks?.. If so, #5 would be the logical choice as it would keep our face pretty. sssSp7p.pngIjLCqFF.png 19:32, March 26, 2012 (UTC)

I think the argument in favor of batches is that pages that include, for example, 10 renamed images would be edited ten times, once for each image, instead of all at once for all ten images. It's true that ten edits would take longer than one. One post (not here, but elsewhere) asserted that recent job that took a day as a batch would have taken a month one-by-one. Even if the assertion is accurate, I'd say "So what?" Consistency is important, and a bot is still faster (and more tireless) than a human. --Saftzie (talk) 20:51, March 26, 2012 (UTC)
If we renamed say 1000 images, then went through Special:AllPages (which is I believe how the bot roughly works) correcting each instance any of the 1000 images used then I will admit it would take significantly less time than renaming one at a time and scanning through all the pages as each was renamed. It is worth noting Special:WantedFiles is a cache, and thus inaccurate most of the time and not forgetting that the bot cannot find conditional uses of any image be it templated or otherwise, or images used in some complex templates. Then there's the point that purging the page may or may not get rid of the redlink. Overall, I can't see how the outcome of this will make any difference. The images used in GE pages are the most widespread and it was rare to catch it midway through a renaming spree (in my experience) although I am not all users and the wiki is accessed 24 hours a day.
The bot creates much more work for those who perform image maintenance but most of the corrections we need to make are to signatures and rarely used templates making it debatable how absolutely necessary the job is. Saftzie is indeed correct that with a database like ours redlinked images that exist for minutes is unacceptable, but when the bot runs 24/7 it is very difficult to say whether the bot has failed or not, or indeed how long ago the move was. All I can say is that the existence of the bot requires us to put more work into something we previously did not have to, which kind of defies the point of having a bot. cqm talk 10:22, March 27, 2012 (UTC)


This request for closure is complete A user has requested closure for RSW image renamer: Tweaks. Request complete. The reason given was: Discussion has not taken place for 14 days.  a proofreader ▸  07:35, April 10, 2012 (UTC)

Closed - RSW image renamer will go with option 5, one file at a time. svco4bY.png3Gf5N2F.png 16:03, April 10, 2012 (UTC)