User:A proofreader/Adventurer's Log Gathering Project

From the RuneScape Wiki, the wiki for all things RuneScape
Jump to: navigation, search
Information icon.svg

This project has concluded.

The project has accumulated 303,465 Adventurer's Log feeds from 28 August 2012 to 18 September 2012. The corpus is available for download here, as well as information resulting from data-mining here. Monster drops are listed on a separate page.

Replacement filing cabinet.svg This page is kept as an archive of the introduction for the project, in case it is needed again. Please do not edit it.

Cook Me Plox's Adventurer's Log Gathering Project aims to download all of the public Adventurer's Logs on RuneScape to mine data in them for the good of the wiki.

Because Jagex has been known to temporarily ban people who request too many pages in a row from their website, implementing a 30-second request cooldown to download feeds would mean that the downloading alone would take over a year. However, this restriction is per IP address, so it is possible, with some crafty distributed computing, to have a proofreader's computer tell yours what to download!

Introduction[edit | edit source]

What?[edit | edit source]

This Adventurer's Log Gathering Project aims to download all of the public Adventurer's Logs on RuneScape to mine data in them for the good of the wiki.

Initially, the data will be used to establish the drop ratios between the various sigils. After this, various drop rates could be extracted from boss kills reported on the Adventurer's Log. Other data could feasibly be extracted from these logs later as well.

Why?[edit | edit source]

Hard data is always better than speculation. With this project, the wiki would have more hard data to back up drop ratios. This body of evidence could be cited as "Adventurer's Log data, gathered between 28 August 2012 and to be determined".

When?[edit | edit source]


Who?[edit | edit source]

Cook Me Plox for the idea, and a proofreader for the coding. But also, all of you who are reading this page and want to help!

How?[edit | edit source]

The general answer is that you will run code that will connect to a proofreader's computer and it will tell yours what to do. Your computer will download the assigned Adventurer's Logs automatically as long as you leave its command prompt window open, or stop if you press Ctrl+C to interrupt the code.

The long answer is that, using distributed computing, your computer will connect to a proofreader's computer to receive a list of Adventurer's Logs to download, then your computer will download them, and you will send them back to that computer in compressed form to receive another assignment. All of this goes on until the program is stopped, either by closing the command prompt or pressing Ctrl+C to interrupt it. Thus, a proofreader will end up with all public Adventurer's Logs, to be zipped to form the final corpus.

The distributed program[edit | edit source]

Instructions to run the program can be found in /Source.

You can join the IRC channel to request a proofreader's IP address, which is needed to run the program.

Is this safe?[edit | edit source]

You can audit the source code before compiling it, of course, and a proofreader does not guarantee anything regarding it, including, but not limited to, implied guarantees of merchantability and fitness for a general or particular purpose. However, the code should be safe enough from cracking.

The program connects to and the work distribution server only.

What's the resource usage?[edit | edit source]

  • Hard drive usage is as much as the Java Development Kit plus the size of source and compiled files. As of JDK 7 Update 6, that would be 200 MB + 50 KB.
  • Memory usage is as much as a regular Java process. As of JDK 7 Update 6, that would be about 100 MB.
  • CPU usage should be under 2% at all times on most processors.
  • Bandwidth usage is about 50 megabytes per day, broken down as follows:
    • Every 30 seconds, an upload of 512 bytes to Jagex to request an RSS feed, and a download of up to 28 KB for the response. About half of the feeds are private or for free-players and return an HTTP 404 instead, which is a download of up to 1 KB.
    • Every 3 minutes, an upload of up to 12 KB to a proofreader's computer to return results for a batch of 6, and a download of 76 bytes for an assignment.

Can I run the client more than once?[edit | edit source]

You could, but you stand more chance to have your requests banned by Jagex in that case.

Can I run the client on more than one IP?[edit | edit source]

You can start a client, telling it which IP addresses you want to use to make requests to Jagex. This is only useful if you have multiple public IP addresses on one computer, like on a dedicated server. See /Source#Running the client for more information.

However, if you have multiple computers on your home network and they share the same public IP address, you should not run this program on more than one computer. To check your public IP address from your computers, go to [1].

Can people submit forged data?[edit | edit source]

Of course. There is an element of trust.

A user could modify the program to replace the compressed RSS feeds with huge files that compress to under 10 KB, or garbage files of any size, or valid RSS feed data with obviously-changed Adventurer's Log events, or valid RSS feed data with subtly changed or rearranged Adventurer's Log events.

It is up to the server operator to place trust in certain users.

What happens if...[edit | edit source]

... I turn my computer off?[edit | edit source]

You can turn your computer off without any impact on this project, even if you don't close the command prompt window before. The names whose Adventurer's Logs you were assigned to download will simply be passed to another computer in 15 minutes.

... my Internet connection cuts me off in the middle of communicating with the server?[edit | edit source]

No impact on the project, and your client will hang for up to 45 seconds, then retry connecting normally.

... my Internet connection cuts off for a few minutes?[edit | edit source]

No impact on the project, and your client will get connection timeout errors and try to reconnect every minute.

... my Internet connection cuts off for a few hours?[edit | edit source]

No impact on the project. Your assignment, if any, will have been given to another computer to download, and yours will get more names when it reconnects.

Where are stats?[edit | edit source]

There are no stats. The server knows only IPs, and having a top contributor list by IP would create privacy and security issues.

What's your privacy policy?[edit | edit source]

Connecting to the work distribution server gives it your IP, which is logged in its console for monitoring. The server operator can use these IPs to look at the number of users currently submitting returns, problems with the server's connections or the clients' connections, as well as users submitting more than once per 30 seconds from the same IP (see #Can I run the client on more than one IP?). A proofreader will not disclose these IPs in their entirety, and for identification purposes only the first byte.