Forum:DATA namespace

From the RuneScape Wiki, the wiki for all things RuneScape
Jump to: navigation, search
Forums: Yew Grove > DATA namespace
Archive
This page or section is an archive.
Please do not edit the contents of this page.
This thread was archived on 14 June 2020 by Liquid.

Whether it's charm drops, jadinko seeds, birds' nests, pickpockets or soil screening, RuneScape is filled with processes that are esssentially weighted random samples. At various times, the wiki has taken an interest in crowd-sourcing data about these processes, with an eye towards displaying our best guess of the underlying rates.

In the past, we've done this in a hodgepodge of different ways:

  • Charm logs – Starting in 2009, charm logs were initially at (e.g.) Abyssal demon/Charm log and the submitter needed to add their results to the running totals, which were heavily vandalized. In late 2010, we moved the aggregate amounts to Charm:Abyssal demon, kept the submissions at /Charm log, and had a bot (that ran like once a week, and did some statistical tests) move the results over to the Charm page. This was replaced in 2015 with an entirely JavaScript-based system where you can submit data from the monster page, and depending on how far away it is from the existing data, we either disallow it, tag it, or allow it freely. Charm log submissions are fairly uncommon now – we get about one per day.
  • Herblore Habitat seeds – We had a thread in 2011 that resulted in the creation of subpages (e.g. Draconic jadinko/Data) for tracking seed drop rates for the jadinkos. This still essentially uses the original charm log formulation, where you need to add your numbers to the running totals, and it's not used very much. (Amusingly this thread also proposed a Data namespace, but it was not seriously considered, probably because Liquid proposed it).
  • Bird's nests – We have these at Bird's nest/Mole nest log (and others) and they work similar to jadinko seeds.
  • Soil screening - This is all the rage now because of Archaeology, but we've mostly collected the data off-site (on Google forms) because the distribution depends on your Archaeology level.
  • Miscellaneous - We have tons of other one-off data logs on the wiki, like 1000 Caskets from a YouTube video, 100,000 Prifddinas crystal chests from a fella on Discord, an apparently-unused mechanism for charm sprites, gem rocks on talk pages, a userpage with Miscellania data... the list goes on.

The individual things we're gathering data on change as content becomes more or less popular, but at the end of the day, we will always need some sort of mechanism to crowd-source this data. Right now we legitimately have like 8 different mechanisms for this. Some of them allow additional contributions (with varying degrees of complexity required to edit), some have anti-data-spoofing built in, some do the aggregation for you...it's kind of a mess.

I propose that we unify all of these different data-gathering mechanisms under a single Data: namespace, which will have a slick submissions UI, built-in countervandalism tools, and data aggregation.

BlackHawk has built a fantastic implementation of this, which you can start playing around with by going to User:BlackHawk/log_test and loading the JavaScript:

mw.loader.using( "mediawiki.util", function() {
	mw.loader.load( mw.util.getUrl( "User:BlackHawk/JS/logAdd.js", { action: "raw", ctype: "text/javascript" } ) );
} );

The data gets stored at Module:Sandbox/BlackHawk/data, and you can see the results at User:BlackHawk/log table. There is almost no additional code needed to create an entirely new type of data log: all you need to do is define a new schema (as seen on Module:Sandbox/BlackHawk/schema, which contains schemas for 11 different gathering projects).

There are probably a couple more things to add to this (like automatic aggregation with naive statistical techniques, and allowing trusted users to mark fishy submissions as okay), but it's pretty much directly usable today.

Assuming this passes, the first step will be to convert all of the existing data collections to fit this format – this shouldn't be very complicated, and the only big piece will be converting Charm:Abyssal demon to Data:Charm/Abyssal demon (I'm open to other naming conventions if people have strong opinions). It should have minimal outward impact on how charm logs operate (since they were by far the most advanced data gathering project on the wiki, and a lot of this unification project was about taking those features and sharing them elsewhere). Realistically I think we'd probably just rename the Charm namespace, rather than creating a new one. I think BlackHawk is interested in leading a lot of the technical push here.

From there, we can start making new data gathering projects with new schemas, and raise awareness in the community (sitenotice/Reddit) about the new crowdsourcing, especially for soil screening.

It hasn't done it so much currently, but in the past, contributing to charm logs has been a great way to get new editors to dip their toes into the editing process. I'd like to see that happen again.

That's about it. We'd love feedback on the technical proposal, and input on what additional types of data gathering projects could be started from this. Thanks!

Discussion

Support - ʞooɔ 22:58, 1 June 2020 (UTC)

Big support - the wiki is in such a good position to be the place to collect this data. The success of charm logs and soil screening as a OSWF have shown that players are more than willing to contribute their data for us. Excited for this project to be extended to all the different things we could crowdsource data on. Magic logs detail.pngIsobelJTalk page 08:55, 2 June 2020 (UTC)

Support - This is a fantastic idea, and let's be honest, who doesn't love data? Talk to me ShaunyMy contributions 09:14, 2 June 2020 (UTC)

Support - I am super excited to be a part of this project and 100% support it. Yes, there are a few little things that could do with updating in the modules but the demo gives a great visual on how it will work. Crowdsourcing the data in this way will be huge for the wiki with such supportive userbase that will undoubtedly help to collect the data. The current schema only supports a select number of datapoints but due to the dynamic nature of the modules, this can so easily be extended to support anything. Lava hawk.png BlackHawk (Talk)    09:19, 2 June 2020 (UTC)

Support >DATA< - Badassiel 09:33, 2 June 2020 (UTC)

Support - Finally the age old question will be answered: Where's the REAL data, Cook? :Wowee: Farming-icon.png Salix of Prifddinas (Talk) Prifddinas lodestone icon.png 09:54, 2 June 2020 (UTC)

Support - Some questions and considerations (not necessarily looking for replies to all, some are just things to take into account):

  1. I see there is no mention of monster drop tables, is this not intended for use with them?
  2. Are schemas going to be defined directly in a module? Can we expect new users to be able to edit schemas (especially for new content, when drops may not be known yet)? Or are schemas only going to be defined when all drops are known?
  3. Is Jagex-provided data still going to have crowdsourced data pages? I assume so since it allows to check for inconsistencies and pinpoint possible (hidden) in-game changes to drop tables, but would both data sources be displayed in the page? Perhaps a link to view the crowdsourced data is enough
  4. There are one-time items that, when obtained, change the drop rates of other items. Similar to how there would be an input field for, say, a skill level that gives different drop rates, there could be an input field to state whether the player had the item before gathering the data. If they obtained it over the course of data gathering, it's possible for the data to be skewed (unless they submitted the data upon obtaining the item).
    Ideally, we'd want the player to separate and submit the data at every point where drop rates change (eg: at every skill level, before/after obtaining a one-time item), but this is not always going to be the case (though I assume it'd be a minority, mostly users submitting data for the first time).
    Perhaps we want a checkbox so the player can say if the item was obtained at the end the logged data, and otherwise a field to input at which number of 'cases' the item was obtained, for example: obtained one-time item at 1200 out of 2000 cases. This is assuming that the rest of the data would be usable, that is, taking into account that the data is submitted in a bulk and 1200 cases have items with one drop rate and 800 with another. I guess it's possible but someone else must know better
  5. Vandalism checks should consider that a drop table may receive (hidden) updates. This could unintentionally mark subsequent submissions as vandalism
  6. There can be different drop tables in the same page
  7. It's possible for drop tables to have an unknown variable that modifies it (complete a certain quest or do a specific action)
  8. An item can have more than one "drop slot", with different quantities and/or rarity

Habblet (talk) 10:07, 2 June 2020 (UTC)

The idea for this data collection was not necessarily for monster drops, but for things along the lines of charm drops, drops drom geodes and nests, etc. The schema currently are defined within a master module, which could be simply updated, however does require all items needing to be known before adding. The form dynamicly generates based on the fields provided in the schema.
Different tabled within one page will be handles with no problem. The link to add data to the log will be contained by a span with an attr-schema property which defines which schema to use for that table.
Currently countervandalism is not built in but before going live with this, will be fully implemented with idea that come out of this discussion and with features along the line with tagging data as potentially vandalism and requiring manual aproval before being included in aggregation. Lava hawk.png BlackHawk (Talk)    10:19, 2 June 2020 (UTC)

Big Support for Big Data - Smithing.pngAescopalus talkCrafting.png 23:53, 2 June 2020 (UTC)

Support - Velhart2 (talk) 01:14, 4 June 2020 (UTC)

Support - Curious on the implementation and results, would be watching this for potential on the OSW perhaps. Legaia2Pla · ʟ · 00:42, 13 June 2020 (UTC)

Closed - Data can be consolidated into a Data namespace. --LiquidTalk 02:10, 14 June 2020 (UTC)