Whether it's charm drops, jadinko seeds, birds' nests, pickpockets or soil screening, RuneScape is filled with processes that are esssentially weighted random samples. At various times, the wiki has taken an interest in crowd-sourcing data about these processes, with an eye towards displaying our best guess of the underlying rates.
In the past, we've done this in a hodgepodge of different ways:
- Herblore Habitat seeds – We had a thread in 2011 that resulted in the creation of subpages (e.g. [[Draconic jadinko/Data]]) for tracking seed drop rates for the jadinkos. This still essentially uses the original charm log formulation, where you need to add your numbers to the running totals, and it's not used very much. (Amusingly this thread also proposed a Data namespace, but it was not seriously considered, probably because Liquid proposed it).
- Bird's nests – We have these at [[Bird's nest/Mole nest log]] (and others) and they work similar to jadinko seeds.
- Soil screening - This is all the rage now because of Archaeology, but we've mostly collected the data off-site (on Google forms) because the distribution depends on your Archaeology level.
- Miscellaneous - We have tons of other one-off data logs on the wiki, like 1000 Caskets from a YouTube video, 100,000 Prifddinas crystal chests from a fella on Discord, an apparently-unused mechanism for charm sprites, gem rocks on talk pages, a userpage with Miscellania data... the list goes on.
The individual things we're gathering data on change as content becomes more or less popular, but at the end of the day, we will always need some sort of mechanism to crowd-source this data. Right now we legitimately have like 8 different mechanisms for this. Some of them allow additional contributions (with varying degrees of complexity required to edit), some have anti-data-spoofing built in, some do the aggregation for you...it's kind of a mess.
I propose that we unify all of these different data-gathering mechanisms under a single Data: namespace, which will have a slick submissions UI, built-in countervandalism tools, and data aggregation.
The data gets stored at Module:Sandbox/BlackHawk/data, and you can see the results at User:BlackHawk/log table. There is almost no additional code needed to create an entirely new type of data log: all you need to do is define a new schema (as seen on Module:Sandbox/BlackHawk/schema, which contains schemas for 11 different gathering projects).
There are probably a couple more things to add to this (like automatic aggregation with naive statistical techniques, and allowing trusted users to mark fishy submissions as okay), but it's pretty much directly usable today.
Assuming this passes, the first step will be to convert all of the existing data collections to fit this format – this shouldn't be very complicated, and the only big piece will be converting Charm:Abyssal demon to Data:Charm/Abyssal demon (I'm open to other naming conventions if people have strong opinions). It should have minimal outward impact on how charm logs operate (since they were by far the most advanced data gathering project on the wiki, and a lot of this unification project was about taking those features and sharing them elsewhere). Realistically I think we'd probably just rename the Charm namespace, rather than creating a new one. I think BlackHawk is interested in leading a lot of the technical push here.
From there, we can start making new data gathering projects with new schemas, and raise awareness in the community (sitenotice/Reddit) about the new crowdsourcing, especially for soil screening.
It hasn't done it so much currently, but in the past, contributing to charm logs has been a great way to get new editors to dip their toes into the editing process. I'd like to see that happen again.
That's about it. We'd love feedback on the technical proposal, and input on what additional types of data gathering projects could be started from this. Thanks!
Big support - the wiki is in such a good position to be the place to collect this data. The success of charm logs and soil screening as a OSWF have shown that players are more than willing to contribute their data for us. Excited for this project to be extended to all the different things we could crowdsource data on.08:55, 2 June 2020 (UTC)
Support - This is a fantastic idea, and let's be honest, who doesn't love data?09:14, 2 June 2020 (UTC)
Support - I am super excited to be a part of this project and 100% support it. Yes, there are a few little things that could do with updating in the modules but the demo gives a great visual on how it will work. Crowdsourcing the data in this way will be huge for the wiki with such supportive userbase that will undoubtedly help to collect the data. The current schema only supports a select number of datapoints but due to the dynamic nature of the modules, this can so easily be extended to support anything.09:19, 2 June 2020 (UTC)
Support >DATA< - Badassiel 09:33, 2 June 2020 (UTC)
Support - Finally the age old question will be answered: Where's the REAL data, Cook? :Wowee:09:54, 2 June 2020 (UTC)
Support - Some questions and considerations (not necessarily looking for replies to all, some are just things to take into account):
- I see there is no mention of monster drop tables, is this not intended for use with them?
- Are schemas going to be defined directly in a module? Can we expect new users to be able to edit schemas (especially for new content, when drops may not be known yet)? Or are schemas only going to be defined when all drops are known?
- Is Jagex-provided data still going to have crowdsourced data pages? I assume so since it allows to check for inconsistencies and pinpoint possible (hidden) in-game changes to drop tables, but would both data sources be displayed in the page? Perhaps a link to view the crowdsourced data is enough
- There are one-time items that, when obtained, change the drop rates of other items. Similar to how there would be an input field for, say, a skill level that gives different drop rates, there could be an input field to state whether the player had the item before gathering the data. If they obtained it over the course of data gathering, it's possible for the data to be skewed (unless they submitted the data upon obtaining the item).
Ideally, we'd want the player to separate and submit the data at every point where drop rates change (eg: at every skill level, before/after obtaining a one-time item), but this is not always going to be the case (though I assume it'd be a minority, mostly users submitting data for the first time).
Perhaps we want a checkbox so the player can say if the item was obtained at the end the logged data, and otherwise a field to input at which number of 'cases' the item was obtained, for example: obtained one-time item at 1200 out of 2000 cases. This is assuming that the rest of the data would be usable, that is, taking into account that the data is submitted in a bulk and 1200 cases have items with one drop rate and 800 with another. I guess it's possible but someone else must know better
- Vandalism checks should consider that a drop table may receive (hidden) updates. This could unintentionally mark subsequent submissions as vandalism
- There can be different drop tables in the same page
- It's possible for drop tables to have an unknown variable that modifies it (complete a certain quest or do a specific action)
- An item can have more than one "drop slot", with different quantities and/or rarity
- The idea for this data collection was not necessarily for monster drops, but for things along the lines of charm drops, drops drom geodes and nests, etc. The schema currently are defined within a master module, which could be simply updated, however does require all items needing to be known before adding. The form dynamicly generates based on the fields provided in the schema.
Different tabled within one page will be handles with no problem. The link to add data to the log will be contained by a span with an
attr-schemaproperty which defines which schema to use for that table.
Currently countervandalism is not built in but before going live with this, will be fully implemented with idea that come out of this discussion and with features along the line with tagging data as potentially vandalism and requiring manual aproval before being included in aggregation. 10:19, 2 June 2020 (UTC)
Big Support for Big Data -23:53, 2 June 2020 (UTC)