OniGalore:Current events: Difference between revisions

→‎Coming features: oops, MW's API cannot tell you which revision a URL was added in -- that requires a binary search of all page revisions -- but this IA API meets our needs on its own
(revising external links section, linking to generated report)
(→‎Coming features: oops, MW's API cannot tell you which revision a URL was added in -- that requires a binary search of all page revisions -- but this IA API meets our needs on its own)
Line 38: Line 38:
*Detection of "external internal" links, which occur when someone puts the full URL to another page on this same wiki, like <nowiki>"[http://wiki.oni2.net/OniGalore:Current_events]"</nowiki> instead of <nowiki>"[[OniGalore:Current events]]"</nowiki>.
*Detection of "external internal" links, which occur when someone puts the full URL to another page on this same wiki, like <nowiki>"[http://wiki.oni2.net/OniGalore:Current_events]"</nowiki> instead of <nowiki>"[[OniGalore:Current events]]"</nowiki>.
*Detection of external interwiki links, where the editor fails to take advantage of an [[Help:Editing#Interwiki_links|interwiki prefix]] which would have made their link shorter and more resistant to rot.
*Detection of external interwiki links, where the editor fails to take advantage of an [[Help:Editing#Interwiki_links|interwiki prefix]] which would have made their link shorter and more resistant to rot.
*It can be tedious to find a valid version of an old page on the Internet Archive. Borrowing a trick from Wikipedia's [[metawikipedia:InternetArchiveBot|InternetArchiveBot]], the script should be able to use the MediaWiki API to [https://www.mediawiki.org/wiki/API:Revisions find the revision] where a URL was added to a page, and ask the Archive for a snapshot from that time period. The link to this snapshot can then be recommended by the script for review by a human editor.
*It can be tedious to find a valid version of an old page on the Internet Archive by browsing in the Wayback Machine. Fortunately, the Archive offers an API for finding valid snapshots, built for use by Wikipedia's [[metawikipedia:InternetArchiveBot|InternetArchiveBot]]. [http://archive.org/wayback/available?url=http://www.pbs.org/wnet/religionandethics/week622/hedges.html&statuscodes=200&statuscodes=203&statuscodes=206 Here] is an sample query that asks the Archive for the latest snapshot of a given URL where the server holding the original page returned an OK code to the Archive's web crawler. Note that a server returning "OK" does not guarantee that the page it returned actually has the desired content; see next point.
*Once we have knocked down the low-hanging fruit of pages that return NG codes, the screenshot feature in the script will be activated, and we will begin checking that the "OK" links are actually loading the intended page. At a cursory glance, I can see that many pages no longer display the desired content even though they don't return a "not found" or redirection code, so we will have to do visual inspections of the pages in order to catch these issues.
*At a cursory glance, I can see that many external links no longer display the content that they were intended to display. In many cases, web sites are silently redirecting the user to their main page without using the appropriate code that indicates the content was not found. We will have to do visual inspections of the pages in order to catch these issues. Once we have dealt with the low-hanging fruit of pages that return NG codes, the screenshot feature in the script will be activated, and we will begin checking that the "OK" links (and "OK" Archive snapshots) are actually loading the intended page.


==PlayStation 2 port==
==PlayStation 2 port==