User:Iritscen/PageCountAudit: Difference between revisions

(→‎Redirects: new redirect counts)
(→‎Conclusion: updated final tallies)
 
(11 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Page count project==
This page looks into the math that is used by MediaWiki to give various page counts.
===Magic words===
 
These first two poorly-documented magic words provide easy answers to the amount of content on the wiki. However, unless they are based on something logical that I haven't discovered yet, they are completely useless.
==Magic words==
These first two magic words provide easy answers to the amount of content on the wiki, but are they accurate and useful?


'''NUMBEROFPAGES''': {{NUMBEROFPAGES}}
'''NUMBEROFPAGES''': {{NUMBEROFPAGES}}
Line 7: Line 8:
MW code says: Simply count all entries in page table.
MW code says: Simply count all entries in page table.


Iritscen says: What gives, MW? This number is higher than the total of files, articles, and talk pages, so what the heck is the software looking at? Totally unhelpful.
Iritscen says: Though problematic in the past, this magic word now matches the grand total (see below) of all PAGESINNS counts, including files and redirects. However, we don't really want to display that catch-all number on our main page.


'''NUMBEROFARTICLES''': {{NUMBEROFARTICLES}}
'''NUMBEROFARTICLES''': {{NUMBEROFARTICLES}}


MW code says: From MW 1.18 on, the software gets a distinct count of the entries in the pagelinks table, "pl_from" field, that match those page ids. In other words, it filters out pages that do not link to other pages (the reasoning presumably being that "those aren't real wiki pages" if they're not connecting to anything else). Does this mean redirects are counted?
MW code says: If the count method global is set to 'link', the software gets a distinct count of the entries in the pagelinks table, "pl_from" field, that match those page ids. In other words, it filters out pages that do not link to other pages (the reasoning presumably being that "those aren't real wiki pages" if they're not connecting to anything else). It also filters out redirects. If the method is set to 'comma', it counts all non-blank pages (yes, really).


Iritscen says:  
Iritscen says: Okay, $wgArticleCountMethod has now been set to 'comma'. (Note: As of MW 1.31, "The 'comma' value for $wgArticleCountMethod is no longer supported for performance reasons, and installations with this setting will now work as if it was configured with 'any'." It appears that 'any' will return the same result, counting all pages that are not redirects.)


'''PAGESINNS''', AKA PAGESINNAMESPACE: These counts agree with the number of pages displayed for each namespace on the [[Special:AllPages]] page, which provides some much-needed verifiability. However, since Allpages counts redirect pages, that means that PAGESINNS does too. Therefore, we can't use a straight sum of PAGESINNS results as our page count. See final section for the adjusted number.
'''PAGESINNS''', AKA PAGESINNAMESPACE: These counts agree with the number of pages displayed for each namespace on the [[Special:AllPages]] page, which provides some much-needed verifiability. However, since Allpages counts redirect pages, that means that PAGESINNS does too. Therefore, we can't use a straight sum of PAGESINNS results as our page count. See final section for the adjusted number.


PAGESINNS breakdown:
PAGESINNS breakdown:
*Namespace '''{{ns:-2}}''' has ID -2, but we can't get a page count
{| class="wikitable"
*Namespace '''{{ns:-1}}''' has ID -1, but we can't get a page count
! Namespace
*Namespace '''Main''' does not return an ID number, but apparently it's 0, because <nowiki>{{PAGESINNS:0}}</nowiki> returns {{PAGESINNS:0}} pages, which agrees with [[Special:AllPages]]
! ID
*Namespace '''{{ns:1}}''' has ID 1 and {{PAGESINNS:1}} pages
! Page count
*Namespace '''{{ns:2}}''' has ID 2 and {{PAGESINNS:2}} pages
|-
*Namespace '''{{ns:3}}''' has ID 3 and {{PAGESINNS:3}} pages
| Media
*Namespace '''{{ns:4}}''' has ID 4 and {{PAGESINNS:4}} pages
| -2
*Namespace '''{{ns:5}}''' has ID 5 and {{PAGESINNS:5}} pages
| <not available>
*Namespace '''{{ns:6}}''' has ID 6 and {{PAGESINNS:6}} pages
|-
*Namespace '''{{ns:7}}''' has ID 7 and {{PAGESINNS:7}} pages
| Special
*Namespace '''{{ns:8}}''' has ID 8 and {{PAGESINNS:8}} pages
| -1
*Namespace '''{{ns:9}}''' has ID 9 and {{PAGESINNS:9}} pages
| <not available>
*Namespace '''{{ns:10}}''' has ID 10 and {{PAGESINNS:10}} pages
|-
*Namespace '''{{ns:11}}''' has ID 11 and {{PAGESINNS:11}} pages
| Main
*Namespace '''{{ns:12}}''' has ID 12 and {{PAGESINNS:12}} pages
| 0
*Namespace '''{{ns:13}}''' has ID 13 and {{PAGESINNS:13}} pages
| {{PAGESINNS:0}}
*Namespace '''{{ns:14}}''' has ID 14 and {{PAGESINNS:14}} pages
|-
*Namespace '''{{ns:15}}''' has ID 15 and {{PAGESINNS:15}} pages
| Talk
*Namespace '''{{ns:100}}''' has ID 100 and {{PAGESINNS:100}} pages
| 1
*Namespace '''{{ns:101}}''' has ID 101 and {{PAGESINNS:101}} pages
| {{PAGESINNS:1}}
*Namespace '''{{ns:102}}''' has ID 102 and {{PAGESINNS:102}} pages
|-
*Namespace '''{{ns:103}}''' has ID 103 and {{PAGESINNS:103}} pages
| User
*Namespace '''{{ns:104}}''' has ID 104 and {{PAGESINNS:104}} pages
| 2
*Namespace '''{{ns:105}}''' has ID 105 and {{PAGESINNS:105}} pages
| {{PAGESINNS:2}}
*Namespace '''{{ns:108}}''' has ID 108 and {{PAGESINNS:108}} pages
|-
*Namespace '''{{ns:109}}''' has ID 109 and {{PAGESINNAMESPACE:109}} pages
| User talk
| 3
| {{PAGESINNS:3}}
|-
| OniGalore
| 4
| {{PAGESINNS:4}}
|-
| OniGalore talk
| 5
| {{PAGESINNS:5}}
|-
| File
| 6
| {{PAGESINNS:6}}
|-
| File talk
| 7
| {{PAGESINNS:7}}
|-
| MediaWiki
| 8
| {{PAGESINNS:8}}
|-
| MediaWiki talk
| 9
| {{PAGESINNS:9}}
|-
| Template
| 10
| {{PAGESINNS:10}}
|-
| Template talk
| 11
| {{PAGESINNS:11}}
|-
| Help
| 12
| {{PAGESINNS:12}}
|-
| Help talk
| 13
| {{PAGESINNS:13}}
|-
| Category
| 14
| {{PAGESINNS:14}}
|-
| Category talk
| 15
| {{PAGESINNS:15}}
|-
| BSL
| 100
| {{PAGESINNS:100}}
|-
| BSL talk
| 101
| {{PAGESINNS:101}}
|-
| OBD
| 102
| {{PAGESINNS:102}}
|-
| OBD talk
| 103
| {{PAGESINNS:103}}
|-
| AE
| 104
| {{PAGESINNS:104}}
|-
| AE talk
| 105
| {{PAGESINNS:105}}
|-
| Oni2
| 108
| {{PAGESINNS:108}}
|-
| Oni2 talk
| 109
| {{PAGESINNS:109}}
|-
| XML
| 110
| {{PAGESINNS:110}}
|-
| XML talk
| 111
| {{PAGESINNS:111}}
|-
|}
 
'''All articlespaces''' (without File) totalled using PAGESINNS: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:2}}+{{PAGESINNS:4}}+{{PAGESINNS:8}}+{{PAGESINNS:10}}+{{PAGESINNS:12}}+{{PAGESINNS:14}}+{{PAGESINNS:100}}+{{PAGESINNS:102}}+{{PAGESINNS:104}}+{{PAGESINNS:108}}+{{PAGESINNS:110}}}}


'''All articlespaces''' (without File) totalled using PAGESINNS: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:2}}+{{PAGESINNS:4}}+{{PAGESINNS:8}}+{{PAGESINNS:10}}+{{PAGESINNS:12}}+{{PAGESINNS:14}}+{{PAGESINNS:100}}+{{PAGESINNS:102}}+{{PAGESINNS:104}}+{{PAGESINNS:108}}}}
'''All talkspaces''' totalled using PAGESINNS: {{#expr:{{PAGESINNS:1}}+{{PAGESINNS:3}}+{{PAGESINNS:5}}+{{PAGESINNS:9}}+{{PAGESINNS:11}}+{{PAGESINNS:13}}+{{PAGESINNS:15}}+{{PAGESINNS:101}}+{{PAGESINNS:103}}+{{PAGESINNS:105}}+{{PAGESINNS:109}}+{{PAGESINNS:111}}}}


'''All talkspaces''' totalled using PAGESINNS: {{#expr:{{PAGESINNS:1}}+{{PAGESINNS:3}}+{{PAGESINNS:5}}+{{PAGESINNS:9}}+{{PAGESINNS:11}}+{{PAGESINNS:13}}+{{PAGESINNS:15}}+{{PAGESINNS:101}}+{{PAGESINNS:103}}+{{PAGESINNS:105}}+{{PAGESINNS:109}}}}
'''All contentspaces''' (as currently defined in $wgContentNamespaces = {0, 2, 100, 102, 104, 108, 110}) totalled using PAGESINNS: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:2}}+{{PAGESINNS:100}}+{{PAGESINNS:102}}+{{PAGESINNS:104}}+{{PAGESINNS:108}}+{{PAGESINNS:110}}}}


The '''grand total''' for all namespaces (including File) is: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:1}}+{{PAGESINNS:2}}+{{PAGESINNS:3}}+{{PAGESINNS:4}}+{{PAGESINNS:5}}+{{formatnum:{{PAGESINNS:6}}|R}}+{{PAGESINNS:7}}+{{PAGESINNS:8}}+{{PAGESINNS:9}}+{{PAGESINNS:10}}+{{PAGESINNS:11}}+{{PAGESINNS:12}}+{{PAGESINNS:13}}+{{PAGESINNS:14}}+{{PAGESINNS:15}}+{{PAGESINNS:100}}+{{PAGESINNS:101}}+{{PAGESINNS:102}}+{{PAGESINNS:103}}+{{PAGESINNS:104}}+{{PAGESINNS:105}}+{{PAGESINNS:108}}+{{PAGESINNS:109}}}}
The '''grand total''' for all namespaces (including File) is: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:1}}+{{PAGESINNS:2}}+{{PAGESINNS:3}}+{{PAGESINNS:4}}+{{PAGESINNS:5}}+{{formatnum:{{PAGESINNS:6}}|R}}+{{PAGESINNS:7}}+{{PAGESINNS:8}}+{{PAGESINNS:9}}+{{PAGESINNS:10}}+{{PAGESINNS:11}}+{{PAGESINNS:12}}+{{PAGESINNS:13}}+{{PAGESINNS:14}}+{{PAGESINNS:15}}+{{PAGESINNS:100}}+{{PAGESINNS:101}}+{{PAGESINNS:102}}+{{PAGESINNS:103}}+{{PAGESINNS:104}}+{{PAGESINNS:105}}+{{PAGESINNS:108}}+{{PAGESINNS:109}}+{{PAGESINNS:110}}+{{PAGESINNS:111}}}}


===Redirects===
==Redirects==
There were 305 redirects as of 3/18/13 according to [[Special:ListRedirects]].
There were 353 redirects as of 2021-12-13 according to [[Special:ListRedirects]].


Redirect breakdown:
Redirect breakdown:
*Main:      266
{| class="wikitable"
*Talk:        0
|-
*Help:        1
| Main
*Help talk:  1
| 280
*File:        0
|-
*File talk0
| Talk
*AE:          5
| 0
*AE talk:    1
|-
*BSL:        2
| User
*BSL talk:    0
| 0
*OBD:        22
|-
*OBD talk:    0
| User talk
*OniGalore:  3
| 0
*Oni2:        3
|-
*Oni2 talk:  1
| OniGalore
*User:        0
| 2
*User talk:   0
|-
| OniGalore talk
| 0
|-
| File
| 0
|-
| File talk
| 0
|-
| MediaWiki
| 0
|-
| MediaWiki talk
| 0
|-
| Template
| 0
|-
| Template talk
| 0
|-
| Help
| 1
|-
| Help talk
| 1
|-
| Category
| 0
|-
| Category talk
| 0
|-
| BSL
| 2
|-
| BSL talk
| 0
|-
| OBD
| 18
|-
| OBD talk
| 0
|-
| AE
| 0
|-
| AE talk
| 0
|-
| Oni2
| 2
|-
| Oni2 talk
| 0
|-
| XML
| 46
|-
| XML talk
| 0
|}
 
==Conclusion==
NUMBEROFPAGES is too broad to be useful, but now that the page-count method is 'comma' (or 'all'), I am able to reconcile NUMBEROFARTICLES with PAGESINNS. PAGESINNS in turn reconciles with AllPages, which lists each page onscreen and is thus verifiable by a direct count (which I have done in the past). So to see how the math works out, we can get the directly-verifiable count by using PAGESINNS on all "content" namespaces, and then manually subtracting redirects as counted above.
 
Namespaces '''Main, User, BSL, OBD, XML, AE, and Oni2''' totaled using PAGESINNS: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:2}}+{{PAGESINNS:100}}+{{PAGESINNS:102}}+{{PAGESINNS:104}}+{{PAGESINNS:108}}+{{PAGESINNS:110}}}}


===Conclusion===
Same total minus redirects in those namespaces: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:2}}+{{PAGESINNS:100}}+{{PAGESINNS:102}}+{{PAGESINNS:104}}+{{PAGESINNS:108}}+{{PAGESINNS:110}}-280-0-2-18-46-0-2}}
NUMBEROFPAGES is worse than useless, and I cannot reconcile NUMBEROFARTICLES with PAGESINNS. On the other hand, PAGESINNS reconciles with AllPages, which makes it verifiable. So we need to get the true count by using PAGESINNS and subtracting redirects (which is unfortunate because the redirect part has to be counted manually).


Namespaces '''Main, User, BSL, OBD, AE, and Oni2''' totaled using PAGESINNS: {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:2}}+{{PAGESINNS:100}}+{{PAGESINNS:102}}+{{PAGESINNS:104}}+{{PAGESINNS:108}}}}
At the time of this writing (2021-12-13), this gives a total "content minus redirect" total of 919, which is 29 higher than the value (890) returned by NUMBEROFARTICLES. That's an acceptable margin of error, though one that could bear some investigation (I would be curious if the page-count method omits blank or very short pages, which we have a number of). Thus NUMBEROFARTICLES is fit for use on the main page.


'''''True count''''' (above total minus redirects in those namespaces): {{#expr:{{PAGESINNS:0}}+{{PAGESINNS:2}}+{{PAGESINNS:100}}+{{PAGESINNS:102}}+{{PAGESINNS:104}}+{{PAGESINNS:108}}-289-2-2-21-10-8}}
[[Category:Userspace]]

Latest revision as of 21:17, 13 December 2021

This page looks into the math that is used by MediaWiki to give various page counts.

Magic words

These first two magic words provide easy answers to the amount of content on the wiki, but are they accurate and useful?

NUMBEROFPAGES: 4,783

MW code says: Simply count all entries in page table.

Iritscen says: Though problematic in the past, this magic word now matches the grand total (see below) of all PAGESINNS counts, including files and redirects. However, we don't really want to display that catch-all number on our main page.

NUMBEROFARTICLES: 904

MW code says: If the count method global is set to 'link', the software gets a distinct count of the entries in the pagelinks table, "pl_from" field, that match those page ids. In other words, it filters out pages that do not link to other pages (the reasoning presumably being that "those aren't real wiki pages" if they're not connecting to anything else). It also filters out redirects. If the method is set to 'comma', it counts all non-blank pages (yes, really).

Iritscen says: Okay, $wgArticleCountMethod has now been set to 'comma'. (Note: As of MW 1.31, "The 'comma' value for $wgArticleCountMethod is no longer supported for performance reasons, and installations with this setting will now work as if it was configured with 'any'." It appears that 'any' will return the same result, counting all pages that are not redirects.)

PAGESINNS, AKA PAGESINNAMESPACE: These counts agree with the number of pages displayed for each namespace on the Special:AllPages page, which provides some much-needed verifiability. However, since Allpages counts redirect pages, that means that PAGESINNS does too. Therefore, we can't use a straight sum of PAGESINNS results as our page count. See final section for the adjusted number.

PAGESINNS breakdown:

Namespace ID Page count
Media -2 <not available>
Special -1 <not available>
Main 0 756
Talk 1 114
User 2 124
User talk 3 51
OniGalore 4 18
OniGalore talk 5 2
File 6 2,772
File talk 7 18
MediaWiki 8 42
MediaWiki talk 9 1
Template 10 135
Template talk 11 6
Help 12 3
Help talk 13 2
Category 14 202
Category talk 15 6
BSL 100 60
BSL talk 101 6
OBD 102 189
OBD talk 103 37
AE 104 23
AE talk 105 15
Oni2 108 37
Oni2 talk 109 18
XML 110 127
XML talk 111 19

All articlespaces (without File) totalled using PAGESINNS: 1716

All talkspaces totalled using PAGESINNS: 277

All contentspaces (as currently defined in $wgContentNamespaces = {0, 2, 100, 102, 104, 108, 110}) totalled using PAGESINNS: 1316

The grand total for all namespaces (including File) is: 4783

Redirects

There were 353 redirects as of 2021-12-13 according to Special:ListRedirects.

Redirect breakdown:

Main 280
Talk 0
User 0
User talk 0
OniGalore 2
OniGalore talk 0
File 0
File talk 0
MediaWiki 0
MediaWiki talk 0
Template 0
Template talk 0
Help 1
Help talk 1
Category 0
Category talk 0
BSL 2
BSL talk 0
OBD 18
OBD talk 0
AE 0
AE talk 0
Oni2 2
Oni2 talk 0
XML 46
XML talk 0

Conclusion

NUMBEROFPAGES is too broad to be useful, but now that the page-count method is 'comma' (or 'all'), I am able to reconcile NUMBEROFARTICLES with PAGESINNS. PAGESINNS in turn reconciles with AllPages, which lists each page onscreen and is thus verifiable by a direct count (which I have done in the past). So to see how the math works out, we can get the directly-verifiable count by using PAGESINNS on all "content" namespaces, and then manually subtracting redirects as counted above.

Namespaces Main, User, BSL, OBD, XML, AE, and Oni2 totaled using PAGESINNS: 1316

Same total minus redirects in those namespaces: 968

At the time of this writing (2021-12-13), this gives a total "content minus redirect" total of 919, which is 29 higher than the value (890) returned by NUMBEROFARTICLES. That's an acceptable margin of error, though one that could bear some investigation (I would be curious if the page-count method omits blank or very short pages, which we have a number of). Thus NUMBEROFARTICLES is fit for use on the main page.