Like almost all large
websites
, Wikipedia also suffers from the
phenomenon
known as
link rot
, where external links go stale after a period of time. As of the
November 6
,
2006
database dump
, Wikipedia contained
2,578,134
external links, and roughly 10% of these links are broken in some manner.
Dead links
are unprofessional, and should be fixed on a regular basis. You can try to find the current location of the resource using a Google search. Dead links of online newspaper articles can be converted to references to off-line sources.
Do
not
simply remove dead links; they often contain valuable information.
However, if unsuccessful tag the link with {{
dead link
}} which will notify other editors that the link is dead and optionally provide a link to the
Internet Archive
. See
Wikipedia:Citing sources#What to do when a reference link "goes dead"
and
Wikipedia:Using the Wayback Machine
.
This page is intended to be a
clearing house
for all such external links.
If you make corrections to the source
article
to fix a broken link, please indicate so below to prevent a duplication of effort
. Also use the following edit summary can help increase the awareness of the problem:
Fixed broken links to external websites; [[Wikipedia:Dead external links|you can help too!]]
Although the sections below contain a short description of the status code in question, please see the
list of HTTP status codes
for a more complete description.
The 200 status code indicates that the link is correctly formed, and retrievable. Although such links do not need correction, they are included here for completeness. Wikipedia currently contains
2,171,863
of these links. Due to the sheer number of links that correctly resolve, these are not available for download.
Indicates that the website requested more information from the
bot
so that it could make an appropriate presentation of the content. Although such links are most likely correct, they should probably be double checked. Wikipedia currently contains
143
of these links.
Indicates that the content has been moved permanently, and that the link inside Wikipedia should probably be updated to reflect the new location. Although this should not be changed for all sites as some sites use 301 redirects to redirect pages that change their destination often. Wikipedia currently contains
84,303
of these links.
Indicates that the content has been temporarily moved, and that the client should continue to use the original link. Although these links should be correct in
theory
, they are often used by
link farms
, and should probably be checked. Wikipedia currently contains
146,643
status 302 links,
1567
status 303 links, and
88
status 307 links.
Indicates that the site in question could not understand the bot's request. Although these should hopefully diminish with future revisions of the bot, it may be useful to test them, anyway (low priority). Wikipedia currently contains
1,604
of these links. Note: links with anchors and HTML entities should be ignored (see talk page).
The page required
authorization
, which the bot does not support. The page in question may have included
login
information, the bot has no way of knowing this. Such links should be fixed if the page does not contain login information. Wikipedia currently contains
672
status 401 links.
Although not an active status code, the servers used it anyway. It indicates that the server requested payment (in theory) from the client. Such links should be fixed. Wikipedia currently contains
4
of these links.
"Forbidden" - this generally indicates the server software itself cannot access the location where the file would be found, or that access to that location is not permitted from the internet under any circumstance - login or authorization information will not change things. Some for-pay reference sites, such as
http://www.jstor.org/
, might give partial access in the response (e.g. display the first page), which might still be useful. Often a symptom of link rot. Such links should be fixed. Wikipedia currently contains
7,984
status 403 links.
The
404 error
is the most common symptom of link rot, and it indicates that the page has not been found. The 410 status code is similar, but indicates that the file has permanently gone. Such links are required by policy to be repaired, perhaps with a link to the
Internet Archive
,
WebCite
or by finding the current location of the page if it has been moved without a forwarding redirect. Wikipedia currently contains
92,808
status 404 links and
229
status 410 links.
Occurs for a number of reasons, indicates that the client request was unacceptable in some manner. Should probably be fixed. Wikipedia currently contains
1,521
of these links.
Indicates some sort of error that the client needs to resolve. Should probably be fixed. Wikipedia currently contains
1
of these links.
Although not an active status code, servers use it to indicate some sort of "Locked" error. Wikipedia currently contains
6
of these links.
Another non-active status code from a single server, http://www.worldofspectrum.org/. The message it returned at that time was "Mirroring Denied", but those links work OK now. See also
Apache docs
which indicate a message of "No code", indicating a server misconfiguration.
Indicates there was some sort of internal server error. This could be the result of a malformed bot HTTP request, or numerous other reasons. Should be examined to determine whether the site is suffering from some sort of permanent problem with the link in question. Wikipedia currently contains
17,625
status 500 links,
22
status 501 links,
481
status 502 links, and
714
status 503 links.
NA - Unsupported protocol
[
?????????
]
Indicates that the link was used a
protocol
such as
IRC
,
Gopher
, etc. that the bot is not capable of resolving. Should be checked as to whether the resource type is correct (e.g.
htttp
://www.wikipedia.org instead of
http
). Wikipedia currently contains
331
of these links.
NA - Unknown error
[
?????????
]
Indicates that the bot had some sort of difficulty resolving the link in question. Could be caused by a number of errors:
DNS
lookup failures,
socket
timeouts
, etc. The default socket timeout was set to 30
seconds
, which may be too low for some very slow sites. Should probably be tested. Wikipedia currently contains
48,600
of these links.
Below are links to download
tab
separated text files (
gzip
compressed
) containing the links. They are in the form:
Article title, [tab], URL, [tab], further description (as in [http://www.wikipedia.org/ Wikipedia] links), [tab], error code, [tab], server response. These should probably be located to somewhere more permanent in the future.
200
(not available)
300
-
301
-
302
-
303
-
307
400
-
401
-
402
-
403
-
404
-
406
-
409
-
410
-
423
-
425
500
-
501
-
502
-
503
NA (Unsupported protocol)
-
NA (Unknown error)
The 404 errors have pages to themselves. These have now been updated to reflect the November 6, 2006 database update:
- misc
, 2964 entries
- a
, 5987 entries
- b
, 4723 entries
- c
, 6298 entries
- d
, 4179 entries
- e
, 3013 entries
- f
, 2939 entries
- g
, 3322 entries
- h
, 3770 entries
- i
, 2179 entries
- j
, 4467 entries
- k
, 2312 entries
- l
, 6347 entries
- m
, 6672 entries
- n
, 3375 entries
- o
, 1806 entries
- p
, 4295 entries
- q
, 224 entries
- r
, 3808 entries
- s
, 7540 entries
- t
, 5535 entries
- u
, 1592 entries
- v
, 1195 entries
- w
, 2686 entries
- x
, 48 entries
- y
, 481 entries
- z
, 328 entries
Please indicate your correction status in the form "123: ABC - XYZ", eg, "404:
African Academy of Sciences
-
anonymous remailer
"
External link status as of April 2007
-->