One thing that is rather common, especially on websites whose content is not in English, is URLs that contain unencoded characters such as space, å, ä, or ö. While this works most of the time it can cause problems.
Looking at RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax, characters that are allowed to be used unencoded in URLs are either reserved or unreserved. The unreserved characters are a-z
, A-Z
, 0-9
, -
, .
, _
, and ~
. The reserved characters are used as delimiters and are :
, /
, ?
, #
, [
, ]
, @
, !
, $
, &
, '
, (
, )
, *
, +
, ,
, ;
, and =
.
In essence this means that the only characters you can reliably use for the actual name parts of a URL are a-z
, A-Z
, 0-9
, -
, .
, _
, and ~
. Any other characters need to be Percent encoded.
Posted in Browsers, Quick Tips, Usability, Web Standards.