|
Internationalized Resource Identifier
|
On the Internet, the Internationalized Resource Identifier (IRI) is a generalization of the Uniform Resource Identifier (URI), which is in turn a generalization of the Uniform Resource Locator (URL). While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth.
It is defined by RFC 3987.
Advantages
There are reasons to see URIs displayed in different languages; mostly it makes it easier on users who are unfamiliar with the roman alphabet, and assuming that isn't too difficult for anyone to replicate arbitrary unicode on their keyboards this can make the URI system more worldly and accessible.
Disadvantages
Mixing IRIs and ascii URIs can make it much easier to do phishing attacks which trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay.com or www.paypal.com with an internationalized look-alike "a" character, and point that IRI to a malicious site.
Additionally, the extended character set could ironically fracture the universality of URIs ("universal" is the "U" in URI). It now becomes very difficult for those with different language keyboards to access web resources in other languages; by analogy, open-source programming projects (and most programs) are almost exclusively written using the roman alphabet to avoid this type of encoding incompatibility.
See also
- XRI (Extensible Resource Identifier)
- IDN (Internationalized Domain Name)
- Punycode
External links
|