What is a Uniform Resource Identifier (URI)?
A Uniform Resource Identifier (URI) is a character sequence that identifies a logical (abstract) or physical resource — usually, but not always, connected to the internet. A URI distinguishes one resource from another.
URIs enable internet protocols to facilitate interactions between and among these resources. The strings of characters incorporated in a URI serve as identifiers, such as a scheme name and a file path.
In the URI, the file path may be empty.
A Uniform Resource Locator (URL), or web address, is the most common form of URI. It is used for unambiguously identifying and locating websites or other web-connected resources.
How Uniform Resource Identifiers work
A URI provides a simple, extensible way to identify internet resources. Thanks to the uniformity that URIs provide, different types of resource identifiers can be used in the same context, regardless of the mechanisms used to access those resources.
The resource identifiers can also be reused in different contexts.
URIs can identify different types of resources, including:
- electronic documents
- information sources with a consistent purpose
URIs and their generic syntax are defined in the Internet Engineering Task Force (IETF) Request for Comments (RFC) 3986. According to these specifications, these resources do not have to be accessible on the internet.
They are also summarized and extended in a W3C document for the W3C’s “World Wide Web project,” authored by Tim Berners-Lee.
Uniform Resource Identifier syntax
The generic form of any URI scheme is
A URI may consist of the following elements:
Within the URI, the first element is the scheme name. Schemes are case-insensitive and separated from the rest of the object by a colon. The scheme establishes the concrete syntax and associated protocols for the URI.
Ideally, URI schemes should be registered with the Internet Assigned Numbers Authority (IANA) although nonregistered schemes can also be used.
If the URI is telnet://192.0.2.16:80, the scheme name is “telnet.”
The URI’s authority component is made up of multiple parts: a host consisting of either a registered name or an IP address, an optional authentication section and an optional port number.
The authentication section contains the username and password, separated by a colon, and followed by the symbol for at (@). After the @ comes the hostname, followed by a colon and then a port number. IPv4 addresses are commonly in a dot-decimal notation, and IPv6 addresses, which need to be in brackets, are typically in hexadecimal form.
The path containing data is notated by a sequence of segments separated by slashes. These slashes imply a hierarchical structure. The path begins with a single slash, whether or not an authority is present. However, the path cannot start with a double slash. This part of the syntax may closely resemble a particular file path but does not always imply a relation to that file system path.
In the previous URI example (telnet://192.0.2.16:80), a scheme name is present. The numbers after the double slash constitute the authority. Because no characters come after the slash, it indicates that the path is empty.
The query contains a string of non-hierarchical data. It is often a sequence of attribute-value pairs separated by a delimiter, such as an ampersand (&) or semicolon. A question mark separates the query from the part that comes before it.
The string represents some operation applied to a “queryable” object by the URI.
In the URI
the query is name=parrot#beak.
However, because this part of the syntax is optional, it may not always be present.
The fragment contains an identifier that provides direction to a secondary resource. It is separated from the preceding part of the URI by a hash (#).
If the primary resource is an HTML document or article, the fragment may be an ID attribute of a specific element of that resource. In this case, a web browser will scroll this particular element into view.
However, if the fragment ID is void, it indicates that the URI refers to the whole object. In this case, the hash sign may be omitted.
Types of Uniform Resource Identifiers
Uniform Resource Locator (URL)
A URL is used to identify and locate webpages.
A URI identifies a resource but does not imply or guarantee access to it. A URL, however, not only identifies the resource, but also specifies how it can be accessed or where it is located. This is why a URL contains unique components, such as the protocol, domain and/or subdomain, in addition to other URI components.
A URL is a subset of URIs. This means all URLs are URIs.
However, not all URIs are URLs.
A URL begins by stating the protocol that should be used to access and locate the logical or physical resource on a particular network.
- If the resource is a webpage, the URL starts with the protocol HTTP or HTTPS.
- If the resource is a file, the URL begins with the protocol FTP.
- For an email address, the URL starts with the protocol “mailto.”
A URL is a location-dependent URI that may or may not be persistent. This means that if the resource’s location changes, the URL also changes to reflect and point to the new location.
Uniform Resource Name (URN)
Like a URL, a URN identifies a resource. However, unlike a URL, a URN is location-independent and persistent, meaning that it always identifies the same resource over time. A URN continues to persist even when the resource no longer exists or becomes unavailable.
A URN does not state which protocol should be used to locate and access the resource. Instead, it labels the resource with a persistent, location-independent and unique identifier.
A URN has three components:
- The label “urn”
- A colon
- A character string as the unique identifier
URN examples (provided by IETF RFC 2986):
URI vs. URL
Although often used interchangeably, the “URI” and “URL” are different. A URI is an identifier of a specific resource while a URL is a special type of identifier that identifies a resource and specifies how it can be accessed.
The analogy of a person’s name and address can explain this difference. In this case, the name is the URI because it identifies the person. However, it doesn’t explain how the person can be found or where they live. For this, the address or URL is required.
Moreover, a URI can be used to identify and differentiate various types of files and resources, including HTML and XML, from each other. However, URLs can only be used to identify and locate webpages and resources. If a protocol, such as FTP or HTTPS, is present or implied for a domain, it is called a URL, even though it is also a URI.
Uniform Resource Identifier resolution and references
Two additional aspects of Uniform Resource Identifiers are resolution and references.
URI resolution is one of a few common operations performed on URIs that are also URLs. It involves determining the proper data access method and parameters needed to locate and retrieve the resource that the URI represents.
A URI reference is used to determine common usage for a URI and may appear as a full URI, part of a full URI or an empty string. If there is a fragment identifier, it will identify part of the resource referred to by the rest of the URI.
A URI reference can be a URI, but it can also be a relative reference. In this case, the URI reference’s prefix does not match the syntax of a scheme followed by its colon separator. To determine which components are present and whether the reference is relative, each of the URI components is parsed for its subparts and validation.