Uniform Resource Locator (URL)

Jai Ganesh · 2023-12-12 17:46:07

Uniform Resource Locator (URL)

Gist

A Uniform Resource Locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it.

Summary

With Hypertext and HTTP, URL is one of the key concepts of the Web. It is the mechanism used by browsers to retrieve any published resource on the web.

URL stands for Uniform Resource Locator. A URL is nothing more than the address of a given unique resource on the Web. In theory, each valid URL points to a unique resource. Such resources can be an HTML page, a CSS document, an image, etc. In practice, there are some exceptions, the most common being a URL pointing to a resource that no longer exists or that has moved. As the resource represented by the URL and the URL itself are handled by the Web server, it is up to the owner of the web server to carefully manage that resource and its associated URL.

Scheme

The first part of the URL is the scheme, which indicates the protocol that the browser must use to request the resource (a protocol is a set method for exchanging or transferring data around a computer network). Usually for websites the protocol is HTTPS or HTTP (its unsecured version). Addressing web pages requires one of these two, but browsers also know how to handle other schemes such as mailto: (to open a mail client), so don't be surprised if you see other protocols.

Authority

Next follows the authority, which is separated from the scheme by the character pattern : //. If present the authority includes both the domain (e.g. www.example.com) and the port (80), separated by a colon:

The domain indicates which Web server is being requested. Usually this is a domain name, but an IP address may also be used (but this is rare as it is much less convenient).

The port indicates the technical "gate" used to access the resources on the web server. It is usually omitted if the web server uses the standard ports of the HTTP protocol (80 for HTTP and 443 for HTTPS) to grant access to its resources.

Otherwise it is mandatory.

Details

A Uniform Resource Locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.

Most web browsers display the URL of a web page above the page in an address bar. A typical URL could have the form http://www.example.com/index.html, which indicates a protocol (http), a hostname (www.example.com), and a file name (index.html).

History

Uniform Resource Locators were defined in RFC 1738 in 1994 by Tim Berners-Lee, the inventor of the World Wide Web, and the URI working group of the Internet Engineering Task Force (IETF), as an outcome of collaboration started at the IETF Living Documents birds of a feather session in 1992.

The format combines the pre-existing system of domain names (created in 1985) with file path syntax, where slashes are used to separate directory and filenames. Conventions already existed where server names could be prefixed to complete file paths, preceded by a double slash (//).

Berners-Lee later expressed regret at the use of dots to separate the parts of the domain name within URIs, wishing he had used slashes throughout, and also said that, given the colon following the first component of a URI, the two slashes before the domain name were unnecessary.

Early WorldWideWeb collaborators including Berners-Lee originally proposed the use of UDIs: Universal Document Identifiers. An early (1993) draft of the HTML Specification referred to "Universal" Resource Locators. This was dropped some time between June 1994 (RFC 1630) and October 1994 (draft-ietf-uri-url-08.txt). In his book Weaving the Web, Berners-Lee emphasizes his preference for the original inclusion of "universal" in the expansion rather than the word "uniform", to which it was later changed, and he gives a brief account of the contention that led to the change.

Additional Information

A URL (Uniform Resource Locator) is a unique identifier used to locate a resource on the Internet. It is also referred to as a web address. URLs consist of multiple parts -- including a protocol and domain name -- that tell a web browser how and where to retrieve a resource.

End users use URLs by typing them directly into the address bar of a browser or by clicking a hyperlink found on a webpage, bookmark list, in an email or from another application.

URL, compact string of numbers, letters, and symbols that a computer uses to find a resource on a network and act upon it. URLs are often colloquially referred to as Web addresses, or simply addresses, since Web pages are the most common resources that users employ URLs to find. However, all files storable on a server have their own unique URLs, from Web pages and applications to documents, videos, and images.

The format of a URL was standardized in 1994 by the network working group of the Internet Engineering Task force (IETF), which included World Wide Web inventor Tim Berners-Lee. Initially, URLs were expressible only in the characters of the American Standard Code for Information Interchange (ASCII). This limitation has now been surmounted with software that automatically converts URLs written in other languages (International Resource Identifiers, or IRIs) into ASCII text.

URLs can be quite long, but only four segments are typically referenced by users, all of which are on display in the URL https://www.britannica.com/technology/url. Those segments are, in order: the scheme (or protocol) used to access the resource (https), an optional subdomain name (www), the domain or Internet protocol (IP) address of the server (Britannica.com), and, if necessary, the path (/technology/url).

The scheme represents the method by which the files are to be exchanged or transferred. A standard protocol used today is the hypertext transfer protocol secure (HTTPS), which tells a Web browser to display a requested Web page, typically in hypertext format (HTML). Other common protocols are the file transfer protocol (FTP), for transferring files, and the simple mail transfer protocol (SMTP), for sending e-mail. The specified protocol is followed by a colon and two forward slashes.

The protocol is sometimes followed by a subdomain name, which means the URL is the address of a subsection of the main website. If the subdomain name is www, standing for the World Wide Web, the subdomain should either send the site’s visitor directly to the main site or homepage. Many subdomain names reference the type of content that a visitor can expect from the subdomain—for example, play.google.com.

The domain name (again, such as Britannica) is the unique identifier of the website. A domain name is followed by a domain extension or top-level domain (TLD), which theoretically specifies the site’s purpose. Examples include .biz for business, .gov for government agencies, and .mil for military sites. The .com extension originally designated websites made for commercial use but is now considered generic. An extension may also indicate the country in which the domain name is registered—for example, www.royal.uk. More than one extension may be used as well, as in the case of www.news.com.au.

Finally, a user might add a path onto the end of the URL—that is, the path through the structure of the website that the computer will have to take to find the desired file. Each additional step that the computer must take is bracketed by forward slashes. This Web page’s address of www.britannica.com/technology/url identifies it as residing within the /technology subdirectory.

For a more fulsome example of a URL that might appear in a browser after a user has searched for a desired file, consider the URL https://www.domainname.com:80/subdirectory1/subdirectory2/file.html?key1=value1&value1&key2=value2#bookmark.

The number 80 in the longer URL above is the number of the port used to access the desired resource. Ports are technical “gates” reserved for different purposes, such as file servers or Web servers. Web browsers must connect to the appropriate port in order to access a server’s resources. However, the port is usually unnecessary for a user to specify while searching for a certain Web page, because the Web server will use the standard port for the HTTPS protocol.

The section of the example URL following the question mark is the query string. A query string can be composed of additional search parameters beyond the base URL, such as the specific words input into a search engine. These parameters appear as key/value pairs separated by ampersand (&) symbols.

Finally, #bookmark in the above example is a URI (Uniform Resource Identifier) fragment. The number sign, known in this context as an anchor, acts like a bookmark within the resource, instructing the Web browser to show the content at that particular point. For example, a number sign followed by a word is an anchor at that word in an online document.

Math Is Fun Forum

#1 2023-12-12 17:46:07

Uniform Resource Locator (URL)

Board footer