URI (Uniform resource identifier)

A string that identifies a resource according to a schema. super-type of:

  • URL: defines the means to access a resource(e.g webisite page).
  • URN: defines a unique resource in a namespace(e.g. isbn)

Syntax

ALPHA = a-z / A-Z
DIGIT = 0-9

Reserved Characters:

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

Characters used to delimit parts of a URI and shouldnt be encoded, if used as delimiters. Application should (percent)-encode these characters, unless allowed by the uri scheme.

Unreserved Characters:

unreserved = a-z / A-Z / 0-9 / "-" / "." / "_" / "~"

Characters allowed in an uri that do not have a special purpose

URI Parts

  foo://example.com:8042/over/there?name=ferret#nose
  \_/   \______________/\_________/ \_________/ \__/
  |           |            |            |        |
  scheme     authority       path        query   fragment

Scheme

scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

defines the specification of the uri, schemes should be registerd to IANA.

Authority

authority = [ userinfo "@" ] host [ ":" port ]

the authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.

userinfo

The userinfo subcomponent may consist of a user name and, optionally, scheme-specific information about how to gain authorization to access the resource. The user information, if present, is followed by a commercial at-sign ("@") that delimits it from the host.

host

host = IP-literal / IPv4address / reg-name

Name registered for DNS

reg-name    = *( unreserved / pct-encoded / sub-delims )

port

port = DIGIT

a sequence of digits that defines the socket port

Path

The path component contains data, usually organized in hierarchical form, that, along with data in the non-hierarchical query component, serves to identify a resource within the scope of the URI's scheme and naming authority (if any). The path is terminated by the first question mark ("?") or number sign ("#") character, or by the end of the URI.

  path          = path-abempty    ; begins with "/" or is empty
                / path-absolute   ; begins with "/" but not "//"
                / path-noscheme   ; begins with a non-colon segment
                / path-rootless   ; begins with a segment
                / path-empty      ; zero characters

  path-abempty  = *( "/" segment )
  path-absolute = "/" [ segment-nz *( "/" segment ) ]
  path-noscheme = segment-nz-nc *( "/" segment )
  path-rootless = segment-nz *( "/" segment )
  path-empty    = 0<pchar>

  segment       = *pchar
  segment-nz    = 1*pchar
  segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                ; non-zero-length segment without any colon ":"

  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

A path consists of a sequence of path segments separated by a slash ("/") character. A path is always defined for a URI, though the defined path may be empty (zero length). Use of the slash character to indicate hierarchy is only required when a URI will be used as the context for relative references.

Query

The query component contains non-hierarchical data that, along with data in the path component, serves to identify a resource within the scope of the URI's scheme and naming authority (if any). The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.

 query = *( pchar / "/" / "?" )

Fragment

The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. Follows the path or query,separated by the "#" sign.

fragment = *( pchar / "/" / "?" )

Reference