Saturday, August 26, 2017

2017-08-26: rel="bookmark" also does not mean what you think it means

Extending our previous discussion about how the proposed rel="identifier" is different from rel="canonical" (spoiler alert: "canonical" is only for pages with duplicative text), here I summarize various discussions about why we can't use rel="bookmark" for the proposed scenarios.  We've already given a brief review of why rel="bookmark" won't work (spoiler alert: it is explicitly prohibited for HTML <link> elements or HTTP Link: headers) but here we more deeply explore the likely original semantics. 

I say "likely original semantics" because:
  1. the short phrases in the IANA link relations registry ("Gives a permanent link to use for bookmarking purposes") and the HTML5 specification ("Gives the permalink for the nearest ancestor section") are not especially clear, nor is the example in the HTML5 specification.
  2.  rel="bookmark" exists to address a problem, anonymous content, that has been so thoroughly solved that the original motivation is hard to appreciate. 
In our Signposting work, we had originally hoped we could use rel="bookmark" to mean "please use this other URI when you press control-D".  For example, we hoped the HTML at http://www.sciencedirect.com/science/article/pii/S038800011400151X could have:

<link rel="bookmark">http://dx.doi.org/10.1016/j.langsci.2014.12.003</link>

And when the user hit "control-D" (the typical keyboard sequence for bookmarking), the user agent would use the doi.org URI instead of the current URI at sciencedirect.com.  But alas, that's not why rel="bookmark" was created, and the original intention is likely why rel="bookmark" is prohibited from <link> elements.  I say likely because the motivation is not well documented and I'm inferring it from the historical evidence and context.

In the bad old days of the early web, newsfeeds, blogs, forums, and the like did not universally support deep links, or permalinks, to their content.  A blog would consist of multiple posts displayed within a single page.  For example page 1 of a blog would have the seven most recent posts, page 2 would have the previous seven posts, etc.  The individual posts were effectively anonymous: you could link to the "top" of a blog (e.g., blog.dshr.org), but links to individual posts were not supported; for example this individual post from 2015 is no longer on page 1 of the blog and without the ability to link directly to its permalink, one would have click backwards through many pages to discover it.

Of course, now we take such functionality for granted -- we fully expect to have direct links to individual posts, comments, etc.  The earliest demonstration I can find is from this blog post from 2000 (the earliest archived version is from 2003, here's the 2003 archived version of top-level link to the blog where you can see the icon the post mentions).   This early mention of a permalink does not use the term "permalink" or relation rel="bookmark"; those would follow later. 

The implicit model with permalinks appears to be that there would be > 1 rel="bookmark" assertions within a single page, thus the relation is restricted to <a> and <area> elements.  This is because <link> elements apply to the entire context URI (i.e., "the page") and not to specific links, so having > 1 <link> elements with rel="bookmark" would not allow agents to understand the proper scoping of which element "contains" the content that has the stated permalink (e.g., this bit of javascript promotes rel="bookmark" values into <link> elements, but scoping is lost).  An ASCII art figure is order here:

+----------------------------+
|                            |
|  <A href="blog.html"       |
|     rel=bookmark>          |
|  Super awesome alphabet    |
|  blog! </a>                |
|  Each day is a diff letter!|
|                            |
|  +---------------------+   |
|  | A is awesome!!!!    |   |
|  | <a href="a.html"    |   |
|  |    rel=bookmark>    |   |
|  | permalink for A </a>|   |
|  +---------------------+   |
|                            |
|  +---------------------+   |
|  | B is better than A! |   |
|  | <a href="b.html"    |   |
|  |    rel=bookmark>    |   |
|  | permalink for B </a>|   |
|  +---------------------+   |
|                            |
|  +---------------------+   |
|  | C is not so great.  |   |
|  | <a href="c.html"    |   |
|  |    rel=bookmark>    |   |
|  | permalink for C </a>|   |
|  +---------------------+   |
|                            |
+----------------------------+

$ curl blog.html
Super awesome alphabet blog!
Each day is a diff letter!
A is awesome!!!!
permalink for A
B is better than A!
permalink for B 
C is not so great.
permalink for C
$ curl a.html
A is awesome!!!!
permalink for A
$ curl b.html
B is better than A!
permalink for B 
$ curl c.html
C is not so great.
permalink for C


In the example above, the blog has a rel="bookmark" to itself ("blog.html") and since the <a> element appears at the "top level" of the HTML, it is understood that the scope of the element applies to the entire page.  In the subsequent posts, the scope of the link is bound to some ancestor element (perhaps a <div> element) and thus it does not apply to the entire page.  The rel="bookmark" to "blog.html" is perhaps unnecessary, the user agent already knows its own context URI (in other words, a user agent typically knows the URL of the page it is currently displaying (but might not in some conditions, like being the response to a POST request), but surfacing the link with an <a> element makes it easy for the user to right-click, copy-n-paste, etc.  If "blog.html" had four <link rel="bookmark" > elements, the links would not be easily available for user interaction and scoping information would be lost.

And it's not just for external content ("a.html", "b.html", "c.html") like the example above.  In the example below, rel="bookmark" is used to provide permalinks for individual comments contained within a single post.

+----------------------------+
|                            |
|  <A href="a.html"          |
|     rel=bookmark>          |
|  A is awesome!!!!</a>      |
|                            |
|  +---------------------+   |
|  | <a name="1"></a>    |   |
|  | Boo -- I hate A.    |   |
|  | <a href="a.html#1"  |   |
|  |    rel=bookmark>    |   |
|  | 2017-08-01 </a>     |   |
|  +---------------------+   |
|                            |
|  +---------------------+   |
|  | <a name="2"></a>    |   |
|  | a series of tubes!  |   |
|  | <a href="a.html#2"  |   |
|  |    rel=bookmark>    |   |
|  | 2017-08-03 </a>     |   |
|  +---------------------+   |
|                            |
+----------------------------+


This style exposes the direct links of the individual comments, and in this case the anchor text for the permalink is the datestamp of when the post was made (by convention, permalinks often have anchor text or title attributes of "permalink", "permanent link", datestamps, the title of the target page, or variations of these approaches).  Again, it would not make sense to have three separate <link rel="bookmark" > elements here, obscuring scoping information and inhibiting user interaction. 

So why prohibit <link rel="bookmark" > elements?  Why not allow just a single <link rel="bookmark" > element in the <head> of the page, which would by definition enforce the scope to apply to the entire document?  I'm not sure, but I guess it stems from 1) the intention of surfacing the links to the user, 2) the assumption that a user-agent already knows the URI of the current page, and 3) the assumption that there would be > 1 bookmarks per page.  I suppose uniformity was valued over expressiveness. A 1999 HTML specification does not explicitly mention the <link> prohibition, but it does mention having several bookmarks per page.

An interesting side note is that while typing self-referential, scoped links with rel="bookmark" to differentiate them from just regular links to other pages seemed like a good idea ca. 1999, such links are now so common that many links with the anchor text "permalink" or "permanent link" often do not bother to use rel="bookmark" (e.g., Wikipedia pages all have "permanent link" in the left-hand column, but do not use rel="bookmark" in the HTML source, but the blogger example captured in the image above does use bookmark).  The extra semantics are no longer novel and are contextually obvious. 

In summary, in much the same way there is confusion about rel="canonical",  which is better understood as rel="hey-google-index-this-url-instead", perhaps a better name for rel="bookmark" would have been rel="right-click".  If you s/bookmark/right-click/g, the specifications and examples make a lot more sense. 

--Michael & Herbert



N.B. This post is a summary of discussions in a variety of sources, including this WHATWG issue, this tweet storm, and this IETF email thread

No comments:

Post a Comment