A Twitter card often appears in a tweet when a user shares a URL. We found that if the page being shared on Twitter contains an HTML META redirect, Twitter will not follow it to gather information for the card.
On October 26, we (Mat Kelly and Shawn Jones) were confused by a Tweet from Tamara Munzner (memento) about IEEE VIS 2021. We were both perplexed because the card in the tweet displays the text "IEEE VIS 2020" instead of "IEEE VIS 2021".
|
A screenshot of Tamara Munzner's tweet promoting IEEE VIS 2021. We use a screenshot instead of an embed in case Twitter fixes this issue. |
As mentioned in Shawn's recent ACM/IEEE JCDL 2021 and ACM Web Science 2021 papers, his dissertation, and Twitter's Card documentation, whenever a web page is shared on Twitter, Twitter searches for META tags inside the page. If these META tags have values for the fields twitter:card, twitter:title, twitter:description, and twitter:image, then Twitter will use these values for the title, description, and striking image of the card that is displayed whenever someone shares that web page. If the Facebook equivalents of these fields (og:title, og:description, og:image) are present instead of their corresponding twitter:* fields, then Twitter will apply their values as long as twitter:card is still in the page's metadata.
If we visit the page for IEEE VIS 2021 and view its source, we find no evidence of the number 2020 in its HTML. Additionally, the metadata for that page mentions IEEE VIS 2021, so, Mat asked Shawn, why is the card still displaying IEEE VIS 2020 when the metadata clearly says IEEE VIS 2021? Shawn clicked on the link from the tweet and copied the URL from his browser into the Twitter Card Validator and it displayed a different card than the one above.
The URL shared in the tweet is https://virtual.ieeevis.org/, but the browser had redirected Shawn to https://virtual.ieeevis.org/year/2021/index.html. Mat turned to curl for an answer.
Curl helped us see that the real content at https://virtual.ieeevis.org/ was an HTML
META redirect. Shawn's browser had followed this redirect to the 2021 version of the page, so that explained why Shawn could not reproduce the card, but why was https://virtual.ieeevis.org/ displaying a card at all when there was no metadata in the HTML
META redirect? Clearly IEEE VIS 2021 intended for their information to be current, so why was Twitter sharing content from a 2020 version of the page? We discovered a 2020
memento that contains the
twitter:* fields in its
META tags. These fields do contain the values for the title, description, and image seen in the incorrect card at the beginning of this post. People were not sharing this memento's URI-M. They were sharing the current page's URI-R, so that did not explain the issue.
Thus, we constructed a hypothesis of two parts:
- Twitter's card generator will not follow a redirect issued by HTML META refresh
- Twitter is using cached stale metadata to continue to generate a card for https://virtual.ieeevis.org/
To test our two-part hypothesis, Shawn created a test page at https://www.shawnmjones.org/experiments/test-redirect.html, but with the following content:
Shawn tested this page with the Twitter card validator and it produced a card with the values from the page's metadata, as shown below.
|
Our test page without redirection successfully generates a card. |
Shawn posted a tweet for this page.
|
The properly rendered card, in a tweet, before we add the redirect. |
Shawn then changed the content of this page to the following, using relative URIs to mimic what IEEE VIS 2021 had done:
Shawn also created a page for redirected-page.html containing the following:
The page at redirected-page.html does contain the correct metadata fields with values, so if Twitter would have followed the redirect, we should have seen a card containing these values. We then created another tweet again with https://www.shawnmjones.org/experiments/test-redirect.html, now that the redirect is in place. Because Twitter had no metadata to use with the page, Twitter did not render a card in new tweet with the card. Because it failed to produce a card at all, this also means that Twitter failed to follow the HTML META redirect.
|
When we shared a link containing an HTML META redirect, Twitter did not follow the redirect and load the metadata on the target page of the redirect. |
Strangely, the previous tweet still renders the card containing metadata values that were present at the time the Tweet was shared. As seen below we have two tweets that shared the same URL but at different points in time. The one below is our first tweet from when the page contained card metadata. The one above is from when the page contained only the HTML META redirect and no card metadata.
|
Two tweets, two minutes apart, sharing the same URL whose page card metadata changed in between. The older tweet still contains the card even though the current stated of the page has changed. |
Thus, Twitter does cache card metadata in old tweets, and will not follow META redirects. In Munzner's case Twitter is likely reusing cached page data. Shawn could not reproduce this caching issue because we do not have access to Twitter's cache, but Shawn was perplexed because that original test tweet lost its card when he checked back in an hour.
|
Where did my card go? |
As a further experiment, Shawn created a page that contained both an HTML META redirect and Twitter card metadata fields.
This page generates a card with Twitter's card validator. As we see below, the card generated by the card validator contains the values Shawn had specified in the metadata above. Even though the META redirect exists in the HTML, it appears that Twitter still ignores it.
|
Combining META redirect with Twitter card metadata produces a card with those values. |
But, when we tweeted this we got no card. We left the tooltip in place to demonstrate the full URL.
|
Did the card validator lie? Is this a cache of the site without any card metadata? |
A few minutes later, we tried to tweet it again and it appeared successfully. So, it's possible that Twitter had cached that URL as having no card metadata, and then, upon subsequent visits to this Tweet, decided to refresh its cache.
|
Now the card validator is telling the truth, so Twitter's cache has been updated? |
We revisited the other tweets containing that URL after a few minutes and they contained the correct cards.
|
Now all the tweets for this URL reflect the current card metadata. |
In addition, I examined the access log for the webserver hosting this page to see what requests Twitter makes of a web server when someone uses the Twitter Card Validator.
That is two requests from Twitterbot/1.0 to the web page itself and one request to download the striking image specified in the twitter:image metadata field. I do not know why Twitter has to download the page twice, but they did it each time Shawn tried the Twitter Card Validator.
Shawn also enabled
mod_dumpio to retrieve the actual request and response headers from these sessions. No interesting headers showed up in the requests from Twitter.
More interesting, however, was what happened when Shawn created a new tweet at this URL. Below we have a flurry of requests from various IP addresses, including those owned Twitter.
In this case, posting a tweet with a link encouraged not only Twitterbot to revisit the URL, but also bots from IP addresses owned by Google (34.82.221.161),
Paper.li (135.125.219.40 and 51.91.209.130), PT. Duta Empat Saudara (103.121.213.203),
Hetzner hosting (116.202.35.86 and 116.202.33.237),
Livelap (198.27.82.205). Also interesting to note is that the Google followed the link provided by Twitter's URL shortener (https://t.co/Aw7ICrh2lV) and issued a HEAD request while all others issued a GET request directly to the URL. These have nothing to do with Twitter reading metadata, but does indicate that other bots are listening to Twitter for shared URLs.
Twitter's revisitation of the URL when it was shared indicates that it was at least downloading the page again. This casts doubt that their normal procedure is to use cached metadata. If that is the case, why the delay when loading the cards? Why the old card for IEEE VIS 2021?
We have verified part 1 of our hypothesis: Twitter's card generator does not follow
META redirects. We also know that Twitter will cache card metadata failures as well as successes. Because of the inconsistency in their behavior, we cannot verify part 2. Twitter is clearly pulling stale metadata for IEEE VIS 2021 from somewhere, but we cannot determine the source of this stale metadata. According to
TwitCount, https://virtual.ieeevis.org was only shared 186 times, so it seems unlikely that it would be preferred by a cache algorithm that favors pages with a high number of shares.
We have also shown that once Twitter again updates its cache for https://virtual.ieeevis.org these cards will disappear from tweets and Twitter will not display the updated metadata, which will disappoint those promoting IEEE VIS 2021. We could not determine why the IEEE VIS 2020 page is still "stuck" inside Twitter's cache when cards for our own experimental page were updated in a matter of hours. If anyone has any suggestions as to what is causing this discrepancy, please contact us.
|
A screenshot of the same tweet, revisited on November 18, still displaying a card for IEEE VIS 2020. |
So, when sharing URLs on Twitter keep in mind that your META redirects will not be followed. You can, however, add Twitter metadata to the page containing the META redirect and Twitter will use that card metadata. Keep in mind though, that META redirects may be harmful for card generation
Comments
Post a Comment