Posts

Showing posts with the label HTTP redirection

2024-10-22: Analyzing Redirects and Getting Rickrolled Along the Way

Image
Redirects are often seen as the invisible roads of the web, silently sending users from one URL to another. While they typically serve the practical purpose of keeping web traffic flowing smoothly—replacing outdated links and guiding users to relevant resources — sometimes they lead to unexpected destinations. We have been researching the lifespan of web pages as part of our "Not Your Parents' Web" project in collaboration with the Internet Archive and Filecoin Foundation . As part of this work, we focused on redirecting URLs. During our analysis of the primary destinations that URLs lead to (referred to as sinks), one particularly notable pattern emerged, revealing how meme culture and internet pranks influence the web. As I examined a dataset of redirecting URLs, I uncovered a notable pattern involving one of the internet’s most famous pranks: Rickrolling . Rickrolling on web Rickrolling, a cultural internet phenomenon, involves sharing misleading links that direct...

2024-01-16: Paper summary: Archival HTTP Redirection Retrieval Policies

Image
Figure 1: URI-R and URI-M HTTP redirection relationship cases. Figure 2 from AlSum et al. Redirection of web pages refers to the process of forwarding a user from one URI (Uniform Resource Identifier) to another. This can happen for various reasons, and it is a common practice on the web. Redirection is implemented using HTTP status codes, particularly those in the 3xx range . For example, status code 301 is a permanent redirect, and status code 302 is a temporary redirect. In these scenarios the clients (web browsers) automatically direct users to the location as stated in the location response header. In this blog post I summarize the paper titled " Archival HTTP redirection retrieval policies " by AlSum et al. . In web archives, mementos (archived web pages) can have archived redirects (i.e., URI-M (URI of the memento) with a 3xx HTTP status code) where the URI-R (URI of the original resource) returns a redirection status code at crawl time.  Figure 2 shows the calend...