2026-03-18: Reverse TweetedAt: Determining Tweet ID prefixes from Timestamps

Figure 1: Each tweet ID is a unique identifier that encodes the tweet creation timestamp, example adapted from Snowflake ID, Wikipedia.

Web archives, such as the Wayback Machine, are indexed by URL. For example, if we want to search for a tweet we must first know its URL. Figure 2 demonstrates that searching for a tweet URL results in a timemap of that tweet archived at different points in time. Clicking on a particular datetime will show the archived tweet at that particular point in time.

 

Figure 2: An archived tweet URL results in a timemap consisting of archived copies of the tweet.


Figure 3 shows a screenshot of a tweet shared by @_llebrun. The tweet in the screenshot was originally posted by @randyhillier who later deleted his tweet. The screenshot of the tweet does not have the tweet's URL on the image. Moreover, when a tweet is deleted, we will not be able to find the tweet URL on the live web, nor will we know how to  look it up in the archive.


Figure 3: @_llebrun tweeted a screenshot of a tweet originally posted by @randyhiller, who later deleted his tweet.


Therefore, we need to construct the URL of a tweet using only the information present in the screenshot. The structure of a tweet URL is: 


https://twitter.com/Twitter_Handle/status/Tweet_ID


We need the Twitter_Handle and Tweet_ID to construct a tweet URL. Each tweet ID is a unique identifier known as the Snowflake ID that encodes the tweet creation timestamp (Figure 1). We can extract the Twitter handle and timestamp from a tweet in the screenshot. In our previous tech report, we introduced methods for extracting Twitter handles and timestamps from Twitter screenshots. Next, we need to determine the tweet ID from the extracted timestamp. We could use only the Twitter handle and query the Wayback Machine, but that would be an exhaustive task to individually dereference all the archived tweets for a user. For example, the following curl command shows the total number of archived tweets required to dereference for @randyhiller's status URLs is huge (42,053). Hence, our goal is to limit the search space by utilizing the timestamp present on the screenshot.

curl -s "http://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status&matchType=prefix" | wc -l


   42053


Previously, one could query Twitter to find the timestamp of a tweet given a tweet ID. But, this service is no longer freely available.. The Twitter API has access rate limits and metadata from deleted/suspended/private tweets cannot be accessed using the API. Moreover, the Twitter API is currently monetized and no longer research-friendly. To address these issues, WS-DL members Mohammed Nauman Siddique and Sawood Alam developed the TweetedAt web service in 2019. The goal of this service is to extract the timestamps for Snowflake IDs and estimate timestamps for pre-Snowflake IDs. Therefore, TweetedAt has become a useful tool for finding timestamps from tweet IDs. However, we require a tweet ID prefix to be determined from a given timestamp.

Reverse TweetedAt


The Snowflake service generates a tweet ID which is a 64-bit unsigned integer composed of: 41 bits timestamp, 10 bits machine ID, 12 bits machine sequence number, and 1 unused sign bit. The timestamp occupies the upper 41 bits only.


TweetedAt determines the timestamp for a tweet ID by right-shifting the tweet ID by 22 bits and adding the Twitter epoch time of 1288834974657 (offset).


Python code to get UTC timestamp of a tweet ID

def get_tweet_timestamp(tid):


    offset = 1288834974657

    tstamp = (tid >> 22) + offset

    utcdttime = datetime.utcfromtimestamp(tstamp/1000)

    print(str(tid) + " : " + str(tstamp) + " => " + str(utcdttime))


For Reverse TweetedAt, given a datetime, we want to generate a tweet ID prefix by subtracting the offset and left-shifting by 22 bits. The process will not reconstruct the exact tweet ID because the lower 22 bits are all zeros. However, the process will give us a tweet ID prefix for a timestamp. For example, the tweet ID for @randyhillier’s tweet is ‘1495226962058649603’ and the timestamp is ‘9:41 PM Feb 19, 2022’ as shown in Figure 3. The tweet ID is a 19-digit ID and the timestamp is at minute-level granularity. The Reverse TweetedAt would compute a tweet ID prefix ‘149522’ of 6-digits for the 19-digit tweet ID ‘1495226962058649603’ based on the timestamp at minute-level granularity.


Python code to get tweet ID prefix from a Wayback timestamp

from datetime import datetime, timezone


TWITTER_EPOCH_MS = 1288834974657


def wayback_to_tweetid_prefix(timestamp: str):


    s = str(timestamp).strip()


    if len(s) == 14 and s.isdigit():

        granularity = "second"

        dt = datetime.strptime(s, "%Y%m%d%H%M%S").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000)

        end_ms = start_ms + 999


    elif len(s) == 12 and s.isdigit():

        granularity = "minute"

        dt = datetime.strptime(s, "%Y%m%d%H%M").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000) 


    elif len(s) == 10 and s.isdigit():

        granularity = "hour"

        dt = datetime.strptime(s, "%Y%m%d%H").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000) 

        end_ms = start_ms + 3_600_000 - 1


    elif len(s) == 8 and s.isdigit():

        granularity = "date"

        dt = datetime.strptime(s, "%Y%m%d").replace(tzinfo=timezone.utc)

        start_ms = int(dt.timestamp() * 1000) 

        end_ms = start_ms + 86_400_000 - 1


    else:

        raise ValueError(

            "Unsupported Wayback format. Use YYYYMMDD, YYYYMMDDHH, YYYYMMDDHHMM, or YYYYMMDDHHMMSS (UTC)."

        )


    start_delta = start_ms - TWITTER_EPOCH_MS

    end_delta = end_ms - TWITTER_EPOCH_MS

    min_id = start_delta << 22

    max_id = (end_delta << 22) | ((1 << 22) - 1)

    min_str = str(min_id)

    max_str = str(max_id)

    length = max(len(min_str), len(max_str))

    min_str = min_str.zfill(length)

    max_str = max_str.zfill(length)


    i = 0

    while i < length and min_str[i] == max_str[i]:

        i += 1


    prefix_str = min_str[:i] or "0"

    suffix_len = length - i

    prefix_val = int(prefix_str)

    ten_pow = 10 ** suffix_len

    approx_lower = prefix_val * ten_pow

    approx_upper = (prefix_val + 1) * ten_pow - 1


    return {

        "input_timestamp": timestamp,

        "tweet_id_prefix": prefix_str,

        "tweet_id_regex": f"{prefix_str}[0-9]{{{suffix_len}}}",

        "tweet_id_range": f"[{approx_lower} – {approx_upper}]",

    }


We integrated Reverse TweetedAt as a web service alongside TweetedAt. The service accepts a timestamp as user input and returns the corresponding tweet ID prefix, tweet ID regex, and full tweet ID range (Figure 4). It supports multiple valid timestamp formats (e.g., ISO 8601, RFC 1123, Wayback) and provides output at different levels of granularity. For example, Figure 4 shows output for millisecond-level granularity. Because millisecond-level precision is typically unavailable in tweet timestamps, the tool can interpret such inputs at second- or minute-level granularity. Rather than assuming zeros for unknown fields, the tool expands the input into the full corresponding time window (e.g., an entire second or minute), and computes the tweet ID prefix over that interval.

Figure 4: Reverse TweetedAt outputs tweet ID prefix at millisecond- level granularity.


Figure 5: Reverse TweetedAt outputs tweet ID prefix at second-level granularity.


Figure 6: Reverse TweetedAt outputs tweet ID prefix at minute-level granularity.


Tweet ID Regex-based Retrieval Across Temporal Granularity


We can use the tweet ID regex derived from a timestamp to search for archived tweets within a specific temporal window. By querying the Wayback Machine’s CDX API and filtering results using this prefix-based regex, we can identify tweet URLs whose IDs fall within the calculated range. As the timestamp becomes less precise, the tweet ID becomes shorter and the regex search space widens. 


For example, the tweet ID of @randyhillier’s tweet shown in Figure 3 is ‘1495226962058649603.’ Using TweetedAt, we can get the timestamp at millisecond-level granularity. Using Reverse TweetedAt, the millisecond-level granularity  returns a more precise prefix and results in 10 archived captures, while a slightly less precise prefix (second-level granularity) returns 15. When the precision is reduced further (minute-level granularity), the number of results remains 15. This indicates that all tweets within that broader time window were posted within the same narrower interval. This illustrates how lower temporal granularity expands the potential search space. However, a wider ID range does not necessarily produce more results; it only increases the number of possible candidate IDs.

Search space at millisecond-level granularity

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix" \

| grep -E 'status/14952269620[0-9]{8}' | wc -l


   10


Search space at second-level granularity

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix" \

| grep -E 'status/149522696[0-9]{10}' | wc -l


   15


Search space at minute-level granularity

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix" \

| grep -E 'status/149522[0-9]{13}' | wc -l


   15



CDX API Wildcard Search and Snowflake IDs to Limit the Search Space Using Tweet ID Prefix


We can now determine a tweet ID prefix from a screenshot timestamp using the Reverse TweetedAt service. Since a tweet can be archived any time between ±26 hours of the screenshot timestamp, we can determine tweet ID prefixes from the time window timestamps. We can use this time window to limit the search space by excluding the URLs tweeted before and after the alleged timestamp. Let us consider a tweet in the screenshot in Figure 2, where the screenshot timestamp is: 


9:41 PM Feb 19, 2022 (20220219214100)


We compute the tweet ID prefixes from left-hand boundary (-26) and right-hand boundary (+26) timestamps using the Reverse TweetedAt which are listed below:


-26 hours timestamp: 20220218194100 → tweet ID prefix: 14947588
+26 hours timestamp: 20220220234100 → tweet ID prefix: 149554404

As previously mentioned, the timestamp occupies the upper 41 bits only. We can use a common portion of tweet ID prefixes (149[4-5]) and do a CDX API wildcard search in the Wayback Machine to limit the search space. The search space reduces to 629 archived tweets, whereas using only the Twitter handle outputs 42,053 archived tweets. Now, dereferencing 629 archived tweets to search for a particular tweet text of a screenshot is a lot of work but feasible, whereas dereferencing 42,053 archived tweets is far too expensive. The following curl command shows the total number of archived tweets required to dereference for @randyhiller's status URLs with a common tweet ID prefix is comparatively less (629).

curl -s "https://web.archive.org/cdx/search/cdx?url=https://twitter.com/randyhillier/status/&matchType=prefix&from=20220218194100" \ | grep -E 'status/149[4-5]' | wc -l


   629


Summary


It is easy to search for a tweet in the Wayback Machine when you know the  URL. But a screenshot of a tweet typically does not have its URL present on the image. However, the Twitter handle and timestamp present in the tweet in the screenshot can be utilized to search for a tweet in the Wayback Machine web archive. Given a datetime, Reverse TweetedAt produces a tweet ID prefix, which we can then use to grep through a CDX API response of all tweets associated with a Twitter account. We can determine approximate tweet IDs from left-hand boundary and right-hand boundary timestamps from a screenshot timestamp using the Reverse TweetedAt tool. We found that we can limit the search space using a CDX API wild card search based on a common tweet ID prefix. Thus, the process for finding candidate archived tweets for the tweet in the screenshot is optimized. We published a paper at the 36th ACM Conference on Hypertext and Social Media, “Web Archives for Verifying Attribution in Twitter Screenshots,” which discusses how we can further use the candidate archived tweets to verify whether the tweet in the screenshot was posted by the alleged author.


Related Links:



—- Tarannum Zaki (@tarannum_zaki)


Comments