2021-02-20: Creation Time and Published Time Are Not the Same: Estimating the Instagram Epoch


Figure 01: A capture of an Instagram post highlighting the “shortcode” for a post and its published time. 

During the process of examining how the published datetime of a post can be extracted with the use of only the URL of an Instagram (IG) post, we have uncovered a discrepancy between the published time present in the HTML and the time that can be extracted from the shortcode (hereinafter creation time)  of an IG post. During the early stages of this study, we assumed both these values to be the same, similar to Twitter. The time extracted from a Twitter ID is the same as what’s displayed in the JSON object (and HTML) for the tweet. 


Instagram is understudied in academic research based on its popularity, as compared to Twitter. By 2021, Twitter has 330 million monthly active users and Instagram has 1 billion active users. As of Feb 11, 2021, Google Scholar returns 7.52 million hits for “twitter” and 1.54 million hits for “instagram”. This results in an “active-users/Google Scholar hits ratio” of 43.88 for Twitter and 649.35 for Instagram, almost 15 times larger. 


We began this study while exploring the possibilities of bridging this gap between the two platforms in academic research. We looked at different methods to extract the published time associated with a post including the creation time embedded in their media ID and then obtained an estimate for the epoch value used by Instagram while creating their media IDs. We learned that there is a delta between the published time and creation time, which could have implications for applications that depend on sorting events by IG publishing time. 

Published time in the HTML

We can obtain the published time of an Instagram post at two different places in the HTML. The date can be found at the bottom of the post in the "human-readable" form and the same date will be shown if you hover over it. But if you inspect this element you will be able to see the datetime of the post in ISO 8601 format (Figure 02). The published time of a post can be also found in JSON located in one of the <script> tags in the HTML body (Figure 03). The time here is displayed as a Unix timestamp. Converting this Unix timestamp obtained from JSON (Figure 02) back to ISO 8601 will give us the same time as displayed at the bottom of the post (Figure 03). 


Figure 02: The published datetime (in ISO 8601) of the post found in the HTML (2020-10-08T21:12:34Z) explored via the inspect element option in the Google Chrome browser. 


"graphql": {
    "shortcode_media": {
      "__typename": "GraphImage",
      "id": "2415680307434230462",
      "shortcode": "CGGOHDXhhK-",
      "dimensions": {
        "height": 1350,
        "width": 1080
      },
…
      "comments_disabled": false,
      "commenting_disabled_for_viewer": false,
      "taken_at_timestamp": 1602191554,
      "edge_media_preview_like": {
        "count": 8,
        "edges": [
          {
            "node": {
              "id": "1919166011",
              "is_verified": false,
"profile_pic_url": "https://instagram.forf1-4.fna.fbcdn.net/v/t51.2885-19/s150x150/102645018_541091879918535_2546629646016782682_n.jpg?_nc_ht=instagram.forf1-4.fna.fbcdn.net&_nc_ohc=PhA3b4UVuJsAX8qT-8x&tp=1&oh=05bd5c8c6357f3d1269a2d4ce7787550&oe=604F7B66",
              "username": "kritika.garg_"
            }
          }
        ]
      }
      

Figure 03: A snippet from the JSON file which contains the published datetime (in Unix) of the same post that is shown in Figure 02.


In addition to the above two mediums, the datetime is encoded in the shortcode of any IG post. We’ll look into that in detail in the following section.

Creation time from media object's shortcode

Let us now look at how we can obtain the creation time of the shortcode using just the URL of the post. The shortcode for a media object can be found in its URL, which follows the format “https://www.instagram.com/p/{shortcode}”. Here, the shortcode is the base64 encoding of the base 10 media ID assigned to a particular media object. Detailed information on how the media ID is built is available in the blog post “Sharding & IDs at Instagram” published in Instagram Engineering. According to the information provided in the aforementioned blog post, each of their IDs consists of 64 bits in total where the first 41 bits give us the time in milliseconds (Figure 04) since their internal epoch (hereinafter IG epoch). 


For example, Figure 04 illustrates this conversion in the order of steps mentioned below for the post having the shortcode “CGGOHDXhhK-.


1. Media ID and shortcode of the post.

2. Obtaining the 64-bit binary equivalent of the media ID.

3. Selecting the first 41-bits.

4. Converting the 41-bits to its decimal equivalent (gives the number of milliseconds since IG epoch). 


1. 2415680307434230462 (CGGOHDXhhK-)
2. 0010000110000110001110000111000011010111100001100001001010111110
3. 0010000110000110001110000111000011010111100001100001001010111110
4. 287971533231 

Figure 04: Creation time of a post in milliseconds since IG epoch. Please note that as the media ID is said to be a 64-bit integer, we should first convert it into binary and add extra padding bits to make it a 64-bit integer before taking the first 41 bits which correspond to the ID creation time.


Given the IG epoch, we would be able to calculate the creation time of the post. Although their internal epoch is not disclosed by IG, we can estimate its value.

Estimating the epoch value used by Instagram

Let’s take a look at how we can estimate the IG epoch value used by IG. As discussed in the previous sections, we are well-informed about the below two values: 


  • Published time: The value extracted from the HTML (taken_at_timestamp), which is the number of seconds from the UNIX epoch (say, Tp).

  • Creation time: The value extracted from the media ID, which is the number of milliseconds from the  IG epoch (say, Tc).


By using the above two values, we can easily obtain an estimate for the IG epoch. Be mindful of the unit conversions here during calculations. 


IG epoch = Tp - (Tc/1000) seconds

Dataset

We have created a dataset for IG epoch estimates calculated using 1000 shortcodes. Without loss of generality, we sample 1000 shortcodes from a previous study about Katy Perry's IG account. The code used for estimating the epoch is available in GitHub. The dataset contains posts having a single media item (either one image or one video) as well as ones containing multiple media items. Multiple media posts, which are known within the community as “carousel posts” can have multiple images and/or videos. The construction of carousel posts will be discussed further in the next section.


Looking at the estimates obtained for the IG epoch in our dataset, we can see that 76.4% (764 out of 1000) of the values point to either 2011-08-24T21:07:00Z or 2011-08-24T21:07:01Z, where the two values only differ by one second. However, the epoch calculated with single video posts was off by a greater margin, as much as 60 min 32 sec. 


Below is a summary of the estimated epoch values we obtained with the use of our dataset:


2011-08-24T21:07:00Z = 21.5% (215 out of 1000)

2011-08-24T21:07:01Z = 54.9% (549 out of 1000)

Every other estimate  = 23.6% (236 out of 1000)


We picked the earliest datetime estimate (2011-08-24T21:07:00Z) as the epoch value. However, over half of the estimated values we obtained were one second later (2011-08-24T21:07:01Z). The other 23.6% of posts, which point to a range of different epoch estimates with a greater margin (as much as 60 min 32 sec), happen to be single video posts. (Not all single video posts belong to that 23.6%, meaning there are single video posts with zero or one second delta value.) This aroused our curiosity and made us look deeper into this. 

The difference between the creation time and published time

At first, we thought that the reason for this difference between the creation time and published time could be dependent on the length of the video. To test this hypothesis, we calculated the difference between our selected IG epoch value of 2011-08-24T21:07:00Z and the datetime values in the “epoch_estimate (utc)” column to create the “delta” column in our dataset. We then filtered the single video posts and plotted the duration of the video against the delta values as shown in Figure 05(a). We also computed the Kendall rank correlation coefficient using Kendall’s Tau-b value to check for any correlation. 


  • p-value = 0.4205415 

  • Kendall Tau-b = 0.234224


Figure 05(a): Relationship between off_by value (delta) vs vid_length (duration of the video) for all the single video posts. p-value = 0.4205415, Kendall Tau-b = 0.234224


Additionally, we excluded the videos with video duration greater than 60 seconds (IGTV videos) and plotted the duration of the remaining videos against their delta values as shown in Figure 05(b). We have calculated the Kendall tau-b value for these posts as well. 


  • p-value = 0.1785421

  • Kendall Tau-b = 0.1937192


Figure 05(b): Relationship between off_by value (delta) vs vid_length (duration of the video) for single video posts with video duration <= 60. Please note that the data used in Figure 05(a) is a subset of Figure 05(b). p-value = 0.1785421, Kendall Tau-b = 0.1937192


Although both Kendall coefficient values show us that there is a weak positive correlation between the video duration and delta value, they both have higher p-values. It appeared that there is no significant relationship between the video duration and delta value. 


Our next assumption was that this delta might be affected by the complexity of the video. To verify this, I have posted two videos of different complexities (same video length, but different file size) and used its shortcode and published time to estimate the IG epoch. However, the two estimates for epoch were similar to each other regardless of the file size.


Our next hypothesis was that this delta is affected by the difference between the time users hit publish on a post and the time it finishes posting to get published to the web (processing time). What we considered as the processing time is the summation of upload time, the time it takes for the video conversion, and any other additional time Instagram takes until it gets posted on the web. The process of posting is shown in Figure 06.


Figure 06: The process of posting a media item to Instagram from the time the user hits post until the time it gets published.


We then checked if the processing time for the video is what’s causing this difference. To test this hypothesis, we posted the same video (video length = 11 sec) ten times by setting two different upload speeds (Figure 07). The first five videos (1-5) were uploaded with an upload network speed of 2.18 Mbps and the next 5 videos were uploaded with an upload speed of 0.02 Mbps, which has two orders of magnitude difference from the first five uploads. The same video is uploaded to keep constant other aspects like video duration, video complexity, etc. We also manually timed how long it took for the video to complete the processing in both scenarios, and they were approximately 35 sec with an upload speed of 2.18 Mbps and 11 min with an upload speed of 0.02 Mbps. 


Figure 07: The two different network upload speeds used.


Table 1 shows the outcome of the above test. The delta associated with the first five shortcodes has a smaller value (ranging from 23 sec - 49 sec) whereas the delta associated with the final five shortcodes has a comparatively higher value (ranging from 10 min 30 sec - 11 min 48 sec). Also, these two different categories of the delta are corresponding to the approximate upload time mentioned in the initial step. This supports our hypothesis that the delta is affected by the time between the user hitting "publish" and the video being uploaded and published into the user feed.    



shortcode

media_id

mediaID_sec

creation_time unix

published_time unix

off_by (delta)

upload_time

(approx.)

1

CKnY0ztBNtT

2497073700445410000

297674381.8

1611894402

1611894451

49 s

35 s

2

CKnY7sEB1LS

2497074173277850000

297674438.2

1611894458

1611894484

25 s

35 s

3

CKnY_sPhXLA

2497074448348570000

297674470.9

1611894491

1611894515

24 s

35 s

4

CKnZFBkhhPd

2497074814846900000

297674514.6

1611894535

1611894558

23 s

35 s

5

CKnZIw8hUU5

2497075071873800000

297674545.3

1611894565

1611894590

24 s

35 s

6

CKncCvYBtWk

2497087852010460000

297676068.8

1611896089

1611896756

11 min 7 s

11 min

7

CKndYT6h__k

2497093732399580000

297676769.8

1611896790

1611897485

11 min 35 s

11 min

8

CKne00dBGPo

2497100089529750000

297677527.6

1611897548

1611898178

10 min 30 s

11 min

9

CKngH3pBImS

2497105796668880000

297678208

1611898228

1611898874

10 min 46 s

11 min

10

CKnha74hyoG

2497111504940640000

297678888.4

1611898908

1611899616

11 min 48 s

11 min

Table 01: Results of the test conducted to see the effect of upload time/upload speed on the difference between the creation time and published time on IG.


We also thought the delta would be much higher for a post with multiple videos as it might take an even longer time to process/upload as compared to a single video post. However, that’s not the case, based on the structure of how a multiple media post is built.

Carousel post: Multiple images or videos

IG allows users to post up to 10 images and/or videos in a single post. If you look back at the dataset, you can see that the delta value is either 1 or 2 sec for all carousel posts. This means that there is either no difference or only a single second difference between the creation time and published time even if there are multiple videos and/or images involved. 


Let us consider a multiple media post with an image and a video. Figure 08 shows a JSON snippet from the HTML body that explains how the structure of a carousel post is built. 


"graphql": {
  "shortcode_media": {
    "__typename": "GraphSidecar",
    "id": "2487219965527166501",
    "shortcode": "CKEYWF7gY4l",
    "dimensions": {
      "height": 1080,
      "width": 1080
    },
    
    ...


    "edge_sidecar_to_children": {
      "edges": [
        {
          "node": {
            "__typename": "GraphImage",
            "id": "2487219962515789569",
            "shortcode": "CKEYWDIA5cB",
            "dimensions": {
              "height": 1080,
              "width": 1080
            },
            
        ...
        
    
        {
          "node": {
            "__typename": "GraphVideo",
            "id": "2487219810312905327",
            "shortcode": "CKEYT1YA-Jv",
            "dimensions": {
              "height": 750,
              "width": 750
            },
            

Figure 08: A snippet from the JSON file of a post with an image and a video displaying how the structure of a multiple media post is built.


There is a shortcode for the post ("CKEYWF7gY4l") and there are also shortcodes for child 1 ("CKEYWDIA5cB") and child 2 ("CKEYT1YA-Jv") under the “edge_sidecar_to_children” key. Any request made directly to the URLs constructed with a child shortcode will redirect to the main post URL (Figures 09 and 10).


$ curl -A "googlebot" -ILs https://www.instagram.com/p/CKEYWDIA5cB/
HTTP/2 302 
content-type: text/html; charset=utf-8
location: https://www.instagram.com/p/CKEYWF7gY4l
vary: Accept-Language, Cookie
content-language: en
date: Sun, 07 Feb 2021 03:07:58 GMT

HTTP/2 301
content-type: text/html; charset=utf-8
location: https://www.instagram.com/p/CKEYWF7gY4l/
vary: Accept-Language, Cookie
date: Sun, 07 Feb 2021 03:07:58 GMT

HTTP/2 200 
content-type: text/html; charset=utf-8
vary: Cookie, Accept-Language, Accept-Encoding
content-language: en
date: Sun, 07 Feb 2021 03:07:58 GMT

Figure 09: A direct request to the URL constructed using the shortcode for child post 1



$ curl -A "googlebot" -ILs https://www.instagram.com/p/CKEYT1YA-Jv/
HTTP/2 302 
content-type: text/html; charset=utf-8
location: https://www.instagram.com/p/CKEYWF7gY4l
vary: Accept-Language, Cookie
content-language: en
date: Sun, 07 Feb 2021 03:10:03 GMT

HTTP/2 301 
content-type: text/html; charset=utf-8
location: https://www.instagram.com/p/CKEYWF7gY4l/
vary: Accept-Language, Cookie
date: Sun, 07 Feb 2021 03:10:03 GMT

HTTP/2 200 
content-type: text/html; charset=utf-8
vary: Cookie, Accept-Language, Accept-Encoding
content-language: en
date: Sun, 07 Feb 2021 03:10:03 GMT

Figure 10: A direct request to the URL constructed using the shortcode for child post 2


Table 02 shows the creation time of each shortcode, the published time on the HTML, and the delta between the two. It’s clear how the creation time of the main post shortcode and that of child 1’s shortcode aligns with the published time on the HTML, with a delta of only one second, whereas the creation time of child 2’s shortcode is 19 sec ahead of the published time in the HTML. This means that the media ID/creation time of child 2 is earlier than child 1 and the main post. We assume that IG creates the main post media ID at the very end of the posting process.                                             

                                             

shortcode

media_id

mediaID_sec

creation_time unix

publish_time unix

delta (sec)

Main

post

CKEYWF7gY4l

2487219965527166501

296499725

1610719745

1610719746

1

child 1

(image)

CKEYWDIA5cB

2487219962515789569

296499724.7

1610719745

1610719746

1

child 2

(video)

CKEYT1YA-Jv

2487219810312905327

296499706.5

1610719727

1610719746

19

Table 02: The creation time and published time of each item (main post, child 1, and child 2) in a multiple media post.

Comparison with Twitter

Twitter uses Snowflake as its internal service to generate Twitter IDs. They shifted to this method of ID generation since they needed to generate lots of 64-bit IDs per second which will still be roughly sortable. Also, this new method allows for distributed ID creation whereas the prior MySQL-based technique was centralized and not scalable. Performing a similar test by limiting the upload speed and posting the same video twice to the Twitter feed, we were able to verify that the time extracted from a Twitter ID is the same as what’s displayed in the JSON object for the tweet. Note that the creation time of the ID is obtained from TweetedAt, a service and library built by Mohammed Nauman Siddique and Sawood Alam that makes it easy to extract the datetime from Twitter IDs. Table 03 shows the outcome of the test.   



twitter_id

creation_time_utc (from TweetedAt)

publish_time_utc  (from the tweet JSON)

off_by

 (delta)

1

1357536414276210691

2021-02-05T03:47:57Z

2021-02-05T03:47:57Z

0

2

1357539154876334080

2021-02-05T03:58:51Z

2021-02-05T03:58:51Z

0

Table 03: Results of the test conducted to see the effect of upload time/upload speed on the difference between the creation time and published time on Twitter.


As shown in Table 03, the delta is zero, meaning that there is no difference between the creation time and published time regardless of the limitations to the upload speed to prolong the processing time of the post.

Reasons why this delta could be important

Understandably, these are two different events (id creation and post publishing), and Twitter and Instagram decided to handle this differently. Twitter defines them to be the same, whereas at Instagram they are the same only for posts with the shortest processing times, which occurred 76.4% of the time (zero or one second delta) in our sample. However, we can think of a few instances where this difference between the two timings (delta) can be of importance. One could use this delta value to further study network connection related factors. As users use the IG mobile app to post pictures, we can use this delta value to understand how someone’s mobile-network connection speeds have changed over time. This could also give away location information based on mobile-network connection speed ranges vs home WiFi speed ranges. The results may not be precise but at least we could identify user clusters based on connection speeds. 


Another use case of this delta could be at instances where priority claiming occurs. For example, a game organized via IG where one should complete a certain challenge and post a video of it, and the winners will be chosen according to the time of the post. In such a scenario it is only fair to use the creation time instead of published time to pick the winners. The contestants should not be penalized for having bad connection upload speeds. 

Conclusion

In this study, we have shown how we can extract the published time of a post from the HTML of any Instagram post. We have also shown that it is possible to obtain the creation time of the ID from the media ID/shortcode of a post if the epoch value used by IG is known. Although Instagram has never disclosed this value, we have reasons to believe that the epoch value used by IG is 2011-08-24T21:07:00Z. We reached this conclusion after careful analysis of estimates obtained for the IG epoch by using the published time and creation time extracted from a collection of 1000 IG post shortcodes. 


Even though one would expect the two values, published time and creation time, to be the same, we have discovered that these two values differ from each other. The difference between them is found to be the time it takes for a post to get completely processed and published on the web once the user hits publish. There can be several factors that will affect the processing time, but it seems to be dominated by the upload speed. Even though the impact of this delta value is minute, it can still affect scenarios where the chronological ordering of posts to find out on which post comes first, especially if fine grain granularity is of importance. 

Acknowledgments 

The advice given by Dr. Michael Nelson and Dr. Michele Weigle has been a great help in compiling this blog post. I am grateful for their guidance and support, but the responsibility for any errors remains my own.

Himarsha Jayanetti (@HimarshaJ)

Comments