2013-07-15: Temporal Intention Relevancy Model (TIRM) Data Set
provided a link to the streamed news. A couple of days later when I read this tweet and clicked on the link and instead of seeing anything related to the press conference, Haiti, or President Obama, I got a stream feed of the Mercedes-Benz Super Dome in New Orleans in preparation for the 2013 Super Bowl. It is worth mentioning that at the time of writing this blog the tweet above was actually deleted, proving that social posts don't persist throughout time as we discussed in our earlier post.
This scenario illustrates the problem we are trying to detect, model, and solve. The inconsistency between what is intended at the time of sharing and what the reader sees at the time of clicking the link in the tweet.
It is evident that resources change, relocate, or even disappear. In some cases it is tolerable but in other times when it is related to sharing significantly important content (e.g., related to a revolution, protest, corruption claims, and others).
It is evident that resources change, relocate, or even disappear. In some cases it is tolerable but in other times when it is related to sharing significantly important content (e.g., related to a revolution, protest, corruption claims, and others).
From these observations we decided to perform experiments to detect and model this "user intention" of the author at the time of tweeting and measure how accurately it is perceived by the reader at any point in time. In our JCDL 2013 paper, we deduced that the problem of intention is not straightforward and in order to correctly model it a mapping should be performed to transform the intention problem to a relevancy and change problem.
Amazon's Mechanical Turk is utilized initially in a direct manner to collect data from workers about intention, unfortunately this approach produced very low accuracy in inter-rater agreement.
After a closer look at the most popular tasks on Mechanical Turk, we found out that categorization and classification problems are the most prominent. The questions that are asked to the workers are simpler and require far less explanation.
We introduce the Temporal Intention Relevancy Model or TIRM to illustrate the mapping between intention and relevancy. Let's consider the following tweet from Pfizer. The tweet has a link which leads to the newsletter that is updated with the latest announcements of the company.
At any point in time this page is still relevant to tweet, thus we can deduce that the intention behind posting this tweet is to check whatever the current state of the page is. In other words, if the page changed from its initial state at the time of tweeting and it is still relevant we can assume the intention is: current state.
After a closer look at the most popular tasks on Mechanical Turk, we found out that categorization and classification problems are the most prominent. The questions that are asked to the workers are simpler and require far less explanation.
We introduce the Temporal Intention Relevancy Model or TIRM to illustrate the mapping between intention and relevancy. Let's consider the following tweet from Pfizer. The tweet has a link which leads to the newsletter that is updated with the latest announcements of the company.
Check out Pfizer's latest news here http://t.co/4sWLMtHb
— Pfizer Inc. (@pfizer_news) May 9, 2012
Similarly, we notice a different pattern upon inspecting a tweet posted on the day Michael Jackson died and linking to CNN.com. The front page of CNN.com has definitely changed since the time of the tweet and the content is no longer relevant to the tweet.
In a large number of social posts the resource remains unchanged and still relevant to the post. In this case we assume that this is state of the resource at the point in time when the author published this post, but also since it is unchanged a current version will do as well.
Michael Jackson had died due to cardiac arrest, just saw it on CNN.com.... Farrah Fawcett died earlier today... http://cnn.comThus, the author's intention was for the reader to see the state of the page at the time he tweeted about it. In conclusion, if the page changed and is no longer relevant to the tweet we can assume that the author's intention is: past state of the resource. So, we dig it up from the web archives.
— Jeff Homan (@mdnitehk) June 25, 2009
In a large number of social posts the resource remains unchanged and still relevant to the post. In this case we assume that this is state of the resource at the point in time when the author published this post, but also since it is unchanged a current version will do as well.
Finally! NewScientist magazine confirms: Humans prefer cockiness to expertise http://bit.ly/bwPCX (RT @iA_Cyrill) Damn, I KNEW it.Finally, when the resource is changed and has never been related to the post. Then in this case we do not have enough information to decide which user intention the author wanted to convey. This scenario happens often in spam posts.
— Chris Lüscher (@iA_Chris) July 30, 2009
Find out who stalks your twitter! http://t.co/0GINxHCg
— Aaron Irizarry (@aaroni268) April 11, 2012
We use Mechanical Turk to collect the training data for our model along with multiple features related to the social post, such as its nature, archivability, social presence, and resource’s content.
For further details, please refer to the paper:
Hany M. SalahEldeen, Michael L. Nelson. Reading the Correct History? Modeling Temporal Intention in Resource Sharing. Proceedings of the Joint Conference on Digital Libraries JCDL 2013, Indianapolis, Indiana. 2013, also available as a technical report http://arxiv.org/abs/1307.4063
- Hany SalahEldeen
The resulting dataset was utilized in extracting 39 different textual and semantic features that was used to train a classifier to implement the TIRM. We argue that this gold standard dataset will pave the way for future temporal intention based studies. Currently, we are extending the experiments and refining the utilized features.
For further details, please refer to the paper:
Hany M. SalahEldeen, Michael L. Nelson. Reading the Correct History? Modeling Temporal Intention in Resource Sharing. Proceedings of the Joint Conference on Digital Libraries JCDL 2013, Indianapolis, Indiana. 2013, also available as a technical report http://arxiv.org/abs/1307.4063
- Hany SalahEldeen
Comments
Post a Comment