Sunday, April 17, 2016

2016-04-17: A Summary of "What Triggers Human Remembering of Events? A Large-Scale Analysis of Catalysts for Collective Memory in Wikipedia"

Authors Nattiya Kanhabua, Ngoc Tu Nguyen, and Claudia NiederĂ©e from L3S published the following study at JCDL 2014. In the process of reviewing possible topics for my PhD research,  I share my analysis of their findings. The full citation and presentation for the paper is below.

Kanhabua, N., Nguyen, T. N., & Niederee, C. (2014, September). What triggers human remembering of events?: a large-scale analysis of catalysts for collective memory in Wikipedia. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 341-350). IEEE Press.

The focus of the article centers around identifying patterns that trigger recollection of events in collective memory. Since the number of categorical events is limitless, the authors focus on natural and man-made disasters, accidents, and terrorism. Their analysis confirms that two of the most notable characteristics across all events are time and location. While in conjunction they are not consistent metrics in identifying triggers for recollection of events, their independent state is.  In addition, the study also confirms that semantics found in different types of events, like level of impact and damage cost, further help trigger remembrance of specific memories.

For their analysis, authors use the English Wikipedia as the collective memory location, which is built by an online community. It is important to note that this memory is dynamic in nature, changes over time, and is constructed by the agreed upon social influence. Essentially, the goal here is to extract patterns and characteristics of a particular memory, and use them in identifying how they can be triggered in recall. Note, aside from characteristic analysis, we can identify the most popular memories by category, community division over topics, or even observe the edit wars that are centered around controversial topics.

To get a better understanding of the underlying collection, the authors parse view logs of different events documented on Wikipedia. This allows them to visually interpret and categorize them. Figure 1 below shows how such a log can be used alongside a temporal attribute.

(Peaks signify an increase in resource views, .)

By observing the chart above, we can conclude that within some timespan, peaks are created as resource views dramatically increase. Thus, they become the driving factor behind correlating documents to temporal and categorical events. Take for example a document explaining a hurricane event in 2015 being viewed dramatically in 2016.

By itself, a peak is not a complete solution in identifying memory recollection, as there is nothing to compare it to. The proposed solution here is a remembrance score, which analyzes how likely peaks are memory catalysts of past events. In other words, it's a comparison between multiple peaks to see if relationships exists. This score is divided into three parts: Cross-correlation coefficient (CCF), Sum of squared errors (SSE), and Kurtosis. These parts are all centered around time and peak analysis, and compare how likely is it for us to remember one event by experiencing another one. For this, CCF is used as a means of understanding the similarity between two time series in a volume. It's a simple representation of how different events relate during particular time frames. SSE further pushes CCF by measuring the accuracy of how unplanned a particular time is within a time frame, and promotes surprise detection. This helps us understand if one peak potentially triggered the other. Lastly, Kurtosis is applied to the remembering score to accommodate for the skewness of the peaks. This considers the underlying distribution over time, and answers the question, is the peak a constant phenomenon or a heavily influenced variable of change?

(Table 1 shows the test data of events used from Wikipedia. Do note, italicized events are excluded from the experiment, as there were too few results for significant evaluation.)

While this score is a good approach in understanding triggers for all events, the authors propose an analysis of common features to identify relationship development between similar events. This includes temporal similarity, or the time when the events occurred, and location similarity, where they occurred. Lastly, they also observe the impact of an event and how likely this event is to remain a continuous memory. Examples of impacts include: cost incurred due to event occurrence, affected regions, fatalities, etc.

In Figure 5, location is a key observable similarity between hurricane events, whereas time is much more inconsistent.

In Figure 10, time and location both play a significant role in identifying terrorist events. The conjunction of these attributes is much more evident here as opposed to hurricane events shown in Figure 5.

In Figure 11, high impact events comprise between 25% and 50% of the top 10 triggered events. The percentage expands to 75% when considering the top 20.

By observing the charts above, we can conclude several things from the proposed study. First, location and time are key contributors when identifying which events cause remembrance of others. In addition, they are sporadic in influence over the different types of events. Next, according to the results retrieved, contextual information also plays a very large role in determining relationships. The impact of events and semantic similarity can significantly boost or demolish the triggered recollection of collective memories we have stored. Lastly, the computed remembrance scores are a good step towards identifying which peaks relate. While they can be tuned to score better for particular events, they also must remain generic enough for limitless use.

It is clear that the explored study here has a great motive, and even more interesting findings. However, attached are two key limitations. First, the authors analyze human remembering of events against the English Wikipedia. While this could be helpful for a language specific study, it could have a very large cultural bias as compared to versions on other languages. In addition, it might sway focus and emphasize events that are more centered towards regions relating to an English-based context. The other limitation is that the authors are simply assuming an occurrence of one event triggers a recall from collective memory. While this can apply for many cases, this assumption does not consider the fact that new events could trigger research of the prior, as opposed to remembrance.

Applying in your research:

  • Significant insight in a forecasted and understood user recollection promotes targeted event triggering. When users are searching for particular events, we could recommend other events that they might be interested in within particular bounds of similarity. 
  • In contrary to exploring new data, we could also help the user recall what they have forgotten from the past. 
Slobodan Milanko

No comments:

Post a Comment