2018-10-10: Americans More Open Than Asians to Sharing Personal Information on Twitter: A Paper Review

Mat Kelly reviews "A Personal Privacy Preserving Framework..." by Song et al. at SIGIR 2018.                                                                                                                                                                                                                                                                                                                                                                            ⓖⓞⓖⓐⓣⓞⓡⓢ


Americans are more open to share personal aspects on the Web than Asians. — Song et al. 2018

I recently read a paper published at SIGIR 2018 by Song et al. titled "A Personal Privacy Preserving Framework: I Let You Know Who Can See What" (PDF). The title alone captivated my interest with the above claim deep within the text.

The authors' goal of the work was to reduce users' privacy risks on social networks by determining who could see what sort of information they posted. They did so by establishing boundary regulations through summarizing the literature and associate them with 32 categories corresponding to a personal aspect of a user, broken down into 8 groups spanning the categories of personal attributes to life milestones. The authors then fed a list of keywords to the Twitter Search Service for each category they established. From this taxonomy they created a model to be used to uncover personal aspects from users' posts. Their model, TOKEN (a forced acronym of laTent grOup multi-tasK lEarniNg), allowed the authors to create guidelines for information disclosure by users into four kinds of social circles and generate a data set consisting of a rich set of privacy-oriented features (available here).

The authors noted that users' private tweets are very sparse and thus they used the Twitter service to gather posts that met the categories in their taxonomy to collect just over 269k tweets. To reduce the noise in the collection, the authors filtered tweets that contain URLs that were not in reference to the users' respective other social media posts. Retweets and tweets less than 50 characters were excluded. The authors did not justify this exclusion.

To establish a ground truth, the authors used Amazon Mechanical Turk to have each post annotated with their selective categories. Turkers that did not validate at least 80% with the authors sampling were excluded from the results. This procedure resulted in just over 11k posts being labeled. To determine inter-worker reliability, the authors employed Fleiss' kappa (PDF of 1969 paper), adapting for the potential variance in label count/post by reducing to a binary classification, to determine moderate agreement (Fleiss' coefficient of 0.43).

The authors then extracted a set of privacy-oriented linguistic features using Linguistic Inquiry Word Count (LIWC), a Privacy Dictionary (per Vasalou et al.'s 2011 JASIST work), Sentiment Analysis (via Stanford's NLP classifier), Sentence2Vector (with each tweet a sentence), and an ad hoc meta-feature approach. The aforementioned final approach considered the presence of hashtags, slang words, images, emojis, and user mentions. Slang, here, was identified using the Internet Slang Dictionary.

Following this analysis, the authors established a prediction component by first formulating a predictive model inter-relating each of the 32 "tasks" within the 8 "groups". The authors anticipated that tasks within the same group would share relevant features, e.g., "places planning to go" and "current location", would share common features within the location group in their taxonomy. From this initial formulation they established the matrix L, whose columns represent the latent features, and S, whose rows represent the weights of the features in L.

To solve L and S, the authors optimized one variable while fixing the other in each iteration of analysis. To determine L, they took the derivative of their objective function (their Equation 5, see paper) with respect to L to produce a linear system with a vector B representing the stacking of columns into a single matrix and A, a definite and invertible matrix. Computing S with L fixed was a bit more mathematically complex that I will leave as an exercise in understanding to the interested reader.

Prescription

...there is still a societal consensus that certain information is more private than the others (sic) from a general societal view.A. Islam et al. 2014

The authors used Mechanical Turk to build guidelines regarding disclosure norms in different circles. This was performed on two selections of Turkers limited by respective geographies of the U.S. and Asia. The authors note that 99% of the Asian participants were Indians. An anticipated real world goal of the authors was, when posting a tweet containing information on a health condition (for example), to set the privacy setting to only share this with her family members. This, I felt, would be an odd recommendation given:

  1. The corpus was of publicly available tweets.
  2. Twitter does not currently have a means of limiting who may see a tweet akin to services like Facebook.

This drastically reduces the usefulness of the recommendation, I feel, in the context of the medium observed.

Verification

The authors sought to detect privacy leakage by comparing the precision of TOKEN as compared to the S@K and P@K metrics, as they had previously done in Song et al. 2015 from (IJCAI). Here, S@K is representative of the mean probability that a correct interest is captured within the top K recommended categories and P@K standing for the proportion of the top K recommendations being correct. They used a grid search strategy to obtain the optimal parameters with 10-fold cross-validation.

Using S@K and P@K where K was set to 1, 3, and 5, the authors found LIWC to be most representative of the characterization of users' privacy features as compared to the aforementioned Privacy Dictionary, Sentence2Vector, etc. approaches. They attributed this to LIWC's inclusion of pronouns and verb tense that provide references and temporal hints.

In applying these feature configurations to their corpus, the authors noticed that timestamps played an important role in identifying private information leakage, so took a detour to cursorily explore this. Based on the patterns found (pictured below), various activities peak at certain times of day, e.g., drug and alcohol tweets around "20pm" (sic). It is unclear from the paper whether this was applied to both the U.S. and the Asian results. Further, the multiple plot display with variable inter-plot y-axis scales produces a deceptive result that the authors do not address.

Plots per Song et al. show temporal patterns.

To finally validate their model compared to S@K and P@K, they used SVM, MTL_Lasso,

A different set of categories was compared to show similarities in sharing comfort between Americans and Asians. Of these (healthcare treatments, health conditions, passing away, specific complaints, home address, current location, contact information, and places planning to go), American Turkers were much more restrictive about sharing with the outside world where Asian Turkers exhibited a similarly and relatively conservative sentiment about sharing. From this, the author concluded:

Americans are more open to share personal aspects on the Web than Asians. — Song et al. 2018

Take Home

I found this study to be interesting despite some of the methodological problems and derived conclusions. As I mentioned, the inability to regulate who sees tweets when posting (a la Facebook) affects the nature of the tweet with a potential likely bias toward the tweeter being less concerned for privacy. The authors did not mention whether it was asked if each Turker personally used Twitter or if they even mentioned to the Turkers that the text they judged were tweets and not just "messages posted online". This context, if excluded, could make those judging the tweets unsuitable to do so. I would hope to see an expanded version of this study (say, posted to arXiv) with more comprehensive results, as the authors stated space was a limitation, but there was no indication as such.

—Mat (@machawk1)

Comments