2016-11-16: Introducing the Local Memory Project

Collage made from screenshot of local news websites across the US

The national news media has different priorities than the local news media. If one seeks to build a collection about local events, the national news media may be insufficient, with the exception of local news which “bubbles” up to the national news media. Irrespective of this “bubbling” of some local news to the national surface, the perspective and reporting of national news differs from local news for the same events. Also, it is well known that big multinational news organizations routinely cite the reports of smaller local news organizations for many stories. Consequently, local news media is fundametal to journalism.

It is important to consult local sources affected by local events. Thus the need for a system that helps small communities to build collections of web resources from local sources for important local events. The need for such a system was first (to the best of my knowledge) outlined by Harvard LIL. Given Harvard LIL's interest of helping facilitate participatory archiving by local communities and libraries, and our IMLS-funded interest of building collections for stories and events, my summer fellowship at Harvard LIL provided a good opportunity to collaborate on the Local Memory Project.

Our goal is to provide a suite of tools under the umbrella of the Local Memory Project to help users and small communities discover, collect, build, archive, and share collections of stories for important local events from local sources.

Local Memory Project dataset

We currently have a public json US dataset scraped from USNPL of:

5,992 Newspapers
1,061 TV stations, and
2,539 Radio stations

The dataset structure is documented and comprises of the media website, twitter/facebook/youtube links, rss/open search links, as well as geo-coordinates of the cities or counties in which the local media organizations reside. I strongly believe this dataset could be essential to the media research
community.

There are currently 3 services offered by the Local Memory Project:

1. Local Memory Project - Google Chrome extension:

This service is an implementation of Adam Ziegler and Anastasia Aizman's idea for a utility that helps one build a collection for a local event which did not receive national coverage. Consequently, given a story expressed by a query input, for a place, represented by a zip code input, the Google Chrome extension performs the following operations:

Retrieve a list of local news (Newspapers and TV stations) websites that serve the zip code
For each local news website search Google for stories from all the local news websites retrieved from 1.

The result is a collection of stories for the query from local news sources.

For example, given the problem of building a collection for Zika virus for Miami Florida, we issue the following inputs (Figure 1) to the Google Chrome Extension and click "Submit":

Figure 1: Google Chrome Extension, input for building a collection about Zika virus for Miami FL

After the submit button is pressed the application issues the "zika virus" query to Google with the site directive for newspapers and tv stations for the 33101 area.

Figure 2: Google Chrome Extension, search in progress. Current search in image targets stories about Zika virus from Miami Times

After the search, the result (Figure 3) was saved remotely.

Figure 3: A subset (see complete) of the collection about Zika virus built for the Miami FL area.

Here are examples of other collections built with the Google Chrome Extension (Figures 4 and 5):

Figure 4: A subset (see complete) of the collection about Simone Biles' return for Houston Texas

Figure 5: A subset (see complete) of the collection about Protesters and Police for Norfolk Virginia

The Google Chrome extension also offers customized settings that suit different collection building needs:

Figure 6: Google Chrome Extension Settings (Part 1)

Figure 7: Google Chrom Extension Settings (Part 2)

Google max pages: The number of Google search pages to visit for each news source. Increase if you want to explore more Google pages since the default value is 1 page.
Google Page load delay (seconds): This time delay between loading Google search pages ensures a throttled request.
Google Search FROM date: Filter your search for news articles crawled from this date. This comes in handy if a query spans multiple time periods, but the curator is interested in a definite time period.
Google Search TO date: Filter your search for news articles before this date. This comes in handy especially when combined with 3, it can be used to collect documents within a start and end time window.
Archive Page load delay (seconds): Time delay between loading pages to be archived. You can increase this time if you want to have the chance to do something (such as hit archive again) before the next archived page loads automatically. This is tailored to archive.is.
Download type: Download to your machine for a personal collection in (json or txt format). But if you choose to share, save remotely (you should!)
Collection filename: Custom filename for collection about to be saved.
Collection name: Custom name for your collection. It's good practice to label collections.
Upload a saved collection (.json): For json collections saved locally, you may upload them to revisualize the collection.
Show Thumbnail: A flag that decides whether to send a remote request to get a card (thumbnail summary) for the link. Since cards require multiple GET requests, you may choose to switch this off if you have a large collection.
Google news: The default search of the extension is the generic Google search page. Check this box to search teh Google news vertical instead.
Add website to existing collection: Add a website to an existing collection.

2. Local Memory Project - Geo service:

The Google Chrome extension utilizes the Geo service to find media sources that serve a zip code. This service is an implementation of Dr. Michael Nelson's idea for a service that supplies an ordered list of media outlets based on their proximity to a user-specified zip code.

Figure 8: List of top 10 Newspapers, Radio and TV station closest to zip code 23529 (Norfolk, VA)

3. Local Memory Project - API:

The local memory project Geo website is meant for human users, while the API website targets machine users. Therefore, it provide the same services as the Geo website but returns a json output (as opposed to HTML). For example, below is a subset output (see complete) corresponding to a request for 10 news media sites in order of proximity to Cambridge, MA.

{
  "Lat": 42.379146, 
  "Long": -71.12803, 
  "city": "Cambridge", 
  "collection": [
    {
      "Facebook": "https://www.facebook.com/CambridgeChronicle", 
      "Twitter": "http://www.twitter.com/cambridgechron", 
      "Video": "http://www.youtube.com/user/cambchron", 
      "cityCountyName": "Cambridge", 
      "cityCountyNameLat": 42.379146, 
      "cityCountyNameLong": -71.12803, 
      "country": "USA", 
      "miles": 0.0, 
      "name": "Cambridge Chronicle", 
      "openSearch": [], 
      "rss": [], 
      "state": "MA", 
      "type": "Newspaper - cityCounty", 
      "website": "http://cambridge.wickedlocal.com/"
    }, 
    {
      "Facebook": "https://www.facebook.com/pages/WHRB-953FM/369941405267", 
      "Twitter": "http://www.twitter.com/WHRB", 
      "Video": "http://www.youtube.com/user/WHRBsportsFM", 
      "cityCountyName": "Cambridge", 
      "cityCountyNameLat": 42.379146, 
      "cityCountyNameLong": -71.12803, 
      "country": "USA", 
      "miles": 0.0, 
      "name": "WHRB 95.3 FM", 
      "openSearch": [], 
      "rss": [], 
      "state": "MA", 
      "type": "Radio - Harvard Radio", 
      "website": "http://www.whrb.org/"
    }, ...

Saving a collection built with the Google Chrome Extension

Collection built on a user machine can be saved in one of two ways:

Save locally: this serves as a way to keep a collection private. Saving can be done by clicking "Download collection" in the Generic settings section of the extension settings. A collection can be saved in json or plaintext format. The json format permits the collection to be reloaded through "upload a saved collection" in the Generic settings section of the extension settings. The plaintext format does not permit reloading into the extension, but contains all the links which make up the collection.
Save remotely: in order to be able to share the collection you built locally with the world, you need to save remotely by clicking the "Save remotely" button on the frontpage of the application. This leads to a dialog requesting a mandatory unique collection author name (if one doesn't exist) and an optional collection name (Figure 10). After supplying the inputs the application saves the collection remotely and the user is presented with a link to the collection (Figure 11).

Before a collection is saved locally or remotely, you may choose to exclude an entire news source (all links from a given source) or a single news source as described by Figure 9:

Figure 9: Exclusion options before saving locally/remotely

Figure 10: Saving a collection prompts a dialog requesting a mandatory unique collection author name and an optional collection name

Figure 11: A link is presented after a collection is saved remotely

Archiving a collection built with the Google Chrome Extension

Saving is the first step to make a collection persist after it is built. However, archiving ensures that the links referenced in a collection persist even if the content is moved or deleted. Our application currently integrates archiving via Archive.is, but we plan to expand the archiving capability to include other public web archives.

In order to archive your collection, click the "Archive collection" button on the frontpage of the application. This leads to a dialog similar to the saving dialog which requests a mandatory unique collection author name (if one doesn't exist) and an optional collection name. Subsequently, the application archives the collection by first archiving the front page which contains all the local news sources, and secondly, the application archives the individual links which make up the collection (Figure 12). You may choose to stop the archiving operation at any time by clicking "Stop" on the archiving update orange-colored message bar. At the end of the archiving process, you get a short URI corresponding to the archived collection (Figure 13).

Figure 12: Archiving in progress

Figure 13: When the archiving is complete, a short link pointing to the archived collection is presented

Community collection building with the Google Chrome Extension

We envision a community of users contributing to a single collection for a story. Even though the collections are built in isolation, we consider a situation in which we can group collections around a single theme. To begin this process, the Google Chrome Extension lets you share a locally built collections on Twitter by clicking the "Tweet" button (Figure 14).

Figure 14: Tweet button enables sharing the collection

This means if user 1 and user 2 locally build collections for Hurricane Hermine, they may use the hashtags #localmemory and #hurricanehermine when sharing the collection. Consequently, all Hurricane Hermine-related collections will be seen via Twitter with the hashtags. We encourage users to include #localmemory and the collection hashtags in tweets when sharing collections. We also encourage you to follow the Local Memory Project on Twitter.

.@localmem local memory collection, hermine (Hilton Head SC): https://t.co/EaYsRYYGy5 https://t.co/wZQ8su9sIw #localmemory #hurricanehermine
— Alexander C. Nwala (@acnwala) September 3, 2016

The local news media is a vital organ of journalism, but one in decline. We hope by providing free and open source tools for collection building, we can contribute in some capacity to help its revival.

I am thankful for everyone who has contributed to the ongoing success of this project. From Adam, Anastasia, Matt, Jack and the rest of the Harvard LIL team, to my Supervisor Dr. Nelson and Dr. Weigle, and Christie Moffat at the National Library of Medicine, as well as Sawood and Mat and the rest of my colleagues at WSDL, thank you.

2018-02-24 Edit: We had the need to extract links from Google as part of a larger research project to quantify "refinding" stories on Google. This required issuing queries to Google every day and collecting links.

I used the LMP extension to semi-automatically extract 33,432 URLs from the first 5 pages of Google for 7 queries every day for over 7 months:https://t.co/oV5wwxjfYT #webarchiving #localmemory
— Alexander C. Nwala (@acnwala) February 15, 2018

Fortunately, the Local Memory Project Google chrome extension is well suited for this task because it enables downloading JSON files consisting of links extracted from Google for the issued queries. However, the standard version of the extension only permits issuing queries to Google and extracting links from local news media organizations. This "local media" restriction was not required in the research project. In other words, our research project required a standard Google search in which links from all kinds of media sources (local and non-local) are included the search result. As a result, I added this functionality into the extension.

Additionally, a different research interest required us to extract tweets from conversation threads. The Twitter API does not permit this, similarly, we adapted the Local Memory Project Google Chrome extension for this task.

To summarize, here are the additional features of the extension:

Extract links from standard Google searches: If you would like to extract links from Google for a given query (e.g., "winter olympics") set the zip code textbox to 0 (Figure 15). This instructs the extension to initiate a standard Google search and extract links for the specified number of pages (Figure 6, annotation 1). The extracted links may be downloaded locally as JSON files or saved (Figure 10-11) and/or archived (Figure 12-13) for remote and persistent access.

Figure 15: Extract links from Google for query "winter olympics." This is achieved by setting the zip code textbox to "0"
Extract tweets from Twitter: In order extract tweets from Twitter timeline, search or threaded conversations, copy the URI from the search bar and paste into the Tweet URL textbox (Figure 16), and press the Extract tweets button.

Figure 16: Extract tweets from Twitter search for query: "winter olympics"

For example, in order to extract tweets from your timeline, the Tweet URL textbox is set to "https://twitter.com/". In order to extract tweets from the search query "winter olympics," the Tweet URL textbox is set to "https://twitter.com/search?q=winter%20olympics&src=typd". In order to extract tweets from the hashtag "#WinterOlympics" the Tweet URL is set to "https://twitter.com/hashtag/WinterOlympics?src=hash". In order to extract tweets from a threaded conversation (e.g., Figure 17), set the Tweet URL to the conversation URL, e.g., "https://twitter.com/NewYorker/status/964220343278858240". The easiest way to get these tweet URLs is the address bar of your browser. The extension is able to scroll and click in order to load more tweets.

Figure 17: The extension can extract tweets from threaded conversations

-- Nwala (@acnwala)

Search This Blog

Web Science and Digital Libraries Research Group

2016-11-16: Introducing the Local Memory Project

Comments

Post a Comment