Tuesday, April 18, 2017

2017-04-18: Local Memory Project - going global

Screenshots of world local newspapers from the Local Memory Project's local news repository. Top: newspapers from Iraq, Nigeria, and France. Bottom: Chile, US (Alaska), and Australia.
Soon after the introduction of the Local Memory Project (LMP) and the local news repository of:
  • 5,992 US Newspapers
  • 1,061 US TV stations, and
  • 2,539 US Radio stations
I considered extending the local news collection beyond US local media to include newspapers from around the world.
Finding and generating the world local newspaper dataset
After a sustained search, I narrowed my list of potential sources of world local news media to the following in order of my perceived usefulness:
From this list, I chose Paperboy as my world local news source because it was fairly structured (makes web scraping easier), and contained the cities in which the various newspaper organizations are located. Following scraping and data cleanup, I extracted local newspaper information for:
  • 6,638 Newspapers from 
  • 3,151 Cities in 
  • 183 Countries
The dataset is publicly available.
Integrating the world local newspaper dataset into LMP
For a seamless transition from US to a world-centric Local Memory Project, it was pertinent to ensure the world local media was represented with exactly the same data schema as the US local media. This guarantees that the architecture of LMP remains the same. For example, the following response excerpt represents a single US college newspaper (Harvard Crimson). 
{
  "city": "Cambridge", 
  "city-latitude": 42.379146, 
  "city-longitude": -71.12803, 
  "collection": [
   {
      "city-county-lat": 42.377, 
      "city-county-long": -71.1167, 
      "city-county-name": "Harvard", 
      "country": "USA", 
      "facebook": "http://www.facebook.com/TheHarvardCrimson", 
      "media-class": "newspaper", 
      "media-subclass": "college", 
      "miles": 0.6, 
      "name": "Harvard Crimson", 
      "open-search": [], 
      "rss": [], 
      "state": "MA", 
      "twitter": "http://www.twitter.com/thecrimson", 
      "video": "https://www.youtube.com/user/TheHarvardCrimson/videos", 
      "website": "http://www.thecrimson.com/" 
   }
  ], 
  "country": "USA", 
  "self": "http://www.localmemory.org/api/countries/USA/02138/10/?off=tv%20radio%20", 
  "state": "MA", 
  "timestamp": "2017-04-17T18:56:10Z"
 }
Similarly, world local media use this same schema for seamless integration into the existing LMP framework. However, different countries have different administrative subdivisions. From an implementation standpoint, it would have been ideal if all countries had the US-style administrative subdivision of: Country - State - City, but this is not the case. Also, currently, LMP's Geo and LMP's Local Stories Collection Generator are accessed using a zip code. Consequently, the addition of world local news media meant finding the various databases which mapped zip codes to their respective geographical locations. To overcome the obstacles of multiple administrative subdivisions, and the difficulty of finding comprehensive databases that mapped zip codes to geographical locations, while maintaining the pre-existing LMP data schema, I created a new access method for Non-US local media. Specifically, US local news media are accessed with a zip code (which maps to a City in a State), while Non-US local news media are accessed with the name of the City. For example, here is a list of 100 local newspapers that serve Toronto, Canada: http://www.localmemory.org/geo/#Canada/Toronto/100/

The addition of 6,638 Non-US newspapers from 183 countries makes it possible not only to see local news media from different countries, but also to build collections of stories about events from the perspectives of local media around the world.

--Nwala

No comments:

Post a Comment