Posts

Showing posts with the label Git

2025-02-11: Getting to the Source of the (Memento) Damage

Image
    I've previously written about the Memento Damage project, originally started by Dr. Justin Brunelle , a Web service designed to estimate the amount of damage to a web archive by assessing it's missing resources. Previously, I had been specializating some of the project  while working on the Memento Tracer project, funded by the Alfred P. Sloan Foundation , to take special considerations regarding the damage weighting for Web hosted repository pages.  I have been making further updates to the Memento Damage project over the course of this year that helps improve this analysis and damage estimation. The most prominent is the implementation of a secondary crawler component for analyzing an archived repository and its source tree. Web-hosted Git repositories are hosted on centralized Web platforms, the largest being GitHub along with other major platforms such as GitLab, Bitbucket, and Sourceforge. The source files for a Git project are hosted "behind the scen...

2017-01-15: Summary of "Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data"

Image
Example: original URI vs. trusty URI Based on the paper: Kuhn, T. , Dumontier, M. : Trusty URIs: Verifiable, immutable, and permanent digital artifacts for linked data . Proceedings of the European Semantic Web Conference (ESWC) pp. 395–410 (2014). A trusty URI is a URI that contains a cryptographic hash value of the content it identifies. The authors introduced this technique of using trusty URIs to make digital artifacts, specially those related to scholarly publications, immutable, verifiable, and permanent. With the assumption that a trusty URI, once created, is linked from other resources or stored by a third party, it becomes possible to detect if the content that the trusty URI identifies has been tampered with or manipulated on the way (e.g., trusty URIs to prevent man-in-the-middle attacks ). In addition, trusty URIs can verify the content even if it is no longer found at the original URI but still can be retrieved from other locations, such as Google's cache, ...

2015-03-02 Reproducible Research: Lessons Learned from Massive Open Online Courses

Image
Source: Dr. Roger Peng (2011). Reproducible Research in Computational Science . Science 334: 122 Have you ever needed to look back at a program and research data from lab work performed last year, last month or maybe last week and had a difficult time recalling how the pieces fit together? Or, perhaps the reasoning behind the decisions you made while conducting your experiments is now obscure due to incomplete or poorly written documentation.  I never gave this idea much thought until I enrolled in a series of Massive Open Online Courses (MOOCs) offered on the Coursera platform. The courses, which I took during the period from August to December of 2014, were part of a nine course specialization in the area of data science. The various topics included R Programming , Statistical Inference and Machine Learning . Because these courses are entirely free, you might think they would lack academic rigor. That's not the case. In fact, these particular courses and others on Courser...