Wednesday, December 14, 2011

2011-12-14 Python & Memento Presentation for the ODU ACM

Earlier this semester, I was invited to present Python at an ODU ACM meeting. I presented a brief overview of the Python language and followed up with a code walk through of the code I use to parse Memento timemaps in my current research.

Python, of course, has advantages and disadvantages compared to other languages. Since most ODU undergrads have experience with C++, the presentation presents Python with respect to C++. Pythons advantages include a fast development cycle and an extensive collection of community libraries. Its primary disadvantage compared to C++ is execution speed. My experience is that Python is sometimes over 100 times slower.

Python's basic syntax and semantics are straight forward, so the presentation focused on the Python equivalents of commonly-used C++ constructs and the differences between static (C++) and dynamic (Python) typing. Python's implementation of high-level data types (lists, dictionaries, tuples, and sets) and functional code were compared to the complexity of the C++ equivalents.



To bring all the pieces together, I did a code walk through of the python.py module I use to parse Memento timemaps (see the Memento Introduction and Internet Draft for more information). The module has two classes. The TimeMap class is a parser and dictionary for timemap data. The TimeMapTokenizer class is a tokenizer for link-style timemaps.

To load a timemap, a new instance of TimeMap is created using the timemap's URI, which is the constructor's only argument. A TimeMapTokenizer instance returns individual tokens, simplifying the parsing code in the get_next_link function. TimeMap implements the __getitem__ function, allowing it to act as a Python dictionary. TimeMapTokenizer implements the __iter__ and next functions, which the use of Python iteratation constructs over the list of tokens.

— Scott G. Ainsworth

No comments:

Post a Comment