Tuesday, June 18, 2019

2019-06-18: It is time to go back home!

On May 11, 2019 I officially obtained my PhD in Computer Science from Old Dominion University. My graduate studies journey started when I received a full scholarship from the University of Hail in Saudi Arabia, where I worked there two years as a teacher assistant. I came to the USA and specifically to San Francisco in 2010 with my husband and my three-months old daughter. I attended Kaplan Institute where I took English classes and a GRE course for almost a year. After that I got accepted in ODU as a CS Masters student in 2011. In July 2013 I welcomed my second baby girl Jenna, and in August I graduated from the Master program and joined the PhD program to work with the wsdl (Web Science and Digital Libraries) research group there.
On April 4, 2019, I defended my dissertation research, “Expanding the usage of web archives by recommending archived webpages using only the URI” (slides, video).

The goal of my work was to build a model for selecting and ranking possible recommended webpages at a Web archive. This is to enhance both the archive's HTTP 404 responses and HTTP 200 responses by surfacing webpages in the archive that the user may not know existed. An example is when a user requests a Virginia Tech football webpage from the archive. The user knows about the popular Virginia Tech football webpage http://hokiesports.com/football/ and will request that webpage. This webpage is currently on the live Web and archived. However, the user does not know that the webpage http://techsideline.com exists in the archive. In 2013, when requesting the webpage from the live Web it redirects to https://virginiatech.sportswar.com. If the user did not have a link to that webpage on the live Web, the user will never know it existed.

To accomplish this, we first detect the semantics in the requested Uniform Resource Identifier (URI). Next, we classify the URI using an ontology, such as DMOZ or any website directory. Finally, we filter and rank candidates based on several features, such as archival quality, webpage popularity, temporal similarity, and content similarity. Archival quality refers to measuring memento damage by evaluating the number and impact of the missing resources in a webpage. Webpage popularity considers how often the webpage has been archived and its popularity on the live Web. A special case of popularity are webpages in “cold spots”, which are pages that are not on the live Web, are not currently popular, but are archived. Temporal similarity can refer to how close the candidate webpage’s Memento-Datetime is to the requested URI. URI similarity assesses the similarity of candidate URI tokens to the requested URI tokens. We tested the model using human evaluation to determine if we could classify and find recommendations for a sample of requests from the Internet Archive’s Wayback Machine access log. Overall, when selecting the full categorization, reviewers agreed with 80.3% of the recommendations, which is much higher than “do not agree” and “I do not know”. This indicates the reviewer is more likely to agree on the recommendations when selecting the full categorization. But when selecting the first level only, reviewers only agreed with 25.5% of the recommendations. This indicates that having deep level categorization improves the performance of finding relevant recommendations.
My life as a graduate student and especially PhD was not an easy one. Trying to juggle family responsibilities with academic work is a hard task which took me some time to figure a way to balance and handle. There are some lesson learned points that I think could be helpful to other graduate students. First, working on a research concentration that interests you and an advisor that is committed and productive is a key to success. It may take time to find the exact topic you are going to work on but with the right guidance from the advisor, doing a lot of reading on other people’s research, and performing some experiments along the way will help you get there. Second, working with a group that is energized will keep you motivated. It is important to have meetings with the other group members and talk about what was accomplished and what is the future work. Not only does this keep you energized, but it also may lead to research contributions. Third, try to find a balance between your personal life and academic life. It is not easy to have kids and do graduate studies, however having great family and friends support is important. Finally, being a graduate student requires patience and hard work. You need to be self motivated during this journey and believe in yourself.
After 9 long, hard, and beautiful years as a graduate student in the US, I will be heading home in June to where it all started, ‘University of Hail’ at the college of computer science and engineering, and work as an assistant professor.

-Lulwah M. Alkwai

No comments:

Post a Comment