Posts

Showing posts from December, 2018

2018-12-17: CoQA Challenge: Machine Reading Competition Recent Result

Image
CoQA is a dataset containing more than 127,000 questions with answers collected from more than 8000 conversations. Each conversation is about a passage in the form of questions and answers. One example of the passage is below Once upon a time, in a barn near a farm house, there lived a little white kitten named Cotton. Cotton lived high up in a nice warm place above the barn where all of the farmer's horses slept. But Cotton wasn't alone in her little home above the barn, oh no. She shared her hay bed with her mommy and 5 other sisters. All of her sisters were cute and fluffy, like Cotton. But she was the only white one in the bunch. The rest of her sisters were all orange with beautiful white tiger stripes like Cotton's mommy. Being different made Cotton quite sad. She often wished she looked like the rest of her family. So one day, when Cotton found a can of the old farmer's orange paint, she used it to paint herself like them. When her mommy and sisters found

2018-12-14: New Insight to Big Data: Trip to IEEE Big Data 2018

Image
The IEEE Big Data 2018 was held in the Westin Seattle Hotel between December 10 and December 13, 2018. There are more than 1100 people registered. The accepting rates vary between 13% to 24%, with an average rate of 19%. I have a poster accepted titled “CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset”, co-authored with C. Lee Giles , two of his graduate students ( Bharath and Shaurya ), as well as an undergraduate student who produced preliminary results ( Jianyu Mao ). I attended the conference on Day 2 and Day 3 and left the conference hotel after the keynote on Day 3. Insights from Personal Meetings The most important thing to attend conferences is to meet with old friends and know new friends. Old friends I met include Kyle Williams (Microsoft Bing), Mu Qiao (IBM, chair of I&G track), Yang Song (Google AI, co-chair of I&G track), Manlin Li (Google Cloud), and Madian Khabsa (Apple Siri).  Kyle introduced the recent project on recomme

2018-12-14: CNI Fall 2018 Trip Report

Image
Mat Kelly reports on his recent trip to Washington, DC for the CNI Fall 2018 meeting                                                                                                                                                                                                                                                                                                                                                                           ⓖⓞⓖⓐⓣⓞⓡⓢ I ( Mat Kelly, @machawk1 ) attended my first CNI ( #cni18f ) meeting on December 10-11, 2018, an atypical venue for a PhD student, and am reporting my trip experience (also see previous trip reports from Fall 2017 , Spring 2017 , Spring 2016 , Fall 2015 , and Fall 2009 ). Dr. Nelson ( @phonedude_mln ) and I left Norfolk, VA for DC, previously questioning whether the roads would be clear from unseasonably significant snow storm the night before (they were): The roads are clear but snowy as @phonedude_mln and I make our way to DC

2018-12-03: Using Wikipedia to build a corpus, classify text, and more

Image
Wikipedia is an online encyclopedia, available in 301 different languages , and constantly updated by volunteers. Wikipedia is not only an encyclopedia, but it also has been used as an ontology to build a corpus, classify entities, cluster documents, create an annotation, recommend documents to a user, etc. Below, I review some of the significant publications in these areas. Using Wikipedia as a corpus: Wikipedia has been used to create corpora that can be used for text classification or annotation. In “ Named entity corpus construction using Wikipedia and DBpedia ontology ” (LREC 2014), YoungGyum Hahm et al. created a method to use Wikipedia, DBpedia , and SPARQL queries to generate a named entity corpus. The method used in this paper can be accomplished in any language. Fabian Suchanek used Wikipedia, WordNet , and Geonames to create an ontology called YAGO, which contains over 1.7 million entities and 15 million facts. The paper “ YAGO: A large ontology from Wikipedia

2018-12-03: Acidic Regression of WebSatchel

Image
Mat Kelly reviews WebSatchel, a browser based personal preservation tool.                                                                                                                                                                                                                                                                                                                                                                            ⓖⓞⓖⓐⓣⓞⓡⓢ Shawn Jones ( @shawnmjones ) recently made me aware of a personal tool to save copies of a Web page using a browser extension called " WebSatchel ". The service is somewhat akin to the offerings of browser-based tools like Pocket (now bundled with Firefox after a 2017 acquisition ) among many other tools. Many of these types of tools use a browser extension that allows the user to send a URI to a service that creates a server-side snapshot of the page. This URI delegation procedure aligns with Internet Archive's "Save Page N