Posts

Showing posts from 2018

2018-12-17: CoQA Challenge: Machine Reading Competition Recent Result

Image
CoQA is a dataset containing more than 127,000 questions with answers collected from more than 8000 conversations. Each conversation is about a passage in the form of questions and answers. One example of the passage is below Once upon a time, in a barn near a farm house, there lived a little white kitten named Cotton. Cotton lived high up in a nice warm place above the barn where all of the farmer's horses slept. But Cotton wasn't alone in her little home above the barn, oh no. She shared her hay bed with her mommy and 5 other sisters. All of her sisters were cute and fluffy, like Cotton. But she was the only white one in the bunch. The rest of her sisters were all orange with beautiful white tiger stripes like Cotton's mommy. Being different made Cotton quite sad. She often wished she looked like the rest of her family. So one day, when Cotton found a can of the old farmer's orange paint, she used it to paint herself like them. When her mommy and sisters found...

2018-12-14: New Insight to Big Data: Trip to IEEE Big Data 2018

Image
The IEEE Big Data 2018 was held in the Westin Seattle Hotel between December 10 and December 13, 2018. There are more than 1100 people registered. The accepting rates vary between 13% to 24%, with an average rate of 19%. I have a poster accepted titled “CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset”, co-authored with C. Lee Giles , two of his graduate students ( Bharath and Shaurya ), as well as an undergraduate student who produced preliminary results ( Jianyu Mao ). I attended the conference on Day 2 and Day 3 and left the conference hotel after the keynote on Day 3. Insights from Personal Meetings The most important thing to attend conferences is to meet with old friends and know new friends. Old friends I met include Kyle Williams (Microsoft Bing), Mu Qiao (IBM, chair of I&G track), Yang Song (Google AI, co-chair of I&G track), Manlin Li (Google Cloud), and Madian Khabsa (Apple Siri).  Kyle introduced the recent project on rec...

2018-12-14: CNI Fall 2018 Trip Report

Image
Mat Kelly reports on his recent trip to Washington, DC for the CNI Fall 2018 meeting                                                                                                                                                             ...

2018-12-03: Using Wikipedia to build a corpus, classify text, and more

Image
Wikipedia is an online encyclopedia, available in 301 different languages , and constantly updated by volunteers. Wikipedia is not only an encyclopedia, but it also has been used as an ontology to build a corpus, classify entities, cluster documents, create an annotation, recommend documents to a user, etc. Below, I review some of the significant publications in these areas. Using Wikipedia as a corpus: Wikipedia has been used to create corpora that can be used for text classification or annotation. In “ Named entity corpus construction using Wikipedia and DBpedia ontology ” (LREC 2014), YoungGyum Hahm et al. created a method to use Wikipedia, DBpedia , and SPARQL queries to generate a named entity corpus. The method used in this paper can be accomplished in any language. Fabian Suchanek used Wikipedia, WordNet , and Geonames to create an ontology called YAGO, which contains over 1.7 million entities and 15 million facts. The paper “ YAGO: A large ontology from Wikipedia ...