Monday, November 12, 2018

2018-11-12: Google Scholar May Need To Look Into Its Citation Rate



Google Scholar has long been regarded as a digital library containing the most complete collection of scholarly papers and patterns. For a digital library, completeness is very important because otherwise, you cannot guarantee the citation rate of a paper, or equivalently the in-link of a node in the citation graph. That is probably why Google Scholar is still more widely used and trusted than any other digital libraries with fancy functions.

Today, I found two very interesting aspects of Google Scholar, one is clever and one is silly. The clever side is that Google Scholar distinguishes papers, preprints, and slides and count citations of them separately.

If you search "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs", you may see the same view as I attached. Note that there are three results. The first is a paper on IEEE. The second actually contains a list of completely different authors. These people are probably doing a presentation of that paper. The third is actually a pre-print on arXiv. These three have different numbers of citations, which they should do.


The silly side is also reflected in the search result. How does a paper published in less than a year receive more than 1900 citations? You may say that may be a super popular paper. But if you look into the citations. Some do not make sense. For example, the first paper that "cites" the DeepLab paper was published in 2015! How could it cite a paper published in 2018?

Actually, the first paper's citation rate is also problematic. A paper published in 2015 was cited more than 6500 times! And another paper published in 2014 was cited more than 16660 times!

Something must be wrong about Google Scholar! The good news that the number looks higher, which makes everyone happy! :)


Jian Wu

Saturday, November 10, 2018

2018-11-11: More than 7000 retracted abstracts from IEEE. Can we find them from IA?


Science magazine:

More than 7000 abstracts are quietly retracted from the IEEE database. Most of these abstracts are from IEEE conferences that took place between 2009 and 2011.  The plot below clearly shows when the retraction happened. The reason was weird: 
"After careful and considered review of the content of this paper by a duly constituted expert committee, this paper has been found to be in violation of IEEE’s Publication Principles. "
Similar things happened in Nature subsidiary journal (link) and other journals (link).


The question is can we find them from internet archive? Can they still be legally posted on a digital library like CiteSeerX? If they do, they can provide a very unique training dataset to be used for fraud and/or plagiarism detection, assuming that the reason under the hood is one of them. 

Jian Wu

2018-11-10: Scientific news and reports should cite original papers

Image result for sciencealert

I highly encourage all scientific news or reports cite corresponding articles. ScienceAlert usually does a good job on this. This piece of scientific news from ScienceAlert discovers two Rogue planets.  Most planets we discovered rotate around a star. A Rogue planet does not rotate around a star, but the center of the Galaxy. Because planets do not emit light, Rogue planets are extremely hard to detect. This piece of news cites a recently published paper on arXiv. Although anybody can publish papers on arXiv. Papers published by reputable organizations should be reliable.

A reliable citation is beneficial for all parties. It makes the scientific news more trustable. It gives credits to the original authors. It could also connect readers to a place to explore other interesting science.

Jian Wu



Friday, November 9, 2018

2018-11-09: Grok Pattern

Image result for logstash logo
Grok is a way to match a text line against a regular expression, map specific parts of the line into dedicated fields, and perform actions based on this mapping. Grok patterns are (usually long) regular expressions that are widely used in log parsing. With tons of search engine logs, how to effectively parse them, extract useful metadata for analytics, training, and prediction has become a key problem in mining text big data. 

In this article, Ran Ramati gives a beginner's guide to Grok Pattern used in Logstash, one of the powerful tools in the Elastic Stack (the other two are Kibana and Elastic Search).

https://logz.io/blog/logstash-grok/

The StreamSets webpage gives a list of Grok pattern examples: 

https://streamsets.com/documentation/datacollector/3.4.3/help/datacollector/UserGuide/Apx-GrokPatterns/GrokPatterns_title.html

The recent paper by Huawei research lab in China summarizes and compare a number of log parsing tools:

https://arxiv.org/abs/1811.03509

I am kind of surprised that although they cited the Logstash website, they did not compare Logstash with its peers.

 Jian Wu

Thursday, November 8, 2018

2018-11-08: Decentralized Web Summit: Shaping the Next Web


In my wallet I have a few ₹500 Indian currency notes that say, "I PROMISE TO PAY THE BEARER THE SUM OF FIVE HUNDRED RUPEES" followed by the signature of the Governor of the Reserve Bank of India. However, this promise was broken two years ago from today, since then these bills in my pocket are nothing more than rectangular pieces of printed paper. So, I decided to utilize my origami skills and turn them into butterflies.

On November 8, 2016, at 8:00 PM (Indian Standard Time), Indian Prime Minister Narendra Modi announced the demonetization (effective in four hours after midnight) of the two biggest currency notes (₹1,000 and ₹500) in circulation at that time. Together these two notes represented about 86% of the total cash economy of India at that time. More than 65% of the Indian population still lives in rural and remote areas where availability of electricity, the Internet, and other utilities is not reliable yet. Hence, cash is a very common means of business in daily life there. It was morning here in Norfolk (USA) and I was going through the news headlines when I saw this announcement. I could not believe for a while that the news was real and not a hoax. I did not even know that there is a concept called demonetization that governments can practice. Irrespective of my political views and irrespective of the intents and goals behind the decision (whatever good or bad they might have been) I was shocked to realize that the system has so much centralization of power in place that a single person can decide sufferings for about 18% of the global population overnight and cause a chaos in the system. I wished for a better and more resilient system, I wanted a system with decentralization of power by design where no one entity has a significant share of power and influence. I wanted a DECENTRALIZED SYSTEM!

When the Internet Archive (IA) announced plans for the Decentralized Web (DWeb) Summit, I was on board to explore what can we do to eliminate centralization of control and power in systems on the Web. With a generous support from the Protocol Labs, AMF, and NSF IIS-1526700 grants I was able to travel to the West Coast to experience four days full of fun and many exciting events. I got the opportunity to meet many big names who brought us the Web we experience today and many of those who are working towards shaping the future of the Web with their vision, ideas, experience, code, art, legal understanding, education, or social values. They all had a different perspective to share with the rest, but all seemed to agree on one goal of fixing the current Web where freedom of expression is under an ever-growing threat, governments control the voice of dissent, big corporations use personal data of the Internet users for monetary benefits and political influence, and those in power try to suppress the history they might be uncomfortable with.

There was so much going on in parallel that perhaps no two people have experienced the same sequence of events. Also, I am not even pretending to tell everything I have observed there. In this post I will be describing my experience of the following four related events briefly that happened between July 31 and August 3, 2018.

  • IndieWebCamp SF
  • Science Fair
  • Decentralized Web Summit
  • IPFS Lab Day

IndieWebCamp SF


The IndieWeb is a people-focused alternative to the "corporate web". Their objectives include: 1) Your content is yours, 2) You are better connected, and 3) You are in control. Some IndieWeb people at Mozilla decided to host IndieWebCamp SF, a bootcamp the day before #DWebSummit starts and shared open invitation to all participants. I was quick to RSVP there which was going to be my first interaction with the IndieWeb.

On my way from the hotel to the Mozilla's SF office the Uber driver asked me why I came to SF. I replied to her, "to participate in an effort to decentralize the Web". She seemed puzzled and said, "my son was mentioning something about it, but I don't know much". "Have you heard about Bitcoin?", I asked her to get an idea how to explain. "I have heard this term in the news, but don't really know much about it", she said. So, I started the elevator pitch and in the next eight or so minutes (about four round trips of Burj Khalifa's elevator from the ground to the observation deck) I was able to explain some of the potential dangers of centralization in different aspects of our social life and what are some of the alternatives.




The bootcamp had both on-site and remote participants and was well organized. We started with keynotes from Miriam Avery, Dietrich Ayala, and Ryan Barrett then some people introduced themselves, why were they attending the DWeb Summit, and what ideas they had for the IndieWeb bootcamp. Some people had lightning demos. I demonstrated InterPlanetary Wayback (IPWB) briefly. I got to meet some people behind some projects I was well aware of (such as Universal Viewer and Dat Project) and also got to know about some projects I didn't know before (such as Webmention and Scuttlebutt). We then scheduled BarCamp breakout sessions and had lunch.

During and after the lunch I had an interesting discussion and exchanged ideas with Edward Silverton from the British Library and a couple of people from Mozilla's Mixed Reality team about the Universal Viewer, IIIF, Memento, and multi-dimensional XR on the Web.




Later I participated in two sessions "Decentralized Web Archiving" and "Free Software + Indieweb" (see the schedule for notes on various sessions). The first one was proposed by me in which I explained the state of Web archiving, current limitations and threats, and the need to move it to a more persistent and decentralized infrastructure. I have also talked about IPWB and how it can help in distributed web archiving (see notes for details and references). In the latter session we talked about different means to support Free Software and open-source developers (for example bug bounty, crowdfunding, and recurring funding), compared and contrasted different models and their sustainability as compared with closed-source software backed by for-profit organizations. We also touched on some licensing complications briefly.

I had to participate in the Science Fair at IA, so I had to get there a little earlier than the start time of the session. With that in mind, Dietrich (from the Firefox team) and I left the session a little before it was formally wrapped up as the SF traffic in the afternoon was going to make it a rather long commute.

Science Fair


The taxi driver was an interesting person with whom Dietrich and I shared the ride from the Mozilla SF office to the Internet Archive, talking about national and international politics, history, languages, music, and whatnot until we reached our destination where food trucks and stalls were serving the dinner. It was more windy and chilly out there than I anticipated in my rather thin jacket. Brewster Kahle, the founder of the IA, who had just came out of the IA building, welcomed us and navigated us to the registration desk where a very helpful team of volunteers gave us our name badges and project sign holders. I acquired a table right outside the entrance of the IA's building, placed the InterPlanetary Wayback sign on it, and went to the food truck to grab my dinner. When I came back I found that the wind has blown my project sign off the table, so I moved it inside of the building where it was a lot cozier and crowded.

The Science Fair event was full of interesting projects. You may explore the list of all the Science Fair projects along with their description and other details. Alternatively, flip through the pages of the following photo albums of the day.






Many familiar and new faces visited my table, discussed the project, and asked about its functionality, architecture, and technologies. On the one hand I met people who were already familiar with our work and on the other hand some needed a more detailed explanation from scratch. I even met people who asked with a surprise, "why would you make your software available to everyone for free?" This needed a brief overview of how the Open Source Software ecosystem works and why one would participate in it.




This is not a random video. This clip was played to invite Mike Judge, Co-creator of HBO's Silicon Valley on the stage for a conversation with Cory Doctorow in the Opening Night Party after Brewster's welcome note (due to the streaming rights issue the clip is missing in IA's full session recording). I can't think of a better way to begin the DWeb Summit. This was my first introduction with Mike (yes, I had not watched the Silicon Valley show before). After an interesting Q&A session on the stage, I got the opportunity to talk to him in person, took a low-light blurred selfie with him, mentioned Indian demonetization story (which, apparently, he was unaware of), and asked him to make a show in the future about potential threats on DWeb. Web 1.0 emerged as a few entities having control on publishing with the rest of the people being consumers of that content. Web 2.0 enabled everyone to participate in the web both as creators and consumers, but privacy and censorship controls gone in the hands of governments and a few Internet giants. If Web 3.0 (or DWeb) could fix this issue too, what would potentially be the next threat? There should be something which we may or may not be able to think of just yet, right?


Mike Judge and Sawood Alam


Decentralized Web Summit


For the next two days (August 1–2) the main DWeb Summit was organized in the historical San Francisco Mint building. There were numerous parallel sessions going on all day long. At any given moment perhaps there was a session suitable for everyone's taste and no one could attend everything they would wish to attend. A quick look at the full event schedule would confirm this. Luckily, the event was recorded and those recordings are made available, so one can watch various talks asynchronously. However, being there in person to participate in various fun activities, observe artistic creations, experience AR/VR setups, and interacting with many enthusiastic people with many hardware, software, and social ideas are not something that can be experienced in recorded videos.





If the father of the Internet with his eyes closed trying to create a network with many other participants with the help of a yellow yarn, some people trying to figure out what to do with colored cardboard shapes, and some trying to focus their energy with the help of specific posture are not enough then flip through these photo albums of the event to have a glimpse into many other fun activities we had there.





Initially, I tried to plan my agenda but soon I realized it was not going to work. So, I randomly picked one from the many parallel sessions of my interest, spent an hour or two there, and moved to another room. In the process I interacted with many people from different backgrounds participating both in their individual or organizational capacity. Apart from usual talk sessions we discussed various decentralization challenges and their potential technical and social solutions in our one-to-one or small group conversations. An interesting mention of additive economy (a non-zero-sum economy where transactions are never negative) reminded me of our gamification idea we explored when working on the Preserve Me! project and I ended up having a long conversation with a couple of people about it during a breakout session.




If Google Glass was not cool enough then meet Abhik Chowdhury, a graduate student, working on a smart hat prototype with a handful of sensors, batteries, and low-power computer boards placed in a 3D printed frame. He is trying to find a balance in on-board data processing, battery usage, and periodic data transfer to an off-the-hat server in an efficient manner, while also struggling with the privacy implications of the product.

It was a conference where "Crypto" meant "Cryptocurrency", not "Cryptography" and every other participant was talking about Blockchain, Distributed/Decentralized Systems, Content-addressable Filesystem, IPFS, Protocols, Browsers, and a handful other buzz-words. Many demos there were about "XXX but decentralized". Participants included the pioneers and veterans of the Web and the Internet, browser vendors, blockchain and cryptocurrency leaders, developers, researchers, librarians, students, artists, educators, activists, and whatnot.

I had a lightning talk entitled, "InterPlanetary Wayback: A Distributed and Persistent Archival Replay System Using IPFS", in the "New Discoveries" session. Apart from that I spend a fair amount of my time there talking about Memento and its potential role in making decentralized and content-addressable filesystems history-aware. During a protocol related panel discussion, I worked with a team of four people (including members from the Internet Archive and MuleSoft) to pitch the need of a decentralized naming system that is time-aware (along the lines of IPNS-Blockchain) and can resolve a version of a resource at a given time in the past. I also talked to many people from Google Chrome, Mozilla Firefox, and other browser vendors and tried to emphasize the need of native support of Memento in web browsers.

Cory Doctorow's closing keynote on "Big Tech's problem is Big, not Tech" was perhaps one of the most talked about talk of the event, which received many reactions and commentary. The recorded video of his talk is worth watching. Among many other things in his talk, he encouraged people to learn programming and to understand functions of each software we use. After his talk, an artist asked me how can she or anyone else learn programming? I told her, if one can learn a natural language, then programming languages are way more systematic, less ambiguous, and easier to learn. There are really only three basic constructs in a programming language that include variable assignments, conditionals, and loops. Then I verbally gave her a very brief example of mail merge using all of these three constructs that yields gender-aware invitations using a message template for a list of friends to be invited in a party. She seemed enlightened and delighted (while enthusiastically sharing her freshly learned knowledge with other members of her team) and exchanged contacts with me to learn more about some learning resources.

IPFS Lab Day


It looks like people were too energetic to get tired of such jam-packed and eventful days as some of them have planned post-DWeb events of special interest groups. I was invited by Protocol Labs to give an extended talk in one such IPFS-centric post-DWeb event called Lab Day 2018 on August 3. Their invitation arrived the day after I had booked my tickets and reserved the hotel room, so I ended up updating my reservations. This event was in a different location and the venue was decorated with a more casual touch with bean bags, couches, chairs, and benches near the stage and some containers for group discussions. You may take a glimpse of the venue in these pictures.








They welcomed us with new badges, some T-shirts, and some best-seller books to take home. The event had a good lineup of lightning talks and some relatively longer presentations, mostly extended forms of similar presentations in the main DWeb event. Many projects and ideas presented there were in their early stages. These sessions were recorded and published later after necessary editing.

I presented my extended talk entitled, "InterPlanetary Wayback: The Next Step Towards Decentralized Archiving". Along with the work already done and published about IPWB, I also talked about what is yet to be done. I explored the possibility of an index-free, fully decentralized collaborative web archiving system as the next step. I proposed some solutions that would require some changes in IPFS, IPNS, IPLD, and other technologies around to accommodate the use case. I encouraged people to discuss with me if they have any better ideas to help solve these challenges. The purpose was to spread the word out so that people keep web archiving related use cases in mind while shaping the next web. Some people from the core IPFS/IPNS/IPLD developer community approached me and we had an extended discussion after my talk. The recording of my talk and slides are made available online.




It was a fantastic event to be part of and I am looking forward to more such events in the future. IPFS community and people at Protocol Labs are full of fresh ideas and enthusiasm and they are a pleasure to work with.

Conclusions


Decentralized Web has a long way to go and DWeb Summit is a good place to bring people from various disciplines with different perspectives together every once in a while to synchronize all the distributed efforts and to identify the next set of challenges. While I could not attend the first summit (in 2016) I really enjoyed the second one and would love to participate in future events. Those two short days of the main event had more material than I can perhaps digest in two weeks, so my only advice would be to extend the duration of the event instead of having multiple parallel session with overlapping interests.

I extend my heartiest thanks to organizers, volunteers, fund providers, and everyone involved in making this event happen and making it a successful one. I wish going forward not just the Web, but many other organizations, including governments, become more decentralized so that I do not open my wallet once again to realize it has some worthless pieces of currency bills that were demonetized over night.

Resources




--
Sawood Alam