2024-07-05: Trip report: the 3rd International Conference on Science of Science and Innovation (ICSSI)
The 3rd International Conference on Science of Science and Innovation (ICSSI) is held at the National Academy of Sciences in Washington D.C. between July 1 and July 3, 2024. I attended the conference for the first two days and gave a contributed talk titled Toward Long-term Computational Reproducibility: Assessing the Archival Rate of URIs to Git Hosting Platforms in Scholarly Publications.
ICSSI is an annual premier conference on Science of Science and Innovation, which started in 2022. I attended the first ICSSI, which was also at the National Academy of Sciences. The second ICSSI was at Northwestern University, Chicago, IL. The conference was initiated by several leaders in this field including Dr. Dashun Wang (University of Northwestern) and Dr. James Evans (University of Chicago). Different from many conferences in Computer and Information Sciences (e.g., ACM SIGIR, IEEE Big Data), this conference does not produce proceedings and it accepts submissions of published work. The conference emphasizes more on the communication and socialization of all people who are engaged in the science of science and innovation research cycle to promote the dissemination of new research outcomes, and the implementation of research products into policies.
Despite requiring just a two-page extended abstract for submission, ICSSI is very selective, at least for the first two years. My submission last year on reproducibility and replicability was not accepted, even though it was accepted by a prestigious Computer Science conference. This year, I led a submission that resulted in a contributed talk. I also coauthored another submission with Dr. Sarah Rajtmajer and her student at Penn State, titled Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences, which was accepted by LREC-COLING 2024.
The conference kicked off with the opening remark of Dr. James Evans. The conference features many interesting invited talks, panels, contributed talks, and posters presented by a variety of scholars including computer and information scientists, computational social scientists, sociologists, and economists. In addition to plenary meetings in the main auditorium, the conference also had concurrent sessions. Below I highlight some events that I am particularly interested in.
Invited Talks
Dr. Adam Jaffe from Brandeis University gave a very interesting talk on Patent Data in Science of Science and Innovation: Past and Future. The talk retrospected the history of scientific studies using patent data. The main question was how to evaluate the quality of patents. The conclusion was the patent quality was hard to evaluate and citations were not good indicators. Adam introduced two books that were worth reading: Invention and Economic Growth by Jacob Schmookler (1966) and Zvi Griliches's paper in 1979 titled Issues in Assessing the Contribution of Research and Development to Productivity Growth. The latter has been cited almost 7000 times since it was published. Adam is the founding member of the Innovation Information Initiative, a collection of data, tools, and metrics to evaluate innovations. Some of his slides are below.
The invited talk by Dr. Yong Yeol Ahn (aka YY Ahn, University of Indiana) proposed a very interesting concept called Knowledge Space. The idea was to make an analogy between Newton's gravitational law and the knowledge space, so the masses are resembled by transition probabilities. Dr. Ahn claims the knowledge space model could explain lots of phenomena in Science of Science research. However, it wasn't clear to me whether the knowledge was just represented by embedding created by language models. The details were published in the PNAS paper titled Unsupervised Embedding of Trajectories Captures the Latent Structure of Scientific Migration (Murray et al. 2023 PNAS). Ahn's work seemed to be based on Katy Borner's famous paper titled Design and Update of a Classification System: The UCSD Map of Science (Borner et al. 2012). Some of Dr. Ahn's slides are below.
Dr. Anne Hultgren is a staff member at the Arnold and Mabel Beckman Foundation. The foundation supports leading-edge research in the fields of chemistry and life sciences. Anne shared the foundation's recent experience of blind review, which seemed to be working very well. Previously, reviewers may put substantial weight on the PI's resume or even the PI's advisor's resume. Many grants were awarded based on "trustfulness" of the applicant, even if minor errors were spotted. This actually undermines a lot of PIs coming from less prestigious institutions, who proposed great ideas but were not selected.
Awesome dinner talk @ ICSSI in Washington D.C. by @UzziLeadership! pic.twitter.com/sCbJvk0TNp
— James Evans (@profjamesevans) July 3, 2024
Panels
The panel discussion titled Beyond patents and papers: operationalizing and evaluating the social impact of science was very interesting. The four members were Diana Burley (American University), David Guston (Arizona State University), Erwin Gianchandani (U.S. National Science Foundation), and Dahlia Sokolov (U.S. House of Representatives). The moderator was Cristin Dorgelo (OSTP). The panel discussed several important topics regarding funding distribution, faculty evaluation, and proposal review issues. One question raised by me in the Q&A session was the Golden Ticket that was proposed in early 2023. The golden ticket allowed a single panelist to overturn the decisions of other panelists in a review panel. Erwin said NSF was "experimenting with this measure" in its recently launched TIP program.
Another interesting panel discussion was titled The Other Side: Junk, Fraud, Retractions, and Paper Mills, moderated by Dr. Daniel Acuna (UC Boulder). The panel members are Ivan Oransky (Retractionwatch.com), Stephanie Lee (The Chronicle for Higher Education), Matt Hodgkinson (UK Research Integrity Office), and Anna Abalkina Freie (Universität Berlin). Ivan showed that from 2002 to 2023, the fraction of retracted papers increased by a factor of ten (from ~0.02% to ~0.2%). In fact, 3.8% of published papers contained problematic figures, with at least half exhibiting features suggestive of deliberate manipulation. Anna revealed a bloody fact that paper mills were found to have existed since 2019 and they commercialize scientific misconduct by selling manuscripts to people who needed publications. It was estimated that 2% of papers published in 2022 were produced by paper mills (Van Noorden 2023 Nature). Matt cited the paper titled China's Publication Bazaar (Hvistendahl 2013 Science). Stephanie shared Cornell food researcher Brian Wansink's falldown story to highlight the seriousness of this misconduct. The panel members highlighted several ongoing projects to cope with this issue, such as the COPE project, the Clear Skies project, United2Act Summit, and the recently launched Research Signals project. Acuna introduced a new startup called reviewerzero.ai. Some slides are below.
Thunder Talks and Contributed Talks
The conference featured several "lightning talk" sessions, which contained talks lasting 5-8 min with only one question, so the total length does not exceed 10 minutes. James named it "thunder talk" because lightning talks were usually shorter (JCDL's lightning talks were given only 1 minute).
One Thunder Talk I was interested in was given by Carina Kane, who was a B.A. candidate in Philosophy and B.S. candidate in data science (impressive undergraduate student), supervised by Shiyang Lai, Donghyun Kang, and Dr. James Evans. She studied conceptual evolution. The study was based on the study by Thomas Kuhn, who found important relationships between conceptual change and scientific innovation in his famous book titled The Structure of Scientific Revolutions. She trained the word2vec model on 70 million paper abstracts in 23 embedding spaces from 1991 to 2023 (one embedding space per year) and looked at the top 5, top 10, and top 20 most similar words each year. The study found the migration of words' meanings in the embedding space over time. For example, in the Internet Age, some words like "media", "meme", "tag", and "cloud" took on new meanings. Some slides are below.
Another interesting Thunder Talk was given by a student in Dashun Wang's group (I could not recall his name), titled Short-term Explore and Long-term Knowledge Absorption. The work studied the knowledge absorption effect in the review process. They discovered a striking divergence emerging after the review invitation. Notably, the people who declined the invitation show a systematically increasing trend in citing the authors' works. The findings suggest a potential "glimpse effect", indicating that even a cumulative advantage is associated with brief exposure to emerging research. The results highlight the potential value of peer review in knowledge dissemination. Even limited engagement, such as reading an invitation email without paper details, can enhance knowledge absorption. I like the angle, but the title of the presentation sounds more general than the specific topic. I also like the paper mentioned by the presenter titled Peer Review: Troubled From Start (Csiszar 2016 Nature). The datasets used in this study include SciSciNet, and a proprietary dataset containing a completed review record of one journal from a premier publisher of medical research from 2013 to 2023, including 65k submissions, and 733k review invitations. Some slides are below.
One presenter from the Univeristy of Indiana presented an interesting visualization tool called Heliosweb.io, which is a web-based library to visualize dynamic networks in real-time. Compared with many other graph visualization tools, Helios Web seems more scalable. The presenter said it can easily plot a graph containing millions of nodes using the web version and hundreds of millions of nodes using the offline version, assuming the computer has enough memory to hold the data.
A Thunder Talk by Dror Shvadron, Hansen Zhang, and Daniel P. Gross on Day II studied Who Pays for Scientific Training in the US. The authors used an LLM to extract funding agency mentions extracted from 1.2 million U.S. doctoral dissertations from 1950 to 2020, obtained from ProQuest. I could not recall the findings but I suspected how accurate the extraction performance of the LLM was and whether LLM just extracted funding agency mentions or agencies that really funded the research. Plus, in general a graduate student's research may not be only funded by the one mentioned in the thesis or dissertation. A lot of them are partially funded, but the amount of "partially" is hard to guess.
The contributed talks by Dr. Ian Hutchins coauthored with Dr. Chaoqun Ni and Salsabil Arabi from the iSchool at the University of Wisconsin-Madison showed a very interesting plot demonstrating that most highly cited papers do not appear in highly cited journals.
Michael (Fengyuan) Liu from New York University Abu Dhabi presented his research on how the conflict of interest policies (the length of the COI policy) impact the editor-author collaboration. They found that the change from none to 24 months in 2011 seemed to have a more salient impact than the change from 24 months to 48 months in 2014. The authors suggested a dynamic COI period depending on the team size. They found that the overall, effect of COI policies was limited, which was probably due to the suitability-integrity trade-off.
Posters
The conference features a total of 94 posters. Several poster pictures that I selected are shown below.
Friends
I met many friends at the conference, including Dr. Dashun Wang (Northwestern), Dr. Brian Uzzi (Northwestern), Dr. Hamed Alhorri (Northern Illinois) and his student Akhil Pandey Akella, Dr. Lingfei Wu (University of Pittsburgh), Dr. Yang Wang (Xi'an Jiaotong University), Dr. Shaurya Rohatgi (AllSci), Matt Chervenak (AllSci), Sai Koneru (Penn State), Tatiana Chakravorti (Penn State), Dr. Daniel Acuna (UC Boulder). I also talked to many people I never talked to before, including Dr. James Evans (University of Chicago), Dr. Santo Fortunato (University of Indiana), Yiling Lin (University of Pittsburg), and Claire Daviss (Stanford).
My Comments
Overall, the conference was very successful! Compared with previous conferences, this time is more inclusive, with more posters and contributed talks. The hackathon, which was in the 2nd ICSSI, did not happen this time. The 2025 ICSSI will be in Copenhagen, Denmark.
Science of Science is a thriving field. Recently, NSF awarded $20M to Dr. Dashun Wang (Co-PI Alicia Loffler, Jian Cao, Ben Jones, Yian Yin) and $20M to Dr. James Evans (Co-PI Ian Foster and Ufuk Akcigit) through the APTO program to predict the evolutional trend of future innovations (I am a bit suspicious of how accurate the future can be predicted, but you can't criticize NSF's vision).
However, in my opinion, from the past ICSSI conferences, I can envision three big problems in the evolution of this field.
One problem is that the funding for Science of Science mainly flows to elite universities. The participants of ICSSI also primarily come from well-known top universities although this time it is a little more inclusive. However, the participants of the research have silently diffused to mid-rank or even low-rank universities. Science of Science papers have appeared in many Computer Science venues, such as JCDL (natural language processing, information retrieval, digital libraries) and ICDAR (pattern recognition, document analysis). Big groups in top-ranked universities have plenty of resources, making them hard to beat in grant proposal competitions. How the funding agencies can better support mid-rank and low-rank universities to engage more scholars to increase diversity is still a problem.
The second problem is that this field lacks a dedicated journal. If you look at the publications of pioneers and current leaders of Science of Science, they all appear in highly reputable journals such as Nature, Science, PNAS. While the published papers definitely represent high-quality work, it is extremely hard to publish quality in-depth research that focuses on special topics not suitable for Nature-like journals. A generic alternative is PLOS One but compared with Computer and Information Sciences, Science of Science researchers have far less choices.
The third problem is research data and methods. Many frontier research in the Science of Science uses proprietary data, which leads to a big concern about reproducibility and replicability. Most methods used in Science of Science are still basic statistical tools, e.g., correlation analysis. AI tools have started to be used but many are out of date. For example, we can still see many papers using word2vec, which was invented over 10 years ago. Well, for social scientists, maybe 10 years is not that old, but for computer scientists, 10 years is already called out of date. The AI4SciSci workshop I will lead is aimed at introducing state-of-the-art AI methods in NLP and Computer Vision to solve Science of Science problems. The 1st AI4SciSci workshop happened virtually in ICDM 2023. I will propose the second workshop in AAAI 2025.
-- Jian Wu
Comments
Post a Comment