I had a paper accepted to the Drawings and abstract Imagery: Representation and Analysis (DIRA) workshop, allowing me to attend the 17 ^th European Conference on Computer Vision 2022 (ECCV 2022) in Tel Aviv, Israel, from October 23 - 27. ECCV 2022 is a large conference with attendees from more than 76 countries. More than 3,200 people attended ECCV 2022 in person, and 1,800 more attended virtually. ECCV 2022 was my first Computer Vision conference and perhaps the largest academic conference I have attended to date.

ECCV 2022 is a premier conference for computer vision. The conference contains work from many corners of computer vision, from detecting and processing text in images to generating full images based on text prompts. ECCV 2022’s organizers came from a wide variety of universities and industry, including places like Harvard, Meta, Kyoto University, IBM Research, and USC. Amazon Science, Google Research, GM Research Laboratories, Meta, Bosch, Huawei, Apple, Samsung, Snapchat, Baidu, and so many more sponsors dotted the main convention floor looking for opportunities to collaborate and hire postdocs, researchers, and engineers.

Attendees of conferences like ACM/IEEE’s JCDL or ACM WebScience may be surprised by the structure of conferences like ECCV 2022. If ECCV accepts an author’s paper, this does not mean that the author is permitted to present the paper. The conference’s proceedings include all accepted papers. The conference expects all paper authors to create a poster (~25%) to go along with their paper. They promote a select few papers (~3%) as orals, meaning that those authors also got to present their papers in front of a live audience. ECCV 2022’s acceptance rate was 28%.

This post will cover a small portion of my ECCV 2022 experience. As usual, I do not fully reproduce the conference and invite readers to peruse the 39-volume proceedings and review the Twitter hashtag #ECCV2022 for more in-depth content.

ECCV 2022 Keynote speakers

Michael Kearns

Michael Kearns is a Professor at the University of Pennsylvania , the National Center Chair at the University of Pennsylvania, the founding director of the University of Pennsylvania’s Singh Program in Networked & Social Systems Engineering (NETS) , and the founding director of the Warren Center for Network and Data Sciences . He was awarded ACM Fellow in 2014 and is considered a leadin g researcher in computational learning theory and algorithmic game theory. He is currently an Amazon Scholar focusing on algorithmic fairness, privacy, machine learning, and related topics within Amazon Web Services.

In “The Science of Ethical Algorithm Design,” Kearns discussed issues with ensuring that algorithms are fair when applied to people. He noted that we consider fairness as a constrained optimization problem. The goal is to minimize training error for machine learning subject to fairness constraints. There is a tradeoff between accuracy and fairness – relaxing fairness enough reduces error, but we increase error as we increase fairness. Fortunately, for a small cost in accuracy, we can improve fairness. He covered a project that employed “bias bounties” that helped crowdsource better machine learning models. As the crowdsourcing participants analyzed the data and built upon each other’s work, the groups in the dataset became more narrowly defined, and the error rate also improved.

Barbara Tversky

Barbara Tversky is a professor emerita of psychology at Stanford University , specializing in cognitive psychology. She has been a fellow of the American Psychological Society , the Cognitive Science Society , and the Society of Experimental Psychologists . She was also elected to the American Academy of Arts in Sciences . She is a leading authority in the areas of visual-spatial reasoning and collaborative cognition, specifically focusing on language and communication and how humans process events, narratives, directions, and other cognitive processes. She has collaborated with many linguists, computer scientists, neuroscientists, philosophers, designers, and artists.

With “Thinking with the Body and the World,” Tversky described how humans create models of thought. She highlighted how we express ideas with our hands while reading and processing complex texts, even without others around, emphasizing a close correspondence between gesture and action. Our gestures can become more complex when we communicate with each other, especially when describing how machines work. She covered the importance of the concepts of graph theory and how networks permeate our thinking – “thought travels in networks.” She highlighted many visualizations from different cultures and time periods showing how we have visualized space and time when communicating events, locations, and dangers. Through her research, we now understand how some marks, like arrows, may be meaningful across languages and cultures, helping us communicate through diagrams. She demonstrated how more complex photographs might confuse readers while simpler diagrams may provide clarity. She emphasized how these diagrams touch every aspect of our existence, from health warnings to mass transit – “We haven’t just designed our world, we’ve diagrammed it.”

DIRA Workshop

I had a paper to present at the Drawings and abstract Imagery: Representation and Analysis (DIRA) workshop. While ECCV generally focuses on computer vision, DIRA focuses on the application of illustrations, drawings, technical diagrams, and charts. My work on the GoFigure project at Los Alamos fits into this space. Here I cover the keynotes and my own paper.

DIRA Keynote speaker - Anjali Adukia

Anjali Adukia is the director of the Messages, Identity, and Inclusion in Education Lab at the University of Chicago Harris School of Public Policy . She received the William T. Grant Foundation’s Scholar Award , a National Academy of Education/Spencer Foundation Postdoctoral Fellowship , and an Institute of Education Sciences grant. She is a faculty research fellow at the National Bureau of Economic Research , a fellow at the Center for Global Development , and a faculty affiliate of the University of Chicago Education Lab . She focuses on understanding factors that motivate and shape behavior, preferences, attitudes, and educational decision-making.

Through "How Computer Vision Can Help Us Learn about the World that Children See,” Adukia demonstrated work that applied computer vision to evaluate the media that children are exposed to, specifically children’s books . She highlighted the challenge of the representation of people in children’s media. Who is not shown in children’s books can affect their views of the potential of individuals who belong to certain groups while also affecting their subconscious defaults. She developed tools to measure representation systematically using images from award-winning books published over the last 100 years. These tools allowed her team to quantify the representation based on race, gender, and age over time. They found that mainstream books still tend to show lighter-skinned people, with children often shown as having lighter skin than adults. Women were often not shown with an active role in the story, and males were the most represented, especially white males. Thanks to their computer vision analysis, they could show how different races and genders were portrayed – e.g., white women often positively with family, black women likely associated with struggle, white men with power and business, and black men with sports. She highlighted how images are an often underused data source when conducting such analyses. She uses the results as a call to discuss better levels of representation and how, although her team applied computing in the analysis, the eventual solution will be a social one.

DIRA Keynote speaker - Yulia Gryaditskaya

Yulia Gryaditskaya is an assistant professor in artificial intelligence at the University of Surrey’s Centre for Vision, Speech and Signal Processing and the Surrey Institute for People-Centered AI. She is also the leader of the Computational Creativity and Design Lab (CCCDLab) and co-director of the SketchX group. She focuses on the application of sketching and how human sketch data helps us understand how human visual systems operate and thus helps us improve computer vision.

Gryaditskaya’s keynote was titled “Do you speak Sketch?” She talked about the methods humans use when sketching. She then compared these results with how successful machines are when they attempt to sketch the same items. She highlighted work showing how text descriptions can help with image retrieval, but sketches give us a better idea of a user’s information needs. Her team has developed a tool, Pixelor , a “competitive drawing agent,” that can effectively play Pictionary. Given a visual concept, Pixelor can often achieve a recognizable picture as fast as a human. It is trained on existing sketches but focuses on stroke order, focusing on generating the most recognizable and distinguishable strokes first. She covered how well ML algorithms produce text from sketches, produce 3D images from 2D sketches, and help us improve products in industries like clothing design.

DIRA Keynote speaker - Niloy J. Mitra

Niloy J. Mitra is a professor of Geometry Processing in the Department of Computer Science at University College London (UCL). He is a recipient of the ACM Siggraph Significant New Researcher Award, the BCS Roger Needham award, the Eurographics Outstanding Technical Contributions Award, and the ERC Starting Grant on SmartGeometry. He leads the Smart Geometry Processing Group at UCL.

Mitra’s talk, titled “Generative Programs for Vector Graphics,” covered the work of the Image2Vec project (Im2Vec). Many generative algorithms exist for raster images; Im2Vec is a neural network that generates complex vector graphics. Using vector graphics has many benefits, such as establishing correspondence and capturing consistencies between different fonts and handwriting. Mitra also covered the CAD2Sketch project that applies CAD sequences to convey a shape as a sketch with varying levels of clutter. CAD2Sketch can synthesize construction lines from CAD sequences. Based on studying the work of professional designers, CAD2Sketch balances the level of clutter, or detail, to ensure that the resulting sketch is still legible for users.

DIRA Keynote speaker - Changjian Li

Changjian Li is an assistant professor in the School of Informatics at the University of Edinburgh. He was a student recipient of the Google Excellent Student Scholarship, the President’s Scholarship of Shandong University, and the Y S & Christabel Lung Postgraduate Scholarship. Li has worked for Microsoft Research Asia and collaborated with Niloy Mitra as a member of the Smart Geometry Processing Group at UCL.

Where Mitra had presented work on translating CAD drawings to sketches, Li’s talk, “High-quality CAD modeling with Rough Sketching,” went the other way. Ideation sketching is quick and approximate, whereas CAD modeling is precise and editable. Design iteration in industrial design can be cumbersome, expensive, and time-consuming. Through Li’s Sketch2CAD, users can generate CAD models from these sketches and save time. Sketch2CAD is based on observing the work of industrial designers, providing “a common parametrization of operations” and “a deep learning architecture to recognize these operations followed by parameter fitting.”

My work

Thanks to Anjali Adukia for taking this photo!

I was fortunate enough to take the same stage as these luminaries to present my work “ Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine.” Much computer vision research has focused on natural images, like photographs, that capture the natural world. Less work focuses on abstract images – images created by humans to describe a process or model, like diagrams, sketches, drawings, and illustrations. Anecdotally, Diane Oyen and I noticed that Google does a better job returning results for natural images than abstract ones. In this paper, we provide evidence that this is the case for Google and Yandex.

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine from Shawn M. Jones

Thanks to Anjali Adukia for taking this photo!

To quantify this assertion, we acquired 199 natural images and 200 abstract images from Wikimedia Commons. We did so because Wikimedia Commons is well-indexed by search engines. We then submitted each image, in turn, to the reverse image search engines of Baidu, Bing, Google, and Yandex. We programmatically downloaded the images from the search results. We then applied two hashing functions, pHash and VisHash, to determine which images in the thousands of search results matched the image that we had submitted.

Baidu and Bing did not return results for as many images as Google and Yandex, so we did not believe that we had a sufficient sample to establish a concreted pattern with those search engines. Google and Yandex returned results for almost all images, and we noticed a distinct pattern in retrievability, Precision@k, and MRR favoring better results for natural images over abstract ones.

We will repeat this experiment in upcoming work with more images and a more balanced dataset for Bing and Baidu. We will also see how each search engine responds to images that are transformed through cropping, shading, or other means.

Notable Papers at ECCV/DIRA

I attended as many oral sessions as I could at ECCV 2022. I will not reproduce all of them here, but I want to highlight some exciting contributions.

Zhen et al.’s “ On the Versatile Uses of Partial Distance Correlation in Deep Learning ” discussed a solution to the important problem of comparing the functional behavior of neural network models. Their work helps us understand what neural network models are learning or failing to learn. They identify a statistical solution to this problem that has many applications for model development and evaluation. The authors won best paper for this contribution.

Self-Supervised Learning’s (SSL) goal is to have an ML algorithm analyze data from the Internet or by agents exploring an environment and then have it learn based on what it encountered. SSL is beneficial in environments with no ready-made datasets for training a model. Purushwalkam et al.’s “ The Challenges of Continuous Self-Supervised Learning ” reports results from experiments with SSL approaches. Purushwalkam et al. identify problems with existing approaches and offer some solutions.

In “ The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning , ” Hessel et al. discuss the challenge of training neural networks to engage in abductive reasoning rather than deductive or inductive reasoning. Deductive reasoning draws inferences from premises. Inductive reasoning derives a principle from a set of observations. Abductive reasoning derives the most likely conclusion from a set of observations. Hessel's dataset provides an annotated corpus of 103K images to help train ML models not to identify objects in a scene but instead infer additional information not present in the immediate image. For example, if a human sees a speed limit sign of “20 mph,” then they might infer that they are in a residential area. How do we train the machine to derive a similar conclusion?

One of the more exciting capabilities of current machine learning models is the ability to generate images from text prompts. In “ Make-a-Scene: Scene-Based Text-to-Image Generation with Human Priors ,” Gafni et al. provide an improved text-to-image generation technique. Their tool , “Make-A-Scene,” can produce higher fidelity images at a resolution of 512x512, with better visual quality than competitors like Stable Diffusion and DALL-E . They even went so far as to create a children’s book using the images from Make-A-Scene.

Closing Thoughts

Tel Aviv-Yafo is a beautiful city. Growing up in Virginia Beach, I was familiar with the beach environment, but the Mediterranean is quite different from the Atlantic. Tel Aviv is also a cat-friendly city, because cats are recognized as essential to keeping the pest population in check. We saw cats running about as we traveled to our destinations.

ECCV 2022 is a leading computer vision conference. I was thankful to attend. As I try to extend my knowledge beyond my existing work, I was awed by the work presented. I simultaneously felt like I had so much to learn, yet could see areas to contribute. I thank all the attendees I met for their kindness and discussions on potential opportunities to collaborate.

--Shawn M. Jones

This blog post has been reviewed and assigned LA-UR-22-33130 by Los Alamos National Laboratory.

Search This Blog

Web Science and Digital Libraries Research Group

2022-12-23: ECCV 2022 and DIRA 2022 Trip Report