publications
ACM CHI and ACM UIST are top conferences for technical HCI work.
2023
- Social Wormholes: Exploring Preferences and Opportunities for Distributed and Physically-Grounded Social ConnectionsXingyu "Bruce" Liu*, Joanne Leong*, Yuanyang Teng*, Hanseul Jun, Sven Kratz, Yu Jiang Tham, Andrés Monroy-Hernández, Brian A. Smith, and Rajan VaishIn Proceedings of the 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing 2023
Ubiquitous computing encapsulates the idea for technology to be interwoven into the fabric of everyday life. As computing blends into everyday physical artifacts, powerful opportunities open up for social connection. Prior connected media objects span a broad spectrum of design combinations. Such diversity suggests that people have varying needs and preferences for staying connected to one another. However, since these designs have largely been studied in isolation, we do not have a holistic understanding around how people would configure and behave within a ubiquitous social ecosystem of physically-grounded artifacts. In this paper, we create a technology probe called Social Wormholes, that lets people configure their own home ecosystem of connected artifacts. Through a field study with 24 participants, we report on patterns of behaviors that emerged naturally in the context of their daily lives and shine a light on how ubiquitous computing could be leveraged for social computing.
- Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsXingyu "Bruce" Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Alex Olwal, Peggy Chi, Xiang "Anthony" Chen, and Ruofei DuIn Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems 2023
Computer-mediated platforms are increasingly facilitating verbal communication, and capabilities such as live captioning and noise cancellation enable people to understand each other better. We envision that visual augmentations that leverage semantics in the spoken language could also be helpful to illustrate complex or unfamiliar concepts. To advance our understanding of the interest in such capabilities, we conducted formative research through remote interviews (N=10) and crowdsourced a dataset of 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that we integrated into a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We report on our findings from a lab study (N=26) and a two-week deployment study (N=10), which demonstrate how Visual Captions has the potential to help people improve their communication through visual augmentation in various scenarios.
@inproceedings{liu2023visualcaptions, author = {Liu, Xingyu "Bruce" and Kirilyuk, Vladimir and Yuan, Xiuxiu and Olwal, Alex and Chi, Peggy and Chen, Xiang "Anthony" and Du, Ruofei}, title = {Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals}, year = {2023}, isbn = {}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {}, doi = {}, booktitle = {Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems}, articleno = {}, numpages = {}, keywords = {Computer Mediated Communication ; Artifact or System ; Dataset ; Empirical study that tells us about how people use a system}, location = {Hamburg, Germany}, series = {CHI '23}, video = {https://youtu.be/sL_YeHtQt44}, }
- Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications Through Visual ProgrammingRuofei Du, Na Li, Jing Jin, Michelle Carney, Scott Miles, Maria Kleiner, Xiuxiu Yuan, Yinda Zhang, Anuva Kulkarni, Xingyu "Bruce" Liu, Sergio Escolano, Abhishek Kar, Alex Olwal, Ping Yu, Ram Iyengar, and Adarsh KowdleIn Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems 2023
In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for design and experimentation. To better understand these challenges, we conducted a formative study with seven ML practitioners to gather insights about common ML evaluation workflows. This study helped us derive six design goals, which informed Rapsai, a visual programming platform for rapid and iterative development of end-to-end ML-based multimedia applications. Rapsai features a node-graph editor to facilitate interactive characterization and visualization of ML model performance. Rapsai streamlines end-to-end prototyping with interactive data augmentation and model comparison capabilities in its no-coding environment. Our evaluation of Rapsai in four real-world case studies (N=15) suggests that practitioners can accelerate their workflow, make more informed decisions, analyze strengths and weaknesses, and holistically evaluate model behavior with real-world input.
@inproceedings{Du2023Rapsai, title = {{Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications Through Visual Programming}}, author = {Du, Ruofei and Li, Na and Jin, Jing and Carney, Michelle and Miles, Scott and Kleiner, Maria and Yuan, Xiuxiu and Zhang, Yinda and Kulkarni, Anuva and Liu, Xingyu "Bruce" and Escolano, Sergio and Kar, Abhishek and Olwal, Alex and Yu, Ping and Iyengar, Ram and Kowdle, Adarsh}, booktitle = {Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems}, year = {2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, series = {CHI '23}, award = {Honorable Mention}, video = {https://youtu.be/mQ5mvAbZYvc}, }
- Modeling and Improving Text Stability in Live CaptionsXingyu "Bruce" Liu, Jun Zhang, Leonardo Ferrer, Susan Xu, Vikas Bahirwani, Boris Smus, Alex Olwal, and Ruofei DuIn Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems 2023
In recent years, live captions have gained significant popularity through its availability in remote video conferences, mobile applications, and the web. Unlike preprocessed subtitles, live captions require real-time responsiveness by showing interim speech-to-text results. As the prediction confidence changes, the captions may update, leading to visual instability that interferes with the user’s viewing experience. In this paper, we characterize the stability of live captions by proposing a vision-based flickering metric using luminance contrast and Discrete Fourier Transform. Additionally, we assess the effect of unstable captions on the viewer through task load index surveys. Our analysis reveals significant correlations between the viewer’s experience and our proposed quantitative metric. To enhance the stability of live captions without compromising responsiveness, we propose the use of tokenized alignment, word updates with semantic similarity, and smooth animation. Results from a crowdsourced study (N=123), comparing four strategies, indicate that our stabilization algorithms lead to a significant reduction in viewer distraction and fatigue, while increasing viewers’ reading comfort.
@inproceedings{Liu2023Modeling, title = {{Modeling and Improving Text Stability in Live Captions}}, author = {Liu, Xingyu "Bruce" and Zhang, Jun and Ferrer, Leonardo and Xu, Susan and Bahirwani, Vikas and Smus, Boris and Olwal, Alex and Du, Ruofei}, booktitle = {Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems}, year = {2023}, publisher = {Association for Computing Machinery}, series = {CHI EA '23}, address = {New York, NY, USA}, doi = {10.1145/3544549.3585609}, video = {https://youtu.be/Indi_RwODS8}, }
2022
- CrossA11y: Identifying Video Accessibility Issues via Cross-Modal GroundingIn Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology 2022
Authors make their videos visually accessible by adding audio descriptions (AD), and auditorily accessible by adding closed captions (CC). However, creating AD and CC is challenging and tedious, especially for non-professional describers and captioners, due to the difficulty of identifying accessibility problems in videos. A video author will have to watch the video through and manually check for inaccessible information frame-by-frame, for both visual and auditory modalities. In this paper, we present CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos. Using cross-modal grounding analysis, CrossA11y automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries. CrossA11y then displays these segments and surfaces visual and audio accessibility issues in a unified interface, making it intuitive to locate, review, script AD/CC in-place, and preview the described and captioned video immediately. We demonstrate the effectiveness of CrossA11y through a lab study with 11 participants, comparing to existing baseline.
@inproceedings{liu2022crossa11y, author = {Liu, Xingyu "Bruce" and Wang, Ruolin and Li, Dingzeyu and Chen, Xiang "Anthony" and Pavel, Amy}, title = {CrossA11y: Identifying Video Accessibility Issues via Cross-Modal Grounding}, year = {2022}, isbn = {9781450393201}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, doi = {10.1145/3526113.3545703}, url = {https://doi.org/10.1145/3526113.3545703}, booktitle = {Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology}, articleno = {43}, numpages = {14}, keywords = {audio description, video, accessibility, closed caption}, location = {Bend, OR, USA}, series = {UIST '22}, award = {Best Paper Award}, video = {https://youtu.be/HDqjnHOZ7J8}, }
2021
- What Makes Videos Accessible to Blind and Visually Impaired People?In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems 2021
User-generated videos are an increasingly important source of information online, yet most online videos are inaccessible to blind and visually impaired (BVI) people. To find videos that are accessible, or understandable without additional description of the visual content, BVI people in our formative studies reported that they used a time-consuming trial-and-error approach: clicking on a video, watching a portion, leaving the video, and repeating the process. BVI people also reported video accessibility heuristics that characterize accessible and inaccessible videos. We instantiate 7 of the identified heuristics (2 audio-related, 2 video-related, and 3 audio-visual) as automated metrics to assess video accessibility. We collected a dataset of accessibility ratings of videos by BVI people and found that our automatic video accessibility metrics correlated with the accessibility ratings (Adjusted R2 = 0.642). We augmented a video search interface with our video accessibility metrics and predictions. BVI people using our augmented video search interface selected an accessible video more efficiently than when using the original search interface. By integrating video accessibility metrics, video hosting platforms could help people surface accessible videos and encourage content creators to author more accessible products, improving video accessibility for all.
@inproceedings{liu2021what, author = {Liu, Xingyu and Carrington, Patrick and Chen, Xiang "Anthony" and Pavel, Amy}, title = {What Makes Videos Accessible to Blind and Visually Impaired People?}, year = {2021}, isbn = {9781450380966}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3411764.3445233}, doi = {10.1145/3411764.3445233}, booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems}, articleno = {272}, numpages = {14}, keywords = {visual impairments, blind, online videos, accessibility}, location = {Yokohama, Japan}, series = {CHI '21}, video = {https://youtu.be/n2enrJJZdTs}, }
2019
- Making Memes AccessibleIn Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility 2019
Images on social media platforms are inaccessible to people with vision impairments due to a lack of descriptions that can be read by screen readers. Providing accurate alternative text for all visual content on social media is not yet feasible, but certain subsets of images, such as internet memes, offer affordances for automatic or semi-automatic generation of alternative text. We present two methods for making memes accessible semi-automatically through (1) the generation of rich alternative text descriptions and (2) the creation of audio macro memes. Meme authors create alternative text templates or audio meme templates, and insert placeholders instead of the meme text. When a meme with the same image is encountered again, it is automatically recognized from a database of meme templates. Text is then extracted and either inserted into the alternative text template or rendered in the audio template using text-to-speech. In our evaluation of meme formats with 10 Twitter users with vision impairments, we found that most users preferred alternative text memes because the description of the visual content conveys the emotional tone of the character. As the preexisting templates can be automatically matched to memes using the same visual image, this combined approach can make a large subset of images on the web accessible, while preserving the emotion and tone inherent in the image memes.
@inproceedings{gleason2019making, author = {Gleason, Cole and Pavel, Amy and Liu, Xingyu and Carrington, Patrick and Chilton, Lydia B. and Bigham, Jeffrey P.}, title = {Making Memes Accessible}, year = {2019}, isbn = {9781450366762}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3308561.3353792}, doi = {10.1145/3308561.3353792}, booktitle = {Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility}, pages = {367–376}, numpages = {10}, keywords = {blind, social media, audio, image description, alternative text, meme, low vision}, location = {Pittsburgh, PA, USA}, series = {ASSETS '19}, }