publications
ACM CHI and ACM UIST are top conferences for technical HCI work.
2023
- Experiencing Visual Captions: Augmented Communication with Real-Time Visuals Using Large Language ModelsXingyu Bruce Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Peggy Chi, Alex Olwal, Xiang Anthony Chen, and Ruofei DuIn Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology 2023
@inproceedings{liu2023vcdemo, author = {Liu, Xingyu Bruce and Kirilyuk, Vladimir and Yuan, Xiuxiu and Chi, Peggy and Olwal, Alex and Chen, Xiang Anthony and Du, Ruofei}, title = {Experiencing Visual Captions: Augmented Communication with Real-Time Visuals Using Large Language Models}, year = {2023}, isbn = {9798400700965}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3586182.3615978}, doi = {10.1145/3586182.3615978}, booktitle = {Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology}, articleno = {85}, numpages = {4}, keywords = {dataset, text-to-visual, augmented reality, augmented communication, online meeting, collaborative work, video-mediated communication, large language models, AI agent}, location = {San Francisco, CA, USA}, series = {UIST '23 Adjunct}, }
- Social Wormholes: Exploring Preferences and Opportunities for Distributed and Physically-Grounded Social ConnectionsXingyu Bruce Liu*, Joanne Leong*, Yuanyang Teng*, Hanseul Jun, Sven Kratz, Yu Jiang Tham, Andrés Monroy-Hernández, Brian A. Smith, and Rajan VaishProc. ACM Hum.-Comput. Interact. Oct 2023
@article{liu2023social, author = {Liu*, Xingyu Bruce and Leong*, Joanne and Teng*, Yuanyang and Jun, Hanseul and Kratz, Sven and Tham, Yu Jiang and Monroy-Hern\'{a}ndez, Andr\'{e}s and Smith, Brian A. and Vaish, Rajan}, title = {Social Wormholes: Exploring Preferences and Opportunities for Distributed and Physically-Grounded Social Connections}, year = {2023}, issue_date = {October 2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {7}, number = {CSCW2}, url = {https://doi.org/10.1145/3610208}, doi = {10.1145/3610208}, journal = {Proc. ACM Hum.-Comput. Interact.}, month = oct, articleno = {359}, numpages = {29}, keywords = {smart glasses, augmented reality, social connection, ubiquitous computing}, }
- Visual Captions: Augmenting Verbal Communication with On-the-fly VisualsXingyu "Bruce" Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Alex Olwal, Peggy Chi, Xiang Anthony Chen, and Ruofei DuIn Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems Oct 2023
Computer-mediated platforms are increasingly facilitating verbal communication, and capabilities such as live captioning and noise cancellation enable people to understand each other better. We envision that visual augmentations that leverage semantics in the spoken language could also be helpful to illustrate complex or unfamiliar concepts. To advance our understanding of the interest in such capabilities, we conducted formative research through remote interviews (N=10) and crowdsourced a dataset of 1500 sentence-visual pairs across a wide range of contexts. These insights informed Visual Captions, a real-time system that we integrated into a videoconferencing platform to enrich verbal communication. Visual Captions leverages a fine-tuned large language model to proactively suggest relevant visuals in open-vocabulary conversations. We report on our findings from a lab study (N=26) and a two-week deployment study (N=10), which demonstrate how Visual Captions has the potential to help people improve their communication through visual augmentation in various scenarios.
@inproceedings{liu2023visualcaptions, author = {Liu, Xingyu "Bruce" and Kirilyuk, Vladimir and Yuan, Xiuxiu and Olwal, Alex and Chi, Peggy and Chen, Xiang Anthony and Du, Ruofei}, title = {Visual Captions: Augmenting Verbal Communication with On-the-fly Visuals}, year = {2023}, isbn = {}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {}, doi = {}, booktitle = {Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems}, articleno = {}, numpages = {}, keywords = {Computer Mediated Communication ; Artifact or System ; Dataset ; Empirical study that tells us about how people use a system}, location = {Hamburg, Germany}, series = {CHI '23}, video = {https://youtu.be/sL_YeHtQt44}, }
- Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications Through Visual ProgrammingRuofei Du, Na Li, Jing Jin, Michelle Carney, Scott Miles, Maria Kleiner, Xiuxiu Yuan, Yinda Zhang, Anuva Kulkarni, Xingyu "Bruce" Liu, Sergio Escolano, Abhishek Kar, Alex Olwal, Ping Yu, Ram Iyengar, and Adarsh KowdleIn Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems Oct 2023
In recent years, there has been a proliferation of multimedia applications that leverage machine learning (ML) for interactive experiences. Prototyping ML-based applications is, however, still challenging, given complex workflows that are not ideal for design and experimentation. To better understand these challenges, we conducted a formative study with seven ML practitioners to gather insights about common ML evaluation workflows. This study helped us derive six design goals, which informed Rapsai, a visual programming platform for rapid and iterative development of end-to-end ML-based multimedia applications. Rapsai features a node-graph editor to facilitate interactive characterization and visualization of ML model performance. Rapsai streamlines end-to-end prototyping with interactive data augmentation and model comparison capabilities in its no-coding environment. Our evaluation of Rapsai in four real-world case studies (N=15) suggests that practitioners can accelerate their workflow, make more informed decisions, analyze strengths and weaknesses, and holistically evaluate model behavior with real-world input.
@inproceedings{Du2023Rapsai, title = {{Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications Through Visual Programming}}, author = {Du, Ruofei and Li, Na and Jin, Jing and Carney, Michelle and Miles, Scott and Kleiner, Maria and Yuan, Xiuxiu and Zhang, Yinda and Kulkarni, Anuva and Liu, Xingyu "Bruce" and Escolano, Sergio and Kar, Abhishek and Olwal, Alex and Yu, Ping and Iyengar, Ram and Kowdle, Adarsh}, booktitle = {Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems}, year = {2023}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, series = {CHI '23}, award = {Honorable Mention}, video = {https://youtu.be/mQ5mvAbZYvc}, }
- Modeling and Improving Text Stability in Live CaptionsXingyu "Bruce" Liu, Jun Zhang, Leonardo Ferrer, Susan Xu, Vikas Bahirwani, Boris Smus, Alex Olwal, and Ruofei DuIn Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems Oct 2023
In recent years, live captions have gained significant popularity through its availability in remote video conferences, mobile applications, and the web. Unlike preprocessed subtitles, live captions require real-time responsiveness by showing interim speech-to-text results. As the prediction confidence changes, the captions may update, leading to visual instability that interferes with the user’s viewing experience. In this paper, we characterize the stability of live captions by proposing a vision-based flickering metric using luminance contrast and Discrete Fourier Transform. Additionally, we assess the effect of unstable captions on the viewer through task load index surveys. Our analysis reveals significant correlations between the viewer’s experience and our proposed quantitative metric. To enhance the stability of live captions without compromising responsiveness, we propose the use of tokenized alignment, word updates with semantic similarity, and smooth animation. Results from a crowdsourced study (N=123), comparing four strategies, indicate that our stabilization algorithms lead to a significant reduction in viewer distraction and fatigue, while increasing viewers’ reading comfort.
@inproceedings{Liu2023Modeling, title = {{Modeling and Improving Text Stability in Live Captions}}, author = {Liu, Xingyu "Bruce" and Zhang, Jun and Ferrer, Leonardo and Xu, Susan and Bahirwani, Vikas and Smus, Boris and Olwal, Alex and Du, Ruofei}, booktitle = {Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems}, year = {2023}, publisher = {Association for Computing Machinery}, series = {CHI EA '23}, address = {New York, NY, USA}, doi = {10.1145/3544549.3585609}, video = {https://youtu.be/Indi_RwODS8}, }
2022
- CrossA11y: Identifying Video Accessibility Issues via Cross-Modal GroundingIn Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology Oct 2022
Authors make their videos visually accessible by adding audio descriptions (AD), and auditorily accessible by adding closed captions (CC). However, creating AD and CC is challenging and tedious, especially for non-professional describers and captioners, due to the difficulty of identifying accessibility problems in videos. A video author will have to watch the video through and manually check for inaccessible information frame-by-frame, for both visual and auditory modalities. In this paper, we present CrossA11y, a system that helps authors efficiently detect and address visual and auditory accessibility issues in videos. Using cross-modal grounding analysis, CrossA11y automatically measures accessibility of visual and audio segments in a video by checking for modality asymmetries. CrossA11y then displays these segments and surfaces visual and audio accessibility issues in a unified interface, making it intuitive to locate, review, script AD/CC in-place, and preview the described and captioned video immediately. We demonstrate the effectiveness of CrossA11y through a lab study with 11 participants, comparing to existing baseline.
@inproceedings{liu2022crossa11y, author = {Liu, Xingyu "Bruce" and Wang, Ruolin and Li, Dingzeyu and Chen, Xiang Anthony and Pavel, Amy}, title = {CrossA11y: Identifying Video Accessibility Issues via Cross-Modal Grounding}, year = {2022}, isbn = {9781450393201}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, doi = {10.1145/3526113.3545703}, url = {https://doi.org/10.1145/3526113.3545703}, booktitle = {Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology}, articleno = {43}, numpages = {14}, keywords = {audio description, video, accessibility, closed caption}, location = {Bend, OR, USA}, series = {UIST '22}, award = {Best Paper Award}, video = {https://youtu.be/HDqjnHOZ7J8}, }
2021
- What Makes Videos Accessible to Blind and Visually Impaired People?In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems Oct 2021
User-generated videos are an increasingly important source of information online, yet most online videos are inaccessible to blind and visually impaired (BVI) people. To find videos that are accessible, or understandable without additional description of the visual content, BVI people in our formative studies reported that they used a time-consuming trial-and-error approach: clicking on a video, watching a portion, leaving the video, and repeating the process. BVI people also reported video accessibility heuristics that characterize accessible and inaccessible videos. We instantiate 7 of the identified heuristics (2 audio-related, 2 video-related, and 3 audio-visual) as automated metrics to assess video accessibility. We collected a dataset of accessibility ratings of videos by BVI people and found that our automatic video accessibility metrics correlated with the accessibility ratings (Adjusted R2 = 0.642). We augmented a video search interface with our video accessibility metrics and predictions. BVI people using our augmented video search interface selected an accessible video more efficiently than when using the original search interface. By integrating video accessibility metrics, video hosting platforms could help people surface accessible videos and encourage content creators to author more accessible products, improving video accessibility for all.
@inproceedings{liu2021what, author = {Liu, Xingyu and Carrington, Patrick and Chen, Xiang Anthony and Pavel, Amy}, title = {What Makes Videos Accessible to Blind and Visually Impaired People?}, year = {2021}, isbn = {9781450380966}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3411764.3445233}, doi = {10.1145/3411764.3445233}, booktitle = {Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems}, articleno = {272}, numpages = {14}, keywords = {visual impairments, blind, online videos, accessibility}, location = {Yokohama, Japan}, series = {CHI '21}, video = {https://youtu.be/n2enrJJZdTs}, }
2019
- Making Memes AccessibleIn Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility Oct 2019
Images on social media platforms are inaccessible to people with vision impairments due to a lack of descriptions that can be read by screen readers. Providing accurate alternative text for all visual content on social media is not yet feasible, but certain subsets of images, such as internet memes, offer affordances for automatic or semi-automatic generation of alternative text. We present two methods for making memes accessible semi-automatically through (1) the generation of rich alternative text descriptions and (2) the creation of audio macro memes. Meme authors create alternative text templates or audio meme templates, and insert placeholders instead of the meme text. When a meme with the same image is encountered again, it is automatically recognized from a database of meme templates. Text is then extracted and either inserted into the alternative text template or rendered in the audio template using text-to-speech. In our evaluation of meme formats with 10 Twitter users with vision impairments, we found that most users preferred alternative text memes because the description of the visual content conveys the emotional tone of the character. As the preexisting templates can be automatically matched to memes using the same visual image, this combined approach can make a large subset of images on the web accessible, while preserving the emotion and tone inherent in the image memes.
@inproceedings{gleason2019making, author = {Gleason, Cole and Pavel, Amy and Liu, Xingyu and Carrington, Patrick and Chilton, Lydia B. and Bigham, Jeffrey P.}, title = {Making Memes Accessible}, year = {2019}, isbn = {9781450366762}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3308561.3353792}, doi = {10.1145/3308561.3353792}, booktitle = {Proceedings of the 21st International ACM SIGACCESS Conference on Computers and Accessibility}, pages = {367–376}, numpages = {10}, keywords = {blind, social media, audio, image description, alternative text, meme, low vision}, location = {Pittsburgh, PA, USA}, series = {ASSETS '19}, }