We successfully conducted a live demonstration of our Audio-Visual Scene Understanding for Augmented Reality Applications (AV-SUARA) system at the IPSJ Otogaku Symposium 2025, held at Waseda University.

Dealing with real-world environments that are noisy and reverberant remains a challenging task. Although most testers were impressed with our system, including its head-gaze-controlled speech enhancement and recognition capabilities, we need to improve the quality of enhancement, the accuracy of enhancement, and the computational latency.