Acoustic sensor-based human–machine interaction (HMI) plays a crucial role in natural and efficient communication in intelligent robots. However, accurately identifying and tracking omnidirectional sound sources, especially in noisy environments still remains a notable challenge. Here, a self-powered triboelectric stereo acoustic sensor (SAS) with omnidirectional sound recognition and tracking capabilities by a 3D structure configuration is presented. The SAS incorporates a porous vibrating film with high electron affinity and low Young's modulus, resulting in high sensitivity (3172.9 mVpp Pa−1) and a wide frequency response range (100–20 000 Hz). By utilizing its omnidirectional sound recognition capability and adjustable resonant frequency feature, the SAS can precisely identify the desired audio signal with an average deep learning accuracy of 98%, even in noisy environments. Moreover, the SAS can simultaneously recognize multiple individuals in the auxiliary conference system and the driving commands under background music in self-driving vehicles, which marks a notable advance in voice-based HMI systems.