Program
Note that the times are in EDT.
Monday, October 18
Monday, October 18 8:50 – 9:00
Opening Remarks
Monday, October 18 9:00 – 10:15
P1: Hearing, Measures, Audio Events, and Source Separation; Demonstrations
Session chair: Fabio Antonacci
- P1.1 Adaptive Binaural Filtering for a Multiple-Talker Listening System Using Remote and On-Ear Microphones
- P1.2 DPLM: A Deep Perceptual Spatial-Audio Localization Metric
- P1.3 Assessing Segmental Impact for Objective Speech Quality Evaluation
- P1.4 Improved Intelligibility Prediction in the Modulation Domain
- P1.5 MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection With Domain Shifts Due to Changes in Operational and Environmental Conditions
- P1.6 Identifying Actions for Sound Event Classification
- P1.7 Point Cloud Audio Processing
- P1.8 Who Calls the Shots? Rethinking Few-Shot Learning for Audio
- P1.9 Sound Event Detection With Adaptive Frequency Selection
- P1.10 Separate but Together: Unsupervised Federated Learning for Speech Enhancement From Non-IID Data
- P1.11 Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-The-Wild Unsupervised Sound Separation
- P1.12 Convolutive Prediction for Reverberant Speech Separation
- P1.13 Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes
- P1-14[demo] Parametric resynthesis for dialog enhancement in post-production sound mixing for film and TV
Monday, October 18 10:15 – 11:00
Coffee Break
Monday, October 18 11:00 – 12:00
K1: Keynote talk
Monday, October 18 15:00 – 16:15
P2: Coding, Hearing, Measures, Music, and Source Separation
Session chair: Sharon Gannot
- P2.1 A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate
- P2.2 Adversarial Auto-Encoding for Packet Loss Concealment
- P2.3 Spatial Coding for Microphone Arrays Using IPNLMS-Based RTF Estimation
- P2.4 Excitation-Inhibition Cell Activity Patterns for Binaural Source Localisation
- P2.5 Speech Intelligibility of Mandarin- and German- Speaking Listeners in Challenging Conditions
- P2.6 Controlling the Remixing of Separated Dialogue With a Non-Intrusive Quality Estimate
- P2.7 SiDiQ: Computational Quality Assessment of Enhanced Speech Based on Auditory Figure-Ground Segregation, Similarity, and Disturbance
- P2.8 Objective Metrics to Evaluate Residual-Echo Suppression During Double-Talk
- P2.9 Prediction of Missing Frequency Response Functions Through Deep Image Prior
- P2.10 User-Guided One-Shot Deep Model Adaptation for Music Source Separation
- P2.11 VQCPC-GAN: Variable-Length Adversarial Audio Synthesis Using Vector-Quantized Contrastive Predictive Coding
- P2.12 Learning Multi-Pitch Estimation From Weakly Aligned Score-Audio Pairs Using a Multi-Label CTC Loss
- P2.13 Disentanglement Learning for Variational Autoencoders Applied to Audio-Visual Speech Enhancement
- P2.14 FasterIVA: Update Rules for Independent Vector Analysis Based on Negentropy and the Majorize-Minimize Principle
Monday, October 18 16:15 – 17:00
Coffee Break
Monday, October 18 20:00 – 21:00
K1 (repeat): Keynote talk
Monday, October 18 21:00 – 22:15
P1 (repeat): Hearing, Measures, Audio Events, and Source Separation; Demonstrations
Session chair: Michael Mandel
See above for list of papers.
Monday, October 18 22:15 – 23:00
Coffee Break
Tuesday, October 19
Tuesday, October 19 3:00 – 4:15
P2 (repeat): Coding, Hearing, Measures, Music, and Source Separation
Session chair: Antoine Liutkus
See above for list of papers.
Tuesday, October 19 4:15 – 5:00
Coffee Break
Tuesday, October 19 9:00 – 10:15
P3: Array Processing, Room Acoustics, Enhancement, and Audio Events; Demonstrations
Session chair: Scott Wisdom
- P3.1 Low Complexity Online Convolutional Beamforming
- P3.2 Superresolution Photoacoustic Tomography Using Random Speckle Illumination and Second Order Moments
- P3.3 Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions
- P3.4 MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points for Evaluating Sound Field Analysis and Synthesis Methods
- P3.5 Analysis of Frequency-Dependent Behavior of Room Reflections Using Spherical Microphone Measurements & Von Mises-Fisher Clustering
- P3.6 DF-Conformer: Integrated Architecture of Conv-TasNet and Conformer Using Linear Complexity Self-Attention for Speech Enhancement
- P3.7 HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Conditioned on Acoustic Features
- P3.8 Zero-Shot Personalized Speech Enhancement Through Speaker-Informed Model Selection
- P3.9 Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning With Knowledge Distillation
- P3.10 Towards Large Scale Ecoacoustic Monitoring With Small Amounts of Labeled Data
- P3.11 Anomalous Sound Detection Using Attentive Neural Processes
- P3.12 A Multi-Head Relevance Weighting Framework for Learning Raw Waveform Audio Representations
- P3.13 Cross-Domain Semi-Supervised Audio Event Classification Using Contrastive Regularization
- P3-14[demo] ESPnet-SE:Speech Enhancement and Separation toolkit
Tuesday, October 19 10:15 – 11:00
Coffee Break
Tuesday, October 19 11:00 – 12:00
K2: Keynote talk
Tuesday, October 19 15:00 – 16:15
P4: Array Processing, Room Acoustics, Spatial Audio, and Audio Events; Demonstrations
Session chair: Hakan Erdogan
- P4.1 Polynomial Matrix Eigenvalue Decomposition-Based Source Separation Using Informed Spherical Microphone Arrays
- P4.2 Low-Complexity Steered Response Power Mapping Based on Nyquist-Shannon Sampling
- P4.3 Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers
- P4.4 Spherical Harmonic Diagonal Unloading Beamforming With Ego-Noise Reduction for DOA Estimation From Autonomous Systems
- P4.5 Filtered Noise Shaping for Time Domain Room Impulse Response Estimation From Reverberant Speech
- P4.6 Blind Room Parameter Estimation Using Multiple Multichannel Speech Recordings
- P4.7 Spherical Harmonic Decomposition of a Sound Field Based on Microphones Around the Circumference of a Human Head
- P4.8 Ambient-Aware Sound Field Translation Using Optimal Spatial Filtering
- P4.9 Internal Time Delay Calibration of Rigid Spherical Microphone Arrays for Multi-Perspective 6DoF Audio Recordings
- P4.10 Crowdsourcing Strong Labels for Sound Event Detection
- P4.11 Self-Supervised Learning From Automatically Separated Sound Scenes
- P4.12 Adaptive Generalized Cross-Entropy Loss for Sound Event Classification With Noisy Labels
- P4-13[demo] Searching Audio Database using Natural Language Queries
Tuesday, October 19 16:15 – 17:00
Coffee Break
Tuesday, October 19 20:00 – 21:00
K2 (repeat): Keynote talk
Tuesday, October 19 21:00 – 22:15
P3 (repeat): Array Processing, Room Acoustics, Enhancement, and Audio Events; Demonstrations
Session chair: Minje Kim
See above for list of papers.
Tuesday, October 19 22:15 – 23:00
Coffee Break
Wednesday, October 20
Wednesday, October 20 3:00 – 4:15
P4 (repeat): Array Processing, Room Acoustics, Spatial Audio, and Audio Events; Demonstrations
Session chair: Ina Kodrasi
See above for list of papers.
Wednesday, October 20 4:15 – 5:00
Coffee Break
Wednesday, October 20 9:00 – 10:15
P5: Spatial Audio, ANC/Echo, Coding, and Music; Demonstrations
Sesson chair: Prasanga Samarasinghe
- P5.1 Kernel Learning for Sound Field Estimation With L1 and L2 Regularizations
- P5.2 Magnitude Modelling of Individualized HRTFs Using DNN Based Spherical Harmonic Analysis
- P5.3 2D Local Exterior Sound Field Reproduction Using an Addition Theorem Based on Circular Harmonic Expansion
- P5.4 2D Multizone Sound Field Synthesis With Interior-Exterior Ambisonics
- P5.5 Mean-Square-Error-Based Secondary Source Placement in Sound Field Synthesis With Prior Information on Desired Field
- P5.6 Spherical Array Based Drone Noise Measurements and Modelling for Drone Noise Reduction via Propeller Phase Control
- P5.7 Auto-DSP: Learning to Optimize Acoustic Echo Cancellers
- P5.8 Fast Convergent Method for Active Noise Control Over Spatial Region With Causal Constraint
- P5.9 Active Noise Control Over 3D Space With Remote Microphone Technique in the Wave Domain
- P5.10 End-To-End Zero-Shot Voice Conversion Using A DDSP Vocoder
- P5.11 On the Role of Lip Reflection/Transmission in the Relationship Between LPC and Waveguide Vocal Tract Models
- P5.12 HARP-Net: Hyper-Autoencoded Reconstruction Propagation for Scalable Neural Audio Coding
- P5.13 Periodic Analysis of Nonlinear Virtual Analog Models
- P5-14[demo] Binaural Reproduction From Multiple Microphone Arrays
Wednesday, October 20 10:15 – 11:00
Coffee Break
Wednesday, October 20 11:00 – 12:00
K3: Keynote talk
Wednesday, October 20 15:00 – 16:15
P6: Array Processing, Room Acoustics, and Spatial Audio; Demonstrations
Session chair: Justin Salamon
- P6.1 A Polynomial Eigenvalue Decomposition MUSIC Approach for Broadband Sound Source Localization
- P6.2 Joint Direction and Proximity Classification of Overlapping Sound Events From Binaural Audio
- P6.3 SALADnet: Self-Attentive Multisource Localization in the Ambisonics Domain
- P6.4 Low-Order Filter Approximation of Diffraction for Virtual Acoustics
- P6.5 Spatial Subtraction of Reflections From Room Impulse Responses Measured With a Spherical Microphone Array
- P6.6 Stochastic Reverberation Model With a Frequency Dependent Attenuation
- P6.7 A Universal Deep Room Acoustics Estimator
- P6.8 Spatial Filter Bank in the Spherical Harmonic Domain: Reconstruction and Application
- P6.9 Soundfield Reconstruction in Reverberant Rooms Based on Compressive Sensing and Image-Source Models of Early Reflections
- P6.10 Rendering of Source Spread for Arbitrary Playback Setups Based on Spatial Covariance Matching
- P6-11[demo] Wireless Distributed Processing Platform
- P6-12[demo] Estimation of Sampling-Rate Offset and Accumulating Time Drift
- P6-13[demo] Informed Acoustic Source Extraction based on Independent Vector Analysis
- P6-14[demo] Privacy-preserving Features for Audio Signal Classification in Acoustic Sensor Networks
- P6-15[demo] Signal Synchronization and Sound Event Detection
Wednesday, October 20 16:15 – 17:00
Coffee Break
Wednesday, October 20 17:00 – 17:30
Award & Closing Ceremony
Wednesday, October 20 20:00 – 21:00
K3 (repeat): Keynote talk
Wednesday, October 20 21:00 – 22:15
P5 (repeat): Spatial Audio, ANC/Echo, Coding, and Music; Demonstrations
Session chair: Zafar Rafii
See above for list of papers.
Wednesday, October 20 22:15 – 23:00
Coffee Break
Thursday, October 21
Thursday, October 21 3:00 – 4:15
P6 (repeat): Array Processing, Room Acoustics, and Spatial Audio; Demonstrations
Session chair: Jesper Rindom Jensen
See above for list of papers.
Thursday, October 21 4:15 – 5:00
Coffee Break
List of demonstrations
Session P1: Monday, October 18, 9:00-10:15am (EDT) & 9:00-10:15pm (EDT)
P1-14[demo]: Parametric resynthesis for dialog enhancement in post-production sound mixing for film and TV
Session P2: Monday, October 18, 3:00-4:15pm (EDT) & Tuesday, October 19, 3:00-4:15am (EDT)
No demonstration in this session.
Session P3: Tuesday, October 19, 9:00-10:15am (EDT) & 9:00-10:15pm (EDT)
P3-14[demo]: ESPnet-SE:Speech Enhancement and Separation toolkit
Session P4: Tuesday, October 19, 3:00-4:15pm (EDT) & Wednesday, October 20, 3:00-4:15am (EDT)
P4-13[demo]: Searching Audio Database using Natural Language Queries
Session P5: Wednesday, October 20, 9:00-10:15am (EDT) & 9:00-10:15pm (EDT)
P5-14[demo]: Binaural Reproduction From Multiple Microphone Arrays
Session P6: Wednesday, October 20, 3:00-4:15pm (EDT) & Thursday, October 21, 3:00-4:15am (EDT)
P6-11[demo]: Wireless Distributed Processing Platform
P6-12[demo]: Estimation of Sampling-Rate Offset and Accumulating Time Drift
P6-13[demo]: Informed Acoustic Source Extraction based on Independent Vector Analysis
P6-14[demo]: Privacy-preserving Features for Audio Signal Classification in Acoustic Sensor Networks
P6-15[demo]: Signal Synchronization and Sound Event Detection