Elsevier

Journal of Voice

Volume 19, Issue 1, March 2005, Pages 15-28
Journal of Voice

Original Article
Adverse Effects of Environmental Noise on Acoustic Voice Quality Measurements

https://doi.org/10.1016/j.jvoice.2004.07.003Get rights and content

Summary

An accurate analysis of voice quality is imperative when using acoustic measurements to diagnose vocal pathologies. It is known that noise has a significant effect on the reliability and validity of acoustic voice measurements, but the precise relationship has not been established. The purpose of this study was to investigate the influence of noise on the accuracy, reliability, and validity of acoustic voice quality measurements while balancing for gender, age, intersubject and intrasubject variability, microphones, computer hardware, analysis software, and type of noise. Level of noise was precisely controlled. The specific focus of interest was to determine the critical levels of noise that can invalidate voice quality measurements and to generate practical recommendations. Results suggest that the recommended, acceptable, and unacceptable levels of noise in the acoustic environment are above 42 dB, above 30 dB, and below 30 dB signal-to-noise ratio, respectively.

Introduction

Computer-based acoustic voice assessment techniques have been shown to be a useful and convenient tool to use when assessing voice quality.1, 2, 3, 4 In these techniques, an accurate analysis of voice quality is imperative to aid in the diagnosis and treatment of vocal pathologies. The reliability and validity of acoustic voice measurements are influenced by factors that affect the quality of recording and individual factors such as gender, age, intrasubject and intersubject variability, signal type, microphone type, environmental noise, computer hardware, and analysis software.

It is widely accepted that male and female voices are fundamentally different as a result of basic physiological properties.1, 2 Age has also been shown to affect voice quality.5 Moreover, variation can be found within one person across separate voice productions. A technique used to control for intrasubject variability is to obtain the mean voice quality measurement from multiple tokens. A study by Scherer et al6 investigated the number of tokens needed to acquire a representative perturbation value. For stable voices, at least six tokens were recommended and, for voices with normal to high levels of instability, at least 15 tokens were recommended. The National Center for Voice and Speech (NCVS) recommends that 10 samples are needed to obtain reliable perturbation measures.4 Common procedures required to account for intersubject variability in establishing normative values are an adequate number of participants and the use of the means and standard deviations, or normative thresholds, of their acoustic voice parameters.

Most normal and pathological voice signals are quasi-periodic. However, particularly in cases of vocal pathologies, the characteristics of these signals become nonstationary. NCVS recommends that acoustic voice signals be classified into three types4: Types 1, 2, and 3. Type 1 signals are quasi-periodic, and Titze and Liang7 found that perturbation values less than 5% are reliable. Type 2 signals contain intermittency, strong subharmonics, or modulations, which can all affect accuracy of acoustic analysis. Voice assessment for Type 2 signals is best accomplished based on the entire visual display rather than on a single measure. Type 3 signals are aperiodic; therefore, acoustic voice assessment is deemed ineffective. Type 3 signals are best assessed by perceptual ratings.

A study by Titze and Winholtz8 demonstrated that the type of microphone used in acoustic voice analysis has significant impact. The results showed that condenser microphones give better results than dynamic microphones, microphones with a balanced output perform better than those with unbalanced outputs, and microphone sensitivity and distance have the largest effect on perturbation measures. NCVS recommends4 that a professional grade condenser microphone with minimum sensitivity of −60 dB should be used, specifically, a miniature head-mounted microphone with balanced output and a mouth-to-microphone distance less than 10 cm.

Acoustic perturbation measurements represent measurements of noise. They assess the nonstationary characteristics of the acoustic voice signals. Deviations from stationary cyclic behavior can result from the larynx or from noise, either in the acoustic environment or in the data acquisition hardware. Perturbations resulting from noise other than that produced at the level of the larynx can affect the reliability and validity of the acoustic voice assessment. NCVS recommends4 that recordings should be made in a sound-treated room with “ambient noise less than 50 dB.”

Ingrisano et al9 investigated the effects of environmental noise on automatic computer-based analyses of voice samples. Two types of acoustic signals were analyzed: live speech from a male and a synthesized triangular waveform. These samples were acoustically mixed with computer fan noise at six signal-tonoise ratio (SNR) levels ranging from 25 dB to 0 dB. One computer hardware device and one software analysis program (MDVP Model 4305; Kay Elemetrics Corporation, Lincoln Park, NJ) was used to analyze the samples. The results showed that elevated noise floors affected perturbation measurements, and it was suggested that voice samples should not be recorded in SNR levels less than 15 dB. In addition, the authors recommend further research should include investigating possible software bias from varying age, gender, and pathophysiological conditions. It was also suggested that a comprehensive study should be conducted to compare across populations, environmental noise levels, and software routines. The authors also state that safeguards and standards need to be in place in routine use of automatic computer-based voice analysis programs.

Perry et al10 examined the effects of computer fan noise on computer-based analysis of voice samples for females when the center frequency of the noise was matched to the fundamental frequency of the participants. Also, the study looked at whether a pathological voice creates an additional challenge to automatic voice analysis when noise is present. Five females with normal voices and one with a pathological voice (vocal nodules) participated in the study. One previously captured male voice sample was also used. Voice samples and noise were mixed acoustically and analyzed using one hardware system and one software system. Results showed that fundamental frequency was relatively resistant to the effects of noise, but jitter and shimmer measurements generally increased as noise floors increased. The greatest amount of measurement error was found for the pathological female voice when captured in the presence of environmental noise.

The purpose of a recent study by Carson et al11 was to examine the effect of noise on computer-based analysis of voice samples compared across three hardware/software combinations. The participants providing the voice production samples were 10 young women, who were recorded on a digital audiotape recorder. The samples were played back and mixed acoustically with computer fan noise at the following three SNR: 25 dB, 20 dB, and 15 dB. The investigators used two hardware environments to resample the original recording and one software system (CspeechSP: Paul Milenkovic, Madison, WI) to analyze the data. The conclusions were limited to the suggestion that appropriate recording standards are needed to obtain valid and reliable results relative to voice production samples, recording processes, and analysis systems.

Section snippets

Purpose and Research Questions

The literature does not provide further detail on the relationship between noise and acoustic assessment of voice quality. The purpose of this study was to investigate the influence of noise on acoustic voice quality measurements while balancing for gender, age, intersubject and intrasubject variability, microphone type, computer hardware, analysis software, and type of noise. Level of noise was precisely controlled. The research questions of particular interest were as follows: What are the

Instrumentation

Prior to collecting the data, many components of instrumentation were compiled and tested. These components included the computers, sound cards, microphones, and voice quality measurement (VQM) software. The first two components, computers and sound cards, comprise the data acquisition (DA) systems. The five DA systems were as follows:

  • CSL: A desktop computer (Dell GX240; Dell Corporation, Round Rock, TX) with a Computerized Speech Lab Model 4400 by Kay Elemetrics Corporation.12

  • DT1: A desktop

Analysis

Unreliable pitch extraction produces invalid acoustic voice measurements. This experiment evaluated the effect of noise on accuracy of pitch extraction through assessing the noise-induced errors in Fo. It is a widely accepted practice to report pitch extraction accuracy with two types of errors, fine and gross. It is known that gross errors are caused by harmonic and subharmonic pitch classifications, whereas fine errors account for smaller changes. In order to accurately evaluate how noise

Analysis

Most commonly, reliability of acoustic voice analysis is evaluated through the extent to which intrasubject variability affects the repeatability of measurements taken over time.3, 6 In this experiment, the impact of noise on reliability was quantitatively compared with the impact of intrasubject variability.

The error caused by noise was measured as the absolute difference ɛR=|pipi| between the clean values pi and the values affected by noise pi, where p can be any of the nine VQM parameters

Analysis

A direct assessment of the impact of noise on the clinical validity of VQM can be obtained for the parameters that have established normative data. From the nine parameters included in the study, two parameters, RAP and Shim, have normative thresholds available as part of the commercial MDVP system.14 These threshold values are TRAP = 0.68% and TShim = 3.81%. Each noise-mixed sample that met the conditions, RAPi>TRAPRAPi, or Shimi>TShimShimi, where RAPi and Shimi, corresponded to the clean

Analysis

The evaluation of technical reliability, comparative reliability, and classification validity was intended to provide data about the critical levels of noise that, if allowed in the environment, are expected to invalidate VQM. In order to establish environmental noise recommendations for voice laboratories, it is important to narrow the criteria to levels that assure accurate measurement. It is necessary to set a criterion of desirable performance and then to check how that performance compares

Analysis

The selection criterion for the current study was the negative history of vocal pathology, largely characterized with normal voice quality. Can it be assumed that the noise recommendations based on the normal population are also valid for the patients suffering from voice disorders?

Acoustic analysis measures voice quality through the extent to which parameters, such as jitter and shimmer, change. In this change, low and high perturbation values correspond to normal and abnormal voice

Conclusions

The results from this study strongly suggest that instrumental measurement of the noise present in the acoustic environment is imperative when acoustic analysis is used for clinical voice assessment. All noise-contributing factors should result in an acoustic environment that has an SNR of at least 30 dB to produce valid results. SNR of 42 dB is the recommended level that allows for voice measurements with an accuracy of at least 99%. Special precautions should be taken to eliminate sources of

Acknowledgments

The authors thank the University of South Carolina faculty, staff, students, and family members who volunteered for in this study; the Department of Communication Sciences and Disorders for funding and supporting the project; Drs. Paul Milenkovic of the University of Wisconsin-Madison and Paul Boersma of the University of Amsterdam for contributing the TF32 and PRAAT software systems, respectively; and Drs. Eric Healy and Allen Montgomery for useful comments on earlier versions of this

References (18)

There are more references available in the full text version of this article.

Cited by (122)

  • The Effects of Remote Signal Transmission and Recording on Acoustical Measures of Simulated Essential Vocal Tremor: Considerations for Remote Treatment Research and Telepractice

    2024, Journal of Voice
    Citation Excerpt :

    Remote recording procedures would reduce the time and cost associated with in-person data collection and may facilitate data collection with larger samples of participants across multiple institutions and time points. Although the perceptual and acoustical assessments in prior studies were performed using clinic-based and laboratory-based audio recordings, the same analyses could be performed using remotely-collected audio recordings that meet current recording guidelines.26-28 A recent study demonstrated that young, healthy speakers could collect audio recordings in remote settings that met recording guidelines in 90% of trials when they were provided with training and written instructions.29

View all citing articles on Scopus
View full text