|Year : 2023 | Volume
| Issue : 1 | Page : 17-22
A mini-review of pathological voice recognition
Mohammad Ali Saghiri1, Chun Kai Tang2, Ali Mohammad Saghiri3, Elham Samadi4
1 Department of Restorative Dentistry, Rutgers School of Dental Medicine, Newark; Rutgers Biomedical Engineering Department, Rutgers University, Piscataway, NJ, USA
2 Department of Restorative Dentistry, Rutgers School of Dental Medicine, Newark, NJ, USA
3 Stevens Institute of Technology, Hoboken, NJ, USA
4 Dr. Hajar Afsar Lajevardi Dental Material and Devices Group, Hackensack, NJ, USA
|Date of Submission||05-Aug-2022|
|Date of Acceptance||27-Sep-2022|
|Date of Web Publication||25-Nov-2022|
Dr. Mohammad Ali Saghiri
Director of Biomaterials Lab, Rutgers School of Dental Medicine, Newark, NJ 07103
Source of Support: None, Conflict of Interest: None
The aim of this review is to examine various pathological conditions that impact the voice and how these features can be used in their diagnosis. An electronic search of PubMed and Google Scholar was performed for the articles published between January 2000 and July 2022 using the keywords found in the Medical Subject Headings database along with PubMed regarding diseases affecting voice. Our preliminary search result identified 608 articles using the keywords mentioned below. Among those, 12 articles met the inclusion criteria set for this review. Voice analysis can prove to be the missing link in the study and early detection of diseases. Using multiple voice attributes to cross-reference and diagnose conditions has excellent potential to fasten the process and significantly improve the diagnosis and treatment of various diseases.
Keywords: Disease, dysphonia, hoarseness, voice, voice recognition
|How to cite this article:|
Saghiri MA, Tang CK, Saghiri AM, Samadi E. A mini-review of pathological voice recognition. Adv Hum Biol 2023;13:17-22
| Introduction|| |
Voice characteristics describe potential tell-tale signs for the diagnosis of a specific disorder. Each disorder can share similar or utterly different voice signages that provide information for physicians to prescribe effective treatment. Systemic disorders such as Parkinson's, lupus and immune dysfunctions can cause voice impairment like hoarseness. Due to voice change, physicians have the potential to not only better diagnose and treat individuals, but also help discover other diseases that previously have not shown a relationship with voice impairments.
In different pathological diseases, infection and inflammation in the pharynx, larynx, salivary glands, air duct, respiratory tract and lung volume can further increase the risk of voice disorders. Voice disorders can be defined as body lesions that cause voice loss. The sound of the voice in dysphonia condition becomes hoarse, rough and weak. [Figure 1] indicates the role of maxillofacial organelles in speech.
|Figure 1: Pathological conditions significantly affect the quality of speech. (a) Role of teeth in speech (b) Location of Salivary Glands, (c) Pronunciation of 'S' words in 'SEA', indicated the role of saliva as a lubricant (d) Split Model, indicated that dry mouth would irritate the tonsils.|
Click here to view
Voice profile has differences in tune, frequency, amplitude, pitch and loudness that express natural sound produced by a human. According to Laver, the phonetic description of voice quality is a result of the speaker's anatomical and physiological change that operates in a specific range to produce voice. Voice is performed by a series of muscles and tissues along the vocal tract that works correspondingly to make movement and vibration. Furthermore, the vibration in muscles that produce voice can sometimes be affected by an infection that is originated from other sites, which migrate to the laryngopharyngeal space and cause alterations to the surrounding tissues.
Current voice recognition technology consists of measuring frequency in Hz and loudness in dB of speech over time by the long-term-average spectrum (LTAS). Taking the average frequency in voice can determine the status of tone and pitch that describe the health status of the vocal cord and vocal tract muscles. Besides LTAS, newer technology uses an algorithm that combines frequency and loudness of voice to determine emotional status in diseases such as Post-Traumatic Stress Disorder, Bipolar Disease and Parkinson's Disease.
Voice recognition in the detection of various pathological manifestations has recently been implemented into modern medicine, aiding in the diagnosis and treatment of common diseases. However, there is very limited knowledge about using the voice as a diagnostic tool in the dental field. We hope to revolutionise dentistry with the help of voice recognition software to detect numerous dental diseases that will relieve patients from the myriad of ailments caused by periodontal diseases as one example. This review aims to examine voice recognition of various sound parameters to help clinicians find a new way to categorise diseases.
| Materials and Methods|| |
Purpose of review
The purpose of this study is to review different voice patterns and characteristics affected by different diseases and disorders. We will examine different sounds in the detection of diseases, helping clinicians determine the relationship between voice alteration and disease.
Inclusion and exclusion criteria
Inclusion criteria were as follows: (1) Articles with keywords voice and disease. (2) Articles published in the English language. (3) Articles published between January 2000 and July 2022. (4) Articles that mention how disease affects voice profile.
Exclusion criteria were as follows: (1) Articles that do not study voice changes. (2) Articles that are not published in the English language. (3) Articles that only study voice profile but not its effects from the disease.
Electronic searches were performed in PubMed and Google Scholar. Keywords from the Medical Subject Headings database regarding 'voice recognition' and 'diseases' were used to search on PubMed.
In the electronic search for scientific publications through PubMed and Google Scholar, keywords were used in various combinations with 'voice'. The keywords included: A disease affecting voice disorder affecting voice and voice pathology. The complete lists of articles and references were evaluated and sorted based on our inclusion/exclusion criteria.
| Results|| |
By applying the keywords of choice, the initial search resulted in 608 articles, but only 12 papers were selected for this review [Table 1]. The chosen articles described alterations in the voice profile that were caused by diseases known for affecting the vocal cord and vocal tract. Other diseases showed signs of dysphonia, but the relationship still remains controversial as dysphonia is generally caused by vocal cord inflammation rather than a direct route to a specific illness. A great number of voice disorders show inflammation in the vocal tract or alteration of vocal muscle by individual's mood effects. Furthermore, some studies explained how the voice profile in the disease population is different when compared to a healthy population. A detailed analysis of the voice profile is reviewed in this article to compare the effects of diseases.
|Table 1: List of diseases in studies that use voice recognition to diagnose patients|
Click here to view
| Discussion|| |
In general, infection and inflammation in vocal tract tissues and organelles such as the pharynx, larynx, trachea and oesophagus can alter voice profile significantly when compared to a healthy population. Voice is a unique sound that exhibits different tone, pitch, amplitude and frequency that pronounce information of an individual's thought. Furthermore, recognising small alterations in voice can help determine whether an individual is sick. Specific details of voice alteration can relate to a series of vocal cord inflammations that are stimulated by the immune system when the disease occurs. For example, laryngitis, which is swelling in the voice box, can relate to many different bacterial infections, fungal infections, allergies and systemic diseases.
The elderly population exhibits a higher prevalence of vocal disorders due to faster cell death and slower immune response compared to young adults and children. As voice impairments occur, emotions and psychological conditions change with increased anxiety. Individuals that have experienced oesophageal reflux, severe neck/back injury and chronic pain in the body show a higher percentage of voice disorders when compared to other symptoms in the elderly population. Diseases such as arthritis and bronchitis also show an elevated risk of voice disorders among the elderly population. It is likely due to lower cellular response rates in the general elderly population that physical injuries and medical conditions were more likely to recover slowly and develop secondary symptoms such as voice disorders. Therefore, voice disorders can help clinicians get a blueprint to discover the diseases that might have impaired the voice.
Bipolar disorder (BD) is a mental health disorder that causes extreme mood changes that can range from depression to mania. Some symptoms of BD include sudden emotional change, causing an individual to suffer from depressive feelings or become euphoric, usually resulting in mania. Due to the state of emotional instability in BD patients, their voice is different when reading a neutral piece of text compared to healthy people.
In voice quality study, Guidi et al. detected frequency differences in the voice of bipolar patients. However, due to their relatively small sample size of BD patients and the unstable mood state of BD patients, the results cannot generalise a conclusion to say BD patients have a true tell-tale sign from the voice to recognise the disorder. Nevertheless, the study provides a fundamental method of detecting voice in different mood states, which can be useful to determine voice quality in other psychological disorders, including larger-scale studies in BD patients.
End-stage renal disease
Chronic loss of kidney function has been known as an end-stage renal disease (ESRD) or chronic kidney disease. During this stage, kidneys exhibit abnormal functioning, preventing them from effectively removing waste and toxins from the body. Patients with ESRD suffer from nausea, vomit, loss of appetite, cramps, hypertension and swelling ankles. In addition, ESRD affects the breath as some fluid is not excreted by the kidneys and builds up in the lungs. Fluid in the lung causes a significant reduction in breathing volume, further altering the sound that voice patients produce.
A clinical study on ESRD treatment, haemodialysis, indicates that haemodialysis has a good effect on removing toxins. However, 3–5 weeks after haemodialysis treatment was used, patients report a concern in hoarse voices and shortness of breath. Haemodialysis used in ESRD patients shows an increase in voice pitch and frequency and a decrease in the noise-to-harmonics ratio (NHR) and maximal phonation time. NHR is essential to control the degree of hoarseness, reflecting the laryngeal efficiency of the body. Therefore, as NHR increases, the hoarseness in the sound of voice gradually worsens and slowly causes loss of voice. As a result, NHR reduces in non-voice changing patients, while voice changing patients do not show an improvement in hoarseness.
Although the study above does not relate ESRD directly to voice disorders, the treatment of haemodialysis used for ESRD can still affect the voice. For clinicians' diagnosis, hoarseness from patients with ESRD might indicate the use of haemodialysis, discriminating against other causes of this voice condition.
Laryngopharyngeal reflux disease
Laryngopharyngeal reflux (LPR) disease results in inflammation caused by stomach acid traveling up the oesophagus and throat, creating acid burn. Stomach acid has a relatively low pH that damages muscles and tissues, producing voice vibrations located at the laryngopharyngeal tract.
Voice quality of patients with LPR disease was studied by Lechien et al. measuring the reflux finding score and reflux symptom index. The conditions of dysphonia which include roughness, breathiness, asthenia, strain and instability were scored for the 80 patients. The results showed that 85% of the patients exhibited hoarseness. Furthermore, this illustrates that dysphonia is clearly related to LPR with voice impairments as a primary concern. Although the relationship of LPR with vocal fold oedema and other vocal cord abnormalities was not observed in their study, the study suggests a larger scale of data study can support the direct relationship between hoarseness and LPR by testing different reflux acid profiles. Acids formed in the gastroesophageal tract differentiated the effects when reflux occurred in the laryngopharyngeal tract. Therefore, voice quality affected by alteration in gastric acid creates a new avenue to examine the true cause of hoarseness in LPR.
Mitochondrial disease is caused by metabolic dysfunction in the mitochondria, preventing it from produce sufficient energy for cell usage. Insufficient energy produced by the mitochondria affects normal function in human organs and tissues. Abnormal cellular function in vocal cord muscles can lead to voice impairment, voice disorders and hoarseness.
Mitochondrial diseases are inherited through the maternal genome, resulting in all of the children inheriting the phenotype if the mother's gene has a mutation coding for mitochondrial DNA. A study by Read et al. showed that a particular point mutation in the mitochondrial gene could cause weakness in vocal cord muscles, resulting in slurred speech. In the 177 patients with mitochondrial disease tested on voice and swallowing functions, specifically, mitochondrial DNA m. 8344A > G point mutation reported a significantly higher degree of physical voice handicap. This implies that multiple point mutations in mitochondrial DNA can lead to a higher level of muscle weakness that contributes to hoarseness of voice.
Parkinson's disease is a non-reversible neuron degeneration that gradually reduces nerve reflection over time. Damage in the nervous system can display symptoms such as tremor, rigid muscles, loss of movements, speech changes and writing changes. Parkinson's disease has a higher prevalence in the elderly population due to the natural cell death of neurons that produce less amount of dopamine for the brain to function correctly. The cause of Parkinson's disease is currently unknown. It could be inherited, which is usually due to mutation, or it can be caused by environmental factors, such as toxins, which trigger the disease state. Alterations in the brain, such as introducing Lewy bodies, can form a cluster of brain cells that cannot break down, blocking neurotransmitters from traveling to the destination. Regular aerobic exercise and brainstorming can help reduce the cluster of brain cells, a potential prevention for Parkinson's disease.
Gibbins et al. studied voice problems and related abnormalities in voice in patients with Parkinson's disease. Voice problems are an early indication for Parkinson's disease that usually show symptoms of soft voice, breathiness, hoarseness and monotone voice. Due to a reduction of neurotransmitters (acetylcholine and dopamine) at the site of respiratory and laryngeal muscles, the voice-producing mechanism was significantly out of control and displayed numerous symptoms. Another aspect discussed by Gibbins et al. pointed out the effects of Parkinson's disease medications, such as anticholinergics and Monoamine oxidase-B (MAO-B) inhibitors, which have well-known side effects of causing dryness in the vocal tract and may thus correspond to physical laryngeal changes. As the larynx alters, the voice turns hoarse and quieter. The progression of Parkinson's disease may worsen the pre-existing laryngeal pathology, further causing significant voice loss. Therefore, Gibbins et al. proposed Lee Silverman Voice Treatment Loud therapy to enhance voice exercise and to gain louder speaking for Parkinson's disease patients. The results after 4 weeks of loudness training were able to increase the loudness of the voice while also improving the muscular function of swallowing and facial expression.
The study of voice abnormalities of Parkinson's diseases conducted by Midi et al. used Unified Parkinson's Disease Rating Scale to determine motor functions of vocal cords in Parkinson's disease. The voice of Parkinson's disease patients uses the Grade of Dysphonia, Roughness, Breathiness, Asthenia and Strain (GRABS) scale to formulate a detailed analysis. As a result, due to the small sample size, there was no significant difference in the vocal cord motor function when compared to voice parameters from the GRABS scale.
Voice recognition data of Parkinson's pathology that uses vocal fold sounds to discriminate the difference in voice profile between the healthy population and pathological population show significant success in voice analysis. The use of Mel frequency cepstral coefficients (MFCC) to determine the voice profile of Parkinson's disease was easier and faster in computation time to process the data. Pravena et al. extracted voice parameters from vocal fold diseases such as Parkinson's disease and crossed referenced them in the mixture model. Further, they inputted voice data in the MFCC to pinpoint specific spectra differences that indicate abnormalities in the voice caused by vocal fold disease. This prototype project by Pravena et al. can be applied to a larger-scale voice disorder study and help determine unknown voice pathologies.
Another voice recognition tool used Data Mining Technique to analyse voice data from suspected patients to predict Parkinson's disease occurrence. In the study from Sriram et al., they use data mining algorithms, navies Bayesian and Bayes networks, to detect voice i.e., based on the following algorithm: Step 1: Input voice data. Step 2: Convert data into comma-separated value File. Step 3: Generate the average of attributes such as frequency, modulation and phase. Step 4: Use the query file to repeat steps 1, 2 and 3. Step 5: Compare the tested data in the file. Step 6: Make a prediction on Parkinson's disease based on equation P (x) = P (x/) P (x) + P (y) (X = Match With Tested, Y = Miss-Match With Non-Tested). Step 7: Calculate the probable occurrence of Parkinson's disease from the result. Step 8: Formulate the final result in a chart. This method was used by Sriram et al. examined 100 suspected patients with Parkinson's disease and got a successful detection rate of 83%. In conclusion, the voice detection algorithm used was able to detect Parkinson's disease, and no other special instrument is required for further analysis.
Pertussis is also known as whooping cough i.e., caused by Bordetella pertussis bacterial infection. Symptoms of pertussis consist of cold-like symptoms, mild cough and fever. As pertussis progresses in babies', child's and adult's lungs, uncontrollable coughing and difficulty breathing can develop, which could become lethal if not relieved in time. Pertussis symptoms can last from days to weeks and slowly recover. However, according to the statement from the Centers for Disease Control and Prevention, pertussis coughing can return along with other respiratory infections. Due to random coughing, the voice of pertussis-infected individuals is significantly different from the healthy population. A notable sign from pertussis is the 'whoop' sound that is created when air is inhaled into the lungs. Therefore, the specific 'whoop' sound is used to diagnose coughing patients who are infected by Bordetella pertussis.
A study used voice recognition to distinguish pertussis coughing from other types of coughing conducted by Parker et al. showing that pertussis produces a frequency of cough sound that has sharper spikes with relatively longer quiet intervals between the coughing. With the algorithm created by Parker et al., they were able to categorise pertussis coughing with a 90% successful rate. However, due to the small sample size studied in their experiment, Parker et al. claimed that their robust results from analysing pertussis coughing were a potential pilot project that required more data acquired in the future to generate a higher confidence level. In conclusion, the algorithm of voice recognition for pertussis cough was a successful approach that provides clinicians a tool for a more precise diagnosis.
Sjogren's syndrome (SS) is an autoimmune disease caused by the immune system attacking the self-cells and tissues. Symptoms of SS are mainly dry mouth and dry eyes. In addition, antibodies produced by the immune system attack the glandular tissue located in the laryngopharynx, salivary glands and seromucous glands. Due to the immune system's self-attack of glands in the body, individuals with SS may have lower mucus that causes vocal cord lesions and further affects the sound of voice.
In a voice quality study conducted by Ogut et al., SS patients exhibited a higher reflux rate in the laryngopharynx area. This may be caused by SS treatment medications such as salicylic acid (27%), oral steroid (23%) and kinin (62%). Results further pointed out that the medications listed above decrease prostaglandin synthesis, increase stomach acid formation and increase muscle relaxation, which can lead to laryngopharynx reflux that affects the vocal tract. As when observing the voice profile of SS patients, Ogut et al. discovered higher values of pitch over a period and amplitude (peak to peak) in SS patients with their medication taken. Medications of SS cause vocal fold and oedema, which affects the voice-producing mechanism.
| Conclusion|| |
Voice recognition applied to diagnose disease conditions was able to find substantial information needed to distinguish between the healthy population and infected patients. In cases that have inflammation in the vocal tract, voice profile was more likely to become hoarse, low in pitch and low in tone. As studies pointed out, some of the diseases were not thoroughly examined and related to the cause of dysphonia and hoarseness. However, diseases that trigger inflammation or induce by gene mutation can further affect the motor function in the vocal cord. Some successful voice recognition methods to detect diseases such as bipolar disease, ESRD, Mitochondrial disease, Parkinson's disease and Sjogren's syndrome can be applied to a larger scale of study. Notably, for Parkinson's disease, the voice abnormalities are early symptoms of the disease. Therefore, voice recognition methods that predict disease status based on different algorithm models were able to obtain a successful detection rate. A further suggestion of voice recognition to detect disease requires a larger scale of study and data to improve the algorithm's detection rate.
MAS is a recipient of the Denbur Tech, New Jersey Health Foundation, and Tech Advance Awards. This publication is dedicated to the memory of Dr. H. Afsar Lajevardi, (Saghiri, M.A. and A.M. Saghiri, In Memoriam: Dr. Hajar Afsar Lajevardi MD, MSc, MS (1955-2015). Iranian Journal of Pediatrics, 2017. 27(1): p. 1.) a legendry pediatrician (1953-2015) who passed. We will never forget Dr. H Afsar Lajevardi's kindness and support. The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the affiliated organizations. The authors hereby announce that they have active cooperation in this scientific study and preparation of the present manuscript. The authors confirm that they have no financial involvement with any commercial company or organization with direct financial interest regarding the materials used in this study.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Laver J. The Phonetic Description of Voice Quality. Vol. 31. Cambridge Studies in Linguistics London; 1980. p. 1-186.
Löfqvist A, Mandersson B. Long-time average spectrum of speech and voice analysis. Folia Phoniatr (Basel) 1987;39:221-9.
Roy N, Stemple J, Merrill RM, Thomas L. Epidemiology of voice disorders in the elderly: Preliminary findings. Laryngoscope 2007;117:628-33.
Guidi A, Schoentgen J, Bertschy G, Gentili C, Landini L, Scilingo EP, et al
. Voice quality in patients suffering from bipolar disease. Annu Int Conf IEEE Eng Med Biol Soc 2015;2015:6106-9.
Jung SY, Ryu JH, Park HS, Chung SM, Ryu DR, Kim HS. Voice change in end-stage renal disease patients after hemodialysis: Correlation of subjective hoarseness and objective acoustic parameters. J Voice 2014;28:226-30.
Lechien JR, Khalife M, Huet K, Finck C, Bousard L, Delvaux V, et al
. Perceptual, aerodynamic, and acoustic characteristics of voice changes in patients with laryngopharyngeal reflux disease. Ear Nose Throat J 2019;98:E44-50.
Read JL, Whittaker RG, Miller N, Clark S, Taylor R, McFarland R, et al
. Prevalence and severity of voice and swallowing difficulties in mitochondrial disease. Int J Lang Commun Disord 2012;47:106-11.
Gibbins N, Awad R, Harris S, Aymat A. The diagnosis, clinical findings and treatment options for Parkinson's disease patients attending a tertiary referral voice clinic. J Laryngol Otol 2017;131:357-62.
Midi I, Dogan M, Koseoglu M, Can G, Sehitoglu MA, Gunal DI. Voice abnormalities and their relation with motor dysfunction in Parkinson's disease. Acta Neurol Scand 2008;117:26-34.
Pravena D, Dhivya S, Devi AD. Pathological voice recognition for vocal fold disease. Int J Comp Appl 2012;47:31-7.
Sriram T, Rao M, Narayana GV, Kaladhar DS. ParkDiag: A tool to predict Parkinson disease using data mining techniques from voice data. Int J Eng Trend Technol 2016;31:136-40.
Parker D, Picone J, Harati A, Lu S, Jenkyns MH, Polgreen PM. Detecting paroxysmal coughing from pertussis cases using voice recognition technology. PLoS One 2013;8:e82971.
Mahoney EJ, Spiegel JH. Sjögren's disease. Otolaryngol Clin North Am 2003;36:733-45.
Ogut F, Midilli R, Oder G, Engin EZ, Karci B, Kabasakal Y. Laryngeal findings and voice quality in Sjögren's syndrome. Auris Nasus Larynx 2005;32:375-80.
Saltürk Z, Özdemir E, Kumral TL, Karabacakoğlu Z, Kumral E, Yildiz HE, et al
. Subjective and objective voice evaluation in Sjögren's syndrome. Logoped Phoniatr Vocol 2017;42:9-11.