Hideki Kawahara, Emeritus Professor, Wakayama University (tentatively under construction due to host migration)
Updated: Sat May 25 21:16:54 JST 2024
-
All functionality of STRAIGHT (legacy and TANDEM) including the generalized voice morphing are available as a set of GUI tools based on WORLD vocoder.
Visit worldGUItools and YouTube channel playlist for WORLD GUI Tools.
-
Kawahara, H., & Morise, M. (2024). Interactive tools for making vocoder-based signal processing accessible: Flexible manipulation of speech attributes for explorational research and education.
Acoustical Science and Technology, 45(1), 48–51. (DOI: 10.1250/ast.e23.52)
(Link to PDF)
-
Kawahara, H., & Morise, M. (2024). Interactive tools for making temporally variable, multiple-attributes, and multiple-instances morphing accessible: Flexible manipulation of divergent speech instances for explorational research and education. arXiv preprint arXiv:2404.13418. ( Link to arXiV) (Accepted for Acoustical Science and Technology)
I am a tool builder hoping to make useful tools to promote understanding of human speech communication and encourage collaborations between researchers and developers. I would appreciate your suggestions to produce other attractive, beneficial tools.
- Presented an objective evaluation tool for pitch extractors responses to freqency modulated fundamental frequency at Interspeech 2022
Kawahara, Hideki, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, and Masanori Morise. "An objective test tool for pitch extractors' response attributes." arXiv preprint arXiv:2204.00902 (2022).
( link: arXiv PDD , visualization movie of 16 pitch extractors' modulation transfer function )
- Published an acoustic measurement method using actual contents such as music
Hideki Kawahara, Kohei Yatabe, Safeguarding test signals for acoustic measurement using arbitrary sounds: Measuring impulse response by playing music, Acoustical Science and Technology, 2022, 43(3), p. 209-212, 2022. ( Journal link , GitHub link , and Tutorial video )
- Presented at APSIPA ASC 2021: Tokyo and hybrid: Hideki Kawahara, Toshie Matsui Kohei, Yatabe Ken-Ichi Sakakibara Minoru Tsuzaki Masanori Morise Irino (2021) Implementation of interactive tools for investigating fundamental frequency response of voiced sounds to auditory stimulation, Proc. APSIPA ASC 2021, pp.897-903
- Presented two articles at Interspeech2021
- Kawahara, H., Matsui, T., Yatabe, K., Sakakibara, K.-I., Tsuzaki, M., Morise, M., Irino, T. (2021) Mixture of Orthogonal Sequences Made from Extended Time-Stretched Pulses Enables Measurement of Involuntary Voice Fundamental Frequency Response to Pitch Perturbation. Proc. Interspeech 2021, 3206-3210, (doi: 10.21437/Interspeech.2021-2073)
- Kawahara, H., Yatabe, K., Sakakibara, K.-I., Mizumachi, M., Morise, M., Banno, H., Irino, T. (2021) Interactive and Real-Time Acoustic Measurement Tools for Speech Data Acquisition and Presentation: Application of an Extended Member of Time Stretched Pulses. Show and Tell in Proc. Interspeech 2021, 4853-4854, (Archive with a link to PDF)
- Presented at ICASSP2021: Kawahara, Hideki, and Kohei Yatabe. "Cascaded all-pass filters with randomized center frequencies and phase polarity for acoustic and speech measurement and data augmentation." ICASSP2021, pp. 306-310, doi: 10.1109/ICASSP39728.2021.9415057.
(tentative link to arXiV)
- Presented at APSIPA2020: Kawahara, Hideki, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Masanori Morise, and Hideki Banno. "Simultaneous measurement of time-invariant linear and nonlinear, and random and extra responses using frequency domain variant of velvet noise." In 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 174-183. IEEE, 2020.
(tentative link to arXiV)
- Presented: Hideki Kawahara, "Strange aftereffect caused by periodic allocation of a frequency domain variant of velvet noise," Auditory Research Meeting of the Acoustical Society of Japan, H-2019-108, 49(8), pp.591-595.
(PDF,
Presentation (PDF))(14/Dec./2019)
- Two talks Presented at APSIPA2019: (18-21,Nov.,2019)
- Hideki Kawahara, Ken-Ichi Sakakibara, Eri Haneishi, Kaori Hagiwara, "Real-time and Interactive Tools for Vocal Training Based on an Analytic Signal with a Cosine Series Envelope," Proceedings of APSIPA Annual Summit and Conference 2019, Lanzhou, China, pp.907-910, 18-21 November 2019.
(tentative link to arXiV)
- Hideki Kawahara, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Hideki Banno, Masanori Morise, Toshio Irino, "Frequency Domain Variant of Velvet Noise and Its Application to Acoustic Measurements," Proceedings of APSIPA Annual Summit and Conference 2019, Lanzhou, China, pp.1523-1532, 18-21 November 2019.
(tentative link to arXiV)
- Presented at Interspeech 2019: Cite as: Terasawa, H., Wakasa, K., Kawahara, H., Sakakibara, K. (2019) Investigating the Physiological and Acoustic Contrasts Between Choral and Operatic Singing. Proc. Interspeech 2019, 2025-2029, DOI: 10.21437/Interspeech.2019-1864.
(Link ISCA archive,) (15-19, Sept.,2019)
- Awarded, Distinguished Contribution to Acoustics Medal: by
the Acoustical Society of Japan. (24, May, 2019).
- Presented an ARC/IEEE SPS NZ seminar talk at Auckland University, New Zealand. (22, February, 2019)
(Presentation and supporting materials)
- A chapter on voice morphing was published in The Oxford Handbook of Voice Perception. (06, December, 2018)
(
Link to the publisher's page)
- Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen:
Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis,
Proc. SSW9, (2016). (link to arXiV)
(This is my work at Google UK. This also is the starting point of my new VOCODER research.)
- Hideki Kawahara, Ken-Ichi Sakakibara, Hideki Banno, Masanori Morise, Tomoki Toda, Toshio Irino:
Aliasing-free implementation of discrete-time glottal source models and their applications to speech synthesis and F0 extractor evaluation,
Proc. 2015 APSIPA ASC, 520-529, (2015). (link to PDF)
(This is useful. However, our Interspeech 2017 supersedes this.)
- Hideki Kawahara:
Temporally Variable Multi attribute Morphing of Arbitrarily Many Voices for Exploratory Research of Speech Prosody,
in Speech Prosody in Speech Synthesis: Modeling and generation of prosody for high quality and flexible speech synthesis,
(eds. Keikichi Hirose and Jianhua Tao)
Springer Berlin Heidelberg, pp.109-120, 2015.
(DOI:10.1007/978-3-662-45258-5_8 ).
This is a good introduction to a powerful extended morphing.
- Hideki Kawahara, Masanori Morise, Ryuichi Nisimura and Toshio Irino:
HIGHER ORDER WAVEFORM SYMMETRY MEASURE AND ITS APPLICATION TO PERIODICITY DETECTORS FOR SPEECH AND SINGING WITH FINE TEMPORAL RESOLUTION,
ICASSP2013, Vancouver Canada, 26-31 May, 2013. (Accepted) (30/May/2013).
- Hideki Kawahara, Masanori Morise, Ryuichi Nisimura and Toshio Irino:
An interference-free representation of group delay
for periodic signals,
Proc. APSIPA, 3-6 December, OS.17-SLA 8, 2012 Calfornia, USA. (4/Dec./2012)
- Hideki Kawahara, Masanori Morise, Ryuichi Nisimura, Toshio Irino:
Deviation measure of waveform symmetry and its
application to high-speed and temporally-fine F0
extraction for vocal sound texture manipulation,
Interspeech2012, 2012. (10/Sept./2012)
- Hideki Kawahara and Masanori Morise,
Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework,
SADHANA - Academy Proceedings in Engineering Sciences, Vol.36, Part 5, pp.713-722, 2011.
- Hideki Kawahara, Toshio Irino and Masanori Morise,
An interference-free representation of instantaneous frequency of periodic signals and its application to F0 extraction,
Proc. ICASSP 2011, May 2011. (doi:10.1109/ICASSP.2011.5947584 )
- Laetitia Bruckert, Patricia Bestelmeyer, Marianne Latinus, Julien Rouger, Ian Charest, Guillaume A. Rousselet, Hideki Kawahara, Pascal Belin, Vocal Attractiveness Increases by Averaging, Current Biology, Volume 20, Issue 2, 116-120, 26 (January 2010)
DOI: 10.1016/j.cub.2009.11.034
- Romi Zäske, Stefan R. Schweinberger, Jürgen M. Kaufmann, Hideki Kawahara:
In the ear of the beholder: neural correlates of adaptation to voice gender,
European Journal of Neuro Science, Vol.30, No.3, pp.527-534 (August 2009)
DOI: 10.1111/j.1460-9568.2009.06839.x
- Osamu Fujimura, Kiyoshi Honda, Hideki Kawahara, Yasuyuki Konparu, Masanori Morise and J.C. Williams, Noh Voice Quality, J. Logopedics Phoniatrics Vocology,34(4), 157-170 (04 June 2009)
DOI: 10.1080/14015430903002288
- H. Kawaahra, R. Nisimura, T. Irino, M. Morise, T. Takahashi, B. Banno, Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown, Proc. ICASSP, Taipei, Taiwan, 19-24 (2009).
DOI: 10.1109/ICASSP.2009.4960481
- Stefan R. Schweinberger, Christoph Casper, Nadine Hauthal, Juergen M. Kaufmann, Hideki Kawahara, Nadine Kloth, David M.C. Robertson, Adrian P. Simpson and Romi Zaeske,
Auditory Adaptation in Voice Perception, Current Biology 18, 684-688, May 6, (2008).
- Hideki Kawahara, Masanori Morise, Toru Takahashi, Ryuichi Nisimura, Toshio Irino, Hideki Banno, A TEMPORALLY STABLE POWER SPECTRAL REPRESENTATION FOR PERIODIC SIGNALS AND APPLICATIONS TO INTERFERENCE-FREE SPECTRUM, F0, AND APERIODICITY ESTIMATION, Proc. ICASSP 2008, Las Vegas,pp.3933-3936(2008)
- Hideki Banno, Hiroaki Hata, Masanori Morise, Toru Takahashi, Toshio Irino and Hideki Kawahara,
"Implementatioin of realtime STRAIGHT speech manipulation system: Report on its first implementation,"
Acoustic Science and Technology, Vol.28, pp.140-146 (2007)
- Hideki Kawahara: STRAIGHT, Exploration of the other aspect of VOCODER:
Perceptually isomorphic decomposition of speech sounds,
Acoustic Science and Technology, Vol.27, No.6, pp.349-353 (2006).[invited]
- Toshio Irino, Roy D. Patterson, and Hideki Kawahara, "Speech
segregation using an auditory vocoder with event-synchronous
enhancements," IEEE Trans. Speech and Audio Process.,
Vol.27, Issue 6, pp.2212-2221 (2006).
- Hideki Kawahara, Alain de Cheveigne, Hideki Banno, Toru Takahashi and Toshio Irino,
Nearly Defect-free F0 Trajectory Extraction for Expressive Speech Modifications based on STRAIGHT,
Proc. Interspeech2005, Lisboa, pp.537-540, Sept. 2005.
- David R. R. Smith, Roy D. Patterson, Richard Turner, Hideki Kawahara and Toshio Irino,
The processing and perception of size information in speech sounds,
Journal of the Acoustical Society of America, 117(1), pp.305-318, Jan.2005.
- Hideki Kawahara, Hideki Banno, Toshio Irino and Parham Zolfaghari,
ALGORITHM AMALGAM: Morphing waveform based methods, sinusoidal models and STRAIGHT,
Proc. ICASSP'2004, Montreal Canada, vol.1, pp.13-16, 2004
- Hideki Kawahara and Hisami Matsui,
Auditory morphing based on an elastic perceptual distance metric in an interference-free
time-frequency representation,
ICASSP'2003, pp.256-259 (2003).
- Alain de Cheveigné,Hideki Kawahra,
YIN, "a fundamental frequency estimator for speech and music",
Journal of the Acoustical Society of America, Vol.111, No.4, pp.1917-1930 (2002)
- H. Kawahara, Jo Estill and O. Fujimura: Aperiodicity extraction
and control using mixed mode excitation and group delay manipulation
for a high quality speech analysis, modification and synthesis
system STRAIGHT, MAVEBA 2001, Sept.13-15, Firentze Italy, 2001.
- Hideki Kawahara, Yoshinori Atake and Parham Zolfaghari: Accurate
vocal event detection method based on a fixed-point to weighted
average group delay, ICSLP-2000, Beijing, pp.664-667 2000.
- Hideki Kawahara, Haruhiro Katayose, Alain de Cheveigne, Roy
D. Patterson: Fixed Point Analysis of Frequency to Instantaneous
Frequency Mapping for Accurate Estimation of F0 and Periodicity
, Proc. EUROSPEECH'99, Volume 6, Page 2781-2784 (1999).
- Hideki Kawahara, Ikuyo Masuda-Katsuse and Alain de Cheveigne:
Restructuring speech representations using a pitch-adaptive time-frequency
smoothing and an instantaneous-frequency-based F0 extraction:
Possible role of a reptitive structure in sounds, Speech Communication,
27, pp.187-207 (1999). [1998-1999 EURASIP best paper award]
- Alain de Cheveigne,Hideki Kawahara,"Missing-data Model of Vowel Identification" J.Acoust.Soc.Am., Vol.105, pp.3497-3508, 1999.
- Hideki Kawahara, Alain de Cheveigne and Roy D. Patterson:
An instantaneous-frequency-based pitch extraction method for
high-quality speech transformation: revised TEMPO in the STRAIGHT-suite,
Proc. 5th Int. Conf. on Spoken Language Processing (ICSLP '98),
Sudney, (1998.12).
- Hiroko Kato and Hideki Kawahara: ``An Application of the
Bayesian Time Series Model and Statistical System Analysis for
F0 Control'', Speech Communication, (1998)
- Hideki Banno, J. Ju, Satoshi Nakamura, Kiyohiro Shikano and
Hideki Kawahara: ``Efficient Representation of Short-time Phase
Based on Group Delay'', ICASSP'98, SP26.6, Seattle, (1998.5).
- Alain DE Cheveigne (CNRS), Hideki Kawahara, Minoru Tsuzaki,
and Kiyoaki Aikawa: ''Concurrent Vowel Identification I: Effects
of relative Level and F0 Differences,'' J. Acoust. Soc. Am.,
Vol.101, pp.2839-2847 (1997.5)
- Hideki Kawahara: ''Speech Representation and Transformation
using Adaptive Interpolation of Weighted Spectrum: VOCODER Revisited,''
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing
(ICASSP '97), vol.2, pp.1303-1306 (1997.4)
- Our paper on interference-free power spectral representations was presented at APSIPA ASC 2018, Hawaii 12-15, November.
(PDF)
- Legacy-STRAIGHT, a speech analysis, modification, and resynthesis system is open to everyone.
Please visit a GitHub repository of legacy_STRAIGHT.
It is compatible with MATLAB R2018a and Gnu Octave 4.0.0. (24/July/2018)
- Our paper on frequency domain variants of velvet noise was presented at Interspeech 2018, india 2-6, Sept.
(PDF ISCA archive)
- Inharmonic excitation extension of STRAIGHT is applied to cocktail party research and
published in Nature Communications, 9(1) DOI:10.1038/s41467-018-04551-8 (29, May, 2018.)
- Presented variants of velvet noise at SIGMUS-118 (Tsukuba, Japan, 20, February, 2018)(Demo page with slides and MATLAB code)(from 1:53:
streaming video of my presentation)
- Presented a new mixed measure periodicity detector at APSIPA ASC 2017 ,
(Kuala Lumpur, 12-15, December 2017)
- Presented use of SparkNG at ICOT 2017 ,
(NUS Singapore, 8-10, December 2017)
- Presented IALP keynote, at NUS Singapore (5-7, December 2017)
- Demo page for PEVOC2017 (31/August/2017)
- I presented two articles in Interspeech 2017. You can preview them using the following links.
(link to anti-aliased Fujisaki-Ljungqvist model and
link to an instantaneous frequency-based periodicity measure.) (28/July/2017)
- YANGvocoder A new VOCODER framework developed
while I was a visiting research scientist of Google UK, is now open-source on GitHub. (11/Jan./2017)
- Presented my last APSIPA DL talk at
Tianjin University, China(8/Dec./2016) (PDF slides)
photo
- Presentation on SparkNG and new F0 extractor at Academia Sinica Taipei, Taiwan.
(Link to PDF: edited version for public access) (23/Aug./2016)
-
SparkNG was presented in Show and Tell corner of Interspeech 2016. (8-12/Sept./2016)
- A new F0 extractor was presented at ISCA workshop SSW9 as a part of my work in Google. Draft is available from
arXiV (13-15/Sept./2016)
- Augst 2015 - June 2016: Visiting research scientist at Google in London, UK
- Presented my APSIPA DL takl at University College London UK
(16/May/2016)
(PDF slides)
- Presented my APSIPA DL talk at
Edinburgh University, UK(29/Jan./2016) (PDF slides)
photo
- Presented my APSIPA DL talk at
Academia Sinica, Taiwan(22/Dec./2015) (PDF slides)
photo
- Anti-aliased L-F model and its application were presented
at APSIPA 2015, in Hong Kong.
(19/Dec./2015) (PDF slides)
- Presented my APSIPA DL talk
at University of Sheffield(2/Dec./2015)
(PDF slides) photo
- Presented a talk at CCRMA on STRAIGHT as a part of my
APSIPA DL talks. (PDF slide)(20/Nov./2015) photo
- An interactive tool to investigate relations between vocal tract transfer function,
pole frequencies and bandwidths, shape, LSF (LSP), and synthesized sounds is added to
Matlab realtime speech tools. Stand alone applications for Mac and Windows,
which do not require Matlab installation are also available (15/May/2015)
-
Matlab realtime speech tools are introduced in the Issue 9 of
APSIPA Newsletter (20/April/2015).
Please read the article (pp.5-10) for details.
- Final Lecture at Wakayama University is scheduled on 15:00-17:00, 20th March, 2015.
- Appointed as one of 2015-2016 Distinguished Lecturers of
APSIPA (Asia-Pacific Signal and Information Processing Association)
- Tutorial on STRAIGHT was presented at APSIPA 2014. (December 12-19, 2014)
PDF version of the presentation is here. (221 slides, 36MB)
- My presentation of Seminar Series of the DFG Research Unit Person Perception, titled
"Making speech tangible" (PDF) is linked. (November 13, 2014)
- Matlab source codes, presentation and a demonstration movie of "Temporally static group delay representation" are
available. (
link to the supplemental page to Interspeech2014) (17/Sept./2014)
- Presented "Temporally variable multi-aspect N-way morphing" at APSIPA 2013 ASC. (31/October/2013)
(My presentation at APSIPA 2013 (pdf))
- Presented a new F0 extractor (again!) and important "Take home message" at ICASSP2013. Please
check my presentation at ICASSP2013. (30/May/2013)
- Presented an invited talk
at PPRU-Workshop VII,
Friedrich-Schiller-University of Jena, Germany (2013.4.26)
- Presentation at Interspeech 2012. (10/Sept./2012)
- Presented an invited talk at
LISTA workshop 2012 Edinburgh, 2-3 May 2012 .
- Organized a special session
(SS-L9: Advances in singing-voice synthesis, transformation, and application) in
ICASSP2012 (25-30 March 2012, Kyoto)
(My ICASSP presentation is now available: 6/April/2012)
- Invited talk on Signal processing challenge for singing vice texture was presented at the SIGMUS94
special event on Singing information processing
(USTREAM video: in Japanese) using
( interactive iBook: in Japanese ) (3/Feb./2012)
- A new instantaneous frequency calculation method was presented at ICASSP2011 (
Demo movies and
presentation )
(27/May/2011)
-
Movie visualizations on how TANDEM-STRAIGHT works
are accessible now. Detailed descriptions on them were presented as a tutorial session in
SSW7.
(Presentation slides(PDF))(25/Sept./2010)
- An introduction and operation manual for TANDEM-STRAIGHT with GUI is
[accessible online through APSIPA proceedings page](Oct, 2009)
- An extended framework of morphing is formulated and presented at ICASSP2009.
[presentation slides (pdf)](April, 2009)
-
It is shown for the first time that adaptation to nonlinguistic information in voices
elicits systematic auditory aftereffects.
(Current Biology 18, 684-688, May 6, 2008)
DOI link to the article
- A completely new reformulation of STRAIGHT (TANDEM-STRAIGHT) was presented at ICASSP2008.
[presentation slides (pdf)](April, 2008)
- Invited talk on "STRAIGHT as a research tool for L2 study:
How to manipulate segmental and supra-segmental features" (organized by
Jared C. Bernstein and Reiko Akahane-Yamada)
at Fourth Joint Meeting of ASA and ASJ (Decemter 2006).
[presentation slides (pdf)]
[morphed synthetic R/L continuum (wmv)]
[morphed natural R/L continuum (wmv)]
- Lay language paper
on "Voice quality of artistic expression in Noh:
An analysis-synthesis study on source-related parameters"
was presented at Fourth Joint Meeting of ASA and ASJ (Decemter 2006).
- Auditory morphing demonstrations on emotional speech. (Flash movie)
This demonstration was on display at National Museum of Emerging Science and Innovation Tokyo
from 23 April to 15 August 2006. The title of the special event was
"Love stories - Why you are not alone.".
The interface was designed by Takashi Yamaguchi and
auditory morphing sounds were synthesized by Hideki Kawahara using
STRAIGHT-based morphing algorithm.
- STRAIGHT: A very high-quality speech manipulation system
(tutorial page),
(Publicly accessible STRAIGHT trial version)
(The latest version of STRAIGHT (STRAIGHTV40) is available to the registered users.)(20 Sept. 2005)
- STRAIGHT based TTS (Text To Speech) system won the first place in the Blizzard challange and reported at
INTERSPEECH2005. (Sept. 2005).
- "Temporal media design project" supported by CREST (Aka CrestMuse project)
(2005 to 2010: PI is Prof. Katayose and I will take part) uses STRAIGHT as one of key components. (Sept. 2005).
- Invited talk on "Manipulating the pulse rate and resonance scale in speech and animal calls",
at the special session "Size information in speech and animal calls" (organized by
Patterson and
Irino)
at the 149th ASA meeting in Vancouver, Canada. (May 2005).
-
A STRAIGHT-generated chorus won the first place in blindfold listening test of synthetic vocal systems conducted
by RENCON'04, (June 2004).
- WAVE
: Wakayama workshop on Auditory and Vocal rEsearch (April 2004)
- e-Society project (primary investigator of a subgroup on flexible speech synthesis software)
- "Auditory brain project" by CREST (primary investigator,1997-2002)
(CMAP-CREST workshop 2002)
Updated by Hideki Kawahara