Grupo de Tecnología del Habla: Documentation

List of recent publications

Note  : For all the IEEE publications in this list, the following clause applies: "This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder"


·         C. Luna-Jiménez, D. Griol, Z. Callejas, R. Kleinlein, J.M. Montero, F. Fernández-Martínez, "Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning ". Sensors 2021, 21, 7665; , bib

·         M. Gil-Martín, W. Johnston, R. San-Segundo, B. Caulfield, "Scoring Performance on the Y-Balance Test Using a Deep Learning Approach ". Sensors 2021, 21, 7110; , bib

·         C. Zhang, G. Lee, L.F. D’Haro, H. Li, "D-Score: Holistic Dialogue Evaluation Without Reference". IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 29, 2021, Pages 2502-2516; , bib

·         D. Romero, L. F. D’Haro, C. Salamea, "Exploring Transformer-based Language Recognition using Phonotactic Information". IberSPEECH 2021, 24-25 marzo 2021, Valladolid, Spain, pag. 250-254. DOI: 10.21437/IberSPEECH.2021-53 bib

·         A. Gallardo-Antolín, J.M. Montero, "On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification". Neurocomputing, 2021, Volume 456; , bib

·         A. Gallardo-Antolín, J.M. Montero, "Detecting Deception from Gaze and Speech Using a Multimodal Attention LSTM-Based Framework". Applied Sciences, 2021, 11(14), 6393; , bib

·         A. Coucheiro-Limeres, J. Ferreiros-López, F. Fernández-Martínez, R. Córdoba, "A dynamic term discovery strategy for automatic speech recognizers with evolving dictionaries". Expert Systems with Applications, Volume 176, 2021, 114860; , bib

·         R. Kleinlein, C. Luna-Jiménez, D. Arias-Cuadrado, J. Ferreiros, F. Fernández-Martínez, "Topic-Oriented Text Features Can Match Visual Deep Models of Video Memorability". Applied Sciences, 2021, 11(16), 7406; , bib

·         C. Luna-Jiménez, J. Cristóbal-Martín, R. Kleinlein, M. Gil-Martín, J. M. Moya, F. Fernández-Martínez, "Guided Spatial Transformers for Facial Expression Recognition". Applied Sciences 2021, 11, 7217,, bib

·         M. Gil-Martín, R. San-Segundo, F. Fernández-Martínez, J. Ferreiros-López, "Time Analysis in Human Activity Recognition". Neural Processing Letters, 2021,, bib

·         B. Martínez-González, J. M. Pardo, J. A. Vallejo-Pinto, R. San-Segundo, J. Ferreiros, "Analysis of transition cost and model parameters in speaker diarization for meetings". EURASIP Journal on Audio, Speech, and Music Processing (2021) 2021:12,, bib


·         M. Gil-Martín, J. Antúnez-Durango, R. San-Segundo, "Adaptation and Selection Techniques Based on Deep Learning for Human Activity Recognition Using Inertial Sensors". Proceedings of The 7th International Electronic Conference on Sensors and Applications, 2020, 2(1), 22,, bib

·         R. San-Segundo, A. Zhang, A. Cebulla, S. Panev, G. Tabor, K. Stebbins, R.E. Massa, A. Whitford, F. de la Torre, J. Hodgins, "Parkinson's Disease Tremor Detection in the Wild Using Wearable Accelerometers". Sensors 2020, 20, 5817, bib

·         M. Gil-Martín, M. Sánchez-Hernández, R. San-Segundo, "Human Activity Recognition Based on Deep Learning Techniques". Proceedings of The 6th International Electronic Conference on Sensors and Applications, 2020, 42, 15,, bib

·         M. Rodríguez-Cantelar, L. F. D’Haro, F. Matía, "Automatic Evaluation of Non-Task Oriented Dialog Systems by using Sentence Embeddings Projections and their Dynamics". IWSDS 2020, Sept 21-23, Madrid, Spain, bib

·         A. M. de Los Riscos-Mayorga, L. F. D’Haro, "ToxicBot: A Conversational Agent to Fight Online Hate Speech". IWSDS 2020, Sept 21-23, Madrid, Spain, bib

·         C. Zhang, L. F. D’Haro, R. E. Banchs, H. Li, T. Friedrichs, "Deep AM-FM: Toolkit For Automatic Dialogue Evaluation". IWSDS 2020, Sept 21-23, Madrid, Spain, bib

·         L. F. D’Haro, K. Yoshino, C. Hori, T. K. Marks, L. Polymenakos, J. K. Kummerfeld, M. Galley, X. Gao, "Overview of the seventh dialog system technology challenge: Dstc7". Computer Speech & Language, 62, p.101068, 2020, bib

·         M. Gil-Martín, R. San-Segundo, R. de Córdoba, J. M. Pardo, "Robust Biometrics from Motion Wearable Sensors Using a D-vector Approach". Neural Processing Letters, 52(3), 2109-2125, 2020,, bib

·         M. Gil-Martín, R. San-Segundo, F. Fernández-Martínez, R. de Córdoba, "Human activity recognition adapted to the type of movement". Computers & Electrical Engineering, Vol.88 2020,, bib

·         M. Gil-Martín, R. San-Segundo, F. Fernández-Martínez, J. Ferreiros-López, "Improving physical activity recognition using a new deep learning architecture and post-processing techniques". Engineering Applications of Artificial Intelligence, Vol.92 2020,, bib


·         L. F. D'Haro, R. E. Banchs, C. Hori, H. Li, "Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics". Computer Speech & Language, Volume 55, 2019, , bib

·         J. Wu, L. F. D'Haro, N. F. Chen, P. Krishnaswamy, R. E. Banchs, "Joint learning of word and label embeddings for sequence labelling in spoken language understanding". In 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (pp. 800-806). IEEE, December, 2019,  bib

·         C. Salamea, R. Cordoba, L. F. D’Haro, D. Romero, "Incorporation of Language Discriminative Information into Recurrent Neural Networks Models to LID Tasks". In International Conference on Smart Technologies, Systems and Applications (pp. 165-175). Springer, Cham, December, 2019, , bib

·         I. G. Godino, L. F. D'Haro, "GTH-UPM at TASS 2019: Sentiment Analysis of Tweets for Spanish Variants". In IberLEF@ SEPLN (pp. 579-588), September, 2019,  bib

·         A. Gallardo, J. M. Montero, "Tecnologías del habla y sus aplicaciones en la sociedad digital". Boletic (Revista de la Asociación Profesional de los Cuerpos Superiores de Tecnologías de la Información en la Administración), nº 84, pp 38-40 (ISSN 2659-949X / 2659-9503), julio 2019, bib

·         R. San-Segundo, H. Navarro-Hellín, R. Torres-Sánchez, J. Hodgins, F. De la Torre, "Increasing Robustness in the Detection of Freezing of Gait in Parkinson’s Disease". Electronics 2019, 8, 119;,  bib

·         J. M. Montero, A. Gallardo-Antolín, "External Attention LSTM Models for Cognitive Load Classification from Speech". 7th International Conference on Statistical Language and Speech Processing (SLSP 2019) Ljubljana, Slovenia - October 14-16, 2019, bib

·         M. Gil-Martín, J. M. Montero, R. San-Segundo, "Parkinson’s Disease Detection from Drawing Movements Using Convolutional Neural Networks". Electronics 2019, 8, 907; doi:10.3390/electronics8080907,  bib

·         A. García-Faura, F. Fernández-Martínez, R. Kleinlein, R. San-Segundo, F. Díaz-De-María, "A multi-threshold approach and a realistic error measure for vanishing point detection in natural landscapes". Engineering Applications of Artificial Intelligence, Vol.85 (pp 713-726) 2019, bib

·         F. Fernández-Martínez, Z. Callejas, R. Kleinlein, C. Luna Jiménez, J. M. Montero, J. M. Pardo, "Project CAVIAR. CApturing VIewers’ Affective Response (Inferencia de la respuesta afectiva de los espectadores de un vídeo)". xxxv Congreso de la Sociedad Española Para el Procesamiento del Lenguaje Natural, SEPLN2019, 24-27 septiembre 2019 bib

·         R. Kleinlein, D. Riaño, "Persistence Of Data-Driven Knowledge To Predict Breast Cancer Survival". International Journal of Medical Informatics, Volume 129 (2019), PP 303-311. DOI: bib

·         R. Kleinlein, C. Luna-Jiménez, J. M. Montero, Z. Callejas, F. Fernández-Martínez, "Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models". The 20th Annual Conference of the International Speech Communication Association - Interspeech 2019, Paper ID: 2799, Graz, Austria, Sep. 15-19 2019. bib

·         R. Kleinlein, A. García-Faura, C. Luna-Jiménez, J. M. Montero, F. Díaz-de-María, F. Fernández-Martínez, "Predicting Image Aesthetics for Intelligent Tourism Information Systems". Electronics 20198(6), 671; EISSN 2079-9292. (JCR Impact Factor: 2.110; Journal Rank in Category 113/260; Quartile in category Q2) bib

·         A. Coucheiro-Limeres, F. Fernández-Martínez, R. San-Segundo, J. Ferreiros,  "Attention-based word vector prediction with LSTMs and its application to the OOV problem in ASR".  The 20th Annual Conference of the International Speech Communication Association - Interspeech 2019, Paper ID: 2347, Graz, Austria, Sep. 15-19 2019. bib

·         M. Gil-Martín, R. San-Segundo, S. Lutfi, A. Coucheiro-Limeres, "Estimating gravity component from accelerometers". IEEE Instrumentation and Measurement Magazine 22(1):48-53. DOI: 10.1109/MIM.2019.8633352. February 2019. ISSN: 1094-6969 (JCR Impact Factor: 1.895; Journal Rank in Category 27/61; Quartile in category Q2) bib

·         R. San-Segundo, M. Gil-Martín, L. F. D’Haro, J. M. Pardo, "Classification of epileptic EEG recordings using signal transforms and convolutional neural networks". April 2019, Computers in Biology and Medicine 109. DOI: 10.1016/j.compbiomed.2019.04.031. bib

·         I. Taha Aksu, N. F. Chen, L. F. D’Haro, R. Banchs, "Reranking of responses using transfer learning for a retrieval-based chatbot". IWSDS 2019. Fourth Workshop on Chatbots and Conversational Agent Technologies and Dialogue Breakdown Detection Challenge (DBDC4), Pp 1-12,, Sicily, Italy, April 24 2019. bib


·         C. Salamea, R. de Córdoba, L. F. D’Haro, R. San-Segundo, J. Ferreiros, "On the use of Phone-based Embeddings for Language Recognition". IberSPEECH 2018, 21-23 noviembre 2018, pag. 55-59. DOI: 10.21437/IberSPEECH.2018-12 bib

·         A. Ortega, E. Lleida, R. San-Segundo, J. Ferreiros, l. Hurtado, E. Sanchís, M. I. Torres, R. Justo, "AMIC: Affective multimedia analytics with inclusive and natural communication". Procesamiento del Lenguaje Natural Nº 61, pp 147-150, septiembre 2018, ISSN 1135-5948. bib

·         A. García Faura, A. Hernández-García, F. Fernández-Martínez, F. Díaz-de-María, R. San-Segundo, "Emotion and attention: Audiovisual models for group-level skin response recognition in short movies". Web Intelligence, Vol. 17, Nº 2, pp 29-40, ISSN 2405-6464, 2018. DOI 10.3233/WEB-190398. bib

·         A. Coucheiro-Limeres, J. Ferreiros, R. San-Segundo, R. Córdoba, "Resource2Vec: Linked Data distributed representations for term discovery in automatic speech recognition". Elsevier Expert Systems With Applications, Vol. 112, pp 301-320, June 2018 (ISSN: 0957-4174). (JCR Impact Factor: 3.768; Journal Rank in Category 20/132; Quartile in category Q1) bib

·         R. San-Segundo, H. Blunck, J. Moreno-Pimentel, A. Stisen, M. Gil-Martín "Robust Human Activity Recognition using smartwatches and smartphones". Engineering Applications of Artificial Intelligence, Volume 72, June 2018, Pages 190-202. ISSN: 0952-1976. (JCR Impact Factor: 2.819; Journal Rank in Category 32/132; Quartile in category Q1) bib

·         B. Lahasan, S. Lutfi, I. Venkat, M. A. Al-Betar, R. San-Segundo, "Optimized symmetric partial facegraphs for face recognition in adverse conditions". Information Sciences, pp 194-214, ISSN 0020-0255, March 2018. DOI 10.1016/j.ins.2017.11.013, March 2018. (JCR Impact Factor: 4.305; Journal Rank in Category 12/148; Quartile in category Q1) bib

·         G. Renunathan Naidu, S. Lutfi, A. Aziz, J. L. Lorenzo-Trueba, J. M. Montero, "Cross-cultural Perception of Spanish Synthetic Expressive Voices Among Asians". Applied Sciences-Basel 8(3):426, March 2018. DOI10.3390/app8030426. (JCR Impact Factor: 1.689; Journal Rank in Category 77/146; Quartile in category Q3) bib

·         C. Salamea, L. F. D’Haro, R. de Córdoba, "Language Recognition Using Neural Phone Embeddings and RNNLMs". IEEE Latin America Transactions, Vol. 16, No. 7, July 2018, pp. 2033-2039. 07/2018. ISSN 1548-0992 (JCR Impact Factor: 0.502; Journal Rank in Category 239/260; Quartile in category Q4) bib


·         A. Clemotte, M. A. Velasco, R. Raya, R. Ceres, R. de Córdoba, E. Rocon, "Metodología de evaluación de eye-trackers como dispositivos de acceso alternativo para personas con parálisis cerebral". Revista Iberoamericana de Automatica e Informatica Industrial ISSN: 1697-7912, DOI: 10.1016/j.riai.2017.07.004 (JCR Impact Factor: 0.5; Journal Rank in Category 57/60; Quartile in category Q4) bib

·         B. Lahasan, S. Lutfi, R. San-Segundo, "A survey on techniques to handle face recognition challenges: occlusion, single sample per subject and expression". Artificial Intelligence Review, ISSN: 0269-2821 (JCR Impact Factor: 2.627; Journal Rank in Category 38/133; Quartile in category Q2). bib

·         J. Tejedor, D.T. Toledano, P. López-Otero, L. Docio-Fenández, L. Serrano, I. Hernáez, A. Coucheiro-Limeres, J. Ferreiros, J. Olcoz, J. Llombart, "ALBAYZIN 2016 spoken term detection evaluation: an international open competitive evaluation in Spanish". Eurasip Journal on Audio Speech And Music Processing, ISSN: 1687-4722, DOI: 10.1186/s13636-017-0119-z (JCR Impact Factor: 3.057; Journal Rank in Category 4/31; Quartile in category Q1) bib

·         C. Jimenez-Recio, A. Zlotnik, A. Gallardo-Antolín, J. M. Montero and J. C. Martínez-Castrillo, "Prediction of the Degree of Parkinson's Condition Using Recordings of Patients' Voices". 9th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2017), pp. 120-129. Advances in Intelligent Systems and Computing, vol 737. Springer, Cham. Online ISBN: 978-3-319-76357-6, DOI 10.1007/978-3-319-76357-6_12, December 11-13th 2017, Marrakech. bib

·         F. Fernández-Martínez, A. Hernández-García, M. A. Fernández-Torres, I. González-Díaz, A. García-Faura, F. Díaz de María, "Exploiting visual saliency for assessing the impact of car commercials upon viewers". Multimedia Tools and Applications. The final publication is available at, (JCR Impact Factor: 1.53; Journal Rank in Category 48/106; Quartile in category Q2) bib

·         A. Hernández-García, F. Fernández-Martínez, F. Díaz-de-María, "Emotion and attention: predicting electrodermal activity through video visual descriptors". Proceedings WI ’17 International Conference on Web Intelligence, August 23-26, 2017, Leipzig, Germany, doi: 10.1145/3106426.3109418. bib

·         J. Zhu, R. San-Segundo, J. M. Pardo, "Feature extraction for robust physical activity recognition". Human-centric Computing and Information Sciences, 7:16. 2 June 2017. ISSN: 2192-1962, doi: bib

·         B. Martínez-González, J. M. Pardo, J. D. Echeverry-Correa, R. San-Segundo, "Spatial Features Selection for Unsupervised Speaker Segmentation and Clustering". Expert Systems With Applications No. 73, 1 May 2017, pp 27-42, ISSN: 0957-4174, (JCR Impact Factor: 3.768; Journal Rank in Category 20/132; Quartile in category Q1), bib

·         R. San-Segundo, J.D. Echeverry-Correa, C. Salamea-Palacios, S. Lutfi, J. M. Pardo, "I-vector analysis for gait-based person identification using smartphone inertial signals". Pervasive and Mobile Computing, vol. 38 part 1, July 2017. (JCR Impact Factor: 2,974; Journal Rank in Category 33/148; Quartile in Category Q1) ISSN: 1574-1192 doi: bib


·         J. M. Montero, "Generación de lenguaje hablado". Capítulo 2 de libro “Tecnologías del lenguaje en España”, autor: Ángel Luis Gonzalo” ISBN 978-84-08-16893-5, 2016. bib

·         R. San-Segundo, J. D. Echeverry-Correa, C. Salamea, , J. M. PardoHuman Activity Monitoring Based on Hidden Markov Models Using a Smartphone. IEEE Instrumentation & Measurement Magazine, vol. 19 issue 6, December 2016, pp 27-31. (JCR Impact Factor: 1,438; Journal Rank in Category 32/58; Quartile in Category Q3) DOI: 10.1109/MIM.2016.7777649. bib

·         A. Coucheiro-Limeres and J. Ferreiros, "GTH-UPM System for Albayzin 2016 Search on Speech Evaluation". ALBAYZIN 2016 SEARCH ON SPEECH EVALUATION, Iberspeech 2016, november 2016, Lisbon, Portugal bib

·         J. Lorenzo-Trueba, R. Barra-Chicote, J. Yamagishi, J. M. Montero, " Improving Spanish speech synthesis intelligibility under noisy environments". The Journal of the Acoustical Society of America 140 (4), 2961-2962, ISSN: 0001-4966, 2016. (JCR Impact Factor: 1,547; Journal Rank in Category 16/31; Quartile in Category Q3). bib

·         J. Lorenzo-Trueba, R. Barra-Chicote, A. Gallardo-Antolin, J. Yamagishi and J.M. Montero, "Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM". COLING 2016, The 26th International Conference on Computational Linguistics. bib

·         J. Ferreiros, J. M. Pardo, L-F Hurtado, E. Segarra, A. Ortega, E. Lleida, M. I. Torres, R. Justo, "ASLP-MULAN: Audio speech and language processing for multimedia analytics". Procesamiento del Lenguaje Natural, Revista nº 57, septiembre de 2016, págs. 147-150 bib

·         R. San-Segundo, J.M. Montero, J. Moreno-Pimentel, J. M. Pardo, "HMM Adaptation for Improving a Human Activity Recognition System". Algorithms, Volume 9, Issue 3, pp 60-73, 2016. doi:10.3390/a9030060 bib

·         A. Hernández-García, F. Fernández-Martínez, F. Díaz-de-María, "Comparing visual descriptors and automatic rating strategies for video aesthetics prediction". Signal Processing-Image Communication 47(September 2016):280–288, July 2016. ISSN: 0923-5965. (JCR Impact Factor: 2,244; Journal Rank in Category 103/262; Quartile in Category Q2) DOI:10.1016/j.image.2016.07.004 bib

·         B. Martínez-González, J.M. Pardo, R. San-Segundo, J.M. Montero, "Influence of Transition Cost in the Segmentation Stage of Speaker Diarization". Odyssey 2016, June 21-24, pp 385-392, 2016, Bilbao, Spain. ISSN: 2312-2846 bib

·         C. Salamea, L.F. D’Haro, R. de Córdoba, R. San-Segundo, "On the use of phone-gram units in recurrent neural networks for language identification". Odyssey 2016, June 21-24, pp 117-123, 2016, Bilbao, Spain. ISSN: 2312-2846 bib

·         A. Zlotnik, M. Cuchí Alfaro, M. C. Pérez Pérez, A. Gallardo, J. M. Montero, "Building a Decision Support System for Inpatient Admission Prediction With the Manchester Triage System and Administrative Check-in Variables". Computers, informatics, nursing: CIN March 2016. ISSN: 1538-2931 (JCR Impact Factor: 1,301; Journal Rank in Category 71/105; Quartile in Category Q3)