Probabilistic models and natural language processing in health

Cobo Aguilera, Aurora

Probabilistic models and natural language processing in health

Cobo Aguilera, Aurora

Dirigida por:

Antonio Artés Rodríguez Director/a
Pablo Martínez Olmos Codirector/a

Universidad de defensa: Universidad Carlos III de Madrid

Fecha de defensa: 15 de diciembre de 2022

Tribunal:

Joaquín Miguez Arenas Presidente/a
Francisco Jesús Rodríguez Ruiz Secretario/a
Santiago Ovejero García Vocal

Tipo: Tesis

Teseo: 761613 DIALNET e-Archivo editor

Resumen

The treatment of mental disorders nowadays entails a wide variety of still non-solved tasks such as misdiagnosis or delayed diagnosis. During this doctoral thesis we study and develop different models that can serve as potential tools for the clinician labor. Among our proposals, we outline two main lines of research, Natural Language Processing and probabilistic methods. In Chapter 2, we start our thesis with a regularization mechanism used in language models and specially effective in Transformer-based architectures, where we call it NoRBERT, from Noisy Regularized Bidirectional Representations from Transformers [9], [15]. According to the literature, we found out that regularization in NLP is a low explored field limited to the use of general mechanisms such as dropout [57] or early stopping [58]. In this landscape, we propose a novel approach to combine any LM with Variational Auto-Encoders [23]. VAEs belong to deep generative models, with the construction of a regular latent space that permits the reconstruction of the input samples throughout an encoder and decoder networks. Our VAE is based in a prior distribution of a mixture of Gaussians (GMVAE), what gives the model the chance to capture some multimodal information. Combining both, Transformers and GMVAEs we build an architecture capable of imputing missing words from a text corpora in a diverse topic space as well as improve BLEU score in the reconstruction of the data base. Both results depend on the depth of the regularized layer from the Transformer Encoder. The regularization in essence is formed by the GMVAE reconstruction of the Transformer embeddings at some point in the architecture, adding structure noise that helps the model a better generalization. We show improvements in BERT[15], RoBERTa [16] and XLM-R [17] models, verified in different datasets and we also provide explicit examples of sentences reconstructed by Top NoRBERT. In addition, we validate the abilities of our model in data augmentation, improving classification accuracy and F1 score in various datasets and scenarios thanks to augmented samples generated by NoRBERT. We study some variations in the model, Top, Deep and contextual NoRBERT, the latter based in the use of contextual words to reconstruct the embeddings in the corresponding Transformer layer. We continue with the Transformers line of research in Chapter 3, proposing PsyBERT. PsyBERT, as the own name refers, is a BERT-based [15] architecture suitably modified to work in Electronic Health Records from psychiatry patients. It is inspired by BEHRT [19], also devoted to EHRs in general health. We distinguish our model from the training methodology and the embedding layer. In a similar way that with NoRBERT, we find the utility of using a Masked Language Modeling (MLM) policy without no finetuning or specific-task layer at all. On the one hand, we used MLM in NoRBERT to solve the task of imputing missing words, finishing the aim of the model in generating new sentences by inputs with missing information. On the other hand, we firstly propose the use of PsyBERT such as tool to fill the missing diagnoses in the EHR as well as correct misdiagnosed cases. After this task, we also apply PsyBERT in delusional disorder detection. On the contrary, in this scenario we apply a multi-label classification layer, that aims to compute the probability of the different diagnoses in the last visit of the patient to the hospital. From these probabilities, we analyse delusional cases and propose a tool to detect potential candidates of this mental disorder. In both tasks, we make use of several fields obtained from the patient EHR, such as age, sex, diagnoses, treatments of psychiatric history and propose a method capable of combining heterogeneous data to help the diagnosis in mental health. During these works, we point out the problematic in the quality of the data from the EHRs [104], [105] and the great advantage that medical assistance tools like our model can provide. We do not only solve a classification problem with more than 700 different illnesses, but we bring a model to help doctors in the diagnosis of very complex scenarios, with comorbidity, long periods of patient exploration by traditional methodology or low prevalence cases. We present a powerful method treating a problematic with great necessity. Following the health line of research and psychiatry application, we analyse in Chapter 4 a probabilistic method to search for behavioral pattern in patients also with mental disorders. In this case it is not the method the contribution of the work but the application and results in collaboration with the clinician interpretation. The model is called SPFM (Sparse Poisson Factorization Model) [22] and consist on a non-parametric probabilistic model based on the Indian Buffet Process (IBP) [20], [21]. It is a exploratory method capable of decomposing the input data in sparse matrixes. For that, it imposes the Poisson distribution to the product of two matrixes, Z and B, both obtained respectively by the IBP and a Gamma distribution. Hence Z corresponds to a binary matrix representing active latent features in a patient data and B weights the contribution of the data characteristics to the latent features. The data we use in the three works described during the chapter refers to different questions from e-health questionnaries. Then, the data characteristics refer to the answer or punctuation on each question and the latent features from different behavioral patterns in a patient regarding the selection of features active in their questionnaires. For example, patient X can present feature 1 and 2 and patient Y may presence feature 1 and 3, giving as a result two different profiles of behavioral. With these procedure we study three scenarios. In the first problematic, we relate the profiles with the diagnoses, finding common patterns among the patients and connections between diseases. We also analyse the grade of critical state and contrast the clinician judgment via the Clinical Global Impression (CGI). In the second scenario, we pursue a similar study and find out connections between disturbed sleeping patterns and clinical markers of wish to die. We focus this analysis in patients with suicidal thoughts due to the problematic that those individuals suppose as a major public health issue [175]. In this case we vary the questionnarie and the data sample, obtaining different profiles also with important information to interpret by the psychiatrist. The main contribution of this work is the proportion of a mechanism capable of helping with detection and prevention of suicide. Finally, the third work comprehend a behavioral pattern study in mental health patient before and during covid-19 lockdown. We did not want to lose the chance to contribute during coronavirus disease outbreak and presented a study about the changes in psychiatric patients during the alarm state. We analyse again the profiles with the previous e-health questionnaire and discover that the self-reported suicide risk decreased during the lockdown. These results contrast with others studies [237] and suppose signs for an increase in suicidal ideation once the crisis ceases. Finally, Chapter 5 propose a regularization mechanism based in a theoretical idea from [245] to obtain a variance reduction in the real risk. We interpret the robust regularized risk that those authors propose in a two-step mechanism formed by the minimization of the weighted risk and the maximization of a robust objective and suggest an idea to apply this methodology in a way to select the samples from the mini-batch in a deep learning set up. We study different variations of repeating the worst performed samples from the previous mini-bath during the training procedure and show proves of improvements in the accuracy and faster convergence rates of a image classification problem with different architectures and datasets.