Using Large Language Models to Advance the Development and Practice of Personality Assessment

Shea Fyffe

Advisor: Philseok Lee, PhD, Department of Psychology

Committee Members: Seth Kaplan, Reeshad Dalal

Peterson Hall, #1105
November 07, 2025, 01:00 PM to 03:00 PM

Abstract:

Recent developments in natural language processing (NLP)—an area of artificial intelligence (AI) focused on teaching computers to analyze and understand human language—have led to the emergence of a family of NLP models known as large language models (LLMs). Given drastic improvements in performance and accessibility, LLMs are seeing widespread adoption by researchers and practitioners who administer and design talent management systems. This trend is particularly noticeable in applications of personality testing, where researchers and test developers are exploring ways to integrate LLMs into personality scale development and assessment. This dissertation builds upon this enthusiasm by presenting two novel applications of LLMs to personality. In Study 1, we present a novel approach to automated item generation (AIG), building on existing research. We demonstrate how "alignment" techniques, specifically direct preference optimization (DPO), can enhance AIG. Our findings indicate that DPO effectively enhances an LLM's ability to generate diverse item types. Notably, applying DPO to a smaller LLM (3.2 billion parameters) yields performance comparable to, or better than, a much larger LLM (405 billion parameters). In Study 2, we examine inconsistencies in NLP-based methods—even those based on LLMs—for personality assessment. Then, we present a novel approach for deriving indicators of personality from text. Today's prevailing practice is to train LLMs on self-report personality scores. However, this method can cause the models to learn from irrelevant linguistic patterns that simply reflect the biases and flaws in the self-report process. Our research reveals that factors like the text data's stimulus (e.g., writing prompts, essay topics, interview questions), the training sample, and the self-report score distribution significantly influence an LLM's predictive accuracy for self-report scores. As a compelling alternative, we propose a natural language inference (NLI) approach, which holds promise for strengthening personality assessment practices.