Language learning aptitude is assumed to explain a relatively large amount of the variance in the acquisition rate and ultimate L2 level reached by adult second language learners. In SLA, the quality of studies on language aptitude crucially depends on the availability of valid research instruments. The most popular language aptitude test during the past decade has been the LLAMA test suite (Meara, 2005), which has figured in over 50 published studies from which conclusions have been made about phenomena related to language aptitude. A recent paper (Bokander & Bylund, 2020) found several problematic issues pertaining to the internal validity of the LLAMA, such as low reliability of test scores, but the authors did not address external validity, that is, the ability of an aptitude test to predict learning outcomes. No large-scale external validation study of the LLAMA has hitherto been undertaken, but an alternative way of evaluating the external validity of an aptitude test is to systematically examine its correlations with learning outcomes. This presentation reports from a systematic review of previously published correlations between LLAMA and various L2 tasks (e.g., grammaticality judgements, pronunciation, or general L2 proficiency). The aim is to gauge the overall effectiveness of the LLAMA in producing significant correlations with L2 outcomes.
Empirical original studies were obtained via popular scientific databases and were included in the review if they used the full LLAMA suite or a subset thereof, and if they reported correlation coefficients with L2 outcomes. L2 tasks were coded according to the linguistic features that were in focus into four categories: general L2 ability; grammar; vocabulary; and phonology/pronunciation. The correlation coefficients were dichotomously coded as statistically significant or non-significant, depending on how they had been reported in the original studies. In total, 36 studies fulfilled the inclusion criteria. From them, 460 correlations were obtained based on scores from 2286 participants.
The systematic review reveals that only about 20% of the correlations between LLAMA tasks and L2 learning outcomes were reported as statistically significant, potentially allowing them to be interpreted as non-random and attributed with psychological meaning. However, the highest correlations were consistently found in small samples, in which sampling error may be large. Several studies, often those with larger sample sizes, reported near zero correlations with outcomes, and this in cases where positive associations between aptitude and L2 outcomes would be theoretically expected. The analysis thus suggests that some findings based on correlations with LLAMA scores may be unduly influenced by measurement error. A recommendation for future research is using the full LLAMA suite in large-sample correlational designs with a variety of L2 outcomes, in order to evaluate the external validity of the LLAMA. In conclusion, there is a need for caution when researchers base their findings on correlations with LLAMA because too little is yet known about its external validity.
ReferencesBokander, L., & Bylund, E. (2020). Probing the internal validity of the LLAMA language aptitude tests. Language Learning, 70(1), 11–47.
Meara, P. (2005). The LLAMA language aptitude tests. Lognostics.
2022.
Presentation at EuroSLA 31, 24-27 August 2022, University of Fribourg, Switzerland