ResearchPublications

Identifying substance use and high-risk sexual behavior among sexual and gender minority youth by using mobile phone data: Development and validation study
Abstract

BACKGROUND: Sexual and gender minority (SGM) individuals are at heightened risk for substance use and sexually transmitted infections than their non-SGM peers. Collecting mobile phone usage data passively may open new opportunities for personalizing interventions, as behavioral risks could be identified without user input.

OBJECTIVE: This study aimed to determine (1) whether passively sensed mobile phone data can be used to identify substance use and sexual risk behaviors for sexually transmitted infection (STI) and HIV transmission among young SGM who have sex with men, (2) which outcomes can be predicted with a high level of accuracy, and (3) which passive data sources are most predictive of these outcomes.

METHODS: We developed a mobile phone app to collect participants’ messaging, location, and app use data and trained a machine learning model to predict risk behaviors for STI and HIV transmission. We used Scikit-learn to train logistic regression and gradient boosting classification models with simple linear model specification to predict participants’ substance use and sexual behaviors (ie, condomless anal sex, number of sexual partners, and methamphetamine use), which were validated using self-report questionnaires. F1-scores were used to quantify prediction accuracy of the model using different data sources (and combinations of these sources) for prediction. Differences between text, location, app use, and Linguistic Inquiry and Word Count (LIWC) domains by outcome were investigated using independent t tests where associations were considered significant at P<.05.

RESULTS: Among participants (n=82) who identified as SGM, were sexually active, and reported recent substance use, our model was highly predictive of methamphetamine use and having > / =6 sexual partners (F1-scores as high as 0.83 and 0.69, respectively). The model was less predictive of condomless anal sex (highest F1-score 0.38). Overall, text-based features were found to be most predictive, but app use and location data improved predictive accuracy, particularly for detecting > / =6 sexual partners. Methamphetamine use was significantly associated with dating app use (P=.01) and use of sex-related words (P=.002). Having > / =6 sex partners was associated with dating app use (0.02), use of sex-related words (P=.001), and traveling a further distance from home (P=.03), on average, compared to participants with fewer sex partners. Methamphetamine users were more likely to use social (P=.002) and affect words (P=.003) and less likely to use drive-related words (P=.02). People having 6 or more partners were more likely to use social, affect words, and cognitive process-related words (P=.003 and .004 respectively).

CONCLUSIONS: Our results show that passively collected mobile phone data may be useful in detecting sexual risk behaviors. Expanding data collection may improve the results further, as certain behaviors, such as injection drug use, were quite rare in the study sample. These models may be used to personalize STI and HIV prevention as well as substance use harm reduction interventions.

Download PDF

Full citation:
Beikzadeh M, Holloway IW, Kärkkäinen K, Hong C, Cascalheira C, Wu ESC, Boka C, Avendaño AC, Yonko EA, Sarrafzadeh M (2025).
Identifying substance use and high-risk sexual behavior among sexual and gender minority youth by using mobile phone data: Development and validation study
Online Journal of Public Health Informatics, 17, e68013. doi: 10.2196/68013. PMCID: PMC12360732.