You are here : Version anglaise > Fellows > Previous cohorts

Short stay

Taylor
Arnold

Linguistics - USA

Contact details

Research topics

PROJECT

MODELLING LINGUISTIC SHIFTS AND COMMUNITY FORMATION IN LARGE TEXTUAL DATASETS

This project focuses on the modelling, understanding, and visualization of how language changes over time. The goal of the project is to understand how intra-authorial shifts reflect the emergence of sub-communities and forms of power within specific discursive settings. The research offers particular insight into how people interact in digital spaces, a particularly important task given the increased prevalence of social interactions that occur in seemingly anonymous digital platforms. My project relates to prior work in temporal linguistic change in both natural language processing and digital humanities. It is a known phenomenon that communities of expertise build themselves around particular linguistic terms and constructions. Members outside of a community actively acquire these features in order to gain acceptance within a sphere of influence. Examples of communities with strong linguistic signals include law, academia, medicine, and politics. Understanding intra-author changes in language allows for the detection and modelling of this phenomenon. It also has broader implications for the study of community formation and structures of expertise and topic formation.

Two datasets serve as direct applications for the project. The first consists of the large, multilingual set of Wikipedia edit comments. The second dataset is a collection of oral biographies from the 1930s. Outcomes of the research project include three elements: (1) the formulation of statistical models that describe patterns observed across the data; (2) open source software that implements the models on new corpora and visualizes the output; and (3) a critical analysis showing how the results relate to the formation of discursive communities and the social construction of domain-expertise.
 

Activities / Resume

BIOGRAPHY

Taylor Arnold is an assistant professor at the University of Richmond (Virginia, U.S.A.). He has an appointment within the linguistics program and the department of mathematics and computer science. Arnold received his Ph.D. in statistics at Yale University in 2013. Prior to the University of Richmond, he was a senior scientist at AT&T Labs Research in New York City. Arnold studies massive cultural datasets in order to address new and existing research questions in the humanities and social sciences. He specializes in the application of statistical computing to large text and image corpora. The study of data containing both text and images, such as newspapers with embedded figures or television shows with associated closed captions, is of particular interest.

MAIN PUBLICATIONS

Monographs

  • Arnold, T., Kane, M., & Lewis, B. (2019). A Computational Approach to Statistical Learning. New York, NY: Chapman & Hall/CRC Texts in Statistical Science.
  • Arnold, T., and Tilton, L. (2015). Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text. New York, NY: Springer.

Articles

  • Arnold, T. and Tilton, L. (2019). "Distant Viewing : Analyzing Large Visual Corpora." Digital Scholarship in the Humanities.
  • Arnold, T., Berke, A., and Tilton, L. (2019), "Visual Style in Two Network Era Sitcoms." Cultural Analytics.
  • Arnold, T. (2019). “Industrial Research in Applied Statistics.” Notices of the American Mathematical Society,
  • Arnold, T., and Tilton, L. (2018) “Cross-Discourse and Multilingual Exploration of Text with the DualNeighbors Algorithm.” Proceedings of the 2nd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 50–59.
  • Arnold, T., Ballier, N., Gaillat, T., and Lissón, P. (2018). “Predicting CEFRL Levels in Learner English on the Basis of Metrics and Full Texts.” Proceedings of the 20th Conférence sur L’Apprentissage Automatique, 75–82.
  • Arnold, T. (2017). “Tidy Data Model for Natural Language Processing Using cleanNLP.” The R Journal, 9(2), 248–267.
  • Arnold, T., Leonard, P., and Tilton, L. (2017). “Knowledge Creation Through Recommender Systems.” Digital Scholarship in the Humanities, 32.3.
  • Arnold, T., Kane, M., and Urbanek, S. (2017). “iotools: High-performance Tools for R.” The R Journal, 9(1), 6–13.
  • Arnold, T., Maples, S., Tilton, T., and Wexler, L. (2017). “Uncovering Latent Metadata in the FSA-OWI Photographic Archive.” Digital Humanities Quarterly, 11(2).
  • Arnold, T., and Tibshirani, R. (2016). “Efficient Implementations of the Generalized Lasso Dual Path Algorithm.” Journal of Computational and Graphical Statistics, 25(1), 1–27.