Assortative mixture of English parts of speech
Date of Original Version
Network data analysis is an emerging area of study that applies quantitative analysis to complex data from a variety of application fields. Methods used in network data analysis enable visualization of relational data in the form of graphs and also yield descriptive characteristics and predictive graph models. This paper presents an application of network data analysis to the authorship attribution problem. Specifically, we show how a representation of text as a word graph produces the well documented feature sets used in authorship attribution tasks such as the word frequency model and the part-of-speech (POS) bigram model. Analysis of these models along with word graph characteristics provides insights into the English language. Particularly, analysis of the nominal assortative mixture of parts of speech, a statistic that measures the tendency of words of the same POS in the word network to be connected by an edge, reveals regular structural properties of English grammar.
Studies in Computational Intelligence
Leonard, Timothy, Lutz Hamel, Noah M. Daniels, and Natallia V. Katenka. "Assortative mixture of English parts of speech." Studies in Computational Intelligence 689, (2018): 463-475. doi:10.1007/978-3-319-72150-7_38.