The presence of social structure in online texts based on word embedding NLP models
Research on social stratification is closely linked to analysing the prestige associated with different occupations. In this research, word embedding models were applied on large text corpora to see, whether the created semantic space similarly structures the chosen occupations as the ranking obtained from sociological analysis of survey data. The ranking of the digital data shows rank correlation of 0.75 with the ISEI ranking. As a robustness test, we run our model on two different digital corpora representing different spheres of language use and the results were quite similar to each other. A closer analysis showed standard components (income and education play important structuring role in the semantic space), but aspects not revealed so far also emerged.