Abstract
Disinformation, misinformation, and other `fake news’ – collectively false information is quick and inexpensive to create and distribute in our increasingly digital and connected world. Identifying false information early and cost effectively can offset some of those operational advantages. In this paper, we develop light-weight machine learning models that utilize (1) a novel data set tracking browsing behavior and (2) domain registration data that is available for all websites when they are established. Using only the domain registration data, we develop and demonstrate a machine learning classifier that identifies domains, at the time the domain is registered, that will go on to produce false information. We then combine this data with our browsing data and develop a machine learning classifier that identifies false information domains whose content is most associated with higher levels of consumption. Finally, we use our data to identify false information domains that will cease operations after an event of interest, in our case the 2016 U.S. presidential election. We theorize that the last category involves actors seeking primarily to manipulate perceptions and outcomes of that event.
Introduction
The online proliferation of disinformation, misinformation, and other `fake news’ collectively false information has become an increasingly common characteristic of the digital information environment Bradshaw and Howard (2019). Recent false information campaigns have targeted areas that are salient to management and operations in both the private and public sectors. Some false information campaigns target companies. For example, the United States Department of Homeland Security identified a false information campaign in 2018 in which “right wing actors . . . sought to discredit and undermine Nike’s brand reputation” and “do economic harm to a corporation with whom they disagreed” (U.S. Department of Homeland Security, 2019). In 2020, Facebook accused one of south-east Asia’s biggest telecommunication firms of using Facebook accounts to conduct a commercial disinformation campaign seeking to discredit its competitors (Murphy and Reed, 2020). False information campaigns can also target local and national communities and governments.
For instance, in 2014, elaborately orchestrated false information campaigns separately fabricated an explosion at a chemical plant in Louisiana and an outbreak of the Ebola virus in Atlanta, Georgia (Chen, 2015). And in 2020, the United States accused Russia, China, and Iran of engaging in far-reaching false information campaigns on the causes, treatments, and consequences of the novel COVID-19 pandemic (Barnes et al., 2020).
In this paper, we use data from a canonical example of organized false information the 2016 U.S. presidential election. We show how light-weight machine learning models that utilize data at the time a website is registered can be used as an early warning signal to identify domains that are likely to produce false information, produce false information that is most associated with high levels of consumption, and abandon their operations after an event of interest, in our case an election. We theorize and provide suggestive evidence that domains abandoned after an event of interest are established by entities seeking primarily to manipulate perceptions and outcomes of that event.
Download the report to find more.