Neutralizing Confirmation Bias in Technology-Mediated Security Decisions

Statistical tables are misbehaving, feel free to email me for a document.

Neutralizing Confirmation Bias in Technology-Mediated Security Decisions

By Dr. Vera K. Wilde, Ph.D.

Abstract: Lie detectors (polygraphs) are part of and in some ways can proxy for the larger surveillance state ocean of technology-mediated security decisions. Scientific evidence is insufficient to establish the neutrality of these decisions. What if bias systematically affects them? In a series of four survey experiments, I show racial bias probably does not affect them, confirmation bias probably does, and it’s possible to “hack,” or neutralize, the confirmation bias.

You are always right—or so a part of you likes to believe. We all have confirmation bias all the time. Making sense of the world is easier when we plug new facts into old fact patterns, and usually those mental shortcuts are adaptive. They get us to work when we’re busy listening to the radio, help us understand what people are asking when we don’t catch the whole question, and generally let us focus on learning new things instead of trying to make sense anew of the entire world all at once, at every moment. In this way, confirmation bias is part of the cognitive tuning apparatus that lets us have selective attention—tuning our radios to discrete signals in order to function in harmony in a world with lots of noise—and run automatic as opposed to controlled cognitive processes—enjoying higher-level intelligence and flow states in the process.

But the same mental shortcuts can cause mistakes or even serve as backdoors for bias (Kahneman 2003, Plous 1993, Stanovich and West 2000). This can cause problems in everyday interactions, as when racial stereotypes lead potential employers to contact candidates with African-American names at lower rates than candidates with Caucasian names (Bertrand and Mullainathan 2004). And it might also contribute to less easily documented inequalities on a larger scale in the context of mass surveillance. What if technology-mediated security decisions that appear neutral and scientific are actually biased—systematically influenced by extraneous factors such as stereotypes, negative background information, or the interpreter’s own beliefs about the technology? Confirmatory evidence-seeking is a well-documented problem in political institutions’ decision-making processes, affecting a broad range of judgments from interpretations of forensic evidence, to decisions about whether or not to go to war (Blix 2004, Jervis 2006, Thompson 2009). But the science assessing whether decision-making technologies such as lie detectors (polygraphs) help or hurt administrative neutrality lacks consensus.

Leading scholars have long agreed that polygraphs are insufficiently evidence-based to rely on them in security decisions (e.g., Iacono and Lykken 1997, National Research Council 2003). When the co-chairs of the Congressionally commissioned National Research Council report on this evidence presented their summary to the Senate Select Committee on Intelligence, the Senators agreed that polygraphs appeared to present more national security risk than benefit to the intelligence agencies that are their strongest advocates and heaviest users, given the better-than-best-case limits of their ability to detect or deter terrorists or spies (Fienberg and Stern 2005).

Yet instead of shrinking, polygraph programs and their next-generation analogues in security screenings at border crossings and other checkpoints have expanded as part of the post-9/11 security state expansion others have documented (Becker 2013, Geracimos 2002, Arkin & Priest 2010, Priest 2011). Recruit and employee security screening polygraphs rose nearly 750% in the FBI from 2002 to 2005 (U.S. Department of Justice 2006). Polygraph programs are required as part of U.S. sponsored anti-corruption programs such as Plan Colombia, the Mérida Initiative in Mexico, and others in the Bahamas, Bolivia, Guatemala, Honduras, and Iraq (U.S. Government Accountability Office 2010; U.S. Department of State 2010). The programs also play increasingly prominent roles at national labs (Aftergood 2000; Department of Energy 2006). Meanwhile, behavioral, verbal, and psychophysiological threat and credibility assessment systems extending polygraph screening methods to transportation and border security screening contexts—such as the Department of Homeland Security’s Transportation Security Administration’s SPOT (Screening of Passengers by Observation Techniques), FAST (Future Attribute Screening Technologies), and AVATAR (the Automated Virtual Agent for Truth Assessments in Real-time)—touch the realms of everyday freedom of movement, expression, association, and privacy for all (BORDERS National Center for Border Security and Immigration 2013, Furnas 2012, Greenemeier 2012, Higginbotham 2013, Nunamaker and Golob 2012, O’Reilly 2012, U.S. Department of Homeland Security 2010, U.S. Government Accountability Office 2010, U.S. G.A.O. 2011, U.S. G.A.O. 2013). Deception detection is a $3-4 billion/year domestic industry. The surveillance state as a whole probably has an annual taxpayer pricetag an order of magnitude larger.

In budgetary and other respects, polygraphs are a useful proxy for the surveillance state. Their insufficient evidence base, non-transparency, and expansion probably track together, but larger programs such as those managing mass metadata collection in telecommunications are even harder to study with quantitative methods. Non-transparency in Defense budgets makes estimating the cost and magnitude of polygraph programs themselves difficult. Despite intense secrecy surrounding federal lie detection programs—and documented intelligence agency deception in Congressional communication—polygraphs are a relatively visible, quantifiable drop in the technology-mediated security decision ocean (Taylor 2013).2

With the late 20th-century expansion of the “law and order” state, use of avowedly objective, scientific forensic evidence skyrocketed in public programs, policies and institutions. U.S. criminal justice, immigration, and national security systems now routinely use “lie detectors” to inform criminal, administrative, and employment investigations. Yet there is widespread scientific consensus that there is no technology capable of detecting deception, as there is indeed no evidence of a “deception response” to detect. Thus, numerous widespread forensic procedures lack scientific basis. Ironically, scientific evidence abounds demonstrating the potential of cognitive biases to affect the interpretation of forensic data. These biases can include information on individuals’ race (in the case of racial bias) and other background information (in the case of confirmation bias). Further, some evidence suggests the interpretation of forensic data may be subject to cognitive biases just like other interpretive tasks.

In a series of four randomized controlled survey experiments, the potential for technology-mediated bias—bias in decisions that appear neutral and scientific, but could actually institutionalize prejudice—comes out in a confirmation bias effect in polygraph chart interpretation. The experiments also produce consistently null racial and intersectional (race plus confirmation) bias results. In the experiments, people read a background investigation and scored a polygraph chart. Results indicate that polygraphs probably generate apparently novel evidence that instead tends to confirm rather than independently checking preconceptions. The fourth survey experiment also suggests that the confirmation bias effect disappears
when interpreters can elect to run the test in “suspicious mode.” So as usual, technology has the ability to advance decision-making speed and accuracy—or to hinder that same accuracy by institutionalizing preconceptions in neutral-seeming ways. My research finds evidence supporting both potentials for polygraphy. The rest of this summary presents the experimental design and results in greater detail.

I. Against the Grain

Overall, this research both builds on and challenges existing social and cognitive psychology, political science, and sociology research on intergroup prejudice, attribution bias, and intersectional (here, race plus class) bias. Leading theories of implicit and explicit racial attitudes predict that racial bias against African-Americans and particularly against negative stereotype-conforming African-Americans is pervasive, harmful to minorities, and affects decisions—especially under conditions that should contribute to automatic as opposed to controlled cognitive processing, such as ambiguity and the ability to attribute decisions to factors other than prejudice (Allport 1979; Bargh 1994; Bertrand Mullainathan 2004; Devine, Plant, Amodio, Harmon-Jones, and Vance 2002; Dovidio, Glick, and Rudman 2005; Peffley, Hurwitz, and Sniderman 1997; Pettigrew 1979; and Sidanius and Pratto 2001). By contrast, I compile novel evidence suggesting that racial and intersectional bias do not systematically characterize technology-mediated decisions.

As numerous observational and experimental studies would predict, my research shows that confirmation bias affects technology-mediated security decisions, as reflected in a statistically significantly higher probability of polygraph charts being judged deceptive when they are associated with negative background investigations. However, this confirmation bias effect can be nullified by giving interpreters an option to run the test in “suspicious mode,” creating a sort of a cognitive shelf on which to compartmentalize the confirmation bias. So the good news is that regular people can overcome what is perhaps the most robust and pervasive cognitive bias by rechanneling it through a technology-mediated dummy option.

II. Survey Experimental Design

The survey experiments were conducted online between Feb. 13, 2012, and Oct. 6, 2013. The online administration mode avoided contamination from possible, unintended race and gender effects from an individual survey administrator (participant-observer effects; e.g., Gosling et al. 2004; Orne 2009; Osborne 2001). Participants were workers on Amazon.com’s Mechanical Turk (MTurk) platform located in the U.S. MTurk is an Internet survey platform that facilitates simple, inexpensive recruitment and payment of subjects. Individuals undertake “Human Intelligence Tasks” (HITs) on MTurk, which can include survey experiments. MTurk data compare favorably with typical experimental political science and psychology data—which are collected using local convenience or student samples—in terms of internal and external validity. Research indicates MTurk samples replicate diverse experimental findings, even though their characteristics differ from the general population’s (Berinsky, Huber, and Lenz 2012; Buhrmester, Kwang, and Gosling 2011; Gosling et al. 2004) . Moreover, replication and triangulation, not random selection, establish generalizability of experimental findings. These results replicate across survey experiments using other technology-mediated decision support tools, as well as across diverse data sources, in other research (Author 2014). Thus, it is reasonable to assume that the inferential statistics obtained from this series of experiments tend to be low in bias.

The survey experimental design randomly assigned participants to different race and background information variable value conditions. Participants were randomly assigned to view a vignette consisting of relevant text alongside a photo that conveyed the race of the mock subject (here, the ostensible polygraph subject—and in parallel survey experiments that replicated results in other technology-mediated decision tool contexts, the medical patient or food stamp applicant). The photos were normed along relevant dimensions: age, familiarity, mood, memorability, and picture quality (Kennedy, Hope, and Raz 2009). Photos come from the Center for Vital Longevity Face Database. They feature neutral facial expressions and gray backgrounds (Minear and D. C. Park 2004).

All materials were pretested in Charlottesville, Virginia between January 2012 and September 2013, and are available in the University Library (LIBRA) repository and online. All recorded observations were utilized in the reported analyses, with the following pre-determined exclusions: (1) non-U.S. respondent location according to IP address, (2) repeated study completions from the same IP address, (3) non-compliance with quality control measures, established through automated checks, and (4) failure to enter a valid response ID (a code all respondents received at the end of the survey) when prompted.

The primary dependent variable of interest is polygraph chart interpretation. In Experiments 1-3, the dependent variable values are whether participants judged a polygraph chart as indicating deception (“deception indicated,” valued at 1 for analysis), or not (“no deception indicated,” valued at 0). In Experiment 4, additional dependent variable operationalizations are added to this measure. Those operationalizations are whether participants would rather the polygraph be set to interpret charts in suspicious or friendly mode (to enable technology-mediated Bayesian updating), and whether participants assess the chart as correctly indicating whether someone is lying.

Like a typical polygraph chart, the chart (Figure 1 below and in survey instruments available online) consisted of a static graph representing changes in electrodermal responses (a measure relating to sweat on skin), cardiovascular responses (heart rate and a proxy measure of blood volume), and respiration rate and depth. Using complete polygraph charts from the field was undesirable due to their length and confidentiality issues. Instead, a sampling of open-source polygraph charts was compiled, and slightly modified to be relatively uniform in appearance other than controlled changes. Experienced federal polygraphers thought charts generated in this way, including the one used in this survey experiment, were real. Instructions to participants indicated how to read the charts and were based on federal polygraph chart interpretation methods (Department of Defense, Counterintelligence Field Activity 2006; Maschke and Scalabrini 2005; Psychophysiological Detection of Deception Program 2006; Sullivan 2010).3 The chart was designed to be ambiguous, and results confirm its ambiguity.

Figure 1: Polygraph chart.

I tested four hypotheses about racial and confirmation bias in polygraph chart interpretation. First, the interaction of race and confirmation bias in polygraph chart interpretation produces compound effects. Second, these effects are magnified when the race variable is relatively subtle, allowing for relatively automatic rather than controlled cognitive processing of the stimuli. Third, threat, emotion, and time pressure increase the availability and efficiency-driven use of stereotypes. Fourth, adding a probability-focused framing treatment to polygraph chart interpretation tasks increases racial and confirmation bias under conditions of pressure, but decreases it under pressure-neutral conditions.

In Experiment 1, the treatment variables are polygraph subject race and background information. The experimental design is a 5×2 fully crossed matrix randomly varying race and background information. Race here has five values: dark-skinned black, light-skinned black, dark-skinned Hispanic, light-skinned Hispanic, and white. The black and Hispanic skin c
olor variations were created digitally from one medium skin-toned black and Hispanic photo, respectively. This controlled for all features other than skin color within those pairs. Background information has two values: negative and neutral. These values are operationalized as varying information under the subheadings employment, medical, criminal, family, and credit history.

This experimental vignette text is presented to subjects as a background investigation they need to familiarize themselves with before scoring the polygraph chart, like a real polygrapher would. These treatment conditions all have good mundane realism, because polygraphers see polygraph subjects’ race and familiarize themselves with background information, frequently in the form of background investigations structured like those used here, before conducting polygraphs.

Experiment 2, like Experiment 1, varies polygraph subject race and background information. The experimental design is a 2×3 fully crossed matrix randomly varying race and background information. It simplifies the race variable values to black and white. The photograph used for the black variable value is the original (non-morphed) photo used to generate the dark-skinned and light-skinned black photos in Experiment 1. The photograph used for the white variable value is the same photo used in Experiment 1. This set of simplified race variable values is then used again in Experiments 3 and 4. The background investigation variable values in Experiment 2 include a negative, neutral, and added positive background investigation treatment condition.

Experiments 3 and 4 both implement a 2x2x2 fully crossed matrix. Both retest race (black/white) and background information (negative/neutral) manipulations and their interactions. They also randomly vary novel contextual conditions. In Experiment 3, those conditions are the presence or absence of threat, emotion, and time pressure (“TEP”) as a combined set of conditions that might increase reliance on mental shortcuts in general and implicit bias in particular by encouraging automatic as opposed to controlled cognitive processing (Bargh 1994; Devine et al. 2002). The TEP control condition omits threat, emotion, and time pressure cues. In the TEP treatment condition, all three of those features are present, providing a tough test of the hypothesis that these contextual conditions systematically influence polygraph chart interpretation. Participants randomly assigned to the TEP condition view an introductory paragraph containing information about violent crime rates. The information is framed as justification for why collecting evidence from suspects themselves is important. This vignette contains information needed to answer a quality control question in which participants must acknowledge violent crime can affect “Anyone, including me and my loved ones,” to proceed with the study. Participants are also asked to think of the last time they were the victim or witness of a violent crime such as a robbery or assault, or were afraid they were going to be, and rate their fear. This question invites participants to personalize the threat of violent crime presented in the vignette. Participants are also asked to think of the last time they heard about a violent crime, and rate their anger. Finally, time pressure in the TEP condition is operationalized through the addition of the following vignette immediately preceding the polygraph chart interpretation task: “Speed can be important in criminal investigations. So, to further simulate a real polygrapher’s job in a criminal case, please work as fast as you can, without compromising accuracy. The two workers who correctly interpret the chart the fastest will each receive a bonus of $5.” This treatment condition cues both intrinsic and extrinsic motivation for speed, generating time pressure.

In Experiment 4, the novel contextual condition randomly varied along with race and background information in the fully crossed 2x2x2 matrix is probability framing. All participants view a table applying Bayes’ rule to polygraph testing under Suspicious versus Friendly testing modes. Then they view a set of statements reframing the same likelihoods expressed as frequencies in the table as probabilities. In one probability framing treatment condition, participants view these probabilities in terms of accuracy rates. In the other, these probabilities are presented in terms of error rates.

III. Results

Negative background information systematically influenced polygraph chart interpretation across the first three of four experiments. In the fourth experiment, it significantly affected another interpretive scoring choice offered only in this experiment that is not typically available in polygraph chart interpretation, but might be in the future as technology catches up with statistics (and normal human limits therein), and Bayesian updating becomes increasingly automated. This overall collection of results suggests that confirmation bias, or the application of prior information to inform seemingly independent judgments, affects polygraphy.

Racial and intersectional bias results were null. Rather than undergoing a compounding interaction with confirmation bias, racial bias did not systematically affect polygraph chart interpretation either by itself or in combination with confirmation bias. Sample sizes are large enough that it is reasonable to expect a practically significant interaction effect between race and background information would have been detected with statistical significance if it existed. Nonsignificant interaction findings across cases including polygraphy, medical decision tools, and welfare benefits administration tools in a series of six parallel survey experiments (the four polygraph case study ones reported here and two others reported elsewhere and available online) thus yield the important insight that race and background information probably do not undergo a compounding interaction in technology-mediated administrative decisions the way that leading theories about intersectional bias might lead one to expect. Replication bolsters the finding’s generalizability.

Table 1 summarizing Experiment 1 shows that the race and skin color of the polygraph subject do not affect the interpretation of the chart as indicating deception or not. The log odds of the chart being interpreted as indicating deception are significantly heightened when it is associated with a negative background investigation. The odds are about two times higher that a negative background investigation is associated with a guilty chart reading (p = 0.013).

Table 2 shows that Experiment 2 replicates and extends these results. In this smaller sample (N = 241 as compared to Experiment 1’s N = 1208), the confirmation bias effect is only marginally significant (p = 0.07). The race of the polygraph subject still does not appear to systematically influence chart interpretation, even when combined with positive racial substereotypes in a new background information condition.

Table 3 presents results from Experiment 3, again replicating and extending these results. Again, race does not help explain variation in chart interpretation. This remains true even under conditions that we might expect to magnify racial and confirmation bias by increasing reliance on automatic cognitive processing – namely, threat, emotion, and time pressure. But confirmation bias, as operationalized by the negative background information variable value, does significantly help explain variation in chart interpretation (p = 0.043).

Table 4 shows that when probability frames are added, the results replicate and extend. Specifically, race and substereotype (race interacting with background information) does not help explain variation in chart interpretation. Under these framing conditions emphasizing either accuracy or error rates, viewing negative background information causes interpreters to be more likely to select the suspicious instead of friendly te
sting mode given this option that was not offered in earlier experiments. So given a way to effectively delegate to technology what appears to be rational Bayesian updating of guilt in response to negative background information, interpreters systematically tend to take that option and then refrain from using that background information again to inform their chart readings. This result is highly statistically significant (p = 0.012). On one hand, it shows that confirmation bias in polygraph chart interpretation replicates and in a way extend to information environments in which probability-updating enables interpreters to make more fully neutral decisions. On the other hand, it shows that this same confirmation bias can be rechanneled in a rational way.

Table 1: Polygraph Chart Interpretation – Deceptive/non-deceptive, Experiment 1

Deception Indicated

(SE)

Dark-skinned African-American

-0.118

(0.255)

Light-skinned African-American

-0.083

(0.258)

Dark-skinned Hispanic

-0.256

(0.268)

Light-skinned Hispanic

0.061

(0.255)

Negative background information

0.665*

(0.266)

Dark-skinned African-American X negative background information

-0.582

(0.363)

Light-skinned African-American X negative background information

-0.147

(0.380)

Dark-skinned Hispanic X negative background information

-0.269

(0.378)

Light-skinned Hispanic X negative background information

0.024

(0.370)

Constant

-0.105

(0.188)

N = 1208

* = p < 0.05, *** = p < 0.000. Results reflect coefficients from a logistic regression model, with standard errors in parentheses.

Table 2: Polygraph Chart Interpretation – Deceptive/non-deceptive, Experiment 2

Coeff.

(SE)

African-American

-0.299

(0.436)

Negative background information

0.822†

(0.453)

Positive background information

0.406

(0.433)

African-American X negative background information

-0.272

(0.654)

African-American X positive background information

0.191

(0.621)

Constant

-0.251

(0.291)

N = 241

† signals p = 0.07. Results reflect coefficients from a logistic regression model, with standard errors in parentheses.

Table 3: Polygraph Chart Interpretation – Deceptive/non-deceptive, Experiment 3

Coeff.

(SE)

African-American

-0.087

(0.371)

Negative background information

0.731*

(0.361)

African-American X negative background information (substereotype)

-0.063

(0.524)

Threat, time pressure, emotion (TEP)

0.492

(0.350)

TEP X African-American

-0.455

(0.508)

TEP X negative background information

-0.328

(0.523)

TEP X substereotype

0.190

(0.745)

Constant

-0.405

(0.253)

N = 480

* = p < 0.05, *** = p < 0.000. Results reflect coefficients from a logistic regression model, with standard errors in parentheses.

Table 4: Polygraph Chart Interpretation – Characteristics of judgments produced, Experiment 4

Deceptive chart interpretation

Suspicious (versus friendly) mode

Assessment of chart as correctly indicating deception

African-American

-0.357

(0.357)

0.008

(0.415)

-0.043

(0.474)

Negative background information

-0.028

(0.353)

1.843***

(0.391)

0.629

(0.429)

African-American X negative background information

0.927

(0.538)

-0.079

(0.587)

-0.284

(0.666)

Probability frame (error rate focus)

0.119

(0.349)

0.957*

(0.379)

-0.065

(0.474)

Probability frame X African-American

-0.001

(0.510)

-0.615

(0.561)

0.386

(0.669)

Probability frame X negative background information

0.104

(0.526)

-0.578

(0.573)

-0.924

(0.711)

Probability frame X substereotype

-0.250

(0.748)

0.642

(0.813)

0.472

(0.972)

Constant

-0.154

(0.227)

-1.204***

(0.269)

-1.61***

(0.304)

N = 482

* = p < 0.05, *** = p < 0.000. Results reflect coefficients from a logistic regression model, with standard errors in parentheses.

The good news is that results from four novel survey experiments suggest that racial bias does not systematically influence polygraph chart interpretation. These results disprove most of my initial hypotheses. The bad news is that confirmation bias appears to threaten the ability of apparently neutral technologies to independently verify credibility, institutionalizing hunches and judgments based on prior information (accurate or not) as if they are the outcomes of apparently neutral, scientific evaluation or independent fact-finding process.

The confirmation bias result appears robust and ecologically valid, but field experimental data are needed to assess its generalizability. Indeed, the replication of this result across four experiments suggests it generalizes across a variety of contextual conditions, such as the presence of threat, emotion, and time pressure, and even—with possible technology-mediated neutralization of the bias—the availability of Bayesian updating accuracy/error rates for the polygraph test, a feature that is not standard in these technologies but is increasingly common in other technological decision-making aids in fields such as medicine. However, existing research on this field generalizability question is limited and its results mixed. In 1986, CBS’s “60 Minutes” staged a theft and randomly selected four polygraphers to test four employee suspects. Each polygrapher was told that a different employee was probably the thief, and each examiner found that employee deceptive (Saxe 1991). These hypothesis-blind polygraphers appeared to exhibit confirmation bias under field conditions. The study design was strong, but small sample size prohibits meaningful tests of statistical significance. Similarly, another field study with insufficient sample size for meaningful tests of statistical significance generated null confirmation bias results. In that study, researchers staged an ostensible replication of a polygraph validity study with seven police polygraphers (Elaad, Ginton, and Ben-Shakhar 2006). The same police force that employed the polygraphers employed the researchers. A larger-scale between-subjects field experiment using hypothesis-blind polygrapher subjects and protecting against possible conflicts of interest is needed to settle the question of whether confirmation bias—and the “suspicious mode” hack for neutralizing it—generalizes to field polygraph and other technology-mediated security decision conditions.

IV. Conclusion

Large bureaucracies such as governmental and military institutions, corporations, and educational systems have long sought to routinize and render invisible discretionary power under the auspices of science—usually with unintended consequences (March and Olsen 1989; Porter 1996; Ross 1992; Scott 1998). New technologies create new opportunities for this type of centralized standardization enterprise, while also making it possible to better decentralize informed decision-making, by making expert knowledge and tools more readily available to end users—citizens, pa
tients, and clients alike. In this way, technology has always had the dual potential of institutionalizing bias while appearing neutral on one hand—and empowering traditionally disempowered groups on the other. Polygraphs are no exception to this historical rule.

Future research on security decision-making technologies like polygraphs might incorporate groups of subjects with varying levels of expertise, in situations with different levels of stress (TEP) conditions and varying correct judgments. It would be ideal to triangulate field observational and experimental data sources, but the federal agencies with this data have systematically withheld this data. Testing the field utility of mechanisms whereby professionals might correct for technology-mediated confirmation bias, as appears to potentially be the case for the “suspicious mode” option in Experiment 4, should also be a priority. However, just as physiological responses correlate with deception, but “lie detection” does not exist in scientific terms because there is no deception response to detect—so too does neutrality correlate with accuracy at a less than one-to-one ratio. Some analyses suggest that under field conditions, polygraphs are about as accurate at detecting deception as coin flips (Zelicoff 2007). So hacking confirmation bias in polygraph chart interpretation with a “suspicious mode” option as Experiment 4 suggests might increase polygraph neutrality without affecting accuracy.

By far the greatest practical significance of these experiments is their application to surveillance state technologies more broadly—technologies that are not transparent enough to test from the outside of the systems they support, that may be used to support decisions about limiting people’s most basic Constitutional freedoms—freedom of movement, expression, association, and privacy. Equally important, they may be used as independent verification mechanisms to check security decisions, when in reality their independence is constrained by confirmation bias. In this way, insufficiently evidence-based technology-mediated security decision-making practices exemplified by polygraphy may threaten both liberty and security.

Bibliography

Aftergood, Steven. 2000. “Polygraph Testing and the DOE National Laboratories.” Science 290 (5493) (November 3): 939–940.

Allport, Gordon W. 1979. The Nature of Prejudice. Unabridged, 25th anniversary ed. Reading, Mass: Addison-Wesley Pub. Co.

Author. 2014. “Neutral Competence? Polygraphy and Technology-Mediated

Administrative Decisions.” Doctoral dissertation.

Bargh, John A. 1994. “The Four Horsemen of Automaticity: Awareness, Efficiency, Intention, and Control in Social Cognition.” In Handbook of Social Cognition, Vol. 2, 1:1–31. Hillsdale, NJ: Erlbaum.

Becker, Andrew. 2013. “During Polygraphs, Border Agency Applicants Admit to Rape, Kidnapping”. Center for Investigative Reporting. Available online at http://cironline.org/reports/during-polygraphs-border-agency-applicants-admit-rape-kidnapping-4325.

Berinsky, A. J., G. A. Huber, and G. S. Lenz. 2012. “Evaluating Online Labor Markets for Experimental Research: Amazon.com’s Mechanical Turk.” Political Analysis 20 (3) (March 2): 351–368.

Bertrand, Marianne, and Sendhil Mullainathan. 2004. “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination.” American Economic Review 94 (4) (September): 991–1013.

Blix, Hans. 2004. Disarming Iraq. 1st ed. New York: Pantheon Books.

BORDERS National Center for Border Security and Immigration. 2013. “BORDERS’ AVATAR on Duty in Bucharest Airport”. Available online at http://www.borders.arizona.edu/cms/news/borders-avatar-duty-bucharest-airport.

Buhrmester, M., T. Kwang, and S. D. Gosling. 2011. “Amazon’s Mechanical Turk: A New Source of Inexpensive, Yet High-Quality, Data?” Perspectives on Psychological Science 6 (1) (February 3): 3–5.

Devine, Patricia G., E. Ashby Plant, David M. Amodio, Eddie Harmon-Jones, and Stephanie L. Vance. 2002. “The Regulation of Explicit and Implicit Race Bias: The Role of Motivations to Respond without Prejudice.” Journal of Personality and Social Psychology 82 (5): 835–48.

Dovidio, John F., Peter Glick, and Laurie Rudman, eds. On the Nature of Prejudice: Fifty Years after Allport. 2005. Malden, MA: Blackwell Pub.

Elaad, Eitan, Avital Ginton, and Gershon Ben-Shakhar. 2006. “The Role of Prior Expectations in Polygraph Examiners Decisions.” Psychology, Crime & Law 7 (4): 1–16.

Fienberg, Stephen E., and Paul C. Stern, “In Search of the Magic Lasso: The Truth About the Polygraph,” Statistical Science 20 (3) 249-260.

Furnas, Alexander. 2012. “Homeland Security’s ‘Pre-Crime’ Screening Will Never Work.” The Atlantic. http://www.theatlantic.com/technology/archive/2012/04/homeland-securitys-pre-crime-screening-will-never-work/255971/.

Geracimos, Ann. 2002. “A Special Kind of Education; FBI Trainees at Quantico Work Hard.” The Washington Times, December 23. Available online at http://goliath.ecnext.com/coms2/gi_0199-999750/A-special-kind-of-education.html.

Gosling, Samuel D., Simine Vazire, Sanjay Srivastava, and Oliver P. John. 2004. “Should We Trust Web-Based Studies? A Comparative Analysis of Six Preconceptions About Internet Questionnaires.” American Psychologist 59 (2): 93–104.

Greenemeier, Larry. 2012. “Avatar Officer Installed at Arizona-Mexico Border Station.” Scientific American, August 6.

Higginbotham, Adam. 2013. “Deception Is Futile When Big Brother’s Lie Detector Turns Its Eyes on You.” Wired Threat Level. http://www.wired.com/threatlevel/2013/01/ff-lie-detector/all/.

Iacono, W. G., and D. T. Lykken. 1997. “The validity of the lie detector: Two surveys of scientific opinion.” Journal of Applied Psychology 82(3): 426-433.

Jervis, Robert. 2006. “Reports, Politics, and Intelligence Failures: The Case of Iraq.”

Journal of Strategic Studies 29(1): 3-52.

Kahneman, Daniel. 2003. “A Perspective on Judgment and Choice: Mapping Bounded Rationality.” American Psychologist 58 (9): 697–720.

Kennedy, K. M., Hope, K., & Raz, N. 2009. Lifespan adult faces: norms for age, familiarity, memorability, mood, and picture quality. Experimental Aging Research, 35(2), 268–275.

March, James G., and Johan P. Olsen. 1989. Rediscovering Institutions: The Organizational Basis of Politics. New York: Free Press.

Maschke, George W., and Gino J. Scalabrini. 2005. The Lie Behind the Lie Detector. 4th ed. AntiPolygraph.org.

Minear, M., & Park, D. C. 2004. A lifespan database of adult facial stimuli. Behavior Research Methods, Instruments, & Computers, 36, 630–633.

Nunamaker, Jay, and Elyse Golob. 2012. “National Center for Border Security and Immigration Work Plan – Year 5: July 2012 to June 2013.” Available online at http://www.borders.arizona.edu/cms/sites/default/files/BORDERS-YR5-Workplan-APPROVED%2008022012.pdf.

O’Reilly, Andrew. 2012. “Avatar Border Agent Screens Commuters At Arizona Post.” Fox News Latino. http://latino.foxnews.com/latino/news/2012/08/06/avatar-border-agent-screens-commuters-at-arizona-post/.

Orne, M. T. 2009. Demand characteristics and the concept of quasi-controls. Artifacts in Behavioral Research (ed. Robert Rosenthal and Ralph L. Rosnow., pp. 110–137). Oxford: Oxford University Press.

Osborne, Jason W. 2006. “Gender, Stereotype Threat, and Anxiety: Psychophysiological and Cognitive Evidence.” Electronic Journal of Research in Educational Psychology 4 (8): 109–38.

Peffley, Mark, Jon Hurwitz, and Paul M. Sniderman. 1997. “Racial Stereotypes and Whites’ Political Views of
Blacks in the Context of Welfare and Crime.” American Journal of Political Science 41 (1): 30–60.

Pettigrew, Thomas F. 1979. “The Ultimate Attribution Error: Extending Allport’s Cognitive Analysis of Prejudice.” Personality and Social Psychology Bulletin 5 (4): 461–76.

Plous, Scott. 1993. The Psychology of Judgment and Decision Making. New York: McGraw- Hill.

Porter, Theodore. 1996. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. 1st. pbk. ed. Princeton N.J.: Princeton University Press.

Priest, Dana, and William M. Arkin. 2010. “A Hidden World, Growing beyond Control.” Top Secret America: A Washington Post Investigation. July 19. http://projects.washingtonpost.com/top-secret-america/articles/a-hidden-world-growing-beyond-control/.

———. 2011. Top Secret America: The Rise of the New American Security State. 1st ed. New York: Little, Brown and Co.

Psychophysiological Detection of Deception Program. 2006. “TEST DATA ANALYSIS: DoDPI Numerical Evaluation Scoring System.”

Ross, Dorothy. 1992. The Origins of American Social Science. Ideas in Context. Cambridge: Cambridge University Press.

Saxe, Leonard. 1991. “Lying: Thoughts of an Applied Social Psychologist.” American Psychologist 46 (4): 409–15.

Scott, James C. 1998. Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. New Haven: Yale University Press.

Sidanius, Jim, and Felicia Pratto. 2001. Social Dominance an Intergroup Theory of Social Hierarchy and Oppression. Cambridge: Cambridge University Press.

Stanovich, Keith E., and Richard F. West. 2000. “Individual Differences in Reasoning:

Implications for the Rationality Debate?” Behavioral and Brain Sciences 23 (5):645– 726.

Sullivan, John F. 2007. Gatekeeper: Memoirs of a CIA Polygraph Examiner. 1st ed. Washington D.C.: Potomac Books.

Taylor, Marisa. 2013-2014. The Polygraph Files. McClatchy Newspapers, August 13.

Thompson, W. C. 2009. “Painting the Target around the Matching Profile: The Texas Sharpshooter Fallacy in Forensic DNA Interpretation.” Law, Probability and Risk 8 (3): 257–76.

United States Department of Defense, Counterintelligence Field Activity. 2006. “Federal Psychophysiological Detection of Deception Examiner Handbook, Counterintelligence Field Activity Technical Manual.”

United States Department of Energy. 2006. “10 CFR Parts 709 and 710, Docket No. CN-03-RM-01, RIN 1992-AA33, Counterintelligence Evaluation Regulations.” Federal Register 71 (189): 57386–97.

United States Department of State. 2010. “Program and Budget Guide”. Bureau for International Narcotics and Law Enforcement Affairs. http://www.state.gov/documents/organization/131027.pdf.

United States Government Accountability Office. 2010. “Mérida Initiative: The United States Has Provided Counternarcotics and Anticrime Support but Needs Better Performance Measures”. Report to Congressional Requesters GAO-10-837.

———. 2010. “Efforts to Validate TSA’s Passenger Screening Behavior Detection Program Underway, but Opportunities Exist to Strengthen Validation and Address Operational Challenges”. Report to the Ranking Member, Committee on Transportation and Infrastructure, House of Representatives GAO-10-763.

———. 2011. “TSA Has Taken Actions to Improve Security, but Additional Efforts Remain”. Testimony Before the Subcommittee on National Security, Homeland Defense, and Foreign Operations, Committee on Oversight and Government Reform, House of Representatives GAO-11-807T.

———. 2013. “TSA Should Limit Future Funding for Behavior Detection Activities”. Testimony Before the Subcommittee on Transportation Security, Committee on Homeland Security, House of Representatives GAO-14-1582.

U.S. Department of Justice, Office of the Inspector General, Evaluation and Inspections

Division. 2006. “Use of Polygraph Examinations in the Department of Justice”. I-

2006-2008. http://www.justice.gov/oig/reports/plus/e0608/final.pdf.

U.S. Department of Homeland Security. 2010. “FAST Privacy Threshold Analysis.” http://epic.org/privacy/body_scanners/EPIC-DHS-FOIA-09-14-11.pdf.

Zelicoff, Alan. 2007. “Positive and Negative Predictive Values of Polygraphs: Results from Published ‘Field’ Studies.” Manuscript.

1 This research was supported by a National Science Foundation Doctoral Dissertation Research Improvement Grant, University of Virginia Raven Society Fellowship, University of Virginia Society of Fellows Fellowship, Louise and Alfred Fernbach Award for Research in International Relations, and William McMeekin, Michael & Andrea Leven, and Bernard Marcus Institute for Humane Studies Fellowships. The usual oceans of dissertation debt churn.

2 I began submitting Freedom of Information Act (FOIA) requests to multiple federal agencies for documents and data relating to polygraph programs in 2009. Over the next few years, I discovered what organizations like the ACLU and EFF already knew: requesters typically have to sue federal agencies to obtain data from them under FOIA. Without representation, I could not so much as obtain processing notes on my own FOIAs, much less responsive records. My requests and related files were repeatedly ignored or lost altogether, before I retained a national security lawyer with expertise in FOIA and polygraphs. This governmental nontransparency featured in a McClatchy national investigative series on polygraphs, and all the obtained documents I obtained under FOIA are available on McClatchy’s website and at http://nationalsecuritylaw.org/document_vault.html.

3 Instructions, also presented in supplemental form as part of the full survey instruments, read: Now, score the subject’s polygraph chart using the following guidelines.

• Where the reaction appears greater following a relevant question (R) than following a control question (C), deception is indicated.

• Where the reaction appears greater following a control question (C) than following a relevant question (R), no deception is indicated.

• Look for the greater reaction for each parameter, where blue is respiration, green is galvanic skin response, and red is heart rate.