Big Data in Political Science Research
- LAST REVIEWED: 27 September 2017
- LAST MODIFIED: 27 September 2017
- DOI: 10.1093/obo/9780199756223-0232
- LAST REVIEWED: 27 September 2017
- LAST MODIFIED: 27 September 2017
- DOI: 10.1093/obo/9780199756223-0232
Introduction
In both applied politics and academic political science research, big data techniques have gained considerable traction for the purposes of analyzing causal relationships, making useful classifications, and forecasting. Generally speaking, big data methods refer to techniques that can tax existing software or hardware, thereby requiring a degree of ingenuity to deal with the computational demands of the data or the method. To that end, what can be classified as big data is a moving target from year to year and generation to generation as computer processing, storage, and software improve. To illustrate this pattern: Admiral Grace Hopper, inventor of the first compiler for a programming language, would routinely hand out 11.8-inch pieces of wire in lectures she gave. She remarked that, in a perfect vacuum, this is how far light could travel in a single nanosecond. Computers had to be small to be fast in order to minimize the distance traveled. As computers have gotten smaller and more efficient, in part due to Hopper’s work, the opportunities for analyzing larger data sets with more complex structures and with more complex machine learning methods becomes increasingly feasible. How has big data become a part of political research? In applied politics, campaigns now frequently engage in microtargeting, a type of cluster analysis that takes extensive databases about as many voters as possible and determines logical classifications for them. By tracking known behaviors to standardize the model, campaigns can then forecast what the ideal message would be for voters they are reaching out to. In academic political science research, meanwhile, many methods of machine learning are taking off in order to allow scholars to answer more complicated questions and use more complicated data. A large body of research treats texts such as written records or floor speeches as data, again using clustering algorithms to determine common speech patterns. A longer stream of research uses Monte Carlo and Markov Chain Monte Carlo techniques to evaluate methods and to estimate Bayesian models. Tree-based methods have emerged as a technique for escaping the curse of dimensionality—the problem that emerges when there are more variables that could potentially affect an outcome than can possibly be included in the model. Measurement techniques that may need to span many years and make comparisons across many different settings have become pivotal to political science. The following bibliography points to several general topics that the potential big data analyst may need to consider and several sources to consult within each topic.
Reference Works
The books listed here are all useful references for the analyst who is engaged in applied big data analysis. Some are references that analysts may want to consult as sources of R commands that would be relevant when implementing one’s own code (such as Grolemund and Wickham 2017 or Monogan 2015). A more big data–specific reference on programming can be found in James, et al. 2013 (which uses R); Joshi 2016 (which uses Python); or Press, et al. 1996 (which uses C). Johnson, et al. 1994–1995; Johnson, et al. 1997; and Johnson, et al. 2005 all serve as essential background on the probability theory that is critical to big and little data analysis. For background on computational algorithms that are key to big data, readers can consult Steele, et al. 2016 as well as Efron and Hastie 2016. All of these volumes provide information that the applied analyst may need to look up at times.
Cheney, Ward, and David Kincaid. Numerical Mathematics and Computing. 7th ed. Boston: Brooks/Cole, 2013.
This is an updated numerical analysis text for the era of big data. In addition to the inclusion of common numerical optimization routines, numerical differentiation, and integration, it covers mathematical preliminaries and floating point representation, linear systems, Monte Carlo methods and simulations, and a section on linear programming.
Efron, Bradley, and Trevor Hastie. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. New York: Cambridge University Press, 2016.
This up-to-date book by two very distinguished statisticians discusses the history of modern statistics since fast digital computers made more advanced statistical methods possible. The book discusses classical frequentist, Bayesian, and mixed methods. The discussions are lucid and very valuable to the understanding of current data science.
Golub, Gene H., and Charles F. Van Loan. Matrix Computations. 4th ed. Baltimore: Johns Hopkins University Press, 2013.
The development of efficient algorithms for high-dimensional data necessarily requires an understanding of advanced linear algebra and high-dimensional matrix computations. To this end, Golub and Van Loan provide an excellent introduction to efficient algorithms for advanced matrix computations. Among the topics covered in this text are a wide variety of matrix factorization techniques, parallel computation algorithms, and optimization routines.
Grolemund, Garrett, and Hadley Wickham. R for Data Science. Sebastopol, CA: O’Reilly, 2017.
For modern data analysis in R, Hadley Wickham’s packages provide essential tools. Wickham and Grolemund cover all of the most useful R packages for modern data analysis. This text covers the popular ggplot2 for beautiful data visualizations; dplyr for data transformations; data wrangling with tibble, readr, and tidy; stringr for dealing with text data, and many others. The text also covers document creation using R Markdown and practical programming advice. Available online.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in R. New York: Springer, 2013.
DOI: 10.1007/978-1-4614-7138-7
This book provides an excellent, intermediate-level introduction to statistical learning theory and applications in R. The text is the less advanced version of its more mathematically mature relative, The Elements of Statistical Learning, and provides an all-inclusive self-guiding tour through some of the most popular machine learning algorithms and implementations. This text is essential reading for graduate students and faculty in political science interested in learning some of the most basic machine learning algorithms.
Joshi, Prateek. Python Machine Learning Cookbook. Birmingham, UK: Packt, 2016.
This volume walks through a litany of topics including classifers, clustering, text-as-data, image analysis, neural networks, and visualizing data. It focuses on algorithms in Python to complete each of these sorts of tasks. Example Python code is included throughout the text, and there is a dedicated GitHub page with data and code from the book.
Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. Continuous Univariate Distributions. 2d ed. 2 vols. New York: Wiley, 1994–1995.
This and the other two books in the series (Johnson, et al. 1997 and Johnson, et al. 2005) are essential reference works for continuous and discrete distributions used in probability and statistics. These two volumes specifically focus on probability distributions of a single variable that has a continuous metric, such as the normal, logistic, beta, uniform, χ2, t, F, and many more. Many underlying distributions like extreme value distributions (a motivator of the logistic) are also included.
Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. Discrete Multivariate Distributions. New York: Wiley, 1997.
This book covers the complex area of discrete multivariate distributions, wherein multiple variables are discrete in nature. These include the multinomial, multivariate Poisson, and others. All of these books by Johnson, Kotz, and Balakrishnan cover the history and mathematical derivations of well-known distributions with references to the original papers by the mathematicians and statisticians who derived them. The authors extensively cross-reference the distributions because so many distributions have obscure variants that were derived to solve specific statistical problems.
Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. Univariate Discrete Distributions. 3d ed. New York: Wiley, 2005.
DOI: 10.1002/0471715816
This volume lays much of the theoretical foundation behind all probability distributions: gamma and beta functions, Bayes’ theorem, moments of a probability distribution, order statistics, and many other features. It then describes families of discrete distributions, which generally take on values that are some subset of nonnegative integers (such as a count or a binary outcome). Distributions covered include the binomial, Poisson, negative binomial, and several others.
Monogan, James E., III. Political Analysis Using R. New York: Springer, 2015.
DOI: 10.1007/978-3-319-23446-5
This book offers an initial and intermediate introduction on the statistical program R. In particular, chapter 8 introduces how user-written add-on packages allow analysts to apply data-intensive methods in R, such as roll-call scaling or Markov Chain Monte Carlo analysis. Chapters 10 and 11 introduce R’s programming functionality, allowing applied analysts of big data more flexibility than many programs when conducting research.
Press, William H., Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge, UK: Cambridge University Press, 1996.
Press and colleagues provide a foundational text for implementing basic numerical analysis algorithms in C. One of the great benefits of a text that implements numerical analysis algorithms in C is that the simplicity of the language serves well as a conceptual basis for other, more modern-day programming languages such as C++, R, and Python. This book covers the most relevant numerical algorithms used for applied linear algebra problems, integration, random number generation, sorting and optimization.
Steele, Brian, John Chandler, and Swarna Reddy. Algorithms for Data Science. Cham, Switzerland: Springer, 2016.
DOI: 10.1007/978-3-319-45797-0
This text is an essential introduction for anyone planning on effectively analyzing high-dimensional data. It reads like a good introductory statistics book that covers basic visualization and data analysis techniques, but with an eye toward scalability for each method implemented. In addition to covering linear regression, the text covers tools for handling big data, such as Hadoop and MapReduce, and also discusses elementary machine learning algorithms, such as k-means clustering and naïve Bayes.
VanderPlas, Jake. Python Data Science Handbook. Sebastopol, CA: O’Reilly, 2017.
This text is a great reference for those just starting to use Python. The book discusses how to effectively utilize iPython notebooks; describes the basic elements of Python’s most popular numerical computation packages, NumPy and Pandas; and also discusses Matplotlib for data visualization. This text covers basic machine learning algorithms available through the Scikit-Learn package, including naïve Bayes, linear regression, support vector machines, decision trees and random forests, principal components analysis, and k-means clustering.
Users without a subscription are not able to see the full content on this page. Please subscribe or login.
How to Subscribe
Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.
Article
- Oceania, Gender, Indigenous and Ethnic Political Represent...
- Advanced Democracies, Electoral System Reform in
- Advanced Democracies, Public Opinion and Public Policy in
- Advertising and Election Campaigns in the United States
- Africa, Comparative Politics of
- Africa, Ethnic, Linguistic, Religious, and Regional Minori...
- Africa, Public Opinion in
- Africa, Women’s Political Representation in
- African Development, Politics of
- American Indian Politics
- Ancient Chinese Political Thought
- Arab Spring, The
- Arab-Israel Conflict, The
- Arendt, Hannah
- Argentine Government and Politics
- Aristotle's Political Thought
- Arms Race Modeling
- Asia, Environmental Politics in
- Asia, Water Politics in
- Asian American Mobilization and Political Identities
- Australia and New Zealand, Comparative Politics of
- Authoritarian Regimes, Lawyers in
- Authoritarianism in Russia
- Authoritarianism in the Public
- Authoritarianism in Turkey
- Bicameralism in Stable Democracies
- Big Data in Political Science Research
- Biopolitics and State Regulation of Human Life
- Birthright Citizenship
- Brazilian Foreign Policy
- Brazilian Political Development
- Brexit, British Politics, and European Integration
- Business-State Relations in Europe
- Campaign Finance in the Era of Super-PACS
- Canadian Foreign Policy
- Canadian Government and Politics
- Candidate Emergence and Recruitment
- Caribbean, Elections and Democracy in the
- Celebrities in US Politics
- Channels of Electoral Representation in Advanced Industria...
- China, Political Economy of
- China's One-Child Policy
- China-Taiwan Relations
- Chinese Communist Party
- Chinese Economic Policy
- Chinese Nationalism
- Civil Society in South Asia
- Civil War in Sub-Saharan Africa
- Civil-Military Relations in Asia
- Class in American Politics
- Climate Change and Politics
- Collective Memory
- Colombian Politics and Government
- Comparative Capitalism Theory
- Comparative Industrial Relations in Europe
- Comparative Political Economy of Resource Extraction
- Comparative Politics of Angola, Mozambique, and Guinea-Bis...
- Comparative Politics of Chile and Uruguay
- Comparative Politics of Federalism
- Comparative Politics of the Middle East and North Africa
- Computational Social Science
- Congress, Defense, and Foreign Policy
- Congressional Reassertion of Authority
- Conservative Litigation Strategies and Groups in US Judici...
- Constitution, Ratification of the
- Constitutional Politics in Asia
- Constitutionalism
- Corruption in China
- Cosmopolitan Political Thought
- Crisis of European Integration in Historical Perspective, ...
- Critical Elections, Partisan Realignment, and Long-Term El...
- Critical Theory and the Frankfurt School
- Cuban Political Development
- Cycles of Protest
- Democracies, Political Clientelism in
- Democracy and Authoritarianism, Empirical Indicators of
- Democracy and Authoritarianism in Sub-Saharan Africa
- Democracy and Dictatorship in Central Asia
- Democracy and Minority Language Recognition
- Democracy in Latin America
- Democratic Citizenship
- Democratic Consolidation
- Democratic Peace Theory
- Democratic Theory
- Democratization
- Democratization in Africa
- Democratization in Central America
- Democratization in Mexico
- Democratization in the Muslim World
- Development of Survey Research
- Diasporas and Politics
- Direct Democracy in the United States
- Dual Citizenship
- East Africa, Politics of
- East and Southeast Asia, Political Party Systems in
- East and Southeast Asia, Women and Politics in
- East Asia, Civil Society and Social Movements in
- Economic Voting
- Effects of the 9/11 Terrorist Attacks on American Public O...
- Egalitarianism
- Election Forecasting
- Election Laws in Democracies
- Election Observation and the Detection of Fraud
- Electoral and Party System Development in Sub-Saharan Afri...
- Electoral Assistance
- Electoral Change in Latin America
- Electoral Institutions and Women’s Representation
- Electoral Reform and Voting in the United States
- Electoral Volatility in the New Democracies of Latin Ameri...
- Electronic Voting Systems
- Emotion and Racial Attitudes in Contemporary American Poli...
- Environmental Governance
- Environmental Politics among Advanced Industrial Democraci...
- Ethnic Diasporas and US Foreign Policy
- Ethnic Politics
- Eurasia, Comparative Politics of
- European Parliament, The
- European Social Democracy
- European Union, Politics of the
- Extension of Voting Rights to Emigrants
- Failed and Weak States in Theory and Practice
- Far-Right Parties in Europe
- Federalism in the United States
- Feminist Political Thought
- Field Experiments
- Filibuster, The
- Framing Effects in Political Communication
- Gender and Electoral Politics in the United States
- Gender and International Relations
- Gender and Political Violence
- Gender and Politics in South Asia
- Gender, Behavior, and Representation
- Gender Gap in US Public Opinion
- Gender Stereotypes in Politics
- Genetic Underpinnings of Political Attitudes and Behaviors
- German Politics and Government
- Global Inequality
- Globalization and the Welfare State
- Globalization, Health Crises, and Health Care
- Governance in Africa
- Governmental Responses to Political Corruption
- Gridlock and Divided Government in the U.S.
- Health-Care Politics in the United States
- Hegemony
- Historiography of Twentieth-Century American Conservatism,...
- Hobbes’s Political Thought
- Hong Kong, Special Administrative Region of
- Hume’s Political Thought
- Hybrid Regimes
- Ideal Point Estimation
- Identity and Political Behavior
- Ideological Reasoning in Politics
- Illiberal Democracies and Democratic Backsliding
- Immigrant Incorporation in Canada
- Immigrant Incorporation in Western Europe
- Immigration and European Politics
- Immigration and International Relations
- Immigration Politics and Policy in the United States
- Impact of Campaign Contributions on Congressional Behavior...
- Impact of C-SPAN on US Democracy
- Implicit Attitudes in Public Opinion
- Income Dynamics and Politics in North America and Europe
- Income Inequality and Advanced Democracies
- Income Inequality in the United States, The Politics of
- Independent Voters, The Study of
- Indian Democracy
- Indigenous Politics and Representation in Latin America
- Indigenous Rights and Governance in Canada, Australia, and...
- Indonesia, Politics of
- Informal Practices of Accountability in Urban Africa
- Institutional Change in Advanced Democracies
- Institutional Factors Affecting Women’s Political Engageme...
- Intellectual Property in International Relations
- Interest Groups and Inequality in the United States
- Interest Groups in American Politics
- Interethnic Contact and Impact on Attitudes
- International Conflict Management
- International Criminal Justice
- International Law
- International NGOs
- International Political Economy of Illegal Drugs
- Internet and Politics, The
- Intersectionality in Political Science
- Interstate Border Dispute Management in the Indo-Pacific
- Iran, Political Development of
- Israeli Politics
- Italian Politics and Government
- Judicial Supremacy and National Judicial Review
- Judiciaries and Politics in East Asia
- Kant's Political Thought
- Labor Migration: Dynamics and Politics
- Labor Politics in East Asia
- Land Reform in Latin America
- Latin America, Democratic Transitions in
- Latin America, Electoral Reform in
- Latin America, Environmental Policy and Politics in
- Latin America, Guerrilla Insurgencies in
- Latin America, Social Movements in
- Legal Mobilization
- LGBT Politics in the United States
- Liberal Pluralism
- Libertarianism
- Local Governments in the United States
- Machiavelli’s Political Thought
- Malaysian Politics and Government
- Marx's Political Thought
- Mass Incarceration and US Politics
- Mechanisms of Representation
- Media Effects in Politics
- Media Politics in South Asia
- Mexican Political Development
- Mexican Politics and Government
- Military Government in Latin America, 1959–1990
- Minority Governments
- Minority Political Engagement and Representation in the Un...
- Mixed-Member Electoral Systems
- Modern Dynastic Rule
- Modern Elections and Voting Behavior in Europe
- Motivated Reasoning
- Narrative Analysis
- National Interbranch Politics in the United States
- Nationalism
- NATO, Politics of
- Negative Campaigning
- Neoclassical Realism
- New Institutionalism Revisited, The
- Nigerian Politics and Government
- North America, Comparative Politics of
- Oil, Politics of
- Online Public Opinion Polling
- Organized Criminal Syndicates and Governance in Mexico and...
- Origins and Impact of Proportional Representation, The
- Outcomes of Social Movements and Protest Activities
- Partisan and Nonpartisan Theories of Organization in the U...
- Partisan Polarization in the US Congress
- Partisan Polarization in the US Electorate
- Party Networks
- Party System Institutionalization in Democracies
- Peace Operations
- Personality and Politics
- Personalization of Politics
- Philippine Politics and Government
- Plato’s Political Thought
- Policy Feedback
- Policy Responsiveness to Public Opinion
- Political Ambition
- Political Economy of Financial Regulation in Advanced Ind...
- Political Economy of India
- Political Economy of Taxation, The
- Political Geography in American Politics
- Political Humor and Its Effects
- Political Institutions and the Policymaking Process in Lat...
- Political Obligation
- Political Participation and Representation, Black
- Political Parties and Electoral Politics of Japan
- Political Roles and Activities of Former Presidents and Pr...
- Political Thought, Hegel's
- Political Thought of the American Founders, The
- Politics and Government, Australian
- Politics and Government, BeNeLux
- Politics and Policy in Contemporary Argentina
- Politics, Gender Quotas in
- Politics of Anti-Americanism
- Politics of Class Formation
- Politics of Disaster Prevention and Management
- Politics of Ethnic Identity in China
- Politics of Financial Crises
- Politics of Foreign Direct Investment in South Asia
- Politics of Higher Education in the U.S.
- Politics of Internal Conquest in the United States and Can...
- Politics of Japan
- Politics of Natural Disasters, The
- Politics of North Korea
- Politics of Science and Technology
- Politics of South Africa
- Politics of Southern Africa
- Politics of the American South
- Politics of the Philippines: From Rizal to Duterte
- Politics of the US-Mexico Border
- Populism
- Populism in Latin America
- Positive and Negative Partisanship
- Postcolonial Political Theory
- Postcolonialism and International Relations
- Post-Communist Democratization
- Preferential Trade Agreements, Politics of
- Presidential Candidate Selection in Comparative Perspectiv...
- Presidential Persuasion and Public Opinion
- Presidential Primaries and Caucuses
- Private Governance
- Protest Participation
- Public Opinion, Cross-National Surveys of
- Public Opinion in Affluent Democracies
- Public Opinion in Europe toward the European Union
- Public Opinion in New Democracies and Developing Nations
- Public Opinion on Immigration
- Public Opinion toward the Environment and Climate Change i...
- Public Presidency, US Elections, and the Permanent Campaig...
- Qualitative Methods, The Renewal of
- Race in American Political Thought
- Racial and Ethnic Descriptive Representation in the United...
- Recruitment and Selection for Elected Office
- Redistricting and Electoral Competition in American Politi...
- Referendums and Direct Democracy
- Regime Transitions and Variation in Post-Communist Europe
- Regional Integration
- Regional Integration in Latin America
- Regional Security
- Regulating Food Production
- Religion and Politics in Latin America
- Religion in American Political Thought
- Religion in Contemporary Political Thought
- Religion, Politics, and Civic Engagement in the United Sta...
- Republicanism
- Rousseau’s Political Thought
- Rule of Law
- Russia and the West
- Science and Democracy
- Science and Social Movements
- Secession and Secessionist Movements
- Semi-Presidential Systems
- Social Networks, Mass Publics, and Democratic Politics
- Social Policy and Immigrant Integration
- South Asian Political Thought
- South Korea, Politics of
- Southeast Asia, International Relations in
- Southeast Asian Politics
- Spanish Politics and Government
- Spectacle, The
- Sport and Politics
- State Building in Sub-Saharan Africa
- State Formation
- State, The Nature of the
- State-Society Relations in South Asia
- Stereotypes in Political Reasoning
- Supreme Court and Public Opinion
- Supreme Court of the United States, The
- Systemic Theories of International Politics
- Taiwan, Politics of
- Tea Party, The
- Thailand, Politics of
- The Crisis of European Integration in Historical Perspecti...
- The New Right in American Political Thought
- The Politics of Parenthood: Attitudes, Behavior, Policy, a...
- The Politics of Waste and Social Inequalities in Indian Ci...
- Third-Party Politics in the United States
- Tocqueville’s Political Thought
- Transboundary Pollution
- Transitional Justice
- Transnational Private Regulation
- Trust in Latin American Governing Institutions
- Turkey, Political Development of
- US Military Bases Abroad
- US Politics, Neoliberalism in
- US Presidency, The
- US Presidential Campaigns and Their Impact
- Venezuela, The Path Toward Authoritarianism in
- Voter Support for Women Candidates
- Voter Turnout
- Voter Turnout Field Experiments
- Voting Technology and Election Administration in the Unite...
- War, Factors Influencing Popular Support for
- Welfare State Development
- Welfare State Development in Latin America
- Welfare State Development in Western Europe
- West Africa, Politics of
- White Identity Politics
- Women and Conflict Studies
- Women’s Inclusion in Executive Cabinets
- Women’s Legal and Constitutional Rights
- Women’s Political Activism and Civic Engagement in Latin A...
- Women’s Representation in Governmental Office in Latin Ame...
- Women’s Representation in the Middle East and North Africa
- Workers’ Politics in China
- Youth and Generational Differences in US Politics