Big Data in Political Science Research
- LAST MODIFIED: 27 September 2017
- DOI: 10.1093/obo/9780199756223-0232
- LAST MODIFIED: 27 September 2017
- DOI: 10.1093/obo/9780199756223-0232
In both applied politics and academic political science research, big data techniques have gained considerable traction for the purposes of analyzing causal relationships, making useful classifications, and forecasting. Generally speaking, big data methods refer to techniques that can tax existing software or hardware, thereby requiring a degree of ingenuity to deal with the computational demands of the data or the method. To that end, what can be classified as big data is a moving target from year to year and generation to generation as computer processing, storage, and software improve. To illustrate this pattern: Admiral Grace Hopper, inventor of the first compiler for a programming language, would routinely hand out 11.8-inch pieces of wire in lectures she gave. She remarked that, in a perfect vacuum, this is how far light could travel in a single nanosecond. Computers had to be small to be fast in order to minimize the distance traveled. As computers have gotten smaller and more efficient, in part due to Hopper’s work, the opportunities for analyzing larger data sets with more complex structures and with more complex machine learning methods becomes increasingly feasible. How has big data become a part of political research? In applied politics, campaigns now frequently engage in microtargeting, a type of cluster analysis that takes extensive databases about as many voters as possible and determines logical classifications for them. By tracking known behaviors to standardize the model, campaigns can then forecast what the ideal message would be for voters they are reaching out to. In academic political science research, meanwhile, many methods of machine learning are taking off in order to allow scholars to answer more complicated questions and use more complicated data. A large body of research treats texts such as written records or floor speeches as data, again using clustering algorithms to determine common speech patterns. A longer stream of research uses Monte Carlo and Markov Chain Monte Carlo techniques to evaluate methods and to estimate Bayesian models. Tree-based methods have emerged as a technique for escaping the curse of dimensionality—the problem that emerges when there are more variables that could potentially affect an outcome than can possibly be included in the model. Measurement techniques that may need to span many years and make comparisons across many different settings have become pivotal to political science. The following bibliography points to several general topics that the potential big data analyst may need to consider and several sources to consult within each topic.
The books listed here are all useful references for the analyst who is engaged in applied big data analysis. Some are references that analysts may want to consult as sources of R commands that would be relevant when implementing one’s own code (such as Grolemund and Wickham 2017 or Monogan 2015). A more big data–specific reference on programming can be found in James, et al. 2013 (which uses R); Joshi 2016 (which uses Python); or Press, et al. 1996 (which uses C). Johnson, et al. 1994–1995; Johnson, et al. 1997; and Johnson, et al. 2005 all serve as essential background on the probability theory that is critical to big and little data analysis. For background on computational algorithms that are key to big data, readers can consult Steele, et al. 2016 as well as Efron and Hastie 2016. All of these volumes provide information that the applied analyst may need to look up at times.
Cheney, Ward, and David Kincaid. Numerical Mathematics and Computing. 7th ed. Boston: Brooks/Cole, 2013.
This is an updated numerical analysis text for the era of big data. In addition to the inclusion of common numerical optimization routines, numerical differentiation, and integration, it covers mathematical preliminaries and floating point representation, linear systems, Monte Carlo methods and simulations, and a section on linear programming.
Efron, Bradley, and Trevor Hastie. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science. New York: Cambridge University Press, 2016.
This up-to-date book by two very distinguished statisticians discusses the history of modern statistics since fast digital computers made more advanced statistical methods possible. The book discusses classical frequentist, Bayesian, and mixed methods. The discussions are lucid and very valuable to the understanding of current data science.
Golub, Gene H., and Charles F. Van Loan. Matrix Computations. 4th ed. Baltimore: Johns Hopkins University Press, 2013.
The development of efficient algorithms for high-dimensional data necessarily requires an understanding of advanced linear algebra and high-dimensional matrix computations. To this end, Golub and Van Loan provide an excellent introduction to efficient algorithms for advanced matrix computations. Among the topics covered in this text are a wide variety of matrix factorization techniques, parallel computation algorithms, and optimization routines.
Grolemund, Garrett, and Hadley Wickham. R for Data Science. Sebastopol, CA: O’Reilly, 2017.
For modern data analysis in R, Hadley Wickham’s packages provide essential tools. Wickham and Grolemund cover all of the most useful R packages for modern data analysis. This text covers the popular ggplot2 for beautiful data visualizations; dplyr for data transformations; data wrangling with tibble, readr, and tidy; stringr for dealing with text data, and many others. The text also covers document creation using R Markdown and practical programming advice. Available online.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning: With Applications in R. New York: Springer, 2013.
This book provides an excellent, intermediate-level introduction to statistical learning theory and applications in R. The text is the less advanced version of its more mathematically mature relative, The Elements of Statistical Learning, and provides an all-inclusive self-guiding tour through some of the most popular machine learning algorithms and implementations. This text is essential reading for graduate students and faculty in political science interested in learning some of the most basic machine learning algorithms.
Joshi, Prateek. Python Machine Learning Cookbook. Birmingham, UK: Packt, 2016.
This volume walks through a litany of topics including classifers, clustering, text-as-data, image analysis, neural networks, and visualizing data. It focuses on algorithms in Python to complete each of these sorts of tasks. Example Python code is included throughout the text, and there is a dedicated GitHub page with data and code from the book.
Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. Continuous Univariate Distributions. 2d ed. 2 vols. New York: Wiley, 1994–1995.
This and the other two books in the series (Johnson, et al. 1997 and Johnson, et al. 2005) are essential reference works for continuous and discrete distributions used in probability and statistics. These two volumes specifically focus on probability distributions of a single variable that has a continuous metric, such as the normal, logistic, beta, uniform, χ2, t, F, and many more. Many underlying distributions like extreme value distributions (a motivator of the logistic) are also included.
Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. Discrete Multivariate Distributions. New York: Wiley, 1997.
This book covers the complex area of discrete multivariate distributions, wherein multiple variables are discrete in nature. These include the multinomial, multivariate Poisson, and others. All of these books by Johnson, Kotz, and Balakrishnan cover the history and mathematical derivations of well-known distributions with references to the original papers by the mathematicians and statisticians who derived them. The authors extensively cross-reference the distributions because so many distributions have obscure variants that were derived to solve specific statistical problems.
Johnson, Norman L., Samuel Kotz, and N. Balakrishnan. Univariate Discrete Distributions. 3d ed. New York: Wiley, 2005.
This volume lays much of the theoretical foundation behind all probability distributions: gamma and beta functions, Bayes’ theorem, moments of a probability distribution, order statistics, and many other features. It then describes families of discrete distributions, which generally take on values that are some subset of nonnegative integers (such as a count or a binary outcome). Distributions covered include the binomial, Poisson, negative binomial, and several others.
Monogan, James E., III. Political Analysis Using R. New York: Springer, 2015.
This book offers an initial and intermediate introduction on the statistical program R. In particular, chapter 8 introduces how user-written add-on packages allow analysts to apply data-intensive methods in R, such as roll-call scaling or Markov Chain Monte Carlo analysis. Chapters 10 and 11 introduce R’s programming functionality, allowing applied analysts of big data more flexibility than many programs when conducting research.
Press, William H., Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C: The Art of Scientific Computing. Cambridge, UK: Cambridge University Press, 1996.
Press and colleagues provide a foundational text for implementing basic numerical analysis algorithms in C. One of the great benefits of a text that implements numerical analysis algorithms in C is that the simplicity of the language serves well as a conceptual basis for other, more modern-day programming languages such as C++, R, and Python. This book covers the most relevant numerical algorithms used for applied linear algebra problems, integration, random number generation, sorting and optimization.
Steele, Brian, John Chandler, and Swarna Reddy. Algorithms for Data Science. Cham, Switzerland: Springer, 2016.
This text is an essential introduction for anyone planning on effectively analyzing high-dimensional data. It reads like a good introductory statistics book that covers basic visualization and data analysis techniques, but with an eye toward scalability for each method implemented. In addition to covering linear regression, the text covers tools for handling big data, such as Hadoop and MapReduce, and also discusses elementary machine learning algorithms, such as k-means clustering and naïve Bayes.
VanderPlas, Jake. Python Data Science Handbook. Sebastopol, CA: O’Reilly, 2017.
This text is a great reference for those just starting to use Python. The book discusses how to effectively utilize iPython notebooks; describes the basic elements of Python’s most popular numerical computation packages, NumPy and Pandas; and also discusses Matplotlib for data visualization. This text covers basic machine learning algorithms available through the Scikit-Learn package, including naïve Bayes, linear regression, support vector machines, decision trees and random forests, principal components analysis, and k-means clustering.
Users without a subscription are not able to see the full content on this page. Please subscribe or login.
How to Subscribe
Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.
Purchase an Ebook Version of This Article
Ebooks of the Oxford Bibliographies Online subject articles are available in North America via a number of retailers including Amazon, vitalsource, and more. Simply search on their sites for Oxford Bibliographies Online Research Guides and your desired subject article.
If you would like to purchase an eBook article and live outside North America please email email@example.com to express your interest.
- Advanced Democracies, Electoral System Reform in
- Advanced Democracies, Public Opinion and Public Policy in
- Advertising and Election Campaigns in the United States
- Africa, Comparative Politics of
- American Indian Politics
- Arab-Israel Conflict, The
- Arendt, Hannah
- Aristotle's Political Thought
- Arms Race Modeling
- Australia and New Zealand, Comparative Politics of
- Authoritarianism in Russia
- Bicameralism in Stable Democracies
- Big Data in Political Science Research
- Biopolitics and State Regulation of Human Life
- Brazilian Political Development
- Business-State Relations in Europe
- Campaign Finance in the Era of Super-PACS
- Canadian Foreign Policy
- Candidate Emergence and Recruitment
- Channels of Electoral Representation in Advanced Industria...
- China's One-Child Policy
- China-Taiwan Relations
- Chinese Communist Party
- Chinese Economic Policy
- Chinese Nationalism
- Civil Society in South Asia
- Civil War in Sub-Saharan Africa
- Civil-Military Relations in Asia
- Civil-Military Relations in Latin America
- Class in American Politics
- Comparative Capitalism Theory
- Comparative Industrial Relations in Europe
- Comparative Politics of Angola, Mozambique, and Guinea-Bis...
- Comparative Politics of Chile and Uruguay
- Comparative Politics of Federalism
- Comparative Politics of the Middle East and North Africa
- Congress, Defense, and Foreign Policy
- Congressional Reassertion of Authority
- Conservative Litigation Strategies and Groups in US Judici...
- Corruption in China
- Cosmopolitan Political Thought
- Crisis of European Integration in Historical Perspective, ...
- Critical Theory and the Frankfurt School
- Cuban Political Development
- Cycles of Protest
- Democracy and Authoritarianism in Sub-Saharan Africa
- Democracy and Dictatorship in Central Asia
- Democracy in Latin America
- Democratic Citizenship
- Democratic Consolidation
- Democratic Peace Theory
- Democratic Theory
- Democratization in Africa
- Democratization in Central America
- Democratization in Mexico
- Development of Survey Research
- Direct Democracy in the United States
- East Africa, Politics of
- Economic Voting
- Election Forecasting
- Election Laws in Democracies
- Electoral and Party System Development in Sub-Saharan Afri...
- Electoral Change in Latin America
- Emotion and Racial Attitudes in Contemporary American Poli...
- Environmental Governance
- Environmental Politics among Advanced Industrial Democraci...
- Ethnic Diasporas and US Foreign Policy
- Ethnic Politics
- Eurasia, Comparative Politics of
- European Social Democracy
- European Union, Politics of the
- Failed and Weak States in Theory and Practice
- Far-Right Parties in Europe
- Federalism in the United States
- Field Experiments
- Filibuster, The
- Gender and Electoral Politics in the United States
- Gender and International Relations
- Gender, Behavior, and Representation
- Global Inequality
- Globalization and the Welfare State
- Globalization, Health Crises, and Health Care
- Governance in Africa
- Governmental Responses to Political Corruption
- Gridlock and Divided Government in the U.S.
- Historiography of Twentieth-Century American Conservatism,...
- Hobbes’s Political Thought
- Hume’s Political Thought
- Hybrid Regimes
- Identity and Political Behavior
- Ideological Reasoning in Politics
- Immigrant Incorporation in Canada
- Immigrant Incorporation in Western Europe
- Immigration and International Relations
- Immigration Politics and Policy in the United States
- Impact of Campaign Contributions on Congressional Behavior...
- Implicit Attitudes in Public Opinion
- Income Dynamics and Politics in North America and Europe
- Income Inequality and Advanced Democracies
- Income Inequality in the United States, The Politics of
- Indian Democracy
- Indigenous Rights and Governance in Canada, Australia, and...
- Informal Practices of Accountability in Urban Africa
- Institutional Change in Advanced Democracies
- Intellectual Property in International Relations
- Interest Groups and Inequality in the United States
- Interest Groups in American Politics
- International Conflict Management
- International Criminal Justice
- International Law
- International NGOs
- International Political Economy of Illegal Drugs
- Internet and Politics, The
- Iran, Political Development of
- Israeli Politics
- Judicial Supremacy and National Judicial Review
- Judiciaries and Politics in East Asia
- Kant's Political Thought
- Labor Politics in East Asia
- Land Reform in Latin America
- Latin America, Democratic Transitions in
- Latin America, Environmental Policy and Politics in
- Latin America, Guerrilla Insurgencies in
- Latin America, Social Movements in
- Legal Mobilization
- LGBT Politics in the United States
- Liberal Pluralism
- Local Governments in the United States
- Machiavelli’s Political Thought
- Marx's Political Thought
- Mass Incarceration and US Politics
- Mechanisms of Representation
- Media Effects in Politics
- Media Politics in South Asia
- Minority Political Engagement and Representation in the Un...
- Modern Dynastic Rule
- Modern Elections and Voting Behavior in Europe
- National Interbranch Politics in the United States
- NATO, Politics of
- Negative Campaigning
- Neoclassical Realism
- New Institutionalism Revisited, The
- North America, Comparative Politics of
- Oil, Politics of
- Origins and Impact of Proportional Representation, The
- Outcomes of Social Movements and Protest Activities
- Partisan and Nonpartisan Theories of Organization in the U...
- Partisan Polarization in the US Congress
- Partisan Polarization in the US Electorate
- Party Networks
- Peace Operations
- Personality and Politics
- Plato's Political Thought
- Policy Responsiveness to Public Opinion
- Political Economy of Financial Regulation in Advanced Ind...
- Political Economy of India
- Political Economy of Taxation, The
- Political Geography in American Politics
- Political Obligation
- Political Parties and Electoral Politics of Japan
- Political Thought, Hegel's
- Political Thought of the American Founders, The
- Politics and Policy in Contemporary Argentina
- Politics of Anti-Americanism
- Politics of Class Formation
- Politics of Disaster Prevention and Management
- Politics of Financial Crises
- Politics of Foreign Direct Investment in South Asia
- Politics of Higher Education in the U.S.
- Politics of Internal Conquest in the United States and Can...
- Politics of Japan
- Politics of Natural Disasters, The
- Politics of North Korea
- Politics of Science and Technology
- Politics of South Africa
- Politics of Southern Africa
- Postcolonialism and International Relations
- Post-Communist Democratization
- Preferential Trade Agreements, Politics of
- Presidential Persuasion and Public Opinion
- Presidential Primaries and Caucuses
- Private Governance
- Public Opinion in Advanced Industrial Democracies
- Public Opinion in New Democracies and Developing Nations
- Public Presidency, US Elections, and the Permanent Campaig...
- Qualitative Methods, The Renewal of
- Race in American Political Thought
- Racial and Ethnic Descriptive Representation in the United...
- Regime Transitions and Variation in Post-Communist Europe
- Regional Integration in Latin America
- Regional Security
- Regulating Food Production
- Religion in American Political Thought
- Religion in Contemporary Political Thought
- Religion, Politics, and Civic Engagement in the United Sta...
- Rousseau's Political Thought
- Rule of Law
- Russia and the West
- Science and Democracy
- Social Policy and Immigrant Integration
- South Korea, Politics of
- Spectacle, The
- State Building in Sub-Saharan Africa
- State Formation
- State, The Nature of the
- Supreme Court of the United States, The
- Systemic Theories of International Politics
- Taiwan, Politics of
- Tea Party, The
- The New Right in American Political Thought
- Transitional Justice
- Transnational Private Regulation
- Turkey, Political Development of
- US Military Bases Abroad
- US Presidency, The
- Voter Turnout
- Welfare State Development
- Welfare State Development in Latin America
- Welfare State Development in Western Europe
- West Africa, Politics of
- Worker Politics in China
- Youth and Generational Differences in US Politics