This list is intended to introduce some of the tools of Bayesian statistics and machine learning that can be useful to computational research in cognitive science. The first section mentions several useful general references, and the others provide supplementary readings on specific topics. If you would like to suggest some additions to the list, contact Tom Griffiths. The sections covered in this list are: - General introduction
- Classics on the interpretation of probability
- Model selection and model averaging
- The EM algorithm
- Monte Carlo methods
- Graphical models
- Hidden Markov models and DBNs
- Bayesian methods and neural networks
## General introductionThere are no comprehensive treatments of the relevance of Bayesian
methods to cognitive science. However, The slides from three tutorials on Bayesian methods presented at the Annual Meeting of the Cognitive Science Society might also be of interest: - The 2004 tutorial by Josh Tenenbaum and Tom Griffiths (384 slides, 10.5MB, PowerPoint format).
- The 2006 tutorial by Tom Griffiths, Josh Tenenbaum, and Charles Kemp (Part I) (Part II) (Part III) (Part IV)
- The 2008 tutorial by Tom Griffiths, Josh Tenenbaum, and Charles Kemp (Part I, PPT) (Part I, PDF) (Part II, PPT) (Part II, PDF) (Part III, PPT) (Part III, PDF) (Part IV, PPT) (Part IV, PDF)
- The 2010 tutorial by Tom Griffiths, Josh Tenenbaum, and Charles Kemp (Part I, PPT) (Part I, PDF) (Part II, PPT) (Part II, PDF) (Part III, PPT) (Part III, PDF) (Part IV, PPT) (Part IV, PDF)
The 2006, 2008, and 2010 tutorials were based on material appearing in three papers: - Griffiths, T. L., & Yuille, A. (2006). A primer on probabilistic inference.
*Trends in Cognitive Sciences, 10,*(online supplement to issue 7). (pdf) - Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based Bayesian models of inductive learning and reasoning.
*Trends in Cognitive Science, 10,*309-318. (pdf) - Griffiths, T. L., Kemp, C., and Tenenbaum, J. B. (2008). Bayesian models of cognition. In Ron Sun (ed.),
*The Cambridge handbook of computational cognitive modeling*. Cambridge University Press. (manuscript pdf)
Modern artificial intelligence uses a lot of statistical notions, and one of the best places to learn about some of these ideas and their relevance to topics related to cognition is - Russell, S., & Norvig, P. (2002).
*Artificial Intelligence: A Modern Approach*(2nd ed.). Englewood Cliffs, NJ: Prentice Hall.
Radford Neal gave a tutorial presentation at NIPS 2004 on Bayesian machine learning, which outlines some of the philosophy of Bayesian inference, its relevance to the study of learning, and some fundamental methods. David Mackay has written an excellent introduction to information theory and statistical inference which covers many topics relevant to cognitive science: - Mackay, D. J. C. (2003).
*Information theory, inference, and learning algorithms.*Cambridge, UK: Cambridge University Press.
Two introductory books on Bayesian statistics (as statistics, rather than the basis for AI, machine learning, or cognitive science) that assume only a basic background, are - Sivia, D. S. (1996).
*Data analysis: A Bayesian tutorial.*Oxford: Oxford University Press. - Lee, P. M. (1997).
*Bayesian statistics.*New York: Wiley.
There are several advanced texts on Bayesian statistics motivated by statistical decision theory: - Berger, J. O. (1993).
*Statistical decision theory and Bayesian analysis.*New York: Springer. - Robert, C. P. (2001).
*The Bayesian choice: From decision-theoretic foundations to computational implementation.*New York: Springer.
The latter is more recent and covers computational methods relevant to Bayesian statistics. The relevance of statistical decision theory to human and machine learning is illustrated in the early chapters of - Duda, R. O., and Hart, P. E. (1973).
*Pattern classification and scene analysis.*New York: Wiley.
which are largely reproduced in the second edition - Duda, R. O., Hart, P. E., and Stork, D. G. (2000).
*Pattern classfication.*New York: Wiley.
The subjective interpretation of probability motivates other advanced texts: - Bernardo, J. M., & Smith, M. F. A. (1994).
*Bayesian theory.*New York: Wiley. - Jaynes, E. T. (1994).
*Probability theory: The logic of science.*(now available as a bound book)
The former builds on the work of De Finetti, exploring its consequences in a range of situations. The latter comes out of the approach taken by E. T. Jaynes in statistical physics. Finally, there are also several advanced texts motivated by statistical applications and data analysis: - Box, G. E. P., and Tiao, G. C. (1992).
*Bayesian inference in statistical analysis.*New York: Wiley. - Gelman, A., Carlin, J. B., Stern, H. S., Rubin,
D. B. (1995).
*Bayesian data analysis.*London: Chapman and Hall.
The former is a classic, illustrating how frequentist methods can be understood from a Bayesian perspective and then going far beyond them. The latter considers the practical problems that can be addressed using Bayesian models, and has chapters on modern computational techniques. Tom Minka has a number of tutorial papers that apply these ideas in several important cases, including inferring a gaussian distribution, inference about the uniform distribution, and Bayesian linear regression. ## Classics on the interpretation of probabilityDe Finetti gives a detailed account of the structure and consequences of subjective probability. Jeffreys discusses the idea of uninformative priors, and defines the approach to choosing priors that bears his name. Savage is the classic text on the decision-theoretic approach to probability. - De Finetti, B. (1992).
*Theory of probability.*New York: Wiley. - Jeffreys, H. (1939/1998).
*Theory of probability.*Oxford: Oxford University Press. - Savage, L. J. (1954).
*The foundations of statistics.*New York: Wiley.
## Model selection and model averagingA number of papers on model selection and model averaging by Raftery and colleagues are available here. There is also a webpage listing research on Bayesian model averaging. Some good reviews of both topics are: - Kass, R. E., and Raftery, A. E. (1994).
*Bayes factors.*Technical Report No. 254, Department of Statistics, University of Washington. - Hoeting, J. A., Madigan, D., and Raftery, A. E. (1999). Bayesian model
averaging: A tutorial.
*Statistical Science, 14*, 382-401. - Wasserman, L. (1997).
*Bayesian model selection and model averaging.*Technical Report No. 666, Statistics Department, Carnegie Mellon University.
Mackay gives a detailed account of how these methods can be applied in artificial neural networks: - MacKay, D. J. C. (1995) Probable
networks and plausible predictions - A review of practical Bayesian
methods for supervised neural networks.
*Neuron, 6*, 469-505.
## The EM algorithmA general introduction to the EM algorithm and its applications is given by Ghahramani and Jordan. Some of the motivation behind EM is explored by Neal and Hinton and in a tutorial by Minka.- Ghahramani, Z. and Jordan, M. I. (1994).
*Learning from incomplete data.*Technical Report No. 1509, AI Lab, MIT. - Neal, R. M., and Hinton, G. E. (1998). A view of the EM
algorithm that justifies incremental, sparse, and other variants. In
M. I. Jordan (ed.)
*Learning in graphical models*, pp. 355-368. Cambridge, MA: MIT Press.
## Monte Carlo methodsMackay motivates and explains several Monte Carlo methods. Neal gives a detailed introduction to Markov chain Monte Carlo. The other two books give examples of how these methods can be used in Bayesian models. - Mackay, D. J. C. (1998).
Introduction to Monte Carlo methods. In M. I. Jordan (ed.)
*Learning in graphical models*, pp. 175-204. Cambridge, MA: MIT Press. - Neal, R. M. (1993).
*Probabilistic inference using Markov chain Monte Carlo methods.*Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto. - Gilks, W. R. , Richardson, S., and Spiegelhalter,
D. J. (1995).
*Markov chain Monte Carlo in practice.*London:Chapman and Hall. - Gelman, A., Carlin, J. B., Stern, H. S., Rubin,
D. B. (1995).
*Bayesian data analysis.*London: Chapman and Hall.
## Graphical modelsThe classic reference on graphical models in artificial intelligence is - Pearl, J. (1988).
*Probabilistic reasoning in intelligent systems: Networks of plausible inference*. San Francisco, CA: Morgan Kaufmann.
This is supplemented by Pearl's more recent book, which considers how graphical models can be used to understand causality. In both books, the first two chapters introduce and motivate the ideas involved, while the later chapters explore the consequences of these ideas. - Pearl, J. (2000).
*Causality: Models, reasoning, and inference*. Cambridge, UK: Cambridge University Press.
Kevin Murphy has both a toolbox for simulating Bayesian networks in Matlab and a detailed tutorial on the subject, including an extensive reading list. Introductions to inference and learning in Bayesian networks are provided by Jordan and Weiss and Heckerman. - Jordan, M. I., and Weiss, Y. (2002). Graphical models: Probabilistic inference. In M. Arbib (Ed.), The Handbook of Brain Theory and Neural Networks, 2nd edition. Cambridge, MA: MIT Press.
- Heckerman, D. (1995). A tutorial on learning with Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research.
## Hidden Markov models and DBNsKevin Murphy has an excellent toolbox for HMMs, as well as a recently written chapter on dynamic Bayesian networks. The classic reference on HMMs is:- Rabiner, L. (1989). A tutorial on
Hidden Markov Models and selected applications in speech recognition.
*Proc. IEEE, 77*, 257-286.
## Bayesian methods and neural networksMacKay has written a number of papers integrating Bayesian methods with artifical neural networks. Some of the connections between neural networks and probability are explored by Jordan and Neal. |

Back