Collective Animal Behavior from Bayesian Estimation and Probability Matching by Alfonso Perez-Escudero and Gonzalo G. de Polavieja

Overview of this paper.

Animals make decisions based on both local sensory information as well as social information from their neighbors (Couzin 2009). One common goal of animals’ decision is to choose environmental locations that are best for foraging food. Decision making by individuals that collect exclusively non-social information (e.g., availability of food, threats by predators, or shelter) has been modeled extensively using both heuristic and Bayesian inference frameworks (Bogacz et al 2006). In some instances, optimal decision strategies can be identified by applying Bayesian inference methods to relate accumulated evidence to the underlying truth of the environment. However, principled models of decision making using both social and non-social information have yet to be fully developed. Most collective decision making models tend to be heuristic equations that are then fit to data, ignoring essential components of probabilistic inference.


This paper aims to develop a probabilistic model of decision making in individuals using both local information and knowledge of their neighbors’ behaviors. For the majority of the paper, they focus on decision making between two options. This is meant to model recent experiements on stickleback foraging between two feeding sites (Ward et al 2008). The framework can be extended to a variety of contexts including more than two options as well as considerations of the history-dependence of group decisions, which the authors consider. They start with the assumption that each animal computes the probability that option Y is the “best” one (e.g., safest or highest yielding) given non-social information C and social information B.


Bayes’ theorem can then be used to compute

P(Y|C,B) = \frac{\displaystyle P(B|Y,C)P(Y|C)}{\displaystyle P(B|X,C)P(X|C)+P(B|Y,C)P(Y|C)}

A major insight of the paper is then that by dividing by the numerator, the effects of non-social information can be separated from social information

P(Y|C,B) = \frac{\displaystyle 1}{\displaystyle 1+aS}

where a=P(X|C)/P(Y|C) is the likelihood ratio associated with non-social information and S=P(B|X,C)/P(B|Y,C) contains all the social information.

Now, one issue with the social information term S is that it is comprised of behavioral information from all the other animals, and these behaviors are likely to be correlated. However, the authors assume the focal individual ignores these correlations for simplicity. It would be interesting to examine what is missed by making this independence assumption. In general, independence assumptions allow joint densities to be split into products P(x_1,x_2)=P(x_1)P(x_2), so assuming B=\{b_i\}_{i=1}^N then

S=\prod_{i=1}^N\frac{\displaystyle P(b_i|X,C)}{\displaystyle P(b_i|Y,C)}

For the majority of the paper, the authors focus on three specific behaviors: \beta_x, choosing site x; \beta_y, choosing site y; and \beta_u, remaining undecided. This means that the main parameters of the model are the likelihood ratios

s_k = \frac{\displaystyle P(\beta_k|X,C)}{\displaystyle P(\beta_k|Y,C)}

indicating how informative each behavior is about the quality of a particular site. Since the model has such a low number of parameters, it is straightforward to fit it to existing data.

data_perezThe authors specifically fit it to data collected from laboratory data on sticklebacks performing a binary choice task (Ward et al, 2008), where each option is equally good. In this case, the probability of a fish choosing site y simplifies considerably:

P_y = \left( 1 + s^{- \Delta n} \right)^{-1},

so there is only one free parameter s, which controls the strength of social interaction. For large values of s, the population very quickly will align itself with one of the two options, since animals make choices probabilistically based on P_y. Asymmetries are introduced in the experimental data by placing replica fish at one or both of the possible sites, and this intial condition influences the probability of the remaining fish’s selection. Remarkably, the single parameter model fits data quite well, as shown in the above figure.

From here, the paper goes on to explore more nuances in the model such as the case where one site is noticably better than another or when some replica fish are more attractive than others. All these effects can be captured and fit the data set from Ward et al (2008) fairly well. In general, social interactions in the model setup a bistable system, that tends to one of two steady states where almost all animals choose one of the two sites. This should not be surprising, since the function P_y has a very familiar sigmoidal form often taken as an interaction function in neural network models (Wilson and Cowan, 1972) and ferromagnetic systems. Again, these models tend to admit multistable solutions.

One issue that the authors explore near the end of the paper is the effect of dependencies on the ultimate probability of choice distribution. In this case, the history of a series of choice behaviors is taken into account by animals making subsequent decisions. In this case, animals may actually pay more attention to dissenting individuals that are in the minority than the majority of individuals that are aligned with the prevailing opinion. The general idea is that dissent could indicate some insight that that single animal has over the other. The authors’ exploration of this phenomenon is cursory, but it seems like there is room to explore such extensions in more detail. For instance, animals could weight the opinions of their neighbors based on how recently those decisions were made. An analysis of the influence of the order of decisions on the ultimate group decision would also be a way to generate a more specific link between model and data.

Note: The model the authors develop is closely linked to Polya’s urn, a toy model of how inequalities in distributions are magnified over time. Essentially, the urn contains a collection of balls of different colors (say black and white). A ball is then drawn randomly from the urn and replaced with two balls of that color. This step is then repeated. Thus, an asymmetry in the number of balls of each color will lead to the more prevalent color having a higher likelihood of being selected. This will lead to that color’s dominance being increased. The probability matching in the Perez-Escudero and Polavieja (2011) model plays the role of drawing and replacing process. The distribution of balls is effectively the probability distribution of selecting one of two choices.


Pérez-Escudero, A., & De Polavieja, G. G. (2011). Collective animal behavior from Bayesian estimation and probability matching. PLoS Comput Biol,7(11), e1002282.

Ward, A. J., Sumpter, D. J., Couzin, I. D., Hart, P. J., & Krause, J. (2008). Quorum decision-making facilitates information transfer in fish shoals.Proceedings of the National Academy of Sciences, 105(19), 6948-6953.

Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: a formal analysis of models of performance in two-alternative forced-choice tasks. Psychological review,113(4), 700.


Origin of information-limiting noise correlations by Kanitscheider, et al. (2015)

Overview of this paper.

Information arriving form the sensory periphery is represented in the activity of neural populations in cortex. The responses of the cells within these populations is correlated. Such correlations can impact the amount of information that can be recovered about a stimulus from the population responses (Zohary, et al. 1994). However, the question of what type of correlations limit the information in a neural population, and where they are likely to originate has not been fully answered. In particular, correlations can (and do) arise from shared feedforward input, recurrent connectivity, and common, population wide modulations. Is any one of these sources primarily responsible for limiting information?

The present paper builds on earlier work which argues that information limiting correlations are primarily a reflection of peripheral sensory noise (Moreno-Bote, et al. 2014), and suboptimal computatiofiguren (Beck et al. 2012).  The following figure captures the idea of the first paper: The population activity changes as a function of the stimulus as  f(s). This traces out a curve in the space of neural responses (axes correspond to the average activity of each neuron).  The f’f’^T  noise in the figure is due to correlations that prevent the averaging out of noise along the direction of f(s).  These are the correlations that prevent  discrimination between the response to two nearby stimuli, f(s1) and f(s2), since they induce noise that cannot be averaged out. figure2.jpg

The questions is where do these information limiting correlations originate?  To answer this questions the authors construct a simple feedforward network of orientation tuned neurons responding to Gabor patches (see figure on right).  The simplicity of the setup makes it analytically tractable. The covariances can be approximated, allowing for further analytical insights. In particular, the law of total covariance shows immediately that correlations decay with difference in orientation preference, as observed in experiments.

The information processing inequality states that you cannot get more information about the visual input from the response of neurons in V1 than from neurons in LGN – here, and in many other references, this is made precise using Fisher Information, although see note below.  It therefore stands to reason that information limiting correlations are due to the  limited information available in the sensory periphery.

Importantly, the origin of information limiting correlations is easy to track in this setup.  An important point is that as the response properties, as characterized by the spatial filters of the different neurons, are changed, the tuning curves and correlations change in tandem. In a number of previous studies (including some of our work, Josić, et al. 2009), these characteristics of the neural response have been changed independently. Here and in previous work the authors correctly argue that this is not realistic, as it can lead to violations of the data processing inequality.

Interestingly, the Fisher information of the population response of the V1 layer in the model is FI_V1 = FI_LGN cos^2 (α), where α is the angle between I’(s) and the vector space spanned by the filters of the individual cells in V1. Thus if the subspace are spanned by the filters contains I’(s), no information is lost in the response of V1. This approach can be used to show that sharpening of tuning curves does not always increase Fisher information.

Global fluctuations shared between all neurons can also affect the information carried by the population response. Interestingly, the authors show that these correlations do not by themselves limit the information in the response, but do typically decrease it. Thus they rule out common top down activity modulation as the main source of information limiting correlations.

One of the most interesting results is the splitting of the covariance matrix of the neural response into an information-limiting part, and one that does not affect information. This allows us them examine the size of information limiting correlations. Perhaps surprisingly, in realistic settings information limiting correlations are pretty small – perhaps only 1% of the total correlations. This likely makes them difficult to measure, despite the fact that these are the correlations that have the highest impact on an optimal linear readout.

The feedforward setup and focus on linear Fisher information is what makes the analysis in the paper possible. However, it also means that the results mainly apply to fine discrimination of stimuli that can be parametrized by a scalar. The larger issue is that in most situations outside of the laboratory fine sensory discrimination may not be all that important. It is possible that the brain keeps as much information as possible about the world. I would argue that the processing of sensory information in most situations is a process of discarding irrelevant information, and preserving only what matters. In many of those cases, maximizing Fisher Information may may not be all that important.

However, the authors do make a good point that many sensory areas do operate in the saturated regime: The neurometric and psychometric thresholds can be comparable. This would not be expected in the unsaturated regime, where a single neuron would not contribute much to the population.

Note: The question of how Fisher information is related to other ways of quantifying encoding quality  is not completely resolved – see for instance this recent articlethis recent article. This touches on ethological and evolutionary questions, as the sensory systems have evolved to extract information that is important for an animal’s survival.


  1. Zohary, Ehud, Michael N. Shadlen, and William T. Newsome. “Correlated neuronal discharge rate and its implications for psychophysical performance.” Nature (1994): 140-143.
  2. Moreno-Bote, R, et al. “Information-limiting correlations.” Nature Neuroscience 17.10 (2014): 1410-1417.
  3. Josić, K, et al. “Stimulus-dependent correlations and population codes.” Neural computation 21.10 (2009): 2774-2804.
  4. Beck JM, Ma WJ, Pitkow X, Latham PE, Pouget A (2012) Not noisy, just wrong: The role of suboptimal inference in behavioral variability. Neuron 74(1):30–39.