Origin of information-limiting noise correlations by Kanitscheider, et al. (2015)

Overview of this paper.

Information arriving form the sensory periphery is represented in the activity of neural populations in cortex. The responses of the cells within these populations is correlated. Such correlations can impact the amount of information that can be recovered about a stimulus from the population responses (Zohary, et al. 1994). However, the question of what type of correlations limit the information in a neural population, and where they are likely to originate has not been fully answered. In particular, correlations can (and do) arise from shared feedforward input, recurrent connectivity, and common, population wide modulations. Is any one of these sources primarily responsible for limiting information?

The present paper builds on earlier work which argues that information limiting correlations are primarily a reflection of peripheral sensory noise (Moreno-Bote, et al. 2014), and suboptimal computatiofiguren (Beck et al. 2012).  The following figure captures the idea of the first paper: The population activity changes as a function of the stimulus as  f(s). This traces out a curve in the space of neural responses (axes correspond to the average activity of each neuron).  The f’f’^T  noise in the figure is due to correlations that prevent the averaging out of noise along the direction of f(s).  These are the correlations that prevent  discrimination between the response to two nearby stimuli, f(s1) and f(s2), since they induce noise that cannot be averaged out. figure2.jpg

The questions is where do these information limiting correlations originate?  To answer this questions the authors construct a simple feedforward network of orientation tuned neurons responding to Gabor patches (see figure on right).  The simplicity of the setup makes it analytically tractable. The covariances can be approximated, allowing for further analytical insights. In particular, the law of total covariance shows immediately that correlations decay with difference in orientation preference, as observed in experiments.

The information processing inequality states that you cannot get more information about the visual input from the response of neurons in V1 than from neurons in LGN – here, and in many other references, this is made precise using Fisher Information, although see note below.  It therefore stands to reason that information limiting correlations are due to the  limited information available in the sensory periphery.

Importantly, the origin of information limiting correlations is easy to track in this setup.  An important point is that as the response properties, as characterized by the spatial filters of the different neurons, are changed, the tuning curves and correlations change in tandem. In a number of previous studies (including some of our work, Josić, et al. 2009), these characteristics of the neural response have been changed independently. Here and in previous work the authors correctly argue that this is not realistic, as it can lead to violations of the data processing inequality.

Interestingly, the Fisher information of the population response of the V1 layer in the model is FI_V1 = FI_LGN cos^2 (α), where α is the angle between I’(s) and the vector space spanned by the filters of the individual cells in V1. Thus if the subspace are spanned by the filters contains I’(s), no information is lost in the response of V1. This approach can be used to show that sharpening of tuning curves does not always increase Fisher information.

Global fluctuations shared between all neurons can also affect the information carried by the population response. Interestingly, the authors show that these correlations do not by themselves limit the information in the response, but do typically decrease it. Thus they rule out common top down activity modulation as the main source of information limiting correlations.

One of the most interesting results is the splitting of the covariance matrix of the neural response into an information-limiting part, and one that does not affect information. This allows us them examine the size of information limiting correlations. Perhaps surprisingly, in realistic settings information limiting correlations are pretty small – perhaps only 1% of the total correlations. This likely makes them difficult to measure, despite the fact that these are the correlations that have the highest impact on an optimal linear readout.

The feedforward setup and focus on linear Fisher information is what makes the analysis in the paper possible. However, it also means that the results mainly apply to fine discrimination of stimuli that can be parametrized by a scalar. The larger issue is that in most situations outside of the laboratory fine sensory discrimination may not be all that important. It is possible that the brain keeps as much information as possible about the world. I would argue that the processing of sensory information in most situations is a process of discarding irrelevant information, and preserving only what matters. In many of those cases, maximizing Fisher Information may may not be all that important.

However, the authors do make a good point that many sensory areas do operate in the saturated regime: The neurometric and psychometric thresholds can be comparable. This would not be expected in the unsaturated regime, where a single neuron would not contribute much to the population.

Note: The question of how Fisher information is related to other ways of quantifying encoding quality  is not completely resolved – see for instance this recent articlethis recent article. This touches on ethological and evolutionary questions, as the sensory systems have evolved to extract information that is important for an animal’s survival.


  1. Zohary, Ehud, Michael N. Shadlen, and William T. Newsome. “Correlated neuronal discharge rate and its implications for psychophysical performance.” Nature (1994): 140-143.
  2. Moreno-Bote, R, et al. “Information-limiting correlations.” Nature Neuroscience 17.10 (2014): 1410-1417.
  3. Josić, K, et al. “Stimulus-dependent correlations and population codes.” Neural computation 21.10 (2009): 2774-2804.
  4. Beck JM, Ma WJ, Pitkow X, Latham PE, Pouget A (2012) Not noisy, just wrong: The role of suboptimal inference in behavioral variability. Neuron 74(1):30–39.


The impact of synchrony and adaptation on signal detection

The role of synchrony in coding has long been debated. In particular, it is not clear if information can be conveyed through tightly coordinated spiking of groups of cells. I just caught up with this paper by Wang, et al  on how adaptation can modulate thalamic synchrony to increase the discriminability of signals. They stimulated the whiskers of anesthetized rats and recorded responses both in the thalamus and the part of the cortex to which these neurons project. They noticed that these cell will strongly adapt to stimulation. After adaptation it became more difficult to detect a stimulus, but it also became easier to discriminate between different stimuli. In other words, the range of responses (as measured by the total activity, ie number of spikes in the cortical region) became more discernible after adaptation. Surprisingly, the activity in the thalamus did not change in the same way after adaptation. However, the level of synchrony in the response of the thalamic cells displayed a higher diversity after adaptation. This translated into larger discriminability downstream.

Randy Bruno has a nice review of the role of synchrony, which gives an overview of the results of this paper

Convergence Speed in Distributed Averaging (Manisha Bhardwaj)

Reaching consensus and understanding the underlying convergence speed is one of the best studied problem in social network models and agent-based systems. Olshevsky and Tsitsiklis 2011 SIAM paper  describe many of the computational algorithms dealing with agreement and averaging formation on a communication network. The paper is focussed on analyzing already existing algorithms in the respective area and designing new efficient algorithms where consensus or agreement can lead to averaging algorithms with polynomial bounds on their resulting convergence rates.

Given a group of agents, each with its real-valued initial opinion in a communication network, influences its neighborhood opinion and hence every other agent in the network. These time-evolving opinions of all agents are expected to converge to the same point (average of initial opinions in case of averaging problem) provided each agent assigns appropriate weights to their neighborhood information (i.e. weights are entries of a stochastic matrix A) and also the dynamically evolving network is strongly connected. The convergence rate of such a process is solely determined by powers of matrix A in case of time-invariant communication network. Having an aperiodic and irreducible Markov chain determining the system in such a case is enough to guarantee convergence of such consensus algorithms. In order to have both agreement and averaging problem interleaved on an equal-neighbor, bidirectional graph, they proposed to run the agreement algorithm in parallel with two different initial opinions of every agent, one with scaled initial opinion by cardinality of the local neighborhood of each agent and other only depending on the latter condition of cardinality. The worst convergence time of such algorithm was shown to be O(n^3.

However, in case of dynamically evolving topology, the agreement or consensus algorithm is not polynomially bound as proved by Cao, Morse and Anderson. Olshevsky and Tsitsiklis in 2006  provided a remedy to such existing problem by proposing a “load-balancing algorithm” where agents share their initial load (or opinion) with their neighbors and try to equalize their respective loads (or opinions). Such an algorithm possess a polynomial bound on its convergence rate leading to a favorable performance.

Synergy and redundancy

There has been a bit of discussion about how to define synergy and redundancy in the information carried by set of random variables X1, X2, … Xn about a random variable Y.  Now V. Griffith and C. Koch have entered the debate with another measure which has several attractive features.  The surprising result is that a set of variables can be both synergistic and redundant.  This seems counterintuitive, but it makes sense as part of the information about Y could be carried synergistically, and another part redundantly in the set {Xi}.  The definition is fairly intuitive: Define a variable Y* which has minimal entropy among all variables Z dependent on Y,  such that I(Xi : Y) = I(Xi : Z) for all i.  Synergy is now the difference between the mutual information I(X1, X2, … Xn : Y) and the variable I(X1, X2, … Xn : Y*).  Redundancy can also be defined in a related way. Unfortunately, the fact that we need to minimize entropy means that analytical expressions for the synergy may be difficult to obtain.