Skip to main content

Advertisement

Log in

Biologically plausible single-layer networks for nonnegative independent component analysis

  • Original Article
  • Published:
Biological Cybernetics Aims and scope Submit manuscript

Abstract

An important problem in neuroscience is to understand how brains extract relevant signals from mixtures of unknown sources, i.e., perform blind source separation. To model how the brain performs this task, we seek a biologically plausible single-layer neural network implementation of a blind source separation algorithm. For biological plausibility, we require the network to satisfy the following three basic properties of neuronal circuits: (i) the network operates in the online setting; (ii) synaptic learning rules are local; and (iii) neuronal outputs are nonnegative. Closest is the work by Pehlevan et al. (Neural Comput 29:2925–2954, 2017), which considers nonnegative independent component analysis (NICA), a special case of blind source separation that assumes the mixture is a linear combination of uncorrelated, nonnegative sources. They derive an algorithm with a biologically plausible 2-layer network implementation. In this work, we improve upon their result by deriving 2 algorithms for NICA, each with a biologically plausible single-layer network implementation. The first algorithm maps onto a network with indirect lateral connections mediated by interneurons. The second algorithm maps onto a network with direct lateral connections and multi-compartmental output neurons.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Bahroun Y, Chklovskii D, Sengupta A (2021) A normative and biologically plausible algorithm for independent component analysis. Adv Neural Inf Process Syst 34:7368–7384

    Google Scholar 

  • Bee MA, Micheyl C (2008) The cocktail party problem: what is it? How can it be solved? And why should animal behaviorists study it? J Comp Psychol 122(3):235

    Article  PubMed  PubMed Central  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Bronkhorst AW (2015) The cocktail-party problem revisited: early processing and selection of multi-talker speech. Atten Percept Psychophys 77(5):1465–1487

    Article  PubMed  PubMed Central  Google Scholar 

  • Colin CE (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25(5):975–979

    Article  Google Scholar 

  • Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Ann Rev Neurosci 18(1):193–222

    Article  CAS  PubMed  Google Scholar 

  • Donoho D, Stodden V (2003) When does non-negative matrix factorization give a correct decomposition into parts? Advances in neural information processing systems, 16

  • Erdogan AT, Pehlevan C (2020) Blind bounded source separation using neural networks with local learning rules. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 3812–3816

  • Friedrich RW, Gilles L (2001) Dynamic optimization of odor representations by slow temporal patterning of mitral cell activity. Science 291(5505):889–894

    Article  CAS  PubMed  Google Scholar 

  • Gschwend O, Abraham NM, Lagier S, Begnaud F, Rodriguez I, Carleton A (2015) Neuronal pattern separation in the olfactory bulb improves odor discrimination learning. Nat Neurosci 18(10):1474–1482

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Huang K, Sidiropoulos ND, Swami A (2013) Non-negative matrix factorization revisited: uniqueness and algorithm for symmetric decomposition. IEEE Trans Signal Process 62(1):211–224

    Article  Google Scholar 

  • Hulse SH, MacDougall-Shackleton SA, Wisniewski AB (1997) Auditory scene analysis by songbirds: stream segregation of birdsong by European starlings (Sturnus vulgaris). J Comp Psychol 111(1):3

    Article  CAS  PubMed  Google Scholar 

  • Hyvärinen A, Hoyer P (2000) Emergence of phase-and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 12(7):1705–1720

    Article  PubMed  Google Scholar 

  • Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4–5):411–430

    Article  PubMed  Google Scholar 

  • Laughlin SB, Sejnowski TJ (2003) Communication in neuronal networks. Science 301(5641):1870–1874

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Laurberg H Christensen MG, Plumbley MD, Hansen LK, and Jensen SH (2008). On the uniqueness of NMF. Computational intelligence and neuroscience, Theorems on positive data, p 2008

  • Lipshutz D, Windolf C, Golkar S, Chklovskii DB (2020) A biologically plausible neural network for slow feature analysis. Adv Neural Inf Process Syst 33:14986–14996

    Google Scholar 

  • Lipshutz D, Bahroun Y, Golkar S, Sengupta AM, Chklovskii DB (2021) A biologically plausible neural network for multichannel canonical correlation analysis. Neural Comput 33(9):2309–2352

    Article  PubMed  Google Scholar 

  • McDermott JH (2009) The cocktail party problem. Current Biol 19(22):R1024–R1027

    Article  CAS  Google Scholar 

  • Narayan R, Best V, Ozmeral E, McClaine E, Dent M, Shinn-Cunningham B, Sen K (2007) Cortical interference effects in the cocktail party problem. Nat Neurosci 10(12):1601–1607

    Article  CAS  PubMed  Google Scholar 

  • Oja E, Plumbley M (2004) Blind separation of positive sources by globally convergent gradient search. Neural Comput 16(9):1811–1825

    Article  PubMed  Google Scholar 

  • Pehlevan C, Chklovskii DB (2019) Neuroscience-inspired online unsupervised learning algorithms: artificial neural networks. IEEE Signal Process Mag 36(6):88–96

    Article  Google Scholar 

  • Pehlevan C, Mohan S, Chklovskii DB (2017) Blind nonnegative source separation using biological neural networks. Neural Comput 29(11):2925–2954

    Article  PubMed  Google Scholar 

  • Plumbley M (2002) Conditions for nonnegative independent component analysis. IEEE Signal Process Lett 9(6):177–180

    Article  Google Scholar 

  • Plumbley MD (2003) Algorithms for nonnegative independent component analysis. IEEE Trans Neural Netw 14(3):534–543

    Article  CAS  PubMed  Google Scholar 

  • Plumbley MD, Oja E (2004) A“nonnegative PCA”algorithm for independent component analysis. IEEE Trans Neural Netw 15(1):66–76

  • Rivera-Alba M, Hanchuan P, de Polavieja GG, Chklovskii DB (2014) Wiring economy can account for cell body placement across species and brain areas. Current Biol 24(3):R109–R110

    Article  CAS  Google Scholar 

  • Shinn-Cunningham BG (2008) Object-based auditory and visual attention. Trends Cognit Sci 12(5):182

    Article  Google Scholar 

  • Wilson RI, Mainen ZF (2006) Early events in olfactory processing. Ann Rev Neurosci 29:163–201

    Article  CAS  PubMed  Google Scholar 

  • Yuan Z, Oja E (2004) A fastICA algorithm for non-negative independent component analysis. In international conference on independent component analysis and signal separation, Springer, pp 1–8

Download references

Acknowledgements

We are grateful to Lucy Reading-Ikkanda for creating Figs. 2 and 3. We thank Siavash Golkar, Johannes Friedrich, Tiberiu Tesileanu, Alex Genkin, Jason Moore and Yanis Bahroun for helpful comments and feedback on an earlier draft of this work. We especially thank Siavash Golkar for pointing out that, in Sect. 4.2, \(\mathbf{W}_{XZ}(\mathbf{x}_t-\overline{\mathbf{x}}_t)(\mathbf{x}_t-\overline{\mathbf{x}}_t)\) is not equal to \((\mathbf{c}_t-\overline{\mathbf{c}}_t)(\mathbf{x}_t-\overline{\mathbf{x}}_t)\) due to the (suppressed) time-dependency of the weights \(\mathbf{W}_{XZ}\). Finally, we thank the two referees for their careful reading of our paper and for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Lipshutz.

Additional information

Communicated by Benjamin Lindner.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Saddle point property

Here we recall the following minmax property for a function that satisfies the saddle point property (Boyd and Vandenberghe 2004, section 5.4).

Theorem 1

Let \(V\subseteq \mathbb {R}^n\), \(W\subseteq \mathbb {R}^m\) and \(f:V\times W\rightarrow \mathbb {R}\). Suppose f satisfies the saddle point property; that is, there exists \((\mathbf{a}^*,\mathbf{b}^*)\in V\times W\) such that

$$\begin{aligned}&f(\mathbf{a}^*,\mathbf{b})\le f(\mathbf{a}^*,\mathbf{b}^*)\le f(\mathbf{a},\mathbf{b}^*),\\&\quad \text {for all }(\mathbf{a},\mathbf{b})\in V\times W. \end{aligned}$$

Then

$$\begin{aligned} \min _{\mathbf{a}\in V}\max _{\mathbf{b}\in W} f(\mathbf{a},\mathbf{b})=\max _{\mathbf{b}\in W}\min _{\mathbf{a}\in V} f(\mathbf{a},\mathbf{b})=f(\mathbf{a}^*,\mathbf{b}^*). \end{aligned}$$

B Optimization over neural activity matrices \((\mathbf{Y},\mathbf{N})\) in the derivation of Algorithm 1

In this section, we show that when \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{YN}\) are at their optimal values, the optimal neural activity matrices \((\mathbf{Y},\mathbf{N})\) can be approximated via projected gradient descent. We first compute that optimal values of \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{YN}\).

Lemma 1

Suppose \((\mathbf{W}_{XY}^*,\mathbf{W}_{YN}^*,\mathbf{Y}^*,\mathbf{N}^*)\) is an optimal solution of objective (5). Then

$$\begin{aligned} \mathbf{W}_{XY}^*=\mathbf{P}\mathbf{A}^\top ,&\mathbf{W}_{YN}^{*,\top }\mathbf{W}_{YN}^*=\mathbf{P}\mathbf{A}^\top \mathbf{A}\mathbf{P}^\top , \end{aligned}$$

for some permutation matrix \(\mathbf{P}\).

Proof

From (Pehlevan et al. 2017, Theorem 3), we know that every solution of the objective

$$\begin{aligned}&\underset{\mathbf{Y}\in \mathbb {R}^{d\times T}}{{\text {arg}}\,{\text {min}}}\;-{{\,\mathrm{Tr}\,}}(\delta \mathbf{Y}^\top \delta \mathbf{Y}\delta \mathbf{X}^\top \delta \mathbf{X})\nonumber \\&\quad \quad \text {subject to}\quad \delta \mathbf{Y}^\top \delta \mathbf{Y}\preceq T\mathbf{I}_T\quad \text {and}\quad \mathbf{Y}=\mathbf{F}\mathbf{X}, \end{aligned}$$
(16)

is of the form \(\mathbf{Y}=\mathbf{F}\mathbf{X}\), where \(\mathbf{F}\) is a whitening matrix. In particular, since \(\mathbf{Y}=\mathbf{F}\mathbf{A}\mathbf{S}\) and \(\mathbf{S}\) also has identity covariance matrix, \(\mathbf{Y}\) is an orthogonal transformation of \(\mathbf{S}\). Furthermore, since \(\mathbf{S}\) is well grounded, by (Plumbley 2002, Theorem 1), \(\mathbf{Y}\) is nonnegative if and only if \(\mathbf{F}\mathbf{A}\) is a permutation matrix. Therefore, every solution \(\mathbf{Y}^*\) of the objective

$$\begin{aligned}&\underset{\mathbf{Y}\in \mathbb {R}_+^{d\times T}}{{\text {arg}}\,{\text {min}}}\;-{{\,\mathrm{Tr}\,}}(\delta \mathbf{Y}^\top \delta \mathbf{Y}\delta \mathbf{X}^\top \delta \mathbf{X})\nonumber \\&\quad \text {subject to}\quad \delta \mathbf{Y}^\top \delta \mathbf{Y}\preceq T\mathbf{I}_T\quad \text {and}\quad \mathbf{Y}=\mathbf{F}\mathbf{X}, \end{aligned}$$
(17)

is of the form \(\mathbf{Y}^*=\mathbf{P}\mathbf{X}\) for some permutation matrix \(\mathbf{P}\). In addition, differentiating the expression

$$\begin{aligned} -{{\,\mathrm{Tr}\,}}(\delta \mathbf{Y}^\top \delta \mathbf{Y}\delta \mathbf{X}^\top \delta \mathbf{X}+\delta \mathbf{N}^\top \delta \mathbf{N}(\delta \mathbf{Y}^\top \delta \mathbf{Y}-T\mathbf{I}_T)), \end{aligned}$$
(18)

with respect to \(\delta \mathbf{Y}\) and setting the derivative equal to zero, we see that the at the optimal value, \(\delta \mathbf{N}^{*,\top }\delta \mathbf{N}^*=\delta \mathbf{X}^\top \delta \mathbf{X}=\delta \mathbf{S}^\top \mathbf{A}^\top \mathbf{A}\delta \mathbf{S}\).

Differentiating \(L_1\) with respect to \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{NY}\), we see that the optimal values for the synaptic weight matrices are achieved at \(\mathbf{W}_{XY}=\frac{1}{T}\delta \mathbf{Y}\delta \mathbf{X}^\top \) and \(\mathbf{W}_{YN}=\frac{1}{T}\delta \mathbf{N}\delta \mathbf{Y}^\top \). Thus,

$$\begin{aligned} \mathbf{W}_{XY}^*=\frac{1}{T}\delta \mathbf{Y}^*\delta \mathbf{X}^\top =\frac{1}{T}\mathbf{P}\delta \mathbf{S}\delta \mathbf{S}^\top \mathbf{A}^\top =\mathbf{P}\mathbf{A}^\top , \end{aligned}$$

and

$$\begin{aligned} \mathbf{W}_{YN}^{*,\top }\mathbf{W}_{YN}^*&=\frac{1}{T^2}\delta \mathbf{Y}\delta \mathbf{N}^{*,\top }\delta \mathbf{N}^*\delta \mathbf{Y}^\top \\&=\frac{1}{T^2}\mathbf{P}\delta \mathbf{S}\delta \mathbf{S}^\top \mathbf{A}^\top \mathbf{A}\delta \mathbf{S}\delta \mathbf{S}^\top \mathbf{P}^\top =\mathbf{P}\mathbf{A}^\top \mathbf{A}\mathbf{P}^\top . \end{aligned}$$

\(\square \)

Next, we show that when \(\mathbf{W}_{XY}\) and \(\mathbf{W}_{YN}\) are at their optimal values, the optimal \((\mathbf{Y}^*,\mathbf{N}^*)\) can be approximated by running the projected gradient dynamics in Eq. (6).

Lemma 2

Suppose \(\mathbf{W}_{XY}=\mathbf{P}\mathbf{A}^\top \) and \(\mathbf{W}_{YN}^\top \mathbf{W}_{YN}=\mathbf{P}\mathbf{A}^\top \mathbf{A}\mathbf{P}^\top \) for some permutation matrix \(\mathbf{P}\). Then

$$\begin{aligned} \mathbf{Y}^*=(\mathbf{W}_{YN}^\top \mathbf{W}_{YN})^{-1}\mathbf{W}_{XY}\mathbf{X}=\mathbf{P}\mathbf{S},&\mathbf{N}^*=\mathbf{W}_{YN}\mathbf{Y}^*. \end{aligned}$$
(19)

is a solution of the minmax problem

$$\begin{aligned}&\min _{\mathbf{Y}\in \mathbb {R}_+^{d\times T}}\max _{\mathbf{N}\in \mathbb {R}^{m \times T}}\frac{2}{T}{{\,\mathrm{Tr}\,}}\Big (\delta \mathbf{N}^\top \mathbf{W}_{YN}\delta \mathbf{Y}\nonumber \\&\quad -\delta \mathbf{Y}^\top \mathbf{W}_{XY}\delta \mathbf{X}-\delta \mathbf{N}^\top \delta \mathbf{N}\Big )\nonumber \\&\quad \text {s.t.}\quad \mathbf{Y}=\mathbf{F}\mathbf{X}. \end{aligned}$$
(20)

In particular, \((\mathbf{Y}^*,\mathbf{N}^*)\) is the unique solution of the minmax problem

$$\begin{aligned} \min _{\mathbf{Y}\in \mathbb {R}_+^{d\times T}}\max _{\mathbf{N}\in \mathbb {R}^{m \times T}}\frac{2}{T}{{\,\mathrm{Tr}\,}}\left( \mathbf{N}^\top \mathbf{W}_{YN}\mathbf{Y}-\mathbf{Y}^\top \mathbf{W}_{XY}\mathbf{X}-\mathbf{N}^\top \mathbf{N}\right) , \end{aligned}$$
(21)

which can be approximated by running the projected gradient dynamics in Eq. (6).

Proof

We first relax the condition that \(\mathbf{Y}\) be a nonnegative linear transformation of \(\mathbf{X}\) and consider the minmax problem

$$\begin{aligned}&\min _{\mathbf{Y}\in \mathbb {R}^{d\times T}}\max _{\mathbf{N}\in \mathbb {R}^{d \times T}}\frac{2}{T}{{\,\mathrm{Tr}\,}}\Big (\delta \mathbf{N}^\top \mathbf{W}_{YN}\delta \mathbf{Y}\\&\quad -\delta \mathbf{Y}^\top \mathbf{W}_{XY}\delta \mathbf{X}-\delta \mathbf{N}^\top \delta \mathbf{N}\Big ). \end{aligned}$$

After differentiating with respect to \(\delta \mathbf{Y}\) and \(\delta \mathbf{N}\), we see that this objective is optimized when the centered matrices \(\delta \mathbf{Y}\) and \(\delta \mathbf{N}\) are given by

$$\begin{aligned} \delta \mathbf{Y}=(\mathbf{W}_{YN}^\top \mathbf{W}_{YN})^{-1}\mathbf{W}_{XY}\delta \mathbf{X},&\delta \mathbf{N}=\mathbf{W}_{YN}\delta \mathbf{Y}. \end{aligned}$$

Next, we see that the above relations for the centered matrices hold when \(\mathbf{Y}\) and \(\mathbf{N}\) are given by Eq. (19), where we have used the fact that \(\mathbf{W}_{XY}=\mathbf{P}\mathbf{A}^\top \) and \(\mathbf{W}_{YN}^\top \mathbf{W}_{YN}=\mathbf{P}\mathbf{A}^\top \mathbf{A}\mathbf{P}^\top \). Note that \(\mathbf{Y}\) is a linear transformation of \(\mathbf{X}\) and \(\mathbf{Y}\) is nonnegative since it is a permutation of the nonnegative sources. It follows that \((\mathbf{Y},\mathbf{N})\) is also a solution to the constrained minmax problem (20). Finally, differentiating the objective in Eq. (21) with respect to \(\mathbf{Y}\) and \(\mathbf{N}\), we see that the optimal \(\mathbf{Y}\) and \(\mathbf{N}\) are again given by Eq. (19). \(\square \)

C Decoupling the interneuron synapses

The NICA algorithm derived in Sect. 4.1 requires the interneuron-to-output neuron synaptic weight matrix \(\mathbf{W}_{NY}\) to be the transpose of the output neuron-to-interneuron synaptic weight matrix \(\mathbf{W}_{YN}\). Enforcing this symmetry via a centralized mechanism is not biologically plausible and is commonly referred to as the weight transport problem.

Here, we show that the symmetry of the 2 weights asymptotically follows from the learning rules in Algorithm 1, even when the symmetry does not hold at initialization. Let \(\mathbf{W}_{NY,0}\) and \(\mathbf{W}_{YN,0}\) denote the initial values of \(\mathbf{W}_{NY}\) and \(\mathbf{W}_{YN}\). Then, in view of the updates rules in Algorithm 1, the difference \(\mathbf{W}_{NY}-\mathbf{W}_{YN}^\top \) after t updates is given by

$$\begin{aligned} \mathbf{W}_{NY}-\mathbf{W}_{YN}^\top = \left( 1-\eta \right) ^t (\mathbf{W}_{NY,0}-\mathbf{W}_{YN,0}^\top ). \end{aligned}$$

In particular, the difference decays exponentially.

Table 1 Optimal hyperparameters used for Algorithm 1, Algorithm 2, 2-layer NSM, and Nonnegative PCA

D Details of numerical experiments

The simulations were performed on an Apple iMac with a 2.8 GHz Quad-Core Intel Core i7 processor. For each of the algorithms that we implement, we use a time-dependent learning rate of the form:

$$\begin{aligned} \eta _t=\frac{\eta _0}{1+\gamma t}. \end{aligned}$$
(22)

To choose the parameters, we perform a grid search over \(\eta _0\in \{10^{-1},10^{-2},10^{-3},10^{-4}\}\) and over \(\gamma \in \{10^{-2},10^{-3},10^{-4},10^{-5},10^{-6},10^{-7}\}\). In Table 1 we report the best performing hyperparameters we found for each algorithm. We now detail our implementation of each algorithm.

  1. 1.

    Bio-NICA with interneurons (Algorithm 1): The neural outputs were computed using the quadratic convex optimization function solve_qp from the Python package quadprog. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs. In addition, if the norm of one of the row vectors of \(\mathbf{W}_{XY}\) fell below 0.1, we would replace the row vector with a random vector to avoid the row vector becoming degenerate, and if a singular value of \(\mathbf{W}_{XY}\), \(\mathbf{W}_{YN}\) or \(\mathbf{W}_{NY}\) fell below 0.01, we replaced the singular value with 1 (we checked every 100 iterations).

  2. 2.

    Bio-NICA with 2-compartmental neurons (Algorithm 2): The neural outputs were computed using the quadratic convex optimization function solve_qp from the Python package quadprog. We used the time-dependent learning rate of Eq. (22) and included \(\tau \in \{0.01,0.03,0.05,0.08,0.1,0.3,0.5,0.8,1,3\}\) in the grid search to find the best performance. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs. In addition, if an eigenvalue of \(\mathbf{W}_{ZZ}\) fell below 0.01, we replaced the eigenvalue with 1 to prevent \(\mathbf{W}_{ZZ}\) from becoming degenerate (we checked every 100 iterations).

  3. 3.

    2-layer NSM: We implemented the algorithm in Pehlevan et al. (2017) with time-dependent learning rates. For the whitening layer, we used the optimal time-dependent learning rate reported in Pehlevan et al. (2017): \(\zeta _t=0.01/(1+0.01t)\). For the NSM layer, we used the time-dependent learning rate of Eq. (22). To compute the neuronal outputs, we used the quadratic convex optimization function solve_qp from the Python package quadprog. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs.

  4. 4.

    Nonnegative PCA (NPCA): We use the online version given in Plumbley and Oja (2004). The algorithm assumes the inputs are noncentered and whitened. We performed the noncentered whitening offline. After each iteration, we checked if any output neuron had not been active up until that iteration. If so, we flipped the sign of its feedforward inputs.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lipshutz, D., Pehlevan, C. & Chklovskii, D.B. Biologically plausible single-layer networks for nonnegative independent component analysis. Biol Cybern 116, 557–568 (2022). https://doi.org/10.1007/s00422-022-00943-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00422-022-00943-8

Keywords

Navigation