Wu, et al. (2022) - Brain-inspired Global-Local Learning Incorporated with Neuromorphic Computing

I presented this paper at INI Journal Club on Wednesday 28.09.2022 and thought of summarizing the paper in a blog post.

Main Contribution

This paper was recently published in Nature Communications (January 2022). It describes a hybrid learning rule combining local Hebbian-based plasticity with global error-based rules in a single meta-learning framework for SNN on neuromorphic hardware.

Background

To put things into perspective, the same group has released a fully digital neuromorphic chip named the Tianjic back in 2019. The chip is intended to be general-purpose hardware capable of implementing a broad range of algorithms from purely computer-science based algorithms to the more biologically-inspired computing algorithms.

In this paper, the group focuses on the algorithmic side and attempts to demonstrate the benefits, as I will discuss later, of the hybrid learning rule they came up with on the Tianjic chip (note this is not a full hardware implementation of the learning rule; they used a hardware simulator for that)

Local vs Global Plasticity

A big part of the neuromorphic models developed focus on one of two learning paradigms:

  1. (back) propagating a global error signal using gradient descent which I will refer to as global plasticity [GP]. This involves two pathways: a forward pass to compute the cost function followed by a backward pass to propagate the errors and update the weights layer by layer.
  2. Simple correlation based rules (Hebbian-like) which I will refer to as local plasticity [LP]. These rules are much easier to implement on the hardware and have shown promise on smaller subset of problems.

The idea of the paper is to combine both LP and GP in a hybrid plasticity framework [HP]. This framework can be summarized as learning the hyperparameters of the local plasticity rule (learning plasticity or meta-learning)

Biological relevance of meta-learning

We know from biological studies that:

Neuromodulators are media for signaling specific global variables and parameters that regulate distributed learning modules in the brain.(Doya, K. Metalearning and neuromodulation, (2002))

In other words, the synaptic weights are modified based on the activity of the pre and post neuron but are additionally modulated by a top-down signal. This signal can affect multiple parameters of the learning such as the learning rate, the update polarity,… etc. The idea of metalearning (aka learning to learn) in this context is hence about learning those hyperparameters of the modulatory signals.

Methods: How to formulate a hybrid learning combining LP and GP

Starting from a neuron

The authors started from the differential equation describing the membrane potential dynamics:

Unlike the common formulation for LIF neuron (also the one I am most used to), you can see that there are no synaptic dynamics in this first equation. Here the second term describes the effect of incoming spikes on the membrane potential $u_i$. $w_{i,j}$ is the synaptic weight between a pre neuron $j$ and post neuron $i$and $s_j$ are the train of spikes from the pre neuron.

Adding synapses…

Now to incorporate the synaptic dynamics, they expanded the weight term into two additional terms:

To dissect this further, what the weight equation is saying is basically that the weights can be split into two terms. One term $w(t_n)$ which is updated globally with back-propagation through time (BPTT). They also refer to it as a phasic term since it is only updated once some time T of the stimulus presentation has passed. A second term is updated more frequently at each pre-post spike (hence locally).

Now the question is: why split the weight update, right? 🤓 Simply to have those updates on different parts of the chip (at least that’s the reason to my understanding.)

Putting it all together

With this formulation they show how this can be implemented with the hybrid plasticity unit. A hybrid unit shows how LP and GP interact:

The next question to answer is how to optimize both $w_{GP}$ and $\theta$ with BPTT? The answer to this is to follow a bi-level optimization scheme:

Minor note: I have received a question after my presentation on to why we use two different data sets for the two-optimization levels? To be honest, I know it is common practice for meta-learning but I don’t really know why.

HP Evaluation

Benchmarks

The authors then validated the proposed hybrid learning on many tasks including conventional image classification tasks, few shot learning and continual learning.

Results


My Critic