Farah Baracat 2022-04-30T04:02:05+00:00 http://hyde.getpoole.com Farah Baracat Day 5-Notes on Preprocessing Dataset 2022-04-26T00:00:00+00:00 http://hyde.getpoole.com/blog/2022/04/26/5_FFT <h1 id="fourier-transform-back-to-basics">Fourier Transform: back to basics</h1> <h3 id="why-do-we-need-negative-frequencies-and-what-is-the-meaning-of-a-negative-frequency-in-the-first-place">Why do we need negative frequencies and what is the meaning of a negative frequency in the first place?</h3> <p><em>notes from <a href="https://www.quora.com/What-is-the-meaning-of-negative-frequencies-in-the-Fourier-transform">answer on Quora</a></em></p> <p>Negative frequencies do not exist in reality. They are constructs in our heads.</p> <ol> <li>The idea behind it is actually very simple. Mr. Fourier said that any signal is composed of the superposition of sinusoidals with different frequencies.</li> <li>Mr. Euler said that a $2sin(2 \pi f.t) = e^{2i ft} + e^{-2i ft}$; $i= \sqrt(-1)$</li> </ol> <p>In other words, a sine is composed of two <strong>phasors</strong> (arrows rotating with constant speed f /sec, one is rotating counter clock-wise [1st term] and another one rotating clock-wise [2nd term]). To distinguish whether we are talking about a clock-wise or counter clock-wise rotation we have a negative sign next to the frequency (which designates the speed of rotation).</p> <p>Combining points 1 and 2:</p> <ul> <li>Since any signal can be constructed in time domain from a bunch of sinusoidals [<strong>Point 1</strong>], and since those sinusoidals are made up of a pair of phasors with opposing frequency signs [<strong>Point 2</strong>] then we get both positive and negative frequencies content when apply FFT to a signal.</li> <li> <p>Getting inverse FT, which is defined as:</p> <p>$x(t) = \int_{-\infty}^{\infty} F(\omega). exp^{2\pi i \omega} d\omega$</p> </li> </ul> <p>Note that the integration goes from $-\infty$ to $\infty$ for the same reason explained above.</p> Day 3-Literature Review on Proportional Control 2022-04-21T00:00:00+00:00 http://hyde.getpoole.com/blog/2022/04/21/proportional_control <h2 id="acronyms">Acronyms</h2> <ul> <li>DOF: Degrees-of-Freedom</li> <li>DOA: Degrees of actuation of a prosthesis.</li> </ul> <p>So what is the difference 😀?</p> <p>DoAs represent the number of independent actuator/motor, while DoF represent the number of movements we can execute on the prosthetic hand with these motors. For example, if a prosthetic hand has 5 intrinsic motors to control flexing and extension of the five fingers then we have 10 DOFs and 5 DOAs.</p> <h2 id="common-concepts-and-ideas">Common concepts and ideas</h2> <ul> <li>Continuous control is also often refered to as <strong>proportional control</strong>. <a href="https://ieeexplore.ieee.org/document/6205630">This paper by Fougner et al., 2012</a> provide a really nice overview on terminology and literature related to proportional control. Quoting from the paper: <blockquote> <p>Proportional control is exhibited by a prosthesis system if and only if the user can control at least one mechanical output quantity of the prosthesis (e.g. force, velocity, position or any function thereof) within a finite, useful, and essentially continuous interval by varying his/her control input within a corresponding continuous interval.</p> </blockquote> </li> <li>Given this definition, proportional control <strong>does not necessarily mean that relationship between the input and output of the system are proportional in a strict mathematical sense</strong>. It just means that changes in the EMG signal (i.e input to the system) lead to changes in the motor output of the system. This is to be contrasted with the <em>on-off control</em> in which there are only two possible outputs of the system (i.e turning <em>on</em> or <em>off</em> a prosthetic function).</li> <li>A simple example of proportional control as extracted, again, from <a href="https://ieeexplore.ieee.org/document/6205630">Fougner et al., 2012</a>: <blockquote> <p>“…a system in which the electromyogram (EMG) from flexors and extensors of the user’s forearm is measured, amplified, filtered and smoothed by two active electrodes. This provides estimates of EMG amplitudes that can be sent to a hand controller. After applying thresholds to remove uncertainty at low contraction levels, the controller sets a voltage applied to the motor that is proportional to the contraction intensity”.</p> </blockquote> </li> <li>People often talk about intuitiveness of a decoder or control strategy. What they mean by that is <strong>how natural the control scheme is or put differently how similar the motor functionality provided by the system is to that of a biological limb</strong>. It follows that the more proportional the control is the more intuitive it is. Intuitive controllers are the ones that map patterns of muscle co-activations to prosthesis DOAs. Therefore, we get a wider range of class of motions.</li> </ul> <hr /> <h2 id="regression-based-control">Regression-based control</h2> <h3 id="models-used">Models used</h3> <p>I kept coming back to these models when reading about regression-based control:</p> <ul> <li>Kalman filter</li> <li>Wiener filter</li> <li>Linear and non-linear regression algorithm (e.g. ANN, Random Forest Regression,…)</li> </ul> <h4 id="wiener-filter">Wiener Filter</h4> <p>Without going into a lot of mathematical details (because I would have to revise some mathematical foundations to be able to grasp everything), what I know for now is that:</p> <ul> <li>A <strong>Wiener filter</strong> is a signal processing technique for estimating a target variable using linear time-invariant filtering applied to a measured noisy signal. Speaking of “filters” usually lead to a discussion about convolution :D. In fact, the Wiener filter operation is about <strong>convolving the input signal (in my case this is EMG feature) with a finite impulse response function to produce an output y</strong> (e.g. position of a single DOA of the prosthetic hand).</li> </ul> <h4 id="linear-regression">Linear Regression</h4> <ul> <li>To evaluate a linear regression model, often $R^2$ (Coefficient of determination which is the square of the correlation coefficient R) and the RMSE (Root Mean Squared Error).</li> <li> <p>In RMSE, the difference between predicted and actual (called residuals). To compute the RMSE, we average the squared residuals then take the square root to put the metric back in the response variable ($y$) scale. <img src="/blog/figures/rmse_formula.png" alt="drawing" width="420" /><em>extracted from <a href="https://medium. com/wwblog/evaluating-regression-models-using-rmse-and-r²-42f77400efee">this article</a></em></p> </li> <li>$R^2$ measures the strength of the relationship between the response and the predictor variables in the model. Higher $R^2$, then means that the predictor variable is characterizing the variance observed in the response variable. Put differently, $R^2$ shows how well the variation we see in $y$ can be explained by the input $x$</li> </ul> <figure> <img src="/blog/figures/r2_formula.png" alt="drawing" width="400" /> <figcaption align="center"><b>$R^2$ formula</b></figcaption> </figure> <hr /> <h2 id="interesting-papers-i-read">Interesting papers I read</h2> <p>Krasoulis, Agamemnon, Sethu Vijayakumar, and Kianoush Nazarpour. “Effect of User Practice on Prosthetic Finger Control With an Intuitive Myoelectric Decoder.” <a href="https://www.frontiersin.org/article/10.3389/fnins.2019.00891">Frontiers in Neuroscience 13 (2019).</a> I might do a more extensive summary of the paper later.</p> <h3 id="aim-of-this-paper">Aim of this paper</h3> <p>To study the effect of user adaptation in continuous finger position control. The idea is that it was previously shown on classification tasks that user adaptation plays a huge role in improving the control accuracy. Here they extend this to continuous control where they hypothesize that:</p> <ol> <li>During closed-loop interaction with the prosthesis can improve the intuitive decoder, one this is mapping EMG features to prosthetic digit positions</li> <li>there is a need to evaluate the trained models in online setting and that offline decoding do not reliably predict real-time control outcomes</li> </ol> <h3 id="decoder-model">Decoder model</h3> <ul> <li>They use Wiener filter to decode finger positions form sEMG, (Ninapro DB8). The model performance is evaluated offline.</li> <li>A real-time control mode was also tested in which the trained model was used</li> </ul> Day 2-Looking for relevant papers 2022-04-12T00:00:00+00:00 http://hyde.getpoole.com/blog/2022/04/12/zshell <h2 id="trying-out-z-shell">Trying out Z-Shell</h2> <ul> <li>For installation tips and making it the default shell, I found this article <a href="https://www.sitepoint. com/zsh-tips-tricks/">sufficient</a></li> <li>Facing minor issues with the Python path since conda command is not recognized <ul> <li>I messed up the PATH variable, found the fix <a href="https://stackoverflow.com/questions/18428374/commands-not-found-on-zsh">here.</a></li> </ul> </li> </ul> <h2 id="some-literature-review-on-regression-based-control">Some literature review on regression-based control</h2> <ul> <li>There are some interesting papers: <ul> <li>Regression of Hand Movements from sEMG Data with Recurrent Neural Networks</li> <li>sEMG-based Regression of Hand Kinematics with Temporal Convolutional Networks on a Low-Power Edge Microcontroller</li> </ul> </li> </ul> Day 1-Notes on Ninapro database, muscle synergies, MEMS 2022-04-10T00:00:00+00:00 http://hyde.getpoole.com/blog/2022/04/10/ninapro_description <h2 id="ninapro-database">Ninapro Database</h2> <ul> <li>There are 10 datasets in NinaPro. Since the goal of my project is to estimate finger position as opposed to grip/grasp. <a href="http://ninaweb.hevs.ch/DB8_Instructions">Database 8</a> seems very suitable for the task.</li> <li>The paper describing this dataset can be found <a href="https://www.frontiersin.org/articles/10.3389/fnins.019. 00891/full">here</a>. DB 8 as the documentation clearly suggest should be used for regression problem and not classification. “Therefore, the use of stimulus/restimulus vectors as target variables should be avoided”)</li> </ul> <h3 id="db-8-description">DB 8 description</h3> <ul> <li>10 subjects able-bodied + 2 right-hand transradial amputees</li> <li>3 datasets collected from each subject <ul> <li>Dataset 1 &amp; 2: 10 repetition/movement</li> <li>Dataset 3: 2 repetition/movement</li> </ul> </li> <li>Each dataset consists of <ul> <li>9 movements + rest</li> <li>movements performed bilaterally</li> <li>movement duration: 6-9 sec + 3 sec rest</li> <li>Each trial includes reaching the position (flexion) then extension</li> </ul> </li> <li>Can use datasets 1 &amp; 2 for training and hyperparameter tuning, dataset 3 for testing, or merge 1 &amp; 2 and do a cross-validation then test on 3.</li> </ul> <h2 id="accelerometer-gyroscope-and-magnetometer-sensors-what-are-they-measuring">Accelerometer, Gyroscope and Magnetometer sensors: what are they measuring?</h2> <p><em>Extracted from <a href="https://www.maximintegrated.com/en/design/technical-documents/app-notes/5/5830.html">this blog</a>, and <a href="https://learn.sparkfun.com/tutorials/gyroscope/all">this cool blog</a></em></p> <ul> <li><strong>Micorelectromechanical systems (MEMS)</strong> are components combining mechanical and electrical elements into a small devices (of size in the micrometer range).</li> <li><strong>Accelerometer sensor</strong> measures the acceleration (obviously 😀) ($m/s^2$). Recap high-school physics, according to Newton’s Second law of motion: the $acc \propto force$. It measures either static or dynamic acceleration. In static acceleration, there is a constant force acting on the object (eg. gravity, friction). Dynamic acceleration, the acting forces are non-uniform, for example, car crashing leading to acceleration. <img src="/blog/figures/acceleration_translation_rotation.png" alt="" /></li> <li><strong>Gyroscopes</strong> measure deflection from a given orientation and angular velocity. Said differently, they measure rotational motion (in $\circ/s$ or revolutions per second). <img src="/blog/figures/Gyroscope-components-and-gyroscopic-precession.jpg" alt="" /></li> <li><strong>Magnetometer sensor</strong> measures relative changes in magnetic field at a particular location.</li> </ul> <hr /> <h2 id="muscle-synergies-in-semg">Muscle Synergies in sEMG</h2> <ul> <li> <p>In voluntary movement, the central nervous system (CNS) has its own way to coordinate muscle activations. We do not still know for sure how this is happening but one thing that we do know is that the CNS must be doing some sort of dimensionality reduction. This is referring to the fact that muscle activities are coordinated, synchronized. So the hypothesis is the CNS might be “representing all useful muscle patterns as combinations of a small number of generators” <a href="d'Avella, A., Saltiel, P. &amp; Bizzi">d’Avella, A., Saltiel, P. &amp; Bizzi, Nat Neurosci, 2003. </a> in order to <br /> allow for a wide range of degrees of freedom and great flexibility. In other words, <strong>there is only a small subset of controllers (i.e muscle synergies) that are combined in space and time to generate many muscle activity patterns.</strong></p> </li> <li> <p>People usually analyze such coherent activations of muscles using muscle synergies analysis. This refers to analyzing muscle patterns to extract the underlying “sources” for the observed muscle patterns. The generated muscle patterns are models are a “combinations of time-varying muscle synergies, that is, coordinated activations of a group of muscles with a specific time course for each muscle” <a href="d'Avella, A., Saltiel, P. &amp; Bizzi">d’Avella, A., Saltiel, P. &amp; Bizzi, Nat Neurosci, 2003. </a>.</p> </li> </ul> <p><strong>Note</strong>: It seems a very similar idea to <strong>blind source separation</strong> for decomposing the sources from a mixed signal (more on that later 🙃)</p> Daily Logs - Starting a new sEMG Project using NinaProDB 2022-04-10T00:00:00+00:00 http://hyde.getpoole.com/blog/2022/04/10/daily-logs <h2 id="aim">Aim</h2> <p>To better understand the characteristics of sEMG signals and their relation to the recorded kinematics (eg. joint angles, fingers position, speed,…etc. ). For this, the plan is to use the publicly available datasets from <a href="http://ninaweb.hevs.ch">NinaPro Database</a>.</p> <p>Eventually the ultimate goal is to research how to perform regression on the joint angles. Since the best way to learn is by doing, I will also code.</p> <p>The main questions I am planing to answer are:</p> <ul> <li>What is the current state-of-the-art (SoA) for this task?</li> <li>What are the current limitations of these SoA? (e.g. deployment, memory, power consumption, precision, latency,…)</li> </ul> <h2 id="day-1-april-10-2022">Day 1: April 10, 2022</h2> <h3 id="todays-progress">Today’s Progress</h3> <ul> <li>Setup this blog series</li> <li>Read about the dataset I will use and muscle synergies (<a href="2022-04-10-ninapro_description.md">My notes</a>)</li> </ul> <h3 id="thoughts">Thoughts</h3> <p>Extremely excited to start this project yet very tired already. Woke up at 3.30 am and its almost 8 am now…will probably go back to sleep now :D</p> <hr /> <h2 id="day-2-april-12-2022">Day 2: April 12, 2022</h2> <ul> <li>Not very productive, was skimming through for papers on regression-based control <a href="2022-04-12-zshell.md">notes</a>.</li> </ul> <hr /> <h2 id="day-3-april-21-2022">Day 3: April 21, 2022</h2> <h3 id="todays-progress-1">Today’s Progress</h3> <ul> <li>Spend a good amount of time (maybe 2 hours) debugging how to build my website. I had problems after upgrading to a new bundler version (e.g website not building, live-reload not working, unable to do bundle update or install,… )</li> <li>Started EDA notebook (loading data, some initial visualization and statistics) 💪🏼 <a href="https://github.com/FarahBaracat/ninapro_db8">(Git Repo)</a></li> <li>Some literature review on proportional control (<a href="2022-04-21-proportional_control.md">My Notes</a>)</li> </ul> <h3 id="thoughts-1">Thoughts</h3> <p>Great progress today, still a long way to go.</p> <hr /> <h2 id="day-4-april-24-2022">Day 4: April 24, 2022</h2> <h3 id="todays-progress-2">Today’s Progress</h3> <ul> <li>Wrote the project description for <a href="https://capocaccia.cc/en/">Capocaccia’s workshop</a></li> <li>Some minor additions to the EDA notebook: computing the stimuli correlations.</li> </ul> <hr /> <h2 id="day-5-april-26-2022">Day 5: April 26, 2022</h2> <h3 id="todays-progress-3">Today’s Progress</h3> <ul> <li>In the past days, I was working on the demo for Capocaccia’s 2022 with the MIA hand. We wanted to have a pipeline ready in which a spiking neural network is mapped onto a neuromorphic chip which interfaces with the hand.</li> <li>Continued working on the <a href="https://github.com/FarahBaracat/ninapro_db8">EDA notebook</a></li> </ul> Ding, Yu, Tian and Huang (2021) - Optimal ANN-SNN Conversion for Fast and Accurate Inference in Deep Spiking Neural Networks 2022-01-15T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2022/01/15/Huang <p>The paper can be found <a href="https://arxiv.org/abs/2105.11654">here</a>.</p> <hr /> <h2 id="motivation">Motivation</h2> <hr /> <h2 id="main-contribution">Main Contribution</h2> <hr /> <h2 id="methods">Methods</h2> <hr /> <h2 id="results">Results</h2> <hr /> <h2 id="final-thoughts">Final Thoughts</h2> Hwang, Kim & Park (2021) - Quantized Weight Transfer Method Using Spike-Timing-Dependent Plasticity for Hardware Spiking Neural Network 2021-12-29T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2021/12/29/Park <p>The paper can be found <a href="https://doi.org/10.3390/app11052059">here</a>.</p> <h2 id="why-am-i-reading-about-this-topic">Why am I reading about this topic?</h2> <p>I was curious to learn about how to convert a trained ANN to SNN. This approach is often mentioned in papers as an <strong>offline learning</strong> technique in which the synaptic weights are adjusted (well, surprise :D ) offline usually with gradient descent then transferred to the SNN. I started looking up how this is exactly accomplished with the goal of answering these questions:</p> <ul> <li>Is there a consensus on how to carry out the ANN to SNN conversion? is it a solved problem (the conversion I mean)?</li> <li>If so, what is the mathematical basis for this conversion?</li> <li>How to set the neuron’s threshold? more generally what happens to all these SNN parameters that do not have an ANN counterpart? are they trained? inferred? manually set?</li> </ul> <hr /> <h2 id="motivation">Motivation</h2> <p>The goal of this paper is to create a hardware-based SNN used as an “inference system”. Here, the authors do not care about training or coming up with an online method but are solely focused on how to precisely map the offline-trained weights to what they call “synaptic devices” (i.e memristors).</p> <h4 id="interlude-on-memristors-from-the-paper">Interlude on memristors (from the paper)</h4> <p>These devices change the synaptic weight between 2 neurons by adjusting the device conductance which occurs via its program (PGM) and erase (ERS) states. Therefore, to map a weight onto the hardware, PGM/ERS pulses are applied once or several times to reach the target synaptic conductance of the device.</p> <h4 id="back-to-the-main-work">Back to the main work</h4> <p>So what are the challenged to perform the above-mentioned conversion:</p> <ol> <li><strong>Low weight bit precision</strong>: while training offline, we usually use a full 32-bit precision floating point numbers. The synaptic devices have access to only limited discrete levels of conductance.</li> <li><strong>Scalability</strong>: according to the authors, the mapping process is time-consuming and hence do not scale well as the size of the network (i.e weights) increases. As described earlier, mapping a single weight corresponds to applying one or more pulses to an individual synaptic device, therefore, <em>if I understood correctly</em>, it would take <strong>time</strong> to repeat this process manually for each synaptic device <em>one by one</em>.</li> </ol> <hr /> <h2 id="main-contribution">Main Contribution</h2> <p>Using STDP (Spike-Timing-Dependent PLasticity) to transfer <strong>quantized</strong>, trained ANN-weights to SNN synaptic weights implemented with memristor devices. The efficacy of the method is demonstrated on the MINST dataset.</p> <p>The novelty of their approach, again according to the authors, lies in the use of STDP to adjust the synaptic conductance to match a target quantized weight. Usually, STDP is used to train the SNN. Here, it is exploited to map the weights to the hardware.</p> <hr /> <h2 id="methods">Methods</h2> <h3 id="how-can-we-operate-their-synaptic-transistor">How can we operate their synaptic transistor?</h3> <ul> <li> <p>The synaptic device used in this work is a 4-terminal transistor, fabricated in the same group. <a href="http://ieeexplore.ieee.org/document/7393453/">This paper</a> describe the fabrication process. The device schematic is shown in Figure 1.</p> </li> <li> <p>The device has two gate terminals (Gate 1 and Gate 2). Depending on the time difference of the voltage pulse applied to Gate 1 and Gate 2 and the phenomenon of <a href="https://en.wikipedia.org/wiki/Hot-carrier_injection">hot carrier injection</a>, the device conductance is changed (demonstrated in the inset of Figure 1c in the changing source to drain current sign and amplitude).</p> </li> <li> <p>The authors note that the curve in inset Figure 1c resembles the STDP curves and therefore fit the curve with 2 STDP equations describing potentiation and depression. From these fit, they inferred the parameters of the STDP (i.e $A_p$, $A_d$, $\tau_p$ and $\tau_d$).</p> </li> </ul> <p><img src="/lit_review/figures/four-terminal_syn_device.png" alt="synaptic_device" /></p> <h3 id="ann-trained-on-mnist">ANN trained on MNIST</h3> <ul> <li>3-layer ANN with unit <strong>784, 800 and 10</strong> in the input, hidden and output layers is trained with stochastic gradient decent.</li> <li>Trained weights are normalized by the maximum output activation or maximum weight then quantized using a <strong>linear quantization</strong> method (weight distribution is equally divided into a linear scale): the weight distribution is divided by <strong>$2n+1$ level</strong> to cover both the negative and positive weights equally. See Figure below ($\alpha$ is the interval/distance between two levels).</li> </ul> <p align="center"> <img src="/lit_review/figures/quantized_trained_weights.png" alt="weight_maps" width="350" /></p> <hr /> <h3 id="hardware-implementation-of-the-snn">Hardware implementation of the SNN</h3> <ul> <li> <p>Synapses are arranged in an array (resembling a crossbar array 🤷🏼‍♀️).</p> </li> <li> <p>Each synaptic weight in the SNN is implemented by a pair of “synaptic transistors” (one excitatory and one inhibitory).</p> </li> <li> <p>Pre-synaptic inputs is applied to Gate 1 and drain terminals. Excitatory and inhibitory currents, $I_{exc}$ and $I_{inh}$ , flow through the <a href="(https://semiengineering.com/whats-really-happening-inside-memory/)">bitlines (BL)</a> (a strip connecting source terminals vertically) to the post-synaptic neuron $k^l$; where l is the $l^{th}$ layer. A neuron hence which receives two input currents, therefore $BL^+$ and $BL^-$.</p> </li> <li> <p>Gate 2 terminals are used for weight mapping only and are not used during inference.</p> </li> </ul> <p><em><u>Note that</u></em></p> <ul> <li>During inference, the label is predicted based on the neuron index with the maximum firing rate (as in most cases).</li> <li>Since the pre-synaptic input is connected to one of the Gate terminals, the weight is changed only when there is a pre-synaptic input.</li> </ul> <p><img src="/lit_review/figures/memristor_snn.png" alt="arch" /></p> <h3 id="weight-transfer-by-stdp">Weight transfer by STDP</h3> <p>STDP is used to adjust the synaptic weighs “connected with the same BL simultaneously” in the following manner:</p> <ol> <li> <p>Find the exact pulse <strong>time difference</strong>, $\Delta t$, from the STDP equations obtained earlier.</p> </li> <li> <p><strong>Which synapse</strong> to change? since one synaptic weight is implemented by an excitatory and inhibitory synapse, the difference of the conductance of both synaptic devices represent this synaptic weight. They, therefore, only adjusted the excitatory synapse: $\Delta I_{source}= \Delta w$.</p> </li> <li><strong>Apply the necessary voltage to the gates</strong>. Given a neuron in the hidden layer which is connected to 784 inputs (28x28), we can represent the synaptic weights to this neuron with a 4 different weight maps (assuming a 4-level quantization): <ul> <li> <p>Each weight map consists of white and black dots, and corresponds to a quantized weight level.</p> </li> <li> <p>Each white dot in the map represents the presence synaptic connection with this quantized weight between the input neuron (pixel) and the neuron in question. <img src="/lit_review/figures/weight_maps.png" alt="weight_maps" /></p> </li> </ul> <p>Now if we activate the synapses connected to this neuron by applying a voltage-encoded input to Gate 1 of its pre-synaptic neurons (represented by white dots in the weight map) while providing an asymmetric voltage pulse (with the $\Delta t$ found in 1.) at Gate 2, we can then transfer one quantized weight (e.g. $w^{2+}$) to multiple synapses simultaneously.</p> </li> <li>If we then <strong>repeat the procedure in 3. for all the weight maps</strong> for each neuron in each layer, we can therefore modify all the synapses in the network.</li> </ol> <p align="center"> <img src="/lit_review/figures/weight_transfer.png" alt="weight_maps" width="400" /></p> <hr /> <h2 id="results">Results</h2> <p>Performance accuracy is definitely impacted by the number of quantization levels used. Still, with 3-level quantization, accuracy was <strong>97.58%</strong> (i.e close to the original 32-bit precision weights) with a noticeable reduction in the number of pulses required to map the weights onto the synaptic devices.</p> <hr /> <h2 id="final-thoughts">Final Thoughts</h2> <p>I am not familiar with this field of research, but it seems messy to me 😂 for <strong>3 reasons</strong>:</p> <ol> <li> <p>The curve of the STDP is steep (which is desirable to achieve different conductance levels). Given that the pulse difference is in the order of few $\mu S$, I am wondering how precisely can we actually map the quantized weights.</p> </li> <li> <p>I don’t quite understand why they used a pair of devices to encode a single weight and only programmed one 😏. Was it clear in the paper?</p> </li> <li> <p>It is still not very obvious (to me at least) whether they considered device mismatch when fitting the STDP parameters. It was briefly mentioned that they also considered device mismatch in the simulations. However, results of these analyses are not shown. I assume that different devices would have different STDP parameters and hence $\Delta t$ values. It follows that we cannot transfer exactly the same weight to the different synapses simultaneously…right?</p> </li> </ol> <blockquote> <p>I can easily say though that this is not what I was looking for when searching for ANN to SNN conversion 😅: the work described in the paper is very specific to the hardware implementation while I was looking for a more general mathematical approach. Yet, I read the paper 😀 (because why not).</p> </blockquote> Vecchio et al. (2020) - Tutorial, Analysis of Motor Unit Discharge Characteristics from High-Density Surface EMG Signals 2021-07-21T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2021/07/21/Farina-tutorial <p>The paper can be found <a href="https://www.sciencedirect.com/science/article/pii/S1050641120300419?via=ihub">here</a>.</p> <p><strong>Disclaimer:</strong> For this tutorial paper, I will follow a different structure for the post; it would resemble more a college student lecture notes 🤓.</p> <hr /> <h2 id="why-am-i-reading-this-tutorial-paper">Why am I reading this tutorial paper?</h2> <p>To better understand whether we really care about the precise timings of the motor unit discharges or only their spiking frequency that matters for muscle control. This question goes back to the famous debate of rate vs. time-based computation.</p> <p>After going through the paper, I noticed that the question is not addressed but still I found it an interesting paper. It goes into details of the EMG data acquisition, decomposition and visual inspection of the results of the decomposition. I might update this post later when I read these sections in more details.</p> <hr /> <h2 id="extracting-neural-information-from-hd-emg">Extracting neural information from HD-EMG</h2> <p><img src="/lit_review/figures/farina_tutorial_MUAP_EMG.png" alt="MUAPs_EMG" /></p> <ul> <li>EMG signal is affected by the timing of the discharges and the waveform of the APs of the MUs since it is the algebraic summation of these motor units APs.</li> </ul> <ol> <li> <p><strong>What factors affect the characteristics of MUAPs (MUAP amplitude)?</strong></p> <p>First of why this question is important 😀? since MUAPs directly relates to the EMG we might want to know how the shape of the MUAPs comes about (because this would relate to the observed EMG signals).</p> <ul> <li>Conduction velocity which is proportional to (scales with) the muscle fiber</li> <li>Number of innervated muscle fibers which is related to the MU recruitment threshold. However, the association of MUAP amplitude and MU recruitment threshold is not very straightforward.</li> </ul> </li> <li> <p><strong>What are the problems with relying on the EMG amplitude to estimate the neural drive?</strong></p> </li> </ol> <blockquote> <p><strong>Note:</strong> For a more detailed description of the limitations of the spectral and amplitude features of EMG to infer neural strategies of muscle control, I refer you to the paper by Del Vecchio et al., <em>Associations between motor unit action potential parameters and surface EMG features</em>, (2017).</p> </blockquote> <ul> <li>From what I just described above, the association between recruitment threshold and MUAP amplitude is weak as it depends on the distance between muscle fibers and recording electrodes. This means that the association between EMG amplitude and strength of neural drive is not very “clean”. As such the link between EMG amplitude and force.</li> <li>EMG amplitude varies across subjects, muscles and time; which makes comparisons across tasks, individuals challenging.</li> <li>Simulation results of EMG generation revealed that the amplitude feature of the signal is only a crude indicator of neural drive. <ul> <li>How did they reach this conclusion? they tested the link between MUAPs (which gives rise to EMG) and recruitment thresholds and found these two variables to be unrelated.</li> </ul> </li> <li>On the other hand, the estimated conduction velocity of the MUAPs was showed to be associated with the MU recruitment threshold across subjects and muscles. It forms therefore a more stable basis for estimating the neural drive. I guess what the authors are trying to convey here is that extracting features from the motor unit themselves is a better way to predict the neural drive compared to global features of sEMG.</li> </ul> <p>These challenges (to interpret features form sEMG) has pushed the research community to start looking onto iEMG and decomposition techniques. These approaches give a direct estimate of the neural drive through the identification of the MU discharge times.</p> <hr /> <h2 id="what-can-we-know-from-the-motor-units-discharge-properties">What can we know from the motor units discharge/ properties?</h2> <ul> <li><strong>Recruitment threshold</strong>: it directly maps to force when the first motor unit AP occurs. Here the force is the once produces by the muscle fibers innervated by the motorneuron. This force occurs with a delay that depends on the conduction velocity of the axons and the properties of the muscle fibers (active/passive) <ul> <li>How to estimate the recruitment threshold of the MUs? <ul> <li>Given this delay characteristic, we would have to measure the discharge properties of the MUs as the subject follows a trapezoidal force trajectories with controlled rates of increase/decrease in force (typically 5-20% MVC/s till they reach a plateau at 35-70% of maximal force).</li> <li>Measuring the recruitment/de-recruitment thresholds can be obtained by looking at the raster plots of the identified motor units. We can clearly see at which force level a new motor unit is being activated.</li> </ul> </li> </ul> </li> </ul> <p><img src="/lit_review/figures/farina_tutorial_recruitment_threshold.png" alt="MU_recruitment_threshold" /></p> <ul> <li><strong>Common synaptic input</strong>: we can extract the characteristics (in time and frequency domains) of the common input to the motorneuron pool. <ul> <li>Time domain: using cross-correlogram (cross-correlation) between the motor unit discharges.</li> <li>Frequency domain: we can estimate the frequency bands of this common input using coherence function which can be thought of a cross-correlation analysis in the frequency domain.</li> </ul> </li> <li> <p><strong>Strength of persistent inward currents (PICs)</strong>: these are the currents coming into the motorneurons and can reflect neuromodulatory inputs received by motorneurons.</p> </li> <li><strong>Other physiological information</strong>: such as the motor unit size can be extracted from the amplitude and conduction velocity.</li> </ul> <p>In that sense, the MU discharge times and characteristics give us access to a more accurate view of the neural drive and physiological properties which can be exploited to design intuitive controllers.</p> Lukyanenko et al. (2021) - Stable, Simultaneous and Proportional 4-DoF Prosthetic Hand Control via Synergy-Inspired Linear Interpolation 2021-07-20T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2021/07/20/Williams <p>The paper can be found <a href="https://jneuroengrehab.biomedcentral.com/articles/10.1186/s12984-021-00833-3">here</a>.</p> <h2 id="why-am-i-reading-this-paper">Why am I reading this paper?</h2> <p>I was looking into the current state of research on proportional and simultaneous control for prosthesis (i.e how is it achieved? how many DoFs? what about the system stability?)</p> <hr /> <h2 id="motivation">Motivation</h2> <p>Two main problems with current commercial prosthetic hand controllers:</p> <ol> <li>Limited DoFs (typically 1-2 DoF and 10-35% abandonment rate for commercial devices) <ul> <li>Notes: There are also the <strong>advanced prosthetic</strong> systems: 10-DoF DEKA/LIKE Arm, 16-DoF Modular Prosthetic Limb but these “place even higher demands on prosthetic hand controllers”, according to the authors.</li> </ul> </li> <li>Need recalibration which requires both time and energy from the user.</li> <li>Feedforward controller requires large amount of data to setup the controller (aka training). Large data means long recording sessions which is a burden on the user; it is not very practical.</li> </ol> <hr /> <h2 id="main-contribution">Main Contribution</h2> <p>This paper proposes a feed-forward controller that combines linear interpolation, muscle synergy framework and chronically implanted EMG electrodes to achieve a stable, <strong>intuitive</strong>, <strong>continuous</strong>, <strong>proportional</strong> high-DoF control (for 4+ DoF). The main goal here is to <strong>reduce the control training time</strong>.</p> <p><strong>Important Terminological Distinctions</strong></p> <ul> <li><strong>Intuitive</strong>: tailored to the user based on dataset recorded from this user</li> <li><strong>Continuous</strong>: regression-based, not limited to fixed discrete states</li> <li><strong>Simultaneous</strong>: allowing combinations of movements</li> <li><strong>Proportional</strong>: variable hand speed</li> </ul> <hr /> <h2 id="methods">Methods</h2> <ul> <li>Experiments carried on 2 trans-radial amputees</li> <li>Using chronically implanted Electromyographic electrodes (ciEMG). Why? because they were shown to improve feed-forward EMG controllers and hence provide more functional benefits to the user</li> </ul> <h3 id="feed-forward-emf-controllers">Feed-forward EMF controllers</h3> <p>A typical pipeline for feed-forward controllers involves:</p> <ol> <li>Recording &amp; Processing: filtering, windowing into 100-200ms, feature extraction (ex. MAV)</li> <li>Features are mapped to hand velocities.</li> </ol> <h3 id="synergy-theory">Synergy theory</h3> <ul> <li>Muscles are activated in synergies ( coordinated activations of groups of muscles dictated by a common neural signal).</li> <li>Study assumptions about synergy: 1) time-invariant synergies 2) EMG signals at steady state 3) Only meaningful feature is mean-absolute-value (mABS)</li> <li> <p>With these assumptions: EMG signal is the linear combination of EMG from underlying sub-movements.</p> </li> <li>How does Synergy help reduce the needed data? <ul> <li>Users have control over the synergy magnitudes which in turn are linearly proportional to force. This implies that each movement has a unique steady-state EMG signal. Hence, there is no need to record movements at different effort levels (if we have a controller that can do such linear mapping). This is especially true if the mapping between the EMG feature to synergy space is an “orthogonal” change of basis (basically 1-1 projection)</li> </ul> </li> </ul> <h3 id="linear-interpolation-vs-linear-regression-controller">Linear interpolation vs linear regression controller</h3> <ul> <li> <p>Here they use linear interpolation to map EMG feature (mABS) to movement intent with the goal of using the trained movements to predict EMG of un-trained movements (through interpolation, the relative relation of this new movement to the known ones is determined)</p> </li> <li> <p>Linear interpolation is piece-wise linear (only linear within the partition/region). We can think of it as a middle ground between linear and non-linear regression approaches.</p> </li> <li> <p><strong>Input-space</strong> (steady-state EMG features in this case) is partitioned into regions bounded by the recorded inputs</p> </li> <li> <p><strong>Output</strong> is the linear interpolation of these partitions (the goal is to fit the user-data (input-output pairs))</p> </li> <li> <p>Given a new input EMG signal, $p$: it is first scaled, $p$ is assigned to a partition. Then using linear interpolation, a movement is determined as linear combination of the DoFs, $c_p$ (i.e that would be the user effort in each DoF). Once this is determined, the hand velocity can be found through effort to hand velocity mapping (relation obtained in a “physiological-inspired” way)</p> </li> </ul> <p><img src="/lit_review/figures/williams_EMG_linear_interpolation.png" alt="linear_interpolation" /></p> <hr /> <h2 id="results">Results</h2> <p>The controller was evaluated on target matching tasks (3 DoFs and 4 DoFs movements). Percentage target match, time-to-target and path efficiency were measured. Controller stability evaluated over 8-10 months period.</p> <ol> <li>Linear interpolation controller can provide 4 DoF simultaneous, continuous, proportional control.</li> <li>The current model relies more on accurate trials rather than large datasets. It can be trained in as little as 3 min.</li> </ol> <hr /> <h2 id="limitations">Limitations</h2> <ol> <li>Computational resources: time to sequentially search through the simplices (is not scalable with number of EMG channels and 4+ DoFs)</li> <li>Generalization: Very specific to the recording modality (implanted EMG): “rest” class is not incorporated per se, it occurs when all the EMG signals are close to 0. This assumption won’t hold for sEMG as we can have non-zero signals even at rest.</li> </ol> <hr /> <h2 id="final-thoughts">Final Thoughts</h2> <ul> <li>For some reason this reminds me of KNN. This idea of linear interpolation relies on clustering all repetitions of the same movement then given a new input tries to associate it to the “nearest” cluster (more or less).</li> <li>Maybe the distinction here is that it associates the new signal to a linear combination of the known signals rather than a single best cluster?</li> <li>The idea of predicting the movement by assigning the EMG signal to one of the partition, I think, is relying on this Synergy theory they claim. The reason is that it assumes that EMG signal of this new movement is the combinations of simpler movements (the ones they trained on). In addition, according to the synergy framework, each movement has a unique EMG pattern.</li> </ul> Xu, Zheng and Hu (2021) - Estimation of Joint Kinematics and Fingertip Forces using Motorneuron Firing Activities. A Preliminary Report 2021-07-20T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2021/07/20/Hu-estimation <p>The paper can be found <a href="https://ieeexplore.ieee.org/document/9441433">here</a>.</p> <h2 id="why-am-i-reading-this-conference-paper">Why am I reading this conference paper?</h2> <p>To better understand how they combine decomposition techniques (i.e motorneuron firing patterns) to design a continuous controller.</p> <hr /> <h2 id="motivation">Motivation</h2> <p>There are two main approaches to drive robotic devices:</p> <ol> <li><strong>Pattern recognition approaches</strong>: classification-based, are limited to discrete states of user intent.</li> <li><strong>Continuous approaches</strong>: regression-based (ex. linear or quadratic regressor), are used to “continuously” estimate joint kinematics and forces (for instance). The problem with these approaches in the context of EMG is that they are unstable over time. They rely on the amplitude of the EMG signals as input features which deteriorate over time (due to amplitude drift and electrode shift).</li> </ol> <p>This motivates the use of motor unit action potentials (MUAPs) to estimate the neural drive instead of relying on the EMG signals (given that EMG signals “intrinsically comprise MUAPs”). MUAPs therefore provide a more stable basis for regression. Once the separation matrix is learnt, it can be directly applied to new HD-EMG signals.</p> <hr /> <h2 id="main-contribution">Main Contribution</h2> <p>The method for MU decomposition, validation and estimation of a finger joint angle in a dynamic task and force in an isometric task. The novelty here is, to my opinion 😀, in the MUs cross-trials validation on a preliminary study (validating the obtained MUs on a second trial before actually testing them).</p> <hr /> <h2 id="methods">Methods</h2> <ul> <li>3 healthy participants</li> <li>HD-EMG data acquired: 8x16 electrode array. Signals sampled at 2048 Hz and bandpassed 10-900 Hz <ul> <li>Load cell used to record the finger forces and angle sensors for joint angles recordings.</li> </ul> </li> <li>2 tasks were performed: isometric finger flexion force (index and middle finger) and dynamic finger movement task (flex and release repetitively)</li> </ul> <h3 id="mu-identification">MU identification</h3> <ul> <li>Decomposition used is based on the <strong>Fast independent component analysis (FastICA)</strong>.</li> <li>The algorithm is carried <strong>offline</strong>. The goal of the decomposition is to separate or resolve the superimposed (spatiotemporally) MUAPs into individual MU activities. This is achieved through the following steps: <ol> <li>Extending the raw signal ( by adding R delayed replicas of the original signals in each channel)</li> <li>Whitening the extended channels</li> <li>Getting the decomposed signal sources using a fixed-point iteration algorithm. This step leads to the separation vectors.</li> <li>Multiplying the EMG with the separation matrix to obtain the decomposed source signals (MUs spike trains).</li> </ol> </li> </ul> <h3 id="mu-validation">MU validation</h3> <p>To validate the decomposition, the separation matrix is computed/derived from a trial and then applied on a second trial before being used on a third testing trial:</p> <ul> <li>By applying the separation matrix on the second trial, they obtain a spike array of $M_i$ X $K_j$; M is the length of the EMG in the second trial and K is the number of decomposed MUs from the first trial.</li> <li>Run a regression analysis between the firing frequency of the spike trains and the measured joint angle/isometric force. The output of this analysis is $K_j$ $R^2$ values; how well each MU obtained on the first trial estimates the motor task on the validation trial.</li> <li>Retain the top 10 MUs with the strongest association to the motor task.</li> </ul> <hr /> <h2 id="results">Results</h2> <ul> <li>Estimation joint angle and isometric force from neural drive (MUs) is more accurate than EMG amplitude. It showed a smaller RMSE.</li> <li>For joint angle estimation, EMG amplitude error is around <strong>20 degrees</strong> compared to <strong>8 degrees</strong> for the MU approach during dynamic movement. <img src="/lit_review/figures/Hu_neuraldrive_vs_amp.png" alt="neural_drive_results" /> <img src="/lit_review/figures/Hu_MUs_neural_drive.png" alt="neural_drive_MU" /></li> </ul> <hr /> <h2 id="limitations">Limitations</h2> <ul> <li> <p>The decomposition is computationally intensive and is not therefore practical in real-time. However, a solution would be to obtain the MUs offline in an initialization period then use the separation matrix in real-time. The underlying assumption is that a common set of MUs is recruited across different muscle activations (i.e. MUs form a common input to the muscles).</p> </li> <li> <p>The decomposition was performed separately for each task. Is it feasible to estimate the neural drive (MU firing frequency) directly using neural network approaches? Interesting examples of ANN approaches for a generic decomposition to look at:</p> <ul> <li>Paper by Farina et al., <em>Deep Learning For Robust Decomposition of High-Density Surface EMG Signals</em>, (2020).</li> <li>Paper by Hu et al. <em>Real-time finger force prediction via parallel convolutional neural networks: a preliminary study</em>, (2020).</li> </ul> </li> </ul> <hr /> <h2 id="final-thoughts">Final Thoughts</h2> <ul> <li> <p>Overall, I think it is a nice paper. I didn’t get yet why choosing ICA in the first place as opposed to other decomposition techniques, but I guess this is related to their earlier paper on <em>Independent Component Analysis Based Algorithms for High-Density EMG Decomposition, Dai &amp; Hu (2019)</em>.</p> </li> <li> <p>One minor comment on the decomposition of EMG for the dynamic task: as the authors stated, extracting the MUs is challenging for the dynamic task, but why is this the case?</p> </li> </ul> Illing, Gerstner & Brea (2019) - Biologically Plausible Deep Learning - but how far can we fo with shallow networks? 2021-06-28T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2021/06/28/Illing <p>You can find the paper <a href="https://www.sciencedirect.com/science/article/pii/S0893608019301741#appE">here</a>.</p> <h2 id="why-am-i-reading-this-paper">Why am I reading this paper?</h2> <p>Stumbled upon this paper while researching how the delta rule is actually implemented in SNN: ex. which error signal do we consider? how to summarize the activity of the output layer over time into a single value to compute the loss?</p> <hr /> <h2 id="motivation">Motivation</h2> <ul> <li> <p>We (neuroscience/neuromorphic community 😀) care about the biological plausibility of the SNN learning rule mainly because of the current advances in neuromorphic hardware. People are looking for learning rules that can be implemented online, on chip…which leads us to local learning rules.</p> </li> <li> <p>The question then becomes <strong>how far can we go with a simple model composed of one hidden layer in which only the readout layer is locally trained?</strong> I find it a particularly interesting question because the performance of such simple model can be used as benchmark against the more elaborate models out there. In a way, it can help draw a baseline on what SNN can do without any complex training procedures or architectures.</p> </li> <li>So what is really <em>out there</em> for biologically-plausible training of SNN? i.e what are the alternatives to backprop for supervised training of multi-layer networks? Obviously there are two possibilities. For both cases, only the output (aka readout) layer is trained in a supervised fashion. Usually, this is a delta-rule. We can either: <ol> <li>Fix the weights of the first layer to random values: ex. extreme learning machine</li> <li>Train the first layer(s) with unsupervised learning which is particularly appealing because can be implemented with local learning rules</li> </ol> </li> <li>For completeness, more elaborate SNN models available in the literature include Conv layers with weight sharing, backprop approximations, multiple hidden layers, dendritic neurons, recurrence, conversion form rate to spikes.</li> </ul> <hr /> <h2 id="main-contribution">Main Contribution</h2> <p>Interesting comparison study contrasting a simple yet promising model trained with biologically plausible local learning rule to deep learning models trained with backprop.</p> <p>The main question the authors try to address is the following:</p> <blockquote> <p>Given a single hidden layer network and a biologically-plausible, spike-based, local learning rule, how well can we perform on standard classification tasks compared to rate-based networks trained with backprop.</p> </blockquote> <ul> <li><strong>Main outcome of this paper</strong> is that even a simple model trained with online stochastic GD (no minibatch) and a constant learning rate demonstrates comparable results to more “complicated” (aka elaborate) approaches.</li> </ul> <hr /> <h2 id="methods">Methods</h2> <ul> <li>Study different network topology and training procedure. All the studied networks consist of a single hidden layer: <ul> <li>Network trained with backprop [<strong>rate-based</strong>]</li> <li>Network with fixed random projections (RP) or random Gabor filters (RG) [<strong>rate-based</strong> and <strong>spike-based</strong>]</li> <li>Network trained with unsupervised learning in the hidden layers (ex. PCA, ICA, Sparse Coding (SC)) [<strong>rate-based</strong>]</li> <li>Network with sparse connectivity between input and hidden:either localized connectivity where the input to hidden is not fully connected or localized populations in which case the hidden layer is composed of independent populations sharing the same receptive field but compete with each other. This is to force the hidden units to learn different features. [<strong>rate-based</strong> and <strong>spike-based</strong>] <ul> <li>For the localized receptive field, this is a similar concept to CNN, a patch spanning p x p pixels of the input image is connected to particular neurons of the hidden layer.</li> </ul> </li> <li>A simple perceptron (SP) without a hidden layer: i.e direct classification of the input, serving as a lower bound on performance [<strong>rate-based</strong>]</li> </ul> </li> <li>Performance of these networks is compared on the MNIST task.</li> </ul> <p><img src="/lit_review/figures/illing_networks.png" alt="simulated_networks" /></p> <h3 id="local-supervised-learning-rule">Local Supervised Learning Rule</h3> <p>The supervised learning rule used for SNN is a supervised delta-rule via STDP:</p> <ul> <li>Error signal computed as the difference between a post-synaptic spike-trace $tr_i(t)$ and a post-synaptic target trace $tgt_i(t)$.</li> <li>Readout weights $w_2$ are updated at every pre-synaptic spike time $t_j^f$.</li> <li>Target trace $tgt_i(t)$ is a constant and predefined.</li> <li> <p>Post-synaptic trace $tr_i(t)$ is updated at every post-synaptic spike time $t_i^f$.</p> <p>$\tau_{tr} \frac{dtr_i(t)}{dt} = -tr_i(t) + \sum_f \delta (t-t_i^f)$</p> <p>$\Delta w_{2,ij} = \alpha. (tgt_i^{post}(t) - tr_i^{post}(t)) \delta (t-t_j^f)$</p> </li> </ul> <hr /> <h2 id="results">Results</h2> <ul> <li> <p>Unsupervised methods and random features projections perform better when paired with localized connectivity as compared to fully-connected topology.</p> </li> <li> <p>Random fixed inout weights/ Gabor filters perform better than using unsupervised learning for the input layer. In other words, using unsupervised learning does not add performance advantage.</p> </li> <li> <p>Using localized random projected and localized Gabor filters reach &gt;98% accuracy on MNIST dataset which is comparable to other biologically plausible DL models</p> </li> </ul> <p>A summary of networks performance on MNIST for biologically plausible DL models can be found in the table below:</p> <p><img src="/lit_review/figures/MNIST_benchmark.png" alt="summary" /></p> Frémaux & Gerstner (2016) - Neuromodulated STDP, and Theory of Three-Factor Learning Rules 2021-06-27T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2021/06/27/Gerstner <p>The paper can found here <a href="https://www.frontiersin.org/articles/10.3389/fncir.2015.00085/full">here</a>.</p> <h2 id="why-am-i-reading-this-review-paper">Why am I reading this review paper?</h2> <p>While researching supervised learning rules for SNN, one cannot escape the 3-factor learning rule 😀. This quite recent review paper nicely summarizes the influence of neuromodulation on STDP and provides experimental evidence on neuromodulation in the context of synaptic plasticity. It presents a general framework for 3-factor learning rules.</p> <hr /> <h2 id="motivation">Motivation</h2> <ul> <li> <p>Behavioral learning and memory formation are closely linked to synaptic plasticity. It is thanks to those connection strength changes between the neurons that we are able to learn new tasks or recognize novel items in our environment.</p> </li> <li> <p>Classical Hebbian learning (focusing on the joint activity between pre- and postsynaptic neurons) is limited to only sub-category of problems, i.e. unsupervised learning tasks. Although STDP (as a typical example of Hebbian learning) is indeed one of the main driving forces of synaptic plasticity, it fails, by design, to consider scenarios where “reward” or “novelty” come into play.</p> </li> <li> <p>Experimental results suggest that information about “reward” (success) or stimulus “novely” is conveyed by neuromodulators (ex. dopamine, acetylcholine, noradrenaline, serotonine,…) which act as “gates” to Hebbian plasticity.</p> </li> </ul> <hr /> <h2 id="framework-for-3-factor-learning-rule">Framework for 3-factor learning rule</h2> <ul> <li> <p>In a classifical Hebbian learning can be described by $\dot w = H(pre,post)$; where $\dot w$ is the synaptic weight change and $H$ is an arbitrary function of pre- and postsynaptic neuron.</p> </li> <li> <p>A Neo-Hebbian learning then takes the form of $\dot w = F(M, pre,post)$</p> </li> <li> <p>In this equation, $M$ is the modulator signal, sometimes also called <strong>global signal</strong>. The <em>pre</em> and <em>post</em> are <strong>local variables</strong> as they convey information about a specific neurons (pre and post). $F$ is a function which would determine the exact type of learning.</p> </li> <li> <p>Note that the choice of $M$ term can lead to different variants of the 3-factor learning depending on the task (the neuromodulator term $M$ can hence take a different form/role). In the context of reward-driven learning models or reward-modulated learning:</p> <ol> <li> <p>For policy gradient models, there is the <strong>R-max</strong> rule which is derived from reward maximization.</p> <ul> <li>Synapses form an eligibility trace, a form of transient memory of pre-post (Hebbian) coincidence (which is stored at the location of the synapse, i.e local). This memory decays exponentially over time with time constant $\tau_e$. The choice of the time constant can then be used to bridge the temporal gap between the neural activity and reward signal (a reward signal might come much later after the agent has already taken a decision) - An effective change of synaptic weight only happens when a neuromodulatory signal M is available. In other words, the synapses are first “marked/flagged” then updated when M comes in.</li> </ul> </li> <li> <p>R-STDP: modulates the standard STDP by a reward term</p> </li> <li> <p>Temporal-difference STDP (TD-STDP): arising from a reinforcement learning paradigm. On a very high-level (mainly because that’s not my field of expertise 😅), TD learning is concerned with predicting the value of each state environment so to choose an optimal policy that would lead to the state with the highest value.</p> </li> </ol> </li> </ul> <p><img src="/lit_review/figures/reward-modulated_lr.png" alt="reward_modulated_lr" /></p> <p><img src="/lit_review/figures/role_M.png" alt="role_M" /></p> Huh & Sejnowski (2018) - Gradient Descent for Spiking Neural Networks 2021-06-25T00:00:00+00:00 http://hyde.getpoole.com/lit_review/2021/06/25/Sejnowski <p>You can find the paper <a href="https://arxiv.org/abs/1706.04698">here</a>.</p> <h2 id="why-am-i-reading-this-paper">Why am I reading this paper?</h2> <p>I came across this one while researching stochastic vs batch training for SNN. My aim is to better understand how to accumulate the gradients and whether there is a prominent difference between accumulating the gradients in a rate-based networks (ANNs) vs. spiking nets.</p> <hr /> <h2 id="motivation">Motivation</h2> <p>One of the main limitations of SNN is that we don’t know how to train them and therefore we are unable to reap the benefits of spike-based computations.</p> <p>ANNs can be seen as a sub-category of SNN. In fact, a SNN can be reduced to a rate-based network in the “high firing-rate limit” (quoting the authors).</p> <p>One interesting question that was brought up in the paper is what kind of computation can we unlock if we manage to train SNN? What kind of computation becomes possible then?</p> <hr /> <h2 id="main-contribution">Main Contribution</h2> <p>A novel approach for training SNN that is not subjected to specific neuron models, network architecture, loss functions or, a particular task, one that is revolving around the formulation of a differentiable SNN for which the gradient can be derived.</p> <p><strong>Note</strong>: The goal is not to have a biologically-plausible learning rule but to derive an efficient learning method. Once SNN networks are trained, we can then try to analyze this Pandora’s box to reveal computational processes of the brain or come up with other algorithms that are hardware-friendly.</p> <hr /> <h2 id="previous-attempts-to-supervised-learning-in-snns">Previous Attempts to Supervised Learning in SNNs</h2> <ul> <li>Spike Response Models can simulate SNN dynamics without the need for integration. They rely on impulse response kernels to describe the network behavior.</li> <li>As SRM relies on the spike times (the state variables of SNNs), accurately calculating the derivatives of these spike times to obtain an update rule becomes challenging (simply because they are not differentiable :D ).</li> <li><strong>SpikeProp</strong> -“Error-backpropagation in temporally encoded networks of spiking neurons”, (2002) can be used to train feedforward networks of neurons firing a single spike. Other variants extended the algorithm to allow for multiple spikes for the input and hidden layers yet only the first output spike is considered for error propagation.</li> <li>Other algorithms that seem to work with variable spike counts have shorcomings as well: <ol> <li>Can train only an output/readout layer where a desired target output pattern can be supplied.</li> <li>Usually the learning rules suggested are neuron models-agnostic.</li> <li>Loss functions used penalize the error between a desired and target spike trains which in practice is not frequently available.</li> </ol> </li> <li> <p>Biologically-plausible rules that are based on Hebb’s postulate combining Hebbian learning and gradient descent such as STDP (ex. <strong>ReSuMe</strong> - “Supervised learning in spiking neural networks with resume:sequence learning, classification, and spike shifting” , (2010)) and reward-modulated STDP do not in general guarantee convergence? (not sure what to think about that though 😅)</p> </li> <li>Converting trained ANN models into spiking models do not help explore new computational solutions that utilized spike times. These are approaches are therefore limited in that sense.</li> </ul> <hr /> <h2 id="methods">Methods</h2> <h3 id="a-differentiable-formulation-of-snn">A differentiable formulation of SNN</h3> <ul> <li>In this paper, the authors rely on synaptic current dynamics.</li> <li>By using a <strong>differentiable current synapse model</strong>, the authors seem to get away from the non-differentiability problem of SNNs. The right question at this point is how do they do that? :)</li> </ul> <blockquote> <p>Most models describe the synaptic current dynamics as linear filter process which instantly activates when the presynaptic membrane voltage $v$ crosses a threshold.</p> </blockquote> <ul> <li> <p>This means that unless the membrane voltage crosses the threshold, we don’t get a synaptic response, so an all-or-none response.</p> </li> <li> <p>What is done differently in this paper is that they replace this threshold function with a non-negative <strong>gate function</strong>. This minor twist is quite helpful as it makes the synaptic current response <strong>change gradually</strong> throughout what they refer to as a <em>small active zone</em>, or area around the threshold (as opposed to a hard threshold). In other words, when the membrane voltage falls within this region, the synaptic current strength changes <strong>non-abruptly</strong>.</p> </li> <li> <p>We can summarize the cases hence as:</p> <ul> <li>Within the active zone, the membrane votage induces a graded synaptic response.</li> <li>Beyond this small region, the post-synaptic potential has a constant charge (area under the curve).</li> <li>Below the active zone, no synaptic response is generated.</li> </ul> </li> </ul> <p>which is equivalent to a soft-clipping function. <img src="/lit_review/figures/diff_synapse_model.png" alt="Synaptic_model" /></p> <h3 id="define-an-entire-network-around-this-new-formulation">Define an entire network around this new formulation</h3> <p>Now that we have a differentiable synaptic current model, we can formulate the network input-output dynamics around it.</p> <ul> <li>Network input: defined by the input current vector, $\vec{i }$.</li> <li>Network state: is fully described by the membrane voltage and current, $\vec{v(t)}$, $\vec{s(t)}$.</li> <li>Network output: defined as linear readout of the synaptic current, $\vec{s(t)}$. <img src="/lit_review/figures/diff_model.png" alt="Network_model" /></li> </ul> <p>Next step is to optimize the weight matrices for the input layer, recurrent connections and readout, $U$ , $W$ and $O$ which can now be accomplished by gradient descent.</p> <h3 id="gradient-descent">Gradient Descent</h3> <p>Without going through all the math (partly because I don’t like equations that much and, I don’t fully grasp them for now 🙈), the weight matrices optimization problem is accomplished through backpropagation dynamics of <strong>adjoint state variables</strong> or the Pontryagin’s minimum principle.</p> <p>At the end, we get a formula for the gradient that resembles the reward-modulated STDP; a presynaptic input is multiplied by a postsynaptic spike activity and a temporal error signal.</p> <hr /> <h2 id="results">Results</h2> <p>This approach is tested on a predictive coding task (i.e matching (reproducing) input-output behavior also auto-encoding task) and delayed-memory XOR task where the XOR operation is performed on the historical input which is stored.</p> <p>The latter task is used to demonstrate the ability of the algorithm to train an SNN performing a non-linear computations over larger timescales (&gt; time constants).</p>