Last post, I showed you the picture of a vertebra arguing that it got me an idea.
Hardly motivated, I put myself in front of a copy from a nice char-rnn light script I’ve customized that you can download here. You don’t need to speak chinese. If you want to try it as it is, run it with a document as argument and be sure the data starts with ^ in your document or edit the script.
Then I realized quickly “without an explicit model, that’ll be a nightmare”.
So here’s the initial picture. As you might know it, input and output paths in the brain are different. Let’s pretend the brain is a blackbox, well it is, I pictured it as 2 fibers that had to convoluate data along the ascending fiber, being able to deconvolute them back when in our “blackbox”, convoluating the response and transferring it to deconvolute them along the descending fiber (mux/demux behaviour in a fuzzy way). Our input flow being the ascending fiber, obviously. Between them, a simpler node in green, the reflex circuit for the left of this part of the body. That’s where my mind lighten up !
Have you ever made an automaton ?
Today, most of our automatons are Petri Net designed with Moore model. But there are still automatons that rely solely on combination of the inputs without internal state. And that’s the part we are interesting for in a reflex circuit, obviously, having an instant reaction. Here, the problem is finding the right combination, which a vanilla MLP could do. We would then extract the combinatory patterns (static data) and pass the rest for dynamic treatment, what RNN do the best.
So here’s my actual idea about what a SpineRNN could look like. (or neural-fiber ? The way it goes along a given length of a sequence looks a bit like a cable… and could maybe constitute the external memory of a Mealy machine ?)
The general idea is you can map a vertebra to a general sequence feeded machine. Here, we use time for sequence as any sequence is time developped.
We make the hypothesis feedforwarding through the MLP is so quick that it produces a response in T0, the initial duration. During periods, the architecture is locked by the period boundaries represented in wavy gray lines. Every data which reach this boundary waits for a new clock pulse.
The MLP bridge extract the combinatory knowledge, adds it to the prediction and ends up with the output. While the MLP gets better at predicting the part of the outputs that can be determined solely on the intputs, the gradient would supposedly correct the RNN to only predict sequential behaviour and therefore be more accurate.
But we enter sequential data that have different rules according to their length (parenthesis don’t have the same needs as spaces for encoding), we repeat then recursively the MLP bridge for any period, making rules according to length.
Deconvoluate static and dynamic from our data ?
That’s the main question I’m wondering now. Unfortunately, it takes me time to implement it as I’m not yet at my ease with Numpy and Theano. What would the data look like or be about if we extracted all sequential and combinatorial properties ?
I also found this paper on ClockWork-RNN explaining we can chain different periods of fully connected clusters of neurons if the period of the cluster from which the connexion goes is larger than the cluster’s period this connexion points at. That wouldn’t be really useful with SpineRNN as its connexions goes both way but, if it could be applied in a way or another, we could make functions that embbed Fourrier transform structure or it could be interesting for teaching Time Interval Petri Nets.
And, even if everything works great with implementing time-period-combination analysis in a nicer model, how would I generalize limited sample through an infinite domain ? Putting a tan() at the end of my sigmoid ?
Well, time to work ! I said I was going to give a bit more in my previous post, unfortunately the implementation takes a bit longer due to my lacking knowledge of those libraries. I hope, sooner, to be able to test ideas quicker. And I’m also trying to develop a special gate, a bit inspired by GRU, for the MLP bridge. Never done that, not sure what to expect from a reaction/treatment/validation model I’m trying to design.
Maybe some visitors will be able to try the model and fix it quicker than me. If it’s the case, please share your results in the comments. It could be nice to compare and discuss about it.
And if you found my idea terrible, please let me know why in the comment, I’m looking to progress.
(Hope someone enjoyed this first true post ^^ )