SpineRNN – part I – This was clearer in my mind

Last post, I showed you the picture of a vertebra arguing that it got me an idea.
Hardly motivated, I put myself in front of a copy from a nice char-rnn light script I’ve customized that you can download here. You don’t need to speak chinese. If you want to try it as it is, run it with a document as argument and be sure the data starts with ^ in your document or edit the script.


Then I realized quickly “without an explicit model, that’ll be a nightmare”.

So here’s the initial picture. As you might know it, input and output paths in the brain are different. Let’s pretend the brain is a blackbox, well it is, I pictured it as 2 fibers that had to convoluate data along the ascending fiber, being able to deconvolute them back when in our “blackbox”, convoluating the response and transferring it to deconvolute them along the descending fiber (mux/demux behaviour in a fuzzy way). Our input flow being the ascending fiber, obviously. Between them, a simpler node in green, the reflex circuit for the left of this part of the body. That’s where my mind lighten up !


Have you ever made an automaton ?

Today, most of our automatons are Petri Net designed with Moore model. But there are still automatons that rely solely on combination of the inputs without internal state. And that’s the part we are interesting for in a reflex circuit, obviously, having an instant reaction. Here, the problem is finding the right combination, which a vanilla MLP could do. We would then extract the combinatory patterns (static data) and pass the rest for dynamic treatment, what RNN do the best.



So here’s my actual idea about what a SpineRNN could look like. (or neural-fiber ? The way it goes along a given length of a sequence looks a bit like a cable… and could maybe constitute the external memory of a Mealy machine ?)


The general idea is you can map a vertebra to a general sequence feeded machine. Here, we use time for sequence as any sequence is time developped.
We make the hypothesis feedforwarding through the MLP is so quick that it produces a response in T0, the initial duration. During periods, the architecture is locked by the period boundaries represented in wavy gray lines. Every data which reach this boundary waits for a new clock pulse.

The MLP bridge extract the combinatory knowledge, adds it to the prediction and ends up with the output. While the MLP gets better at predicting the part of the outputs that can be determined solely on the intputs, the gradient would supposedly correct the RNN to only predict sequential behaviour and therefore be more accurate.

But we enter sequential data that have different rules according to their length (parenthesis don’t have the same needs as spaces for encoding), we repeat then recursively the MLP bridge for any period, making rules according to length.



Deconvoluate static and dynamic from our data ?

That’s the main question I’m wondering now. Unfortunately, it takes me time to implement it as I’m not yet at my ease with Numpy and Theano. What would the data look like or be about if we extracted all sequential and combinatorial properties ?

I also found this paper on ClockWork-RNN explaining we can chain different periods of fully connected clusters of neurons if the period of the cluster from which the connexion goes is larger than the cluster’s period this connexion points at. That wouldn’t be really useful with SpineRNN as its connexions goes both way but, if it could be applied in a way or another, we could make functions that embbed Fourrier transform structure or it could be interesting for teaching Time Interval Petri Nets.

And, even if everything works great with implementing time-period-combination analysis in a nicer model, how would I generalize limited sample through an infinite domain ? Putting a tan() at the end of my sigmoid ?



Well, time to work ! I said I was going to give a bit more in my previous post, unfortunately the implementation takes a bit longer due to my lacking knowledge of those libraries. I hope, sooner, to be able to test ideas quicker. And I’m also trying to develop a special gate, a bit inspired by GRU, for the MLP bridge. Never done that, not sure what to expect from a reaction/treatment/validation model I’m trying to design.

Maybe some visitors will be able to try the model and fix it quicker than me. If it’s the case, please share your results in the comments. It could be nice to compare and discuss about it.

And if you found my idea terrible, please let me know why in the comment, I’m looking to progress.


(Hope someone enjoyed this first true post ^^ )


First Step

Yup, so there it is…

When I first created this blog, I wanted it to be about the path of a renewed approach towards AI, based on an understanding not as outdated as traditionnal neural networks are. Some people are afraid of their possible outcomes if they gain consciousness. Actually they’re just statistical self-organizing machines. They seem smart because we also use patterns in our own cognition and statistical machines are good at predicting patterns. When Mozart-RNN runs, it generates Mozart-like patterns of notes but you make the work of considering it as music.


That’s not intelligence, isn’t it ?

I spent 4 years between neural networks, neurology, psychology, math and my engineer background obsessing over making sense of our own nervous system. Trying to grip the start of a reflexion, having models and ideas popping and falling at the will of new knowledge, simply hoping to make sense of it. And, actually, it’s fuzzily feasible. Mostly because you cut off alot of “what the brain cannot do” and, though you don’t get how it works and why, it has clear structures.


This blog was hopeless.

I then created this blog to make sense of all those weird things I discovered on my path. I’m a complexity guy, I search it even when it doesn’t need to be there. That’s sometimes absurd to force it but finding complexity is always beautiful in a part and frustrating in another. (that’s the very definition of addictive, isn’t it ?)

And this couldn’t work. I wanted my posts to be a rational presentation of approaches, summarizing other ways to consider human cognition, sharing reflexions from a community and pleading for rethinking current AI… That’s always nice on the paper.


Reality hurts more.

When you want to make a scientific quality blog trying to change a large tech industry considerations, you have to not be me, I guess.

Every post I intended required more and more googling, more and more rethinking until the point I couldn’t finish any.  If you want to introduce our conception of intelligence from ages to right before perceptron invention, it’s madness. As diversity and culture made alot of sometimes seemingly correct sometimes missed predictions about what we consider now scientifically true (aaah Greek wax tablets).
I then gave up, this task is out of reach for a noob like me, I’m not a PhD publisher, just an engineering student with lots of interests and enthusiasm.


So yeah.. Average but passionnate !

I spent 4 years studying different approaches. I concluded not only that my first consideration was out of reach (from technological to knowledge point of views) but AI is more about structure and dynamic if turing(babbage) completness is guaranteed.
It discouraged me a bit but I considered neural networks with a renewed interest. It might just be statistical classifier but they have interesting property and how far can we mimic us with them ?


Because they’re just parody system.
While they maximize the best average solution, they can’t consider logic or abstract thinking yet they define similar patterns to the model. Truly, they have a huge potential as mimicing us, and that’s more interesting than you think.
I considered the idea while reading this awesome blog post about char-rnn. The crazy idea is : don’t engineer a complex expert system. Take a book, grind it in batches and throw it at a RNN. If you tune it right, it’ll make you laugh.


Then I was interested back

That’s now 3 days my computer is reading The Law from Bastiat. I’m pretty sure, if you don’t know french, it will seem bearable. (if you know french…. désolé)

Qu’il aurait pessont, consodrel n’y exe térante de le reconne de paris, de la Liberté, qui voy noy notsour. Il faudrait dans les chases qu’entoit dépesse et le proivse tout visemant écaitens l’agarite, la Liberté, le procistrun se fiater le vousse, à companer son par la Fartes, de constitnans de silagront le cempunsit pour peupet, le demandance, la destrime dégendaniere dénliment léglisation ? Que, ce se cradu-tous per les guerr des letisentenes, des maisses des laissent, leurs tomonilues. (Bastiat-RNN)

Done on a GRU-RNN;

  • Input: 86
  • Hidden: 1000
  • Output: 500
  • Iteration: 850
  • Average loss: 43.38


So it mimics french and what ?

That’s sort of frustrating. It understands some elements of syntax and punctuations but what it learns is so short-term or approximate that he doesn’t even get the suffix of nouns. It just have an approximate idea of what a french word and syntax could look like. I could decrease the hidden layers and run it a bit longer but even the best tuned RNN don’t get logic and common sense.

That’s why I consider them parody machines and got a bit disappointed about what I could expect from those. Maybe with a Petri Net above on a top-down architecture ? I wonder what sort of fuzzy network that would make. But how’s that even useful ?
The thesis approaches as I’m entering in my final year and this seems a bit goofy to put out and I was still considering all this useless knowledge, those nights working on basically nothing, just trying to get a grip on a subject way above me.


But I finally started this blog !

Yesterday, while the char-rnn was still running and I was still moody, I watched unconsciously a note from a picture I found some days ago.
Then I considered it in regards of RNN and… maybe it inspired me a nice idea !


A post that long and nothing ?

Not exactly nothing, I needed to introduce me, this blog, why I’m writing in it now, the weird self-taught path I can bring to the discussions and make an engagement because, this time, I might have something feasible and useful I’d like to concretize.


There’s also a thing that intrigues me hugely in this model stucked between my two ears and that’s a nice opportunity to play the scientist outside my student role, in real ground.

I hope to share it with you in the next blog post and I’d be glad to get your feedbacks to build a deeper and fruitful reflexion during the life of this blog.

So… Welcome aboard 🙂