From Cynical to Reasonable

I started that journey 5 years and half ago, I believe? Or was it 6?

Early on, I’ve been caught by the IBM Watson showcase, and, as I was looking for a moon shot project, the conclusion was easy: “I’ll be the one making the next significant step in A.I.”

Well, quickly I figured out those deep neural nets were just statistical classifier. I started to look for a deeper truth: starting from neurobiology, to go erratically between math, philosophy, psychology and back to neurology… Well, it was just wondering around about a deeper truth that I could add to the stack of AI, with or without deep learners.

But math is a selective creation based on deeper truths, neurobiology is just description, psychology is built on moving sand making it hard to distinguish speculation from truth (e.g. Stanford experiment) and assert on their metrics, neurology doesn’t know what to measure and what to account for, they’re still trying to find the most suitable tools, philosophy has been solved by Ludwig von Wittgenstein in a Gödel’s incompleteness fashion.

Well, there are my conclusion: I have no large truth to seek for, there’s no big answer or great principle. Everything is but a bunch of aggregated adaptable functions, with a quasi-deterministic architecture that gives to everyone a close enough similarity that they can develop language and express ideas on common assumptions.

Within this paradigm, we can make an intelligence that understand Humans. The real challenge of AGI is to copy an unknown architecture from a large variety of samples and a large space of possibilities.

Without this paradigm, we can make plenty of new intelligences but with no hope to communicate efficiently with them and, therefore, bend their behavior in a highly useful manneer.


As I’ve seen so many great modern discoveries made by trying to explain the human brain (logic gates, truth tables, automaton, Turing machine, video games UI, etc.), I still took the journey to, myself, produce something great while trying to rationalize the mind in some principles matching my intellectual capacities (a.k.a. chasing the dragon).

I’ve produced many interesting reflections, but no great invention though. I still try to see the peak of this problem, but I obviously cannot grasp it.
Because the solution of this problem is encoded in a higher language, as complex, incompressible and diverse as it can be: this language is the structure and the dynamic of the brain itself.

We’re still at least one evolution away to grasp that problem: so far none can register a full brain architecture as a projection in each functional part of its own brain, and guarantee that those projections are sufficient to describe a brain.

So, yeah, I don’t really feel like solving that problem, even helped with tools like cloud quantum computers, I’m not the one who will ever deliver the solution to AGI and that has never been the point of this blog.
But being a cynical person looking for some nice fish while hunting the mythical one, is really not anymore how I feel I should lose my time.

It finally clipped together, after so many posts talking about a developing platform, I won’t make AGI but I can make the tools to empower great thinkers with ready-to-use simulations; to compare and advert the best results and the people behind those results; to have a wide-integration with today’s electronics; to provide a gamified experience of what AGI development should be from its large possibilities and a sense of wonder; to create connection and community mindset around the AGI question; and, even if we fail to make it, to have something to pass to the next generation and keep the passion burning.


So there it is… I’m gonna put together this blog, and mostly a lot of personal notes, to try to produce this AGI platform. It is at a draft stage, so it’s still gonna take a long time before a first release.
Though the Wright Brothers didn’t make it because they were the most brilliant engineers, but because they had a huge fan and strings to quickly test ideas.


Problem statement – Is AI market like space market ?

I wanted to discuss a bit the product risk iteration path of my lean canvas, both to get more familiar with the tool and to better states some motivations behind it.

A simple analogy of current AI market would be to compare it to the market that put items into orbit.

You have the big professionals that have been there for long, paved the way at the day 0 and are capable of producing large and expensive rockets, though fighting with millions the technical difficulties.

You have the small entrepreneurs; taking a segment of the market, trying to make a specific solution way better and cheaper than the big providers, some might even try to become direct concurrent with them.

Then you have the amateurs: tips and tricks, discussion online; this is really small budget but a lot of passion.

But the analogy breaks apart on a decisive point; the metric is clear when you want to put something into orbit. It is not when you try to make AI.
The way to measure a successful AI is done on standard dataset; the expectations to be fit are therefore the data analyst responsibility.
How an AI could multitask if it doesn’t even have the responsibility of fitting its expectations ?

Data is even more important than that;

What digs the gap between amateurs and professionals in rocket technology is the fuel cost. The more massive your rocket is, the more fuel you need to carry. The more fuel you have to carry, the more massive your rocket becomes. It is therefore really expensive to put rockets in orbit for amateurs.

In AI, data is the fuel. It needs to be diverse, realistic, adapted to given case, capable of encompassing user behavior, labelled (deadly important if you do supervised learning), etc. But, most important of all, the computing power to train in a realistic time over those huge dataset to extract rules general enough.

A good promise for the latest is the trend over transfer learning. It’ll help take networks as complex as alpha go zero, that requires dedicated and expensive hardware to train, and make “low resolution” copies of it.

It’s a bit like, if NASA improves greatly its rockets, amateurs will be able to create cheap almost-as-good copies of them. It’s great for hobbyists, but it doesn’t propel innovation.
Couldn’t we find a way that enables modular and diverse AI? Like embedded spaces as standards that can be spread and connected in diverse ways, a bit like we orchestrate docker containers in modern application.

How could we move from a rocket market to a fish and bread market?

This is quite a haunting question. At first, it seems idiotic seeing the amount of data, expertise, computing power, and so on required to train a useful AI by today standards.

But, unlike rocket science, we can easily build tools that get us a bit closer to orbit. Though, as measure of orbit is fuzzy in AI, so are the tools we use to get there.

It means there are no standard way to put your product out in AI, which seems even harder for people that are out there with a simple high-level business process they’d like to implement and, at some point, requires face detection.

So what about the business prospective? To move away from a rocket market, we need to render large and specialized companies developing AI services obsolete.

One way could be to empower medium companies to become as efficient to provide AI services. AI tools, both community and GAFAM provided, are getting to a point where creating and training the deep neural networks is trivial. Architecture, data analysis, data sets and KPI are much more of a concern today.

This is still a challenge today, and it’s a lost cause to provide such requirements to mainstream users. Another approach is standardized trained tools: like Facebook fastText or Google SyntaxNet Parsey McParseface.

Those are unspecialized steps towards orbit: just like Bootstrap for HTML, Spring for Java, Boost for C++,… it provides you with already trained tools to build on top of.
But could we make it a thing to keep building on top of ? Could we make those tools abstract modules to be used in BPM development ?

In fact, could we make those modules as simple and abstract as they become standard pieces of any development and widen the territory of medium companies and amateurs ?

On my own, I’m also deeply curious about how far we can go with a vector representation and could we build up a new kind of algebra that handles things way more complex than empty set numbers ?


Tentative de défier Wittgenstein et Russell #2

La première approche semble vouée à l’inconsistance et l’échec, mais le discours semble se modéliser sans grande difficulté du côté de la notation SRC/TGT, considérant un objet source et un objet destination dans l’émission d’une communication.

Au final, communiquer à un but, peut-être est-ce sous cette forme qu’il faudrait repenser l’acte de la communication pour mieux l’approcher et l’étudier ?


Exemple détaillant la profondeur de la logique :

TGT = connait la proposition de “boire un verre” & n’a pas encore répondu ~= considère encore sa réponse
SRC = souhaite faire accepter la proposition “boire un verre”
SRC->TGT = considère la proposition de “boire un verre” comme vraie & répond à la question “que prendras-tu?”
“Que prendras-tu à boire lorsqu’on se verra?”

Exemple détaillant la finesse de l’expérience :

TGT = connait des éléments de contexte préalables & considère une interprétation cohérente des propositions elles-mêmes et entre-elles
SRC = souhaite faire accepter un ensemble de propositions définissant un contexte
SRC->TGT = considère les nouvelles propositions dans le contexte comme vraies
“Et chaque tison, mourant isolé, ouvrageait son spectre sur le sol”

Exemple détaillant le niveau de compression :

TGT = connait les pronoms non-dits & considère leur mise en relation
SRC = souhaite faire accepter une proposition dissonante
SRC->TGT = considère la dissonance comme vraie dans la proposition & rit
“Elle l’est!”

Dépend du niveau de culture de la personne (connaissances internes),
de son temps de réflexion (nombres de calculs effectués/focus)
et de son degré d’interprétation (représentations internes)

Exemple détaillant l’usage logique des propositions :

TGT = considère la proposition “Tous les hommes sont mortels” comme vraie & considère la proposition “Socrate est un homme”
SRC = souhaite faire accepter la proposition “Socrate est un homme”
SRC->TGT = considère la proposition “Socrate est un homme” comme vraie & considère la proposition “Socrate est mortel” comme vraie

Conclusion :

Ce modèle semble au-delà des règles de la logique; il agit indépendamment de la logicité des propositions et contextualise simplement le discours comme une transition d’état de la target ayant pour cause la source.
Supposons donc que toute communication a pour objectif la transition d’une cible dans un état donné vers un état souhaité. L’acteur de cette transition étant la source de la communication.
Tout un tas de questions se posent; comment sonder la validation des propositions auprès de la cible ? Comment retranscrire la richesse de l’expérience ? A quel moment intervient le modus ponens dans la modélisation ?..


Tentative de défier Wittgenstein et Russell #1

Des fondements de Wittgenstein sur l’interprétation des propositions de Russell, il vient une injustice à la richesse du langage et des interprétations.

J’ouvre un questionnement personnel sur la façon d’élargir l’approche aux phrases non-propositionnelles; afin de tenter de déduire un rapprochement entre la modélisation du langage et le langage naturel.

Soit un sous-ensemble contenant les 3 formes de phrases suivantes, incluant la proposition au sens atomiste logique;

  • Proposition
  • Ordre
  • Question

Exemple détaillant la profondeur de la logique :

SRC = souhaite faire accepter la proposition “boire un verre”
TGT = connait la proposition de “boire un verre” & n’a pas encore répondu ~= considère encore sa réponse
“Que prendras-tu à boire lorsqu’on se verra?”
=> Faire accepter la proposition “on se verra”
=> Obtenir une réponse plus élémentaire
Combattu par une résistence interne à l’interlocuteur; variante: “On se voit quand ?”
suggestion: possibilité d’absence de pleine conscience lors de la prise d’engagement

Exemple détaillant la finesse de l’expérience :

SRC = souhaite faire accepter un ensemble de propositions définissant un contexte
TGT = connait des éléments de contexte préalables & considère une interprétation cohérente des propositions elles-mêmes et entre-elles
“Et chaque tison, mourant isolé, ouvrageait son spectre sur le sol”
=> Haut degré de complexité pour reproduire tous les non-dits et les interprétations à faire dans cette phrase
=> Demande un langage de haut niveau avec une expérience riche du monde et de ses représentations pour tenter une représentation
Dépend du niveau de culture de la personne (connaissances internes), de son temps de réflexion (nombres de calculs effectués/focus) et de son degré d’interprétation (représentations internes)
PourToutt Tison => mourir.isolé(Tison) || Mourir ~= Destination & Source(Tison) = “feu|cheminée|foyer|..” & Destination(Tison).Voisinage() != Tison => Donc Foyer(Source) ~= Divergent
Tison.Voisinage().Forme() ~= Ouvrage(Son->Spectre) & “Ouvrage ~= Représentation”

Exemple détaillant le niveau de compression :

SRC = souhaite faire accepter une proposition dissonante
TGT = connait les pronoms non-dits & considère leur mise en relation
“Elle l’est!”
=> Résoudre la question du contexte des pronoms
=> Résoudre la proposition après replacement
étrangement lié à l’humour; laisser la personne comprendre une blague
Suggestion: l’humour est un phénomène de préservation de la dépression face à une proposition semblant illogique qu’il faudrait accepter


Tout ne peut être interprété à l’aide des 3 formes suggérées ci-dessus. Mais ce n’est pas par leur ensemble trop petit, leur abondance, mais par leur finesse, leur capacité à déceler de l’information fortement abstraite


Download TensorFlow 0.10.0 for Cuda 8.0 and Ubuntu 16.04

I’d really like to have a look at Parsey McParseface as it seems to have some fuss around this Google’s Natural Language Understanding (NLU) machine based on SyntaxNet.

But, as I’m keeping my packages up-to-date (a good practice for Debian users, maybe not that great on Ubuntu), I had troubles installing TensorFlow 0.10.0rc0 for Cuda 8.0 with gcc 5.4.0 and Python 2.

Why bothering with Cuda 8.0 while the version 7.5 works great from pip, you’re pondering ?
Cuda 7.5 is not available for Ubutun 16.04 64bits, strangely the 8.0 is. Every Xenial users discussions seem to converge to building our own wheel from the source.
After experiencing dependencies troubles, I wanted to spare the time of building it for those of you who have the same system configuration.

  • Be sure you checked my configuration and have the same (Python 2, Cuda 8, Ubuntu 16.04 x86_64)
  • Download the wheel here for TensorFlow 0.10.0rc0
  • # sudo pip install this_wheel_file
  • Get a beer, you deserve it !

(of course, this is just a wheel. You need to follow the previous steps and requirements from the TensorFlow installation guide)

And for the ones interested by Parsey McParseface, check out this awesome video from Sirajology :


SpineRNN – part I – This was clearer in my mind

Last post, I showed you the picture of a vertebra arguing that it got me an idea.
Hardly motivated, I put myself in front of a copy from a nice char-rnn light script I’ve customized that you can download here. You don’t need to speak chinese. If you want to try it as it is, run it with a document as argument and be sure the data starts with ^ in your document or edit the script.


Then I realized quickly “without an explicit model, that’ll be a nightmare”.

So here’s the initial picture. As you might know it, input and output paths in the brain are different. Let’s pretend the brain is a blackbox, well it is, I pictured it as 2 fibers that had to convoluate data along the ascending fiber, being able to deconvolute them back when in our “blackbox”, convoluating the response and transferring it to deconvolute them along the descending fiber (mux/demux behaviour in a fuzzy way). Our input flow being the ascending fiber, obviously. Between them, a simpler node in green, the reflex circuit for the left of this part of the body. That’s where my mind lighten up !


Have you ever made an automaton ?

Today, most of our automatons are Petri Net designed with Moore model. But there are still automatons that rely solely on combination of the inputs without internal state. And that’s the part we are interesting for in a reflex circuit, obviously, having an instant reaction. Here, the problem is finding the right combination, which a vanilla MLP could do. We would then extract the combinatory patterns (static data) and pass the rest for dynamic treatment, what RNN do the best.



So here’s my actual idea about what a SpineRNN could look like. (or neural-fiber ? The way it goes along a given length of a sequence looks a bit like a cable… and could maybe constitute the external memory of a Mealy machine ?)


The general idea is you can map a vertebra to a general sequence feeded machine. Here, we use time for sequence as any sequence is time developped.
We make the hypothesis feedforwarding through the MLP is so quick that it produces a response in T0, the initial duration. During periods, the architecture is locked by the period boundaries represented in wavy gray lines. Every data which reach this boundary waits for a new clock pulse.

The MLP bridge extract the combinatory knowledge, adds it to the prediction and ends up with the output. While the MLP gets better at predicting the part of the outputs that can be determined solely on the intputs, the gradient would supposedly correct the RNN to only predict sequential behaviour and therefore be more accurate.

But we enter sequential data that have different rules according to their length (parenthesis don’t have the same needs as spaces for encoding), we repeat then recursively the MLP bridge for any period, making rules according to length.



Deconvoluate static and dynamic from our data ?

That’s the main question I’m wondering now. Unfortunately, it takes me time to implement it as I’m not yet at my ease with Numpy and Theano. What would the data look like or be about if we extracted all sequential and combinatorial properties ?

I also found this paper on ClockWork-RNN explaining we can chain different periods of fully connected clusters of neurons if the period of the cluster from which the connexion goes is larger than the cluster’s period this connexion points at. That wouldn’t be really useful with SpineRNN as its connexions goes both way but, if it could be applied in a way or another, we could make functions that embbed Fourrier transform structure or it could be interesting for teaching Time Interval Petri Nets.

And, even if everything works great with implementing time-period-combination analysis in a nicer model, how would I generalize limited sample through an infinite domain ? Putting a tan() at the end of my sigmoid ?



Well, time to work ! I said I was going to give a bit more in my previous post, unfortunately the implementation takes a bit longer due to my lacking knowledge of those libraries. I hope, sooner, to be able to test ideas quicker. And I’m also trying to develop a special gate, a bit inspired by GRU, for the MLP bridge. Never done that, not sure what to expect from a reaction/treatment/validation model I’m trying to design.

Maybe some visitors will be able to try the model and fix it quicker than me. If it’s the case, please share your results in the comments. It could be nice to compare and discuss about it.

And if you found my idea terrible, please let me know why in the comment, I’m looking to progress.


(Hope someone enjoyed this first true post ^^ )


First Step

Yup, so there it is…

When I first created this blog, I wanted it to be about the path of a renewed approach towards AI, based on an understanding not as outdated as traditionnal neural networks are. Some people are afraid of their possible outcomes if they gain consciousness. Actually they’re just statistical self-organizing machines. They seem smart because we also use patterns in our own cognition and statistical machines are good at predicting patterns. When Mozart-RNN runs, it generates Mozart-like patterns of notes but you make the work of considering it as music.


That’s not intelligence, isn’t it ?

I spent 4 years between neural networks, neurology, psychology, math and my engineer background obsessing over making sense of our own nervous system. Trying to grip the start of a reflexion, having models and ideas popping and falling at the will of new knowledge, simply hoping to make sense of it. And, actually, it’s fuzzily feasible. Mostly because you cut off alot of “what the brain cannot do” and, though you don’t get how it works and why, it has clear structures.


This blog was hopeless.

I then created this blog to make sense of all those weird things I discovered on my path. I’m a complexity guy, I search it even when it doesn’t need to be there. That’s sometimes absurd to force it but finding complexity is always beautiful in a part and frustrating in another. (that’s the very definition of addictive, isn’t it ?)

And this couldn’t work. I wanted my posts to be a rational presentation of approaches, summarizing other ways to consider human cognition, sharing reflexions from a community and pleading for rethinking current AI… That’s always nice on the paper.


Reality hurts more.

When you want to make a scientific quality blog trying to change a large tech industry considerations, you have to not be me, I guess.

Every post I intended required more and more googling, more and more rethinking until the point I couldn’t finish any.  If you want to introduce our conception of intelligence from ages to right before perceptron invention, it’s madness. As diversity and culture made alot of sometimes seemingly correct sometimes missed predictions about what we consider now scientifically true (aaah Greek wax tablets).
I then gave up, this task is out of reach for a noob like me, I’m not a PhD publisher, just an engineering student with lots of interests and enthusiasm.


So yeah.. Average but passionnate !

I spent 4 years studying different approaches. I concluded not only that my first consideration was out of reach (from technological to knowledge point of views) but AI is more about structure and dynamic if turing(babbage) completness is guaranteed.
It discouraged me a bit but I considered neural networks with a renewed interest. It might just be statistical classifier but they have interesting property and how far can we mimic us with them ?


Because they’re just parody system.
While they maximize the best average solution, they can’t consider logic or abstract thinking yet they define similar patterns to the model. Truly, they have a huge potential as mimicing us, and that’s more interesting than you think.
I considered the idea while reading this awesome blog post about char-rnn. The crazy idea is : don’t engineer a complex expert system. Take a book, grind it in batches and throw it at a RNN. If you tune it right, it’ll make you laugh.


Then I was interested back

That’s now 3 days my computer is reading The Law from Bastiat. I’m pretty sure, if you don’t know french, it will seem bearable. (if you know french…. désolé)

Qu’il aurait pessont, consodrel n’y exe térante de le reconne de paris, de la Liberté, qui voy noy notsour. Il faudrait dans les chases qu’entoit dépesse et le proivse tout visemant écaitens l’agarite, la Liberté, le procistrun se fiater le vousse, à companer son par la Fartes, de constitnans de silagront le cempunsit pour peupet, le demandance, la destrime dégendaniere dénliment léglisation ? Que, ce se cradu-tous per les guerr des letisentenes, des maisses des laissent, leurs tomonilues. (Bastiat-RNN)

Done on a GRU-RNN;

  • Input: 86
  • Hidden: 1000
  • Output: 500
  • Iteration: 850
  • Average loss: 43.38


So it mimics french and what ?

That’s sort of frustrating. It understands some elements of syntax and punctuations but what it learns is so short-term or approximate that he doesn’t even get the suffix of nouns. It just have an approximate idea of what a french word and syntax could look like. I could decrease the hidden layers and run it a bit longer but even the best tuned RNN don’t get logic and common sense.

That’s why I consider them parody machines and got a bit disappointed about what I could expect from those. Maybe with a Petri Net above on a top-down architecture ? I wonder what sort of fuzzy network that would make. But how’s that even useful ?
The thesis approaches as I’m entering in my final year and this seems a bit goofy to put out and I was still considering all this useless knowledge, those nights working on basically nothing, just trying to get a grip on a subject way above me.


But I finally started this blog !

Yesterday, while the char-rnn was still running and I was still moody, I watched unconsciously a note from a picture I found some days ago.
Then I considered it in regards of RNN and… maybe it inspired me a nice idea !


A post that long and nothing ?

Not exactly nothing, I needed to introduce me, this blog, why I’m writing in it now, the weird self-taught path I can bring to the discussions and make an engagement because, this time, I might have something feasible and useful I’d like to concretize.


There’s also a thing that intrigues me hugely in this model stucked between my two ears and that’s a nice opportunity to play the scientist outside my student role, in real ground.

I hope to share it with you in the next blog post and I’d be glad to get your feedbacks to build a deeper and fruitful reflexion during the life of this blog.

So… Welcome aboard 🙂