Brain Farming, Thoughts

Proposition for a Platform for Embedded Spaces

This is the initial version of a white paper I’m working on for developing a platform that distributes trained neural networks and grow a new use for them. The idea is that the right approach to make a functional use is to grow traditional symbolic logic on top of the spaces produced by deep connectionist logic. That way, we can express business cases from symbolic logic down to connectionist representations of its states and values.

 

White Paper – a Platform for Embedded Spaces

A proposition to build standards on top of deep neural nets and empower AI architects towards a new wave

An inquiry into the 3rd wave

From the business perspective, there is a first wave, made of expert systems, and a second wave, made of deep learning networks, the rest are considered technologies immature for market, that bubble in the head of AI scientists.

Expert systems were mostly built from huge amount of complex code heavily documented and aimed at reproducing what humans built from experience in a given topic.

Deep learning algorithms are blackboxes that require heavy amount of data carefully prepared. It made a huge leap by ditching complexity over a connectionist approach

The first type is costly to produce, require complex expertise, long waterfall development, is not easy to rewrite and doesn’t easily fit modern development practices : microservices, devops, lean,…

The second cannot be easily implemented by companies ; the system is simple but great results require complexity from features optimization and training, which tend to market deep neural networks as services provided by few specialized companies.

And, while the second wave is a nice improvement over the first one and helped get results into new AI topics, it doesn’t cover all the topics expert systems can treat, like interpretation tasks in natural language processing.

Getting the best of both waves from the business perspective would be to have an AI :

– That can abstract code complexity through connectionnist approach

– That keeps a transparent and modifiable architecture based on symbolic approach

Which is already a difficult task as the the connectionnist approach is pretty random and hard to make sense of (blackbox) while we expect something readible based on intelligible symbols (transparent) to be the access door to the business logic implemented.

On the technical perspective, we can get the following lessons : for the first wave, we need loosely coupled architecture based on interchangeable modules and the ability to spread workload. But there’s no magic solution to reduce business logic complexity when input and behavior require a lot of nuances and tests.
For the second wave, the trained AI market from amateur to small companies is null. The generalization of those networks is poor. But a huge variety of network topologies exist, and a lot of different frameworks to implement them, which makes this wave extremely prolific, but also hard to encompass. Scientific publications also keep booming towards this modern gold rush. But we still lack an obvious ingredient ; even if we solve the training problem, which is not a realistic statement, the absence of standards block us from any communication between machines without a custom semantic interpreter. We ignore the training data sets, the results, the specificities of the trained network and many important information to weight in which trained net is the best to solve a developer issue.

That looks like a dead-end for seeing a spread of trained neural networks in the second wave.

Our proposition

Embedded spaces seem to be the key towards a better communication between the symbolic and the connectionnist logics.

They link together words, styles or items with vectors, like Word2Vec or Style2Vec approaches, and those spaces represent of their own (like a namespace). Though their ability to encode and decode a symbolic value allow them to cast a representation in multiple spaces and find a projection in-between (like multiple implementations of Word2Vec) which lessen the needs for semantic or ontology.

Therefore, we would like to propose a platform, just like a package manager, that will allow data scientists and AI developers to broadcast freely, or eventually to buy and sell, their trained spaces as common langages.

Those will run from containers, as we need a standard approach to encompass the plurality of deep neural nets, and its orchestration can be done on local servers from a modern solution like Kubernetes or OpenShift that will handle the workload and the microservices approach.

From all those spaces, that are actually trained neural nets well defined in a common register, we can establish relationships based on symbolic ; just like a word has different meaning in different contexts. Those contexts being the multiple spaces that can apply consistently in response of an input.

Those spaces are loaded in memory as we need them, balanced by an orchestrator, and that’s how we will use them : as the background of a working memory for machines.
On those spaces, will draw few but rich representation of data to encapsulate all the details related to the given information in its interpretation by the system. Then the working memory unloads its results to standard storage memories or user interfaces.

The point of this approach is, for an app data flow, to have a node where every known and relevant information is available to make the best out of a new information (adjusting interpretation or knowledge base, for instance). To do so, trees, stored on a bus, are drawn on top of those spaces. Those can represent the current world knowledge of an agent, the flow of a press article, the behavior of an individual on cam, etc. Scaling down those trees to patterns, allow the system to pass compressed knowledge or, the other way around, to determine prediction trees from initial patterns and conditions.

Those spaces allow standardization but also nuances to the interpretation, as we transit from programming discrete enum values, for defining states, to programming points on continuous spaces. This embbeds more information than enum but it also produces more possibilities.

Like the well-know Word2Vec result : King – Male + Female = Queen

Designing a langage to use embedded spaces

Going further than those {+,-} operators on vectors, embedded spaces could be further developed to support more subtle operators, like union, intersection, exclusion, or more complex operators like integrals and derivatives. Sets of points could also have different complexities, as their relationship with the symbolic values can be, or not, transitive, reflexive, symmetric, generative, etc.

On another approach, Style2Vec shows us we can embbed different features from the same symbols. Instead of making a space that embbeds shoes and dresses together, we can have a space that embbeds dresses with shoes that match them well.

This bring context nuances : should I group them by style or by function ? Meaning I can use either a space where dresses and shoes go apart, or a space where shoes and dresses get closer as they match better.

At this point, it’s interesting to consider the different use cases. The current consideration is of one agent that centralizes a lot of knowledge while running this as threads of containers managed from Kubernetes. Its large but short memory will allow it to grasp complex knowledge and simulates continuity through complex tasks ; looking consistent to the user. On this continuity, it is expected to plug higher and higher logic schemes to grow a hierarchy of interpretations.

I’m not sure yet what could be the business cases, beside extended current system capabilities, but I’d like to explore it as a bot assembling project. Getting higher behavior, like empathy from casting sentiment analysis in an « emotion space », and developing higher cognition into trees of interconnected spaces that match the multiple contexts from a press article. Most importantly, I’d like to define how it should interact with an user interface, a service layer, a database and a knowledge base.

But, at the end, this will require a common language to make those modules express what they can do, another language to interoperate those modules, and probably even others for the behavior of containers management or BPM scheduling. I still need to grasp a lot from that topic and reduce its apparent complexity, so it’s an ongoing topic that’ll evolve through versioning.

The platform to distribute those embedded spaces could be financed from a percent over purchasable spaces. Docker platform could host the containers for now, and extra information such as licenses and standard api descriptions, as well as the docker container url, should be provided through the platform register.

1 thought on “Proposition for a Platform for Embedded Spaces”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s