Highlight

Những điều thú vị khi dùng Trí tuệ nhân tạo của Viettel

Những người dùng Internet tại Việt Nam thường lấy “chị Google” ra để… giải trí. Khi “chị” đọc văn bản hay chỉ đường cho người tham gia gi...

Saturday, September 17, 2016

Artificial intelligence could transform healthcare, but we need to accept it first

A demonstrator sits in an electronic wheelchair as she talks with a doctor, seen on the monitor of Panasonic's HOSPI-Rimo, at its presentation in Tokyo October 4, 2011. The communications assistant robot, which has automatic movement and visual communication functions, enable conversations between people who are in separate places, such as when a doctor is in hospital and a patient at home. According to Panasonic, the robot can help people with limited mobility. REUTERS/Kim Kyung-Hoon (JAPAN - Tags: SCIENCE TECHNOLOGY SOCIETY BUSINESS HEALTH)
Scientists in Japan reportedly saved a woman’s life by applying artificial intelligence to help them diagnose a rare form of cancer. Faced with a 60-year-old woman whose cancer diagnosis was unresponsive to treatment, they supplied an AI system with huge amounts of clinical cancer case data, and itdiagnosed the rare leukemia that had stumped the clinicians in just ten minutes.
The Watson AI system from IBM matched the patient’s symptoms against 20m clinical oncology studies uploaded by a team headed by Arinobu Tojo at the University of Tokyo’s Institute of Medical Science that included symptoms, treatment and response. The Memorial Sloan Kettering Cancer Center in New York has carried out similar work, where teams of clinicians and data analysts trained Watson’s machine learning capabilities with oncological data in order to focus its predictive and analytic capabilities on diagnosing cancers.
IBM Watson first became famous when it won the US television game show Jeopardy in 2011. And IBM’s previous generation AI, Deep Blue, became the first AI to best a world champion at chess when it beat Garry Kasparov in a game in 1996 and the entire match when they met again the following year. From a perspective of technological determinism, it may seem inevitable that AI has moved from chess to cancer in 20 years. Of course, it has taken a lot of hard work to get it there.
 Artificial intelligence landscape: global quarterly financing history
Image: CB Insights
But efforts to use artificial intelligence, machine learning and big data in healthcare contexts have not been uncontroversial. On the one hand there is wild enthusiasm – lives saved by data, new medical breakthroughs, and a world of personalised medicine tailored to meet our needs by deep learning algorithms fed by smartphones and FitBit wearables. On the other there’s considerable scepticism – a lack of trust in machines, the importance of individuals over statistics, privacy concerns over patient records and medical confidentiality, and generalised fears of a Brave New World. Too often the debate dissolves into anecdote rather than science, or focuses on the breakthrough rather than the hard slog that led to it. Of course the reality will be somewhere in the middle.
In fact, it may surprise you to learn that the world’s first computerised clinical decision-support system, AAPhelp, was developed in the UK way back in 1972 by Tim De Dombal and one of my colleagues, Susan Clamp.
This early precursor to the genius AI of today used a naive Bayesian algorithm to compute the likely cause of acute abdominal pain based on patient symptoms. Feeding the system with more symptoms and diagnosis helped it to become more accurate over time and, by 1974, De Dombal’s team had trained the system to the point where it was more accurate at diagnosis than junior doctors, and almost as accurate as the most senior consultants. It took AAPhelp overnight to give a diagnosis, but this was on 1970s computer hardware.
The bad news is that 40 years on, AAPhelp is still not in routine use.
This is the reality check for the most ardent advocates of applying technology to healthcare: to get technology such as predictive AIs into clinical settings where they can save lives means tackling all those negative connotations and fears. AI challenges people and their attitudes: the professionals that the machine can outperform, and the patients that are reduced to statistical probabilities to be fed into complex algorithms. Innovation in healthcare can take decades.
Nevertheless, while decades apart both AAPHelp and IBM Watson’s achievements demonstrate that computers can save lives. But the use of big data in healthcare implies that patient records, healthcare statistics, and all manner of other personal details might be used by researchers to train the AIs to make diagnoses. People are increasingly sensitive to the way personal data is used and, quite rightly, expect the highest standards of ethics, governance, privacy and security to be applied. The revelations that one NHS trust had givenaccess to 1.6m identiable patient records to Google’s DeepMind AI laboratorydidn’t go down well when reported a few months ago.
The hard slog is not creating the algorithms, but the patience and determination required to conduct careful work within the restrictions of applying the highest standards of data protection and scientific rigour. At the University of Leed’sInstitute for Data Analytics we recently used IBM Watson Content Analytics software to analyse 50m pathology and radiology reports from the UK. Recognising the sensitivities, we brought IBM Watson to the data rather than passing the data to IBM.
Using natural language processing of the text reports we double-checked diagnoses such as brain metastases, HER-2-positive breast cancers and renal hydronephrosis (swollen kidneys) with accuracy rates already over 90%. Over the next two years we’ll be developing these methods in order to embed these machine learning techniques into routine clinical care, at a scale that benefits the whole of the NHS.
While we’ve had £12m investment for our facilities and the work we’re doing, we’re not claiming to have saved lives yet. The hard battle is first to win hearts and minds – and on that front there’s still a lot more work to be done.

Written by
Owen A JohnsonSenior Fellow , University of Leeds

Now showing: the world’s first cognitive movie trailer


IBM and Watson went to Hollywood to help create the trailer for 20th Century Fox’s new suspense/horror film Morgan. But first Watson needed to understand what scares us. So Watson watched over 100 horror movie trailers and analyzed their imagery, audio tracks and scene composition to determine their underlying emotions and understand what elements make up a good trailer. Watson then watched Morgan in its entirety and selected the 10 best moments of the film for the trailer. IBM filmmakers edited these moments together to create a cohesive narrative and a finished trailer in just 24 hours. Go ahead and watch it now, if you dare.

Watch the cognitive movie trailer →

Nvidia: Der Kampf um die künstliche Intelligenz

Der Chiphersteller Nvidia hat Anfang der Woche seine neuen Chips Tesla P4 und P40 vorgestellt. Damit will einer der führenden Hersteller im Gaming-Bereich auch im Markt für künstliche Intelligenz seine Stellung sichern. Aber die Konkurrenz schläft nicht: Auch Intel hat einen neuen Chip angekündigt.
Die kleinen Supercomputer werden immer wichtiger in unserem Leben. Ob Handys, Suchmaschinen oder, nicht zu vergessen, Autos. Sie gehören zur Sparte „Artificial Intellegence“ (AI) – also künstliche Intelligenz – und helfen zum Beispiel beim autonomen Fahren sekundenschnelle Entscheidungen zu treffen.
Nvidia oder Intel – Wer hat die bessere Ausgangslage?
Beide Unternehmen haben eine unterschiedliche Herangehensweise um AI-Systeme zu entwickeln. Nvidias Chips vollziehen viele kleine Operationen simultan während Intels Chips zwar weniger Operationen parallel ausführen können, aber dafür für eine universellere Datenverarbeitung geeignet sind. Schaffen es beide Hersteller eine ähnliche Lösung anzubieten werden sie sich in erster Linie am Energieverbrauch messen müssen. Nvidias Tesla P4 Chip, der für Server in großen Rechenzentren verwendet werden kann, soll nach eigenen Angaben 40-mal effizienter sein als Server-Chips von Intel. Jetzt muss Intel nachlegen.
Daneben hat der Tesla-Partner Nvidia auch das kleine und effiziente System Drive PX 2 vorgestellt, das für autonomes Fahren genutzt werden kann. Der Drive PX 2 ist nur halb so groß wie das Vorgängermodell und soll im selbstfahrenden Fahrzeugsystem des chinesischen Internetgiganten Baidu verbaut werden.
Kaufen
Die Konkurrenz im AI-Bereich ist groß: Intel konnte sich zum Beispiel die Partnerschaft mit BMW sichern. Trotzdem stehen die Chancen gut, dass sich Nvidias Marktanteil durch die neuen Systeme im Bereich künstliche Intelligenz erhöht.
Die Aktie von Nvidia hat sich am letzten Freitag von ihrem Allzeithoch entfernt, konnte sich aber über der wichtigen Marke von 60 Dollar halten. Von den Neuvorstellungen zeigte sie sich weitgehend unbeeindruckt und hat seit Wochenbeginn nur knapp ein Prozent dazugewonnen. Die Aktie ist mit einem 2017er KGV von 27 zwar relativ teuer, ist aber auch seit Jahresbeginn bereits 270 Prozent gestiegen. Sie dürfte sich weiter volatil entwickeln, doch wer einen engen Stopp setzt kann schwache Tage zum Kauf nutzen.

The Neural Network Zoo

With new neural network architectures popping up every now and then, it’s hard to keep track of them all. Knowing all the abbreviations being thrown around (DCIGN, BiLSTM, DCGAN, anyone?) can be a bit overwhelming at first.
So I decided to compose a cheat sheet containing many of those architectures. Most of these are neural networks, some are completely different beasts. Though all of these architectures are presented as novel and unique, when I drew the node structures… their underlying relations started to make more sense.
neuralnetworks
One problem with drawing them as node maps: it doesn’t really show how they’re used. For example, variational autoencoders (VAE) may look just like autoencoders (AE), but the training process is actually quite different. The use-cases for trained networks differ even more, because VAEs are generators, where you insert noise to get a new sample. AEs, simply map whatever they get as input to the closest training sample they “remember”. I should add that this overview is in no way clarifying how each of the different node types work internally (but that’s a topic for another day).
It should be noted that while most of the abbreviations used are generally accepted, not all of them are. RNNs sometimes refer to recursive neural networks, but most of the time they refer to recurrent neural networks. That’s not the end of it though, in many places you’ll find RNN used as placeholder for any recurrent architecture, including LSTMs, GRUs and even the bidirectional variants. AEs suffer from a similar problem from time to time, where VAEs and DAEs and the like are called simply AEs. Many abbreviations also vary in the amount of “N”s to add at the end, because you could call it a convolutional neural network but also simply a convolutional network (resulting in CNN or CN).
Composing a complete list is practically impossible, as new architectures are invented all the time. Even if published it can still be quite challenging to find them even if you’re looking for them, or sometimes you just overlook some. So while this list may provide you with some insights into the world of AI, please, by no means take this list for being comprehensive; especially if you read this post long after it was written.
For each of the architectures depicted in the picture, I wrote a very, very brief description. You may find some of these to be useful if you’re quite familiar with some architectures, but you aren’t familiar with a particular one.
Feed forward neural networks (FF or FFNN) and perceptrons (P) are very straight forward, they feed information from the front to the back (input and output, respectively). Neural networks are often described as having layers, where each layer consists of either input, hidden or output cells in parallel. A layer alone never has connections and in general two adjacent layers are fully connected (every neuron form one layer to every neuron to another layer). The simplest somewhat practical network has two input cells and one output cell, which can be used to model logic gates. One usually trains FFNNs through back-propagation, giving the network paired datasets of “what goes in” and “what we want to have coming out”. This is called supervised learning, as opposed to unsupervised learning where we only give it input and let the network fill in the blanks. The error being back-propagated is often some variation of the difference between the input and the output (like MSE or just the linear difference). Given that the network has enough hidden neurons, it can theoretically always model the relationship between the input and output. Practically their use is a lot more limited but they are popularly combined with other networks to form new networks.
Radial basis function (RBF) networks are FFNNs with radial basis functions as activation functions. There’s nothing more to it. Doesn’t mean they don’t have their uses, but most FFNNs with other activation functions don’t get their own name. This mostly has to do with inventing them at the right time.
Hopfield network (HN) is a network where every neuron is connected to every other neuron; it is a completely entangled plate of spaghetti as even all the nodes function as everything. Each node is input before training, then hidden during training and output afterwards. The networks are trained by setting the value of the neurons to the desired pattern after which the weights can be computed. The weights do not change after this. Once trained for one or more patterns, the network will always converge to one of the learned patterns because the network is only stable in those states. Note that it does not always conform to the desired state (it’s not a magic black box sadly). It stabilises in part due to the total “energy” or “temperature” of the network being reduced incrementally during training. Each neuron has an activation threshold which scales to this temperature, which if surpassed by summing the input causes the neuron to take the form of one of two states (usually -1 or 1, sometimes 0 or 1). Updating the network can be done synchronously or more commonly one by one. If updated one by one, a fair random sequence is created to organise which cells update in what order (fair random being all options (n) occurring exactly once every n items). This is so you can tell when the network is stable (done converging), once every cell has been updated and none of them changed, the network is stable (annealed). These networks are often called associative memory because the converge to the most similar state as the input; if humans see half a table we can image the other half, this network will converge to a table if presented with half noise and half a table.
Markov chains (MC or discrete time Markov Chain, DTMC) are kind of the predecessors to BMs and HNs. They can be understood as follows: from this node where I am now, what are the odds of me going to any of my neighbouring nodes? They are memoryless (i.e. Markov Property) which means that every state you end up in depends completely on the previous state. While not really a neural network, they do resemble neural networks and form the theoretical basis for BMs and HNs. MC aren’t always considered neural networks, as goes for BMs, RBMs and HNs. Markov chains aren’t always fully connected either.
Boltzmann machines (BM) are a lot like HNs, but: some neurons are marked as input neurons and others remain “hidden”. The input neurons become output neurons at the end of a full network update. It starts with random weights and learns through back-propagation, or more recently through contrastive divergence (a Markov chain is used to determine the gradients between two informational gains). Compared to a HN, the neurons mostly have binary activation patterns. As hinted by being trained by MCs, BMs are stochastic networks. The training and running process of a BM is fairly similar to a HN: one sets the input neurons to certain clamped values after which the network is set free (it doesn’t get a sock). While free the cells can get any value and we repetitively go back and forth between the input and hidden neurons. The activation is controlled by a global temperature value, which if lowered lowers the energy of the cells. This lower energy causes their activation patterns to stabilise. The network reaches an equilibrium given the right temperature.
Restricted Boltzmann machines (RBM) are remarkably similar to BMs (surprise) and therefore also similar to HNs. The biggest difference between BMs and RBMs is that RBMs are a better usable because they are more restricted. They don’t trigger-happily connect every neuron to every other neuron but only connect every different group of neurons to every other group, so no input neurons are directly connected to other input neurons and no hidden to hidden connections are made either. RBMs can be trained like FFNNs with a twist: instead of passing data forward and then back-propagating, you forward pass the data and then backward pass the data (back to the first layer). After that you train with forward-and-back-propagation.
Autoencoders (AE) are somewhat similar to FFNNs as AEs are more like a different use of FFNNs than a fundamentally different architecture. The basic idea behind autoencoders is to encode information (as in compress, not encrypt) automatically, hence the name. The entire network always resembles an hourglass like shape, with smaller hidden layers than the input and output layers. AEs are also always symmetrical around the middle layer(s) (one or two depending on an even or odd amount of layers). The smallest layer(s) is|are almost always in the middle, the place where the information is most compressed (the chokepoint of the network). Everything up to the middle is called the encoding part, everything after the middle the decoding and the middle (surprise) the code. One can train them using backpropagation by feeding input and setting the error to be the difference between the input and what came out. AEs can be built symmetrically when it comes to weights as well, so the encoding weights are the same as the decoding weights.
Sparse autoencoders (SAE) are in a way the opposite of AEs. Instead of teaching a network to represent a bunch of information in less “space” or nodes, we try to encode information in more space. So instead of the network converging in the middle and then expanding back to the input size, we blow up the middle. These types of networks can be used to extract many small features from a dataset. If one were to train a SAE the same way as an AE, you would in almost all cases end up with a pretty useless identity network (as in what comes in is what comes out, without any transformation or decomposition). To prevent this, instead of feeding back the input, we feed back the input plus a sparsity driver. This sparsity driver can take the form of a threshold filter, where only a certain error is passed back and trained, the other error will be “irrelevant” for that pass and set to zero. In a way this resembles spiking neural networks, where not all neurons fire all the time (and points are scored for biological plausibility).
Variational autoencoders (VAE) have the same architecture as AEs but are “taught” something else: an approximated probability distribution of the input samples. It’s a bit back to the roots as they are bit more closely related to BMs and RBMs. They do however rely on Bayesian mathematics regarding probabilistic inference and independence, as well as a re-parametrisation trick to achieve this different representation. The inference and independence parts make sense intuitively, but they rely on somewhat complex mathematics. The basics come down to this: take influence into account. If one thing happens in one place and something else happens somewhere else, they are not necessarily related. If they are not related, then the error propagation should consider that. This is a useful approach because neural networks are large graphs (in a way), so it helps if you can rule out influence from some nodes to other nodes as you dive into deeper layers.
Denoising autoencoders (DAE) are AEs where we don’t feed just the input data, but we feed the input data with noise (like making an image more grainy). We compute the error the same way though, so the output of the network is compared to the original input without noise. This encourages the network not to learn details but broader features, as learning smaller features often turns out to be “wrong” due to it constantly changing with noise.
Deep belief networks (DBN) is the name given to stacked architectures of mostly RBMs or VAEs. These networks have been shown to be effectively trainable stack by stack, where each AE or RBM only has to learn to encode the previous network. This technique is also known as greedy training, where greedy means making locally optimal solutions to get to a decent but possibly not optimal answer. DBNs can be trained through contrastive divergence or back-propagation and learn to represent the data as a probabilistic model, just like regular RBMs or VAEs. Once trained or converged to a (more) stable state through unsupervised learning, the model can be used to generate new data. If trained with contrastive divergence, it can even classify existing data because the neurons have been taught to look for different features.
Convolutional neural networks (CNN or deep convolutional neural networks, DCNN) are quite different from most other networks. They are primarily used for image processing but can also be used for other types of input such as as audio. A typical use case for CNNs is where you feed the network images and the network classifies the data, e.g. it outputs “cat” if you give it a cat picture and “dog” when you give it a dog picture. CNNs tend to start with an input “scanner” which is not intended to parse all the training data at once. For example, to input an image of 200 x 200 pixels, you wouldn’t want a layer with 40 000 nodes. Rather, you create a scanning input layer of say 20 x 20 which you feed the first 20 x 20 pixels of the image (usually starting in the upper left corner). Once you passed that input (and possibly use it for training) you feed it the next 20 x 20 pixels: you move the scanner one pixel to the right. Note that one wouldn’t move the input 20 pixels (or whatever scanner width) over, you’re not dissecting the image into blocks of 20 x 20, but rather you’re crawling over it. This input data is then fed through convolutional layers instead of normal layers, where not all nodes are connected to all nodes. Each node only concerns itself with close neighbouring cells (how close depends on the implementation, but usually not more than a few). These convolutional layers also tend to shrink as they become deeper, mostly by easily divisible factors of the input (so 20 would probably go to a layer of 10 followed by a layer of 5). Powers of two are very commonly used here, as they can be divided cleanly and completely by definition: 32, 16, 8, 4, 2, 1. Besides these convolutional layers, they also often feature pooling layers. Pooling is a way to filter out details: a commonly found pooling technique is max pooling, where we take say 2 x 2 pixels and pass on the pixel with the most amount of red. To apply CNNs for audio, you basically feed the input audio waves and inch over the length of the clip, segment by segment. Real world implementations of CNNs often glue an FFNN to the end to further process the data, which allows for highly non-linear abstractions. These networks are called DCNNs but the names and abbreviations between these two are often used interchangeably.
Deconvolutional networks (DN), also called inverse graphics networks (IGNs), are reversed convolutional neural networks. Imagine feeding a network the word “cat” and training it to produce cat-like pictures, by comparing what it generates to real pictures of cats. DNNs can be combined with FFNNs just like regular CNNs, but this is about the point where the line is drawn with coming up with new abbreviations. They may be referenced as deep deconvolutional neural networks, but you could argue that when you stick FFNNs to the back and the front of DNNs that you have yet another architecture which deserves a new name. Note that in most applications one wouldn’t actually feed text-like input to the network, more likely a binary classification input vector. Think <0, 1> being cat, <1, 0> being dog and <1, 1> being cat and dog. The pooling layers commonly found in CNNs are often replaced with similar inverse operations, mainly interpolation and extrapolation with biased assumptions (if a pooling layer uses max pooling, you can invent exclusively lower new data when reversing it).
Deep convolutional inverse graphics networks (DCIGN) have a somewhat misleading name, as they are actually VAEs but with CNNs and DNNs for the respective encoders and decoders. These networks attempt to model “features” in the encoding as probabilities, so that it can learn to produce a picture with a cat and a dog together, having only ever seen one of the two in separate pictures. Similarly, you could feed it a picture of a cat with your neighbours’ annoying dog on it, and ask it to remove the dog, without ever having done such an operation. Demo’s have shown that these networks can also learn to model complex transformations on images, such as changing the source of light or the rotation of a 3D object. These networks tend to be trained with back-propagation.
Generative adversarial networks (GAN) are from a different breed of networks, they are twins: two networks working together. GANs consist of any two networks (although often a combination of FFs and CNNs), with one tasked to generate content and the other has to judge content. The discriminating network receives either training data or generated content from the generative network. How well the discriminating network was able to correctly predict the data source is then used as part of the error for the generating network. This creates a form of competition where the discriminator is getting better at distinguishing real data from generated data and the generator is learning to become less predictable to the discriminator. This works well in part because even quite complex noise-like patterns are eventually predictable but generated content similar in features to the input data is harder to learn to distinguish. GANs can be quite difficult to train, as you don’t just have to train two networks (either of which can pose it’s own problems) but their dynamics need to be balanced as well. If prediction or generation becomes to good compared to the other, a GAN won’t converge as there is intrinsic divergence.
Recurrent neural networks (RNN) are FFNNs with a time twist: they are not stateless; they have connections between passes, connections through time. Neurons are fed information not just from the previous layer but also from themselves from the previous pass. This means that the order in which you feed the input and train the network matters: feeding it “milk” and then “cookies” may yield different results compared to feeding it “cookies” and then “milk”. One big problem with RNNs is the vanishing (or exploding) gradient problem where, depending on the activation functions used, information rapidly gets lost over time, just like very deep FFNNs lose information in depth. Intuitively this wouldn’t be much of a problem because these are just weights and not neuron states, but the weights through time is actually where the information from the past is stored; if the weight reaches a value of 0 or 1 000 000, the previous state won’t be very informative. RNNs can in principle be used in many fields as most forms of data that don’t actually have a timeline (i.e. unlike sound or video) can be represented as a sequence. A picture or a string of text can be fed one pixel or character at a time, so the time dependent weights are used for what came before in the sequence, not actually from what happened x seconds before. In general, recurrent networks are a good choice for advancing or completing information, such as autocompletion.
Long / short term memory (LSTM) networks try to combat the vanishing / exploding gradient problem by introducing gates and an explicitly defined memory cell. These are inspired mostly by circuitry, not so much biology. Each neuron has a memory cell and three gates: input, output and forget. The function of these gates is to safeguard the information by stopping or allowing the flow of it. The input gate determines how much of the information from the previous layer gets stored in the cell. The output layer takes the job on the other end and determines how much of the next layer gets to know about the state of this cell. The forget gate seems like an odd inclusion at first but sometimes it’s good to forget: if it’s learning a book and a new chapter begins, it may be necessary for the network to forget some characters from the previous chapter. LSTMs have been shown to be able to learn complex sequences, such as writing like Shakespeare or composing primitive music. Note that each of these gates has a weight to a cell in the previous neuron, so they typically require more resources to run.
Gated recurrent units (GRU) are a slight variation on LSTMs. They have one less gate and are wired slightly differently: instead of an input, output and a forget gate, they have an update gate. This update gate determines both how much information to keep from the last state and how much information to let in from the previous layer. The reset gate functions much like the forget gate of an LSTM but it’s located slightly differently. They always send out their full state, they don’t have an output gate. In most cases, they function very similarly to LSTMs, with the biggest difference being that GRUs are slightly faster and easier to run (but also slightly less expressive). In practice these tend to cancel each other out, as you need a bigger network to regain some expressiveness which then in turn cancels out the performance benefits. In some cases where the extra expressiveness is not needed, GRUs can outperform LSTMs.

Neural Turing machines (NTM) can be understood as an abstraction of LSTMs and an attempt to un-black-box neural networks (and give us some insight in what is going on in there). Instead of coding a memory cell directly into a neuron, the memory is separated. It’s an attempt to combine the efficiency and permanency of regular digital storage and the efficiency and expressive power of neural networks. The idea is to have a content-addressable memory bank and a neural network that can read and write from it. The “Turing” in Neural Turing Machines comes from them being Turing complete: the ability to read and write and change state based on what it reads means it can represent anything a Universal Turing Machine can represent.

Bidirectional recurrent neural networks, bidirectional long / short term memory networks and bidirectional gated recurrent units (BiRNN, BiLSTM and BiGRU respectively) are not shown on the chart because they look exactly the same as their unidirectional counterparts. The difference is that these networks are not just connected to the past, but also to the future. As an example, unidirectional LSTMs might be trained to predict the word “fish” by being fed the letters one by one, where the recurrent connections through time remember the last value. A BiLSTM would also be fed the next letter in the sequence on the backward pass, giving it access to future information. This trains the network to fill in gaps instead of advancing information, so instead of expanding an image on the edge, it could fill a hole in the middle of an image.
Deep residual networks (DRN) are very deep FFNNs with extra connections passing input from one layer to a later layer (often 2 to 5 layers) as well as the next layer. Instead of trying to find a solution for mapping some input to some output across say 5 layers, the network is enforced to learn to map some input to some output + some input. Basically, it adds an identity to the solution, carrying the older input over and serving it freshly to a later layer. It has been shown that these networks are very effective at learning patterns up to 150 layers deep, much more than the regular 2 to 5 layers one could expect to train. However, it has been proven that these networks are in essence just RNNs without the explicit time based construction and they’re often compared to LSTMs without gates.
Echo state networks (ESN) are yet another different type of (recurrent) network. This one sets itself apart from others by having random connections between the neurons (i.e. not organised into neat sets of layers), and they are trained differently. Instead of feeding input and back-propagating the error, we feed the input, forward it and update the neurons for a while, and observe the output over time. The input and the output layers have a slightly unconventional role as the input layer is used to prime the network and the output layer acts as an observer of the activation patterns that unfold over time. During training, only the connections between the observer and the (soup of) hidden units are changed.
Extreme learning machines (ELM) are basically FFNNs but with random connections. They look very similar to LSMs and ESNs, but they are used more like FFNNs. This is not just because they are not recurrent nor spiking, but also because these use simple backpropagation through the entire network, instead of dealing with the input/output.
Liquid state machines (LSM) are similar soups, looking a lot like ESNs. The real difference is that LSMs are a type of spiking neural networks: sigmoid activations are replaced with threshold functions and each neuron is also an accumulating memory cell. So when updating a neuron, the value is not set to the sum of the neighbours, but rather added to itself. Once the threshold is reached, it releases its’ energy to other neurons. This creates a spiking like pattern, where nothing happens for a while until a threshold is suddenly reached.
Support vector machines (SVM) find optimal solutions for classification problems. Classically they were only capable of categorising linearly separable data; say finding which images are of Garfield and which of Snoopy, with any other outcome not being possible. During training, SVMs can be thought of as plotting all the data (Garfields and Snoopys) on a graph (2D) and figuring out how to draw a line between the data points. This line would separate the data, so that all Snoopys are on one side and the Garfields on the other. This line moves to an optimal line in such a way that the margins between the data points and the line are maximised on both sides. Classifying new data would be done by plotting a point on this graph and simply looking on which side of the line it is (Snoopy side or Garfield side). Using the kernel trick, they can be taught to classify n-dimensional data. This entails plotting points in a 3D plot, allowing it to distinguish between Snoopy, Garfield AND Simon’s cat, or even higher dimensions distinguishing even more cartoon characters. SVMs are not always considered neural networks.
And finally, Kohonen networks (KN, also self organising (feature) map, SOM, SOFM) “complete” our zoo. KNs utilise competitive learning to classify data without supervision. Input is presented to the network, after which the network assesses which of its neurons most closely match that input. These neurons are then adjusted to match the input even better, dragging along their neighbours in the process. How much the neighbours are moved depends on the distance of the neighbours to the best matching units. KNs are sometimes not considered neural networks either.
Any feedback and criticism is welcome. At the Asimov Institute we do deep learning research and development, so be sure to follow us on twitter for future updates and posts! Thank you for reading!

Tuesday, September 13, 2016

World’s First Operation Inside Eye Using a Robot

University of Oxford surgeons at Oxford’s John Radcliffe Hospital have performed the world’s first operation inside the eye using a robot.
Robert MacLaren, Professor of Ophthalmology. assisted by Dr Thomas Edwards, Nuffield Medical Fellow, used the remotely controlled robot to lift a membrane 100th of a millimetre thick from the retina at the back of the right eye of the Revd Dr William Beaver, 70, an Associate Priest at St Mary the Virgin, Iffley, Oxford. He is the first patient ever to undergo this experimental procedure.
The Robotic Retinal Dissection Device (R2D2) trial is sponsored by the University of Oxford and funded by the NIHR Oxford Biomedical Research Centre with support from Oxford University Hospitals NHS Foundation Trust, which runs the hospital. Additional funding was provided by Zizoz, a Dutch charity for patients with choroideremia, a genetic form of blindness.
The robot needs to operate inside the eye through a single hole that is less than 1 mm in diameter and it needs to go in and out of the eye through this same hole during various steps of the procedure, even though the eye may rotate.
The device is designed to eliminate unwanted tremors in the surgeon’s hand – such as through their pulse – so tiny surgical manipulations can be safely carried out within the eye.
The robot acts like a mechanical hand with seven independent computer-controlled motors resulting in movements as precise as 1000th of a millimetre in scale.
In the case of Father Beaver, the patient for this first operation, a membrane growing on the surface of his retina had contracted and pulled it into an uneven shape. This leads to a distorted image, like looking in a hall of mirrors at a fairground. The membrane is about 100th of a millimetre thick and needed to be dissected off the retina without damaging it.
Surgeons can just about do this by slowing their pulse and timing movements between heart beats, but the robot could make it much easier. Moreover, the robot could enable new, high-precision procedures that are currently out of the reach of the human hand.
Image shows Robert MacLaren steeting the robot.
Robot-assisted eye surgery: Professor Robert MacLaren steers the robot in its first live operation. NeuroscienceNews.com image credited to Oxford University Hospitals NHS Foundation Trust.
The surgeon uses a joystick and touchscreen outside the eye to control the robot whilst monitoring its progress through the operating microscope. This gives the surgeon a notable advantage as significant movements of the joystick result in tiny movements of the robot.
Whilst robots have been developed for large scale surgery, such as in the abdomen, until now no device has been available that achieves the three dimensional precision required to operate inside the human eye. The device has been developed by Preceyes BV, a Dutch medical robotics firm established by the University of Eindhoven. Over the last 18 months, the Preceyes engineers and the team at the University of Oxford’s Nuffield Laboratory of Ophthalmology have worked together to plan this landmark clinical trial. This has resulted in the world first robotic surgery inside the human eye.
On completing the operation, Professor Robert MacLaren said: ‘There is no doubt in my mind that we have just witnessed a vision of eye surgery in the future.
‘Current technology with laser scanners and microscopes allows us to monitor retinal diseases at the microscopic level, but the things we see are beyond the physiological limit of what the human hand can operate on. With a robotic system, we open up a whole new chapter of eye operations that currently cannot be performed.’
Speaking at his follow up visit at the Oxford Eye Hospital, Father Beaver said, ‘My sight is coming back. I am delighted that my surgery went so well and I feel honoured to be part of this pioneering research project.’

Professor MacLaren added, ‘This will help to develop novel surgical treatments for blindness, such as gene therapy and stem cells, which need to be inserted under the retina with a high degree of precision.’
The current robotic eye surgery trial will involve 12 patients in total and involves operations with increasing complexity. In the first part of the trial, the robot is used to peel membranes off the delicate retina without damaging it. If this part is successful, as has been the case so far, the second phase of the trial will assess how the robot can place a fine needle under the retina and inject fluid through it. This will lead to use of the robot in retinal gene therapy, which is a promising new treatment for blindness which is currently being trialled in a number of centres around the world. This follows on from the successful gene therapy trials led by researchers at the Oxford Eye Hospital and includes developing treatments for retinitis pigmentosa, a genetic condition that is one of the most common causes of blindness in young people and age-related macular degeneration, which affects the older age group.

So who put the cyber into cybersex?

General Electric  engineer Ralph Mosher using a robotic exoskeleton he developed in the 1950s.
Where did the “cyber” in “cyberspace” come from? Most people, when asked, will probably credit William Gibson, who famously introduced the term in his celebrated 1984 novel, Neuromancer. It came to him while watching some kids play early video games. Searching for a name for the virtual space in which they seemed immersed, he wrote “cyberspace” in his notepad. “As I stared at it in red Sharpie on a yellow legal pad,” he later recalled, “my whole delight was that it meant absolutely nothing.”
How wrong can you be? Cyberspace turned out to be the space that somehow morphed into the networked world we now inhabit, and which might ultimately prove our undoing by making us totally dependent on a system that is both unfathomably complex and fundamentally insecure. But the cyber- prefix actually goes back a long way before Gibson – to the late 1940s and Norbert Wiener’s book, Cybernetics, Or Control and Communication in the Animal and the Machine, which was published in 1948.
Cybernetics was the term Wiener, an MIT mathematician and polymath, coined for the scientific study of feedback control and communication in animals and machines. As a “transdiscipline” that cuts across traditional fields such as physics, chemistry and biology, cybernetics had a brief and largely unsuccessful existence: few of the world’s universities now have departments of cybernetics. But as Thomas Rid’s absorbing new book, The Rise of the Machines: The Lost History of Cybernetics shows, it has had a long afterglow as a source of mythic inspiration that endures to the present day.



This is because at the heart of the cybernetic idea is the proposition that the gap between animals (especially humans) and machines is much narrower than humanists believe. Its argument is that if you ignore the physical processes that go on in the animal and the machine and focus only on the information loops that regulate these processes in both, you begin to see startling similarities. The feedback loops that enable our bodies to maintain an internal temperature of 37C, for example, are analogous to the way in which the cruise control in our cars operates.
Dr Rid is a reader in the war studies department of King’s College London, which means that he is primarily interested in conflict, and as the world has gone online he has naturally been drawn into the study of how conflict manifests itself in the virtual world. When states are involved in this, we tend to call it “cyberwarfare”, a term of which I suspect Rid disapproves – on the grounds that warfare is intrinsically “kinetic” (like Assad’s barrel bombs) – whereas what’s going on in cyberspace is much more sinister, elusive and intractable.



In order to explain how we’ve got so far out of our depth, Rid has effectively had to compose an alternative history of computing. And whereas most such histories begin with Alan Turing and Claude Shannon and John von Neumann, Rid starts with Wiener and wartime research into gunnery control. For him, the modern world of technology begins not with the early digital computers developed at Bletchley Park, Harvard, Princeton and the University of Pennsylvania but with the interactive artillery systems developed for the US armed forces by the Sperry gyroscope company in the early 1940s.
From this unexpected beginning, Rid weaves an interesting and original story. The seed crystal from which it grows is the idea that the Sperry gun-control system was essentially a way of augmenting the human gunner’s capabilities to cope with the task of hitting fast-moving targets. And it turns out that this dream of technology as a way of augmenting human capabilities is a persistent – but often overlooked – theme in the evolution of computing.


A mechanical dog manufacuted by robot maker Boston Dynamics
Pinterest
 A mechanical dog manufactured by robot maker Boston Dynamics. Cybernetics proposes that the gap between humans and their machines is much narrower than humanists believe. Photograph: Boston Dynamics

The standard narrative about the technology’s history focuses mostly on technical progress – processing power, bandwidth, storage, networking, etc. It’s about machines and applications, companies and fortunes. The underlying assumption is that the technology is empowering – which of course in principle it can be. What, after all, is the web but a memory aid for people? What the dominant narrative conveniently ignores, though, is that the motive force for most tech industry development is not human empowerment but profit. Which is why Facebook wants its 1.7 billion users to stay within its walled garden rather than simply being empowered by the open web.
The dream of computing as a way of augmenting human capabilities, however, takes empowerment seriously rather than using it as a cover story. It is, for example, what underpinned the life’s work of Douglas Engelbart, the man who came up with the computer mouse and the windowing interface that we use today. And it motivated JCR Licklider, the psychologist who was, in a way, the godfather of the internet and whose paper Man-Computer Symbiosis is one of the canonical texts in the augmentation tradition. Even today, a charitable interpretation of the Google Glass project would place it firmly in the same tradition. Ditto for virtual reality (VR).



Given that he starts from cybernetics, the trajectory of Rid’s narrative makes sense. It takes him into the origins of the concept of the “cyborg” – the notion of adapting humans to their surroundings rather than the other way round – an idea that was first explored by Nasa and the US military. Thence he moves into the early history of automation, and startling tales about ambitious early attempts to create robots that might be useful in combat. In 1964, for example, US army contractors built the Pedipulator, an 18ft tall mechanical figure that “looked like a prototype of a Star Warsbiped”. The idea was to create some kind of intelligent full-body armour that would turn troops, in effect, into walking tanks.
From there, it’s just a short leap to virtual reality – also, incidentally, first invented by the US military in the early 1980s. Rid’s account of the California counter-culture’s obsession with VR is fascinating, and includes the revelation that Timothy Leary, the high priest of LSD, was an early evangelist. Leary and co thought that VR was better than LSD because it was inherently social whereas an LSD trip was just chemically induced isolation. Then Rid moves on to the arrival of public-key cryptography, which put military-grade encryption into the hands of citizens for the first time (and which had been secretly invented at GCHQ, so one can imagine its discombobulation when civilian geeks independently came up with it).
The final substantive chapter of Rise of the Machines is about conflict in cyberspace, and contains the first detailed account I’ve seen of the “Moonlight Maze” attack on US networks. Rid describes this as “the biggest and most sophisticated computer network attack made against the United States in history”. It happened in 1996, which means that it belongs in prehistory by internet timescales. And it originated in Russia. The attack was breathtaking in its ambition and comprehensiveness. But it was probably small beer compared with what goes on now, especially given that China has entered the cyberfray.
In some ways, Rid’s chapter on conflict in cyberspace seems orthogonal to his main story, which is about how Wiener’s vision of cybernetics functioned as an inspirational myth for innovators who were interested in what Licklider and Engelbart thought of as “man-machine symbiosis” and human augmentation. If this absorbing, illuminating book needs a motto, it is an aphorism of Marshall McLuhan’s friend, John Culkin. “We shape our tools”, he wrote, “and thereafter our tools shape us.”



Thomas Rid Q&A: ‘Politicians would say “cyber” and roll their eyes’



Thomas Rid
Pinterest
 Thomas Rid: ‘Our temptation to improve ourselves through our own machines is hardwired into who we are as humans.’ Photograph: Flickr

How did you become interested in cybernetics? 
The short word “cyber” seemed everywhere, slapped in front of cafes, crime, bullying, war, punk, even sex. Journalists and politicians and academics would say “cyber” and roll their eyes at it. Sometimes they would ask where the funny phrase actually came from. So every time my boss introduced me as, “Hey, this is Thomas, he’s our cyber expert,” I cringed. So I thought I should write a book. Nobody, after all, had properly connected today’s “cyber” to its historic ancestor, cybernetics.
Initially I wanted to do a polemic. But then I presented some of the history at Royal Holloway, and to my surprise, some of the computer science students warmed to “cyber” after my talk, appreciating the idea’s historical and philosophical depth. So I thought, yes, let’s do this properly.
You teach in a department of war studies, so I can see that cyberwar might be your thing. But you decided that you needed to go way back – not only toNorbert Wiener and the original ideas of cybernetics, but also to the counter-cultural background, to personal computing, virtual reality (VR) and computer conferencing. Why? 
War studies, my department, is an open tent. Crossing disciplinary boundaries and adding historical and conceptual depth is what we do. So “machines” fits right in. I think understanding our fascination with communication and control today requires going back to the origins, to Wiener’s cybernetic vision after the second world war. Our temptation to improve ourselves through our own machines – “big brains” in the 50s, or artificial intelligence today – is hardwired into who we are as humans. We don’t just want to play God, we want to beat God, building artificial intelligence that’s better than the non-artificial kind. This hubris will never go away. So one of our best insurance is to study the history of cybernetic myths, the promise of the perennially imminent rise of the machines.
How long did the book take to research and write? 
It took me about three years. It wasn’t hard to stay focused – the story throughout the decades was just too gripping: here was the US air force building touch-sensitive cybernetic manipulators to refuel nuclear-powered long-range bombers, and there’s LSD guru Timothy Leary discovering the “cybernetic space” inside the machines as a mind-expansion device even better than psychedelic drugs – better, by the way, because the “machine high” was more creative and more social than getting stoned onpsilocybin.

Your account of the Moonlight Maze investigation (of a full-on state-sponsored cyberattack on the US) is fascinating and scary. It suggests that – contrary to popular belief – cyberwarfare is not just a distant possibility but a baffling and terrifying reality. It is also – by your account – intractable. Aren’t we (ie society) out of our depth here? Or, at the very least, aren’t we in a position analogous to where we were with nuclear weapons in, say, 1946? 
One group that’s missing from your account is the engineers who sought to implement old-style cybernetic ideas in real life. For example, theCybersyn project that Stafford Beer led in Chile for Salvador Allende. Did you think of including stuff like that? If not, why not? 
The cybernetic story is expansive. I had to leave out so much, especially in the 50s and 60s, the heyday of cybernetics. For example, the rise of cybernetics in the Soviet Union is a story in itself, and almost entirely missing from my book, as is much of the sociological work that was inspired by Norbert Wiener’s vision (much of it either dated or impenetrable). Cybersyn has been admirably covered, in detail, by Eden Medina’sCybernetic Revolutionaries. I would also mention Ronald Kline’s recent book, The Cybernetics Moment.
“Cyberwar”, if you want to call it that, has been going on since at least 1996 – as I show – without interruption. In fact state-sponsored espionage, sabotage, and subversion escalated drastically in the past two decades. But meanwhile we’ve been fooling ourselves, expecting blackouts and explosions and planes falling out of the sky as a result of cyberattacks. Physical effects happen, but have been a rare exception. What we’re seeing instead is even scarier: an escalation of cold war-style spy-versus-spy subversion and sabotage, covert and hidden and very political, not open and of military nature, like nuclear weapons. Over the last year we have observed several instances of intelligence agencies breaching victims, stealing files, and dumping sensitive information into the public domain: often through purpose-created leak forums, or indeed though Wikileaks.



Russian agencies have been leading this trend, most visibly by trying to influence the US election through hacking and dumping. They’re doing very creative work there. Although the forensic evidence for this activity is solid and openly available, the tactic still works impressively well. Open societies aren’t well equipped to deal with covert spin-doctoring.
We’re currently experiencing a virtual reality frenzy, with companies like Facebook and venture capitalists salivating over it as the Next Big Thing. One of the interesting parts of your story is the revelation that we have been here before – except last time, enthusiasm for VR was inextricably bound up with psychedelic drugs. Then, it was tech plus LSD; now it’s tech plus money. The same cycle applies to artificial intelligence. So cybernetics isn’t the only field to have waxed and waned. 
Absolutely not. I was often writing notes on the margins of my manuscript in Fernandez & Wells in Somerset House, where London fashion week used to happen. Technology is a bit like fashion: every few years a new craze or trend comes around, drawing much attention, money, and fresh talent. Right now, it’s automation and VR, a bit retro-60s and -90s respectively. Of course our fears and hopes aren’t just repeating the past, and the technical progress in both fields has been impressive. But we’ll move on before long, and the next tech wave will probably have a retro feature again.

At a certain moment in the book you effectively detach the prefix “cyber” from its origins in wartime MIT and the work of Norbert Wiener and use it to build a narrative about our networked and computerised existence – cyborgs, cyberspace, cyberwar etc. Your justification, as I see it, is that there was a cybernetic moment and it passed. But had you thought that a cybernetic analysis of our current plight in trying to manage cyberspace might be insightful? For example, one of the big ideas to come out of early cybernetics was Ross Ashby’s 
Law of Requisite Variety – which basically says that for a system to be viable it has to be able to cope with the complexity of its environment. Given what information technology has done to increase the complexity of our current environment, doesn’t that mean that most of our contemporary systems (organisations, institutions) are actually no longer viable. Or is that pushing the idea too far? 
You’re raising a fascinating question here, one that I struggled with for a long time. First, I think “cyber” detached itself from its origins, and degenerated from a scientific concept to an ideology. That shift began in the early 1960s. My book is merely chronicling this larger history, not applying cybernetics to anything. It took me a while to resist the cybernetic temptation, if you like: the old theory still has charm and seductive force left in its bones – but of course I never wanted to be a cyberneticist
.