Future of AI: Reinforcement Learning corrects for Deep Learning's blind spots

Human knowledge supplements data crunching

Deep Learning, the staple of Artificial Intelligence, was supposed to create a brave new world where robots replace humans with flawless performance. The Uber autonomous car that killed a female biker in Arizona put a brake on those dreams for now. Reinforcement Learning which minimizes errors by learning in real-time is the new frontier of artificial intelligence expected to correct for the shortcomings of Deep Learning.

Uber autonomous car collides with pedestrian

The video of the accident in Arizona is puzzling since the car struck a woman who was walking across the road front, and center, in the driver’s field of view. Deep Learning algorithms are adept at object identification, and they did recognize the bicycle albeit only 1.3 seconds before the crash and detected an object 6 seconds before it--classifying it as an unknown object and then a vehicle.

The circumstances of the crash, according to a preliminary report published by NTSB, were unusual: the woman, who was under the influence of drugs, was not using a cross-walk, her pathway was not well lighted, she wore dark clothing, and her bike did not have side reflectors. While the computer system determined the need for emergency braking 1.3 seconds before the crash, the computer system was programmed not to use it to avoid a bumpy ride that would result from false positives. Instead, a driver was hired for the car to exercise judgment before using emergency brakes, but she did not do it in time.

A reasonable surmise, in this case, is that the deep learning algorithm in the Uber car did a near perfect job of identifying the woman’s bicycle, even though she breached traffic rules, but it was not good enough to avoid a tragedy. It is unlikely that regulatory authorities will accept autonomous cars or that they will gain consumer acceptance unless the risk of accidents is minuscule if not non-existent.

Autonomous cars will need algorithms more precise than deep learning algorithms

Autonomous cars will need algorithms more precise than deep learning algorithms

Chinks in Deep Learning

As it turns out, Deep Learning has systemic limitations in data crunching, manifested when it encounters outliers, which precludes a perfect job according to a talk by Prof. Amnon Shashua, the Co-founder, CTO, and Chairman of Mobileye. Deep learning takes a mass of amorphous driving data and crunches it to recommend driving controls. It is not situationally aware of the vehicle types, weather conditions, human behavior, traffic signs, spaces in driving, or geo-location. Consequently, it does not have large enough data sets for rare events and their potential risks.

Major storms, for example, are rare and create unusual situations on roads such as pools of water hazardous to driving. The training of Deep Learning algorithms will need large volumes of data on such storms, including specific types of driving behavior in these conditions. The combined probability of a severe storm and risky driving behavior is even more rare and the data volumes to train Deep Learning algorithms inversely higher.

Deep Learning algorithms have inherent limitations as their learning depends entirely on data crunching

Deep Learning algorithms have inherent limitations as their learning depends entirely on data crunching

Altogether, data on each of the individual type of rare events will add up to staggering volumes of data. Beyond a point, growing numbers of layers in neural networks needed to process the expanding volumes of data cause computation complexity resulting in diminishing returns. Increasingly, experts in artificial intelligence are looking to break up the overall problem of data processing into smaller components, each of them categorized by humans to reduce the effort that machines must make.

Reinforcement Learning

We spoke to Prashant Trivedi, Co-Founder and CMO of AlphaICs, based in San Francisco Bay Area and Vishal Chatrath, Co-Founder, and CEO of Cambridge, England based PROWLER.io on the advances in artificial intelligence that will improve on deep learning for achieving desired outcomes in dynamic environments without making impossible demands on data. AlphaICs, which is designing chipsets for the purpose, is led by Nagendra Nagaraja, previously a chip lead in Nvidia, and has Vinod Dham, of Pentium chip fame, as another co-founder. PROWLER.io is writing the corresponding analytical algorithms and building its VUKU platform.

Nagendra Nagaraja, CEO and Co-founder, AlphaICs

Nagendra Nagaraja, CEO and Co-founder, AlphaICs

“Keener learning becomes possible when the larger problem of autonomous driving is sub-divided into components such as pedestrian behavior or driving patterns of trucks and assigned to multiple-agents. Our chip incorporates agents configured as Deep Learning agents or Decision-Making agents. Deep Learning agents perceive the world around while decision making agents control the vehicle. This will be a significant improvement over the current rule-based decision making in the autonomous cars,” Mr. Prashant Trivedi explained.

In a state of flux, common while driving, deep learning is supplemented by reinforcement learning; agents look to achieve an objective such as minimizing accidents and align their decisions to that purpose. “Each agent makes a sequence of decisions, in response to environmental conditions, consonant with the ultimate objective and earns rewards for achieving them,” Prashant Trivedi added. “Conversely, corrections are made in the policies when outcomes fall short of the objectives, and the revisions are shared on the cloud,” Prashant Trivedi informed us.

Beyond GPUs

Chip technologies are evolving to meet the needs of reinforcement learning. “Current technologies require a GPU for each of the agents while we have a chipset that accommodates multiple agents, up to sixty-four, in a single chip programmed for their expected behavior at the outset,” Prashant Trivedi said. Each agent on the chip can be configured as a group of tensors. AlphaICs has invented the Single Instruction Multiple Agents (SIMA) chipset by which these agents can also be grouped together for a common action thereby providing a huge level of parallelism. These chips are designed to not only process large volumes of data as with Deep Learning, but they are also amenable to control, i.e., they can be instructed to use a variety of algorithms for individual agents or a collection of them. AlphaICs has an agreement with a Tier 1 automotive company to test their chip soon.

Decision Theory

Vishal Chatrath explained the foundational principles of decision-making theory that inform the mathematical models that undergird the AI algorithms at PROWLER.io. Agents make decisions with limited information: “Car driving in a crowded metropolitan area is an example of a scenario where drivers react in real-time, respond to the changing environment on roads, with limited information gained from a view of the vehicles in the line-of-sight of a driver,” Vishal Chatrath said. Drivers instinctively learn to navigate with, “mental models, probabilistic in nature, that are a guide to the odds of the outcomes of decisions they make as they navigate their way whatever the vicissitudes of the traffic flow.”

Drivers are aware that the behavior of other vehicles influences their decisions, each of them acts with bounded rationality without knowing how they will react ahead of time. Yet they learn to find a safe way to their destination. “The process of charting a pathway, iteratively with the feedback received from experience, is reinforcement learning,”Chatrath concluded.

PROWLER.io’s algorithm, in short, has a baseline probabilistic model, based on historical traffic data, which it augments with a theory of decision making, constrained by limited information and bounded rationality of humans, to improve estimates of actual outcomes. Reinforcement learning makes real-time corrections for errors caused by data errors and imperfect decisions by humans.

We raised the issue of the Tesla’s crash into a truck when it mistook its white surface to be a cloud. “Current deep learning algorithmic routines use hand-crafted if-then statements. We are replacing these simple rules with autonomous AI agents capable of considering multiple parameters in the environment which in combination help to arrive at accurate conclusions. If a stop sign has faded, for example, the autonomous agent looks at other factors such as the rate of flow of the traffic to stop the car regardless,” Vishal Chatrath said.

Going beyond the data crunching of deep learning, and aided by the programmability of chips, a wider variety of theoretical and mathematical modeling can be tested to obtain progressively better results. “Deep Learning is a method of statistical modeling with limited capabilities such as image recognition which is unlikely to achieve accuracies beyond eighty percent or so. We are creating a computer language that executes mathematical and theoretical knowledge on a machine accurately. Arithmetic, for example, can only be right or wrong,” said Stephen Wolfram, a mathematician, computer scientist and a businessman.

While the achievements of deep learning are significant, they are only a start. The full promise of artificial intelligence is a long way from its realization, and human knowledge will be a catalyst in achieving much of it.

Vishal Chatrath, Co-founder and CEO, PROWLER.io

Vishal Chatrath, Co-founder and CEO, PROWLER.io


Introduction: Kishore Jethanandani, editor of FuturistLens, sat down with Vishal Chatrath, Co-founder and CEO of PROWLER.io at the Collision Conference in New Orleans in early May 2018 to discuss the business applications of decision modelling and reinforcement learning.

Individual departments of corporations do not necessarily operate in consonance with the overall strategic objectives of the company. Decision modelling helps to align the interests of individual departments and the company management with rewards.

FuturistLens: How have you used decision models, described earlier in the context of autonomous cars, in a business context?

Vishal Chatrath: We have used our decision models to craft marketing strategies for consumer product companies. Typically, consumer products companies want to increase their market share which they often do by lowering prices. However, competitors tend to match the price drops which only reduces the margins on both sides in a mutually destructive race to the bottom. Conversely, companies also want to consider the negative impact of the price drop on their brand value. A higher price can increase market share when consumers see it as a measure of quality. Furthermore, any decision taken by the product team has an impact downstream on the supply chain.

FL: Are you able to share a case study where the introduction of this kind of modelling changed the way a company did business?

VC: We have started working with a Fortune 500 company whose marketing decisions are distributed over two dozen themes each of which runs independently of others and focus on individual products or segments of the company’s business. The goal of our team is to model for the company by considering the interactions of decisions for each product as well the impacts on the supply chain. It will then be possible to simulate a broader variety of scenarios and trade-offs of decision options for marketing strategy of the company.

FL: How did the company come to see a benefit in looking at the information of the entire company instead of individual teams?

VC: One of my colleagues had worked on a similar project with an electronics company with six hundred product variants. He had taken a portfolio approach to sales in the electronics company and reduced the marketing spend for each unit increase in market share. Based on our understanding of the business of the consumer products company, we were able to conceptually show that similar benefits were replicable in its situation. One of the advantages of company-wide marketing is that it stops departments from competing internally at the expense of the overall market share of the company while lowering marketing spend. Savings of as much as fifty million can be realized with this approach.

FL: Are you able to model in real-time to consider the impact of events or is this something you must plan ahead of time?

VC: Most businesses don’t need real-time analysis, but you do have exceptions. Imposition of tariffs or sanctions have unexpected impacts which companies currently evaluate manually and hedge against in financial markets. With this kind of company-wide model, it is possible to evaluate the effect of unanticipated events quickly, leaving enough time for humans to contribute to decision-making. In peak seasons like Christmas, with the prospect of inclement weather and its impact on the infrastructure, companies want to react as events happen.

FL: The complex model you described here will require enormous computing power probably at the edge which is currently not available. Is it operationally possible to implement such a model?

VC: Our probabilistic and reinforcement learning models are based on historical data and learning from trial-and-error. Unlike Deep Learning algorithms, we don’t have to crunch all the data each time. We simulate live situations and analyze anomalies and modify the model as we learn more. For this reason, our need for computing resources is less by order of magnitude.