Heuristics Archives - Got help?

In Greek mythology, Hades

alexheikel — Wed, 13 Sep 2023 23:09:26 +0000

Warning: Trying to access array offset on value of type null in /home2/gethal/hello.gethal.com/wp-content/themes/polite/templatesell/filters/excerpt.php on line 29

(the god of the underworld) was the fairest god. He would give people tasks that could be accomplished and if they did then he would hold up his end of the bargain. Zeus, on the other hand, was an adulterer and, because of this, innocent people suffered from Hera’s wrath.

The post In Greek mythology, Hades appeared first on Got help?.

Is AI Riding a One-Trick Pony?

Dave Bowman — Wed, 21 Dec 2022 17:47:02 +0000

Just about every AI advance you’ve heard of depends on a breakthrough that’s three decades old. Keeping up the pace of progress will require confronting AI’s serious limitations.

I’m standing in what is soon to be the center of the world, or is perhaps just a very large room on the seventh floor of a gleaming tower in downtown Toronto. Showing me around is Jordan Jacobs, who cofounded this place: the nascent Vector Institute, which opens its doors this fall and which is aiming to become the global epicenter of artificial intelligence.

We’re in Toronto because Geoffrey Hinton is in Toronto, and Geoffrey Hinton is the father of “deep learning,” the technique behind the current excitement about AI. “In 30 years we’re going to look back and say Geoff is Einstein—of AI, deep learning, the thing that we’re calling AI,” Jacobs says. Of the researchers at the top of the field of deep learning, Hinton has more citations than the next three combined. His students and postdocs have gone on to run the AI labs at Apple, Facebook, and OpenAI; Hinton himself is a lead scientist on the Google Brain AI team. In fact, nearly every achievement in the last decade of AI—in translation, speech recognition, image recognition, and game playing—traces in some way back to Hinton’s work.

The Vector Institute, this monument to the ascent of Hinton’s ideas, is a research center where companies from around the U.S. and Canada—like Google, and Uber, and Nvidia—will sponsor efforts to commercialize AI technologies. Money has poured in faster than Jacobs could ask for it; two of his cofounders surveyed companies in the Toronto area, and the demand for AI experts ended up being 10 times what Canada produces every year. Vector is in a sense ground zero for the now-worldwide attempt to mobilize around deep learning: to cash in on the technique, to teach it, to refine and apply it. Data centers are being built, towers are being filled with startups, a whole generation of students is going into the field.

The impression you get standing on the Vector floor, bare and echoey and about to be filled, is that you’re at the beginning of something. But the peculiar thing about deep learning is just how old its key ideas are. Hinton’s breakthrough paper, with colleagues David Rumelhart and Ronald Williams, was published in 1986. The paper elaborated on a technique called backpropagation, or backprop for short. Backprop, in the words of Jon Cohen, a computational psychologist at Princeton, is “what all of deep learning is based on—literally everything.”

When you boil it down, AI today is deep learning, and deep learning is backprop—which is amazing, considering that backprop is more than 30 years old. It’s worth understanding how that happened—how a technique could lie in wait for so long and then cause such an explosion—because once you understand the story of backprop, you’ll start to understand the current moment in AI, and in particular the fact that maybe we’re not actually at the beginning of a revolution. Maybe we’re at the end of one.

Vindication

The walk from the Vector Institute to Hinton’s office at Google, where he spends most of his time (he is now an emeritus professor at the University of Toronto), is a kind of living advertisement for the city, at least in the summertime. You can understand why Hinton, who is originally from the U.K., moved here in the 1980s after working at Carnegie Mellon University in Pittsburgh.

When you step outside, even downtown near the financial district, you feel as though you’ve actually gone into nature. It’s the smell, I think: wet loam in the air. Toronto was built on top of forested ravines, and it’s said to be “a city within a park”; as it’s been urbanized, the local government has set strict restrictions to maintain the tree canopy. As you’re flying in, the outer parts of the city look almost cartoonishly lush.

Maybe we’re not actually at the beginning of a revolution.

Toronto is the fourth-largest city in North America (after Mexico City, New York, and L.A.), and its most diverse: more than half the population was born outside Canada. You can see that walking around. The crowd in the tech corridor looks less San Francisco—young white guys in hoodies—and more international. There’s free health care and good public schools, the people are friendly, and the political order is relatively left-leaning and stable; and this stuff draws people like Hinton, who says he left the U.S. because of the Iran-Contra affair. It’s one of the first things we talk about when I go to meet him, just before lunch.

“Most people at CMU thought it was perfectly reasonable for the U.S. to invade Nicaragua,” he says. “They somehow thought they owned it.” He tells me that he had a big breakthrough recently on a project: “getting a very good junior engineer who’s working with me,” a woman named Sara Sabour. Sabour is Iranian, and she was refused a visa to work in the United States. Google’s Toronto office scooped her up.

Hinton, who is 69 years old, has the kind, lean, English-looking face of the Big Friendly Giant, with a thin mouth, big ears, and a proud nose. He was born in Wimbledon, England, and sounds, when he talks, like the narrator of a children’s book about science: curious, engaging, eager to explain things. He’s funny, and a bit of a showman. He stands the whole time we talk, because, as it turns out, sitting is too painful. “I sat down in June of 2005 and it was a mistake,” he tells me, letting the bizarre line land before explaining that a disc in his back gives him trouble. It means he can’t fly, and earlier that day he’d had to bring a contraption that looked like a surfboard to the dentist’s office so he could lie on it while having a cracked tooth root examined.

In the 1980s Hinton was, as he is now, an expert on neural networks, a much-simplified model of the network of neurons and synapses in our brains. However, at that time it had been firmly decided that neural networks were a dead end in AI research. Although the earliest neural net, the Perceptron, which began to be developed in the 1950s, had been hailed as a first step toward human-level machine intelligence, a 1969 book by MIT’s Marvin Minsky and Seymour Papert, called Perceptrons, proved mathematically that such networks could perform only the most basic functions. These networks had just two layers of neurons, an input layer and an output layer. Nets with more layers between the input and output neurons could in theory solve a great variety of problems, but nobody knew how to train them, and so in practice they were useless. Except for a few holdouts like Hinton, Perceptrons caused most people to give up on neural nets entirely.

Hinton’s breakthrough, in 1986, was to show that backpropagation could train a deep neural net, meaning one with more than two or three layers. But it took another 26 years before increasing computational power made good on the discovery. A 2012 paper by Hinton and two of his Toronto students showed that deep neural nets, trained using backpropagation, beat state-of-the-art systems in image recognition. “Deep learning” took off. To the outside world, AI seemed to wake up overnight. For Hinton, it was a payoff long overdue.

Reality distortion field

A neural net is usually drawn like a club sandwich, with layers stacked one atop the other. The layers contain artificial neurons, which are dumb little computational units that get excited—the way a real neuron gets excited—and pass that excitement on to the other neurons they’re connected to. A neuron’s excitement is represented by a number, like 0.13 or 32.39, that says just how excited it is. And there’s another crucial number, on each of the connections between two neurons, that determines how much excitement should get passed from one to the other. That number is meant to model the strength of the synapses between neurons in the brain. When the number is higher, it means the connection is stronger, so more of the one’s excitement flows to the other.

A diagram from seminal work on “error propagation” by Hinton, David Rumelhart, and Ronald Williams.

One of the most successful applications of deep neural nets is in image recognition—as in the memorable scene in HBO’s Silicon Valley where the team builds a program that can tell whether there’s a hot dog in a picture. Programs like that actually exist, and they wouldn’t have been possible a decade ago. To get them to work, the first step is to get a picture. Let’s say, for simplicity, it’s a small black-and-white image that’s 100 pixels wide and 100 pixels tall. You feed this image to your neural net by setting the excitement of each simulated neuron in the input layer so that it’s equal to the brightness of each pixel. That’s the bottom layer of the club sandwich: 10,000 neurons (100×100) representing the brightness of every pixel in the image.

You then connect this big layer of neurons to another big layer of neurons above it, say a few thousand, and these in turn to another layer of another few thousand neurons, and so on for a few layers. Finally, in the topmost layer of the sandwich, the output layer, you have just two neurons—one representing “hot dog” and the other representing “not hot dog.” The idea is to teach the neural net to excite only the first of those neurons if there’s a hot dog in the picture, and only the second if there isn’t. Backpropagation—the technique that Hinton has built his career upon—is the method for doing this.

Backprop is remarkably simple, though it works best with huge amounts of data. That’s why big data is so important in AI—why Facebook and Google are so hungry for it, and why the Vector Institute decided to set up shop down the street from four of Canada’s largest hospitals and develop data partnerships with them.

In this case, the data takes the form of millions of pictures, some with hot dogs and some without; the trick is that these pictures are labeled as to which have hot dogs. When you first create your neural net, the connections between neurons might have random weights—random numbers that say how much excitement to pass along each connection. It’s as if the synapses of the brain haven’t been tuned yet. The goal of backprop is to change those weights so that they make the network work: so that when you pass in an image of a hot dog to the lowest layer, the topmost layer’s “hot dog” neuron ends up getting excited.

Suppose you take your first training image, and it’s a picture of a piano. You convert the pixel intensities of the 100×100 picture into 10,000 numbers, one for each neuron in the bottom layer of the network. As the excitement spreads up the network according to the connection strengths between neurons in adjacent layers, it’ll eventually end up in that last layer, the one with the two neurons that say whether there’s a hot dog in the picture. Since the picture is of a piano, ideally the “hot dog” neuron should have a zero on it, while the “not hot dog” neuron should have a high number. But let’s say it doesn’t work out that way. Let’s say the network is wrong about this picture. Backprop is a procedure for rejiggering the strength of every connection in the network so as to fix the error for a given training example.

The way it works is that you start with the last two neurons, and figure out just how wrong they were: how much of a difference is there between what the excitement numbers should have been and what they actually were? When that’s done, you take a look at each of the connections leading into those neurons—the ones in the next lower layer—and figure out their contribution to the error. You keep doing this until you’ve gone all the way to the first set of connections, at the very bottom of the network. At that point you know how much each individual connection contributed to the overall error, and in a final step, you change each of the weights in the direction that best reduces the error overall. The technique is called “backpropagation” because you are “propagating” errors back (or down) through the network, starting from the output.

The incredible thing is that when you do this with millions or billions of images, the network starts to get pretty good at saying whether an image has a hot dog in it. And what’s even more remarkable is that the individual layers of these image-recognition nets start being able to “see” images in sort of the same way our own visual system does. That is, the first layer might end up detecting edges, in the sense that its neurons get excited when there are edges and don’t get excited when there aren’t; the layer above that one might be able to detect sets of edges, like corners; the layer above that one might start to see shapes; and the layer above that one might start finding stuff like “open bun” or “closed bun,” in the sense of having neurons that respond to either case. The net organizes itself, in other words, into hierarchical layers without ever having been explicitly programmed that way.

A real intelligence doesn’t break when you slightly change the problem.

This is the thing that has everybody enthralled. It’s not just that neural nets are good at classifying pictures of hot dogs or whatever: they seem able to build representations of ideas. With text you can see this even more clearly. You can feed the text of Wikipedia, many billions of words long, into a simple neural net, training it to spit out, for each word, a big list of numbers that correspond to the excitement of each neuron in a layer. If you think of each of these numbers as a coordinate in a complex space, then essentially what you’re doing is finding a point, known in this context as a vector, for each word somewhere in that space. Now, train your network in such a way that words appearing near one another on Wikipedia pages end up with similar coordinates, and voilà, something crazy happens: words that have similar meanings start showing up near one another in the space. That is, “insane” and “unhinged” will have coordinates close to each other, as will “three” and “seven,” and so on. What’s more, so-called vector arithmetic makes it possible to, say, subtract the vector for “France” from the vector for “Paris,” add the vector for “Italy,” and end up in the neighborhood of “Rome.” It works without anyone telling the network explicitly that Rome is to Italy as Paris is to France.

“It’s amazing,” Hinton says. “It’s shocking.” Neural nets can be thought of as trying to take things—images, words, recordings of someone talking, medical data—and put them into what mathematicians call a high-dimensional vector space, where the closeness or distance of the things reflects some important feature of the actual world. Hinton believes this is what the brain itself does. “If you want to know what a thought is,” he says, “I can express it for you in a string of words. I can say ‘John thought, “Whoops.”’ But if you ask, ‘What is the thought? What does it mean for John to have that thought?’ It’s not that inside his head there’s an opening quote, and a ‘Whoops,’ and a closing quote, or even a cleaned-up version of that. Inside his head there’s some big pattern of neural activity.” Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

Geoffrey Hinton

It is no coincidence that Toronto’s flagship AI institution was named for this fact. Hinton was the one who came up with the name Vector Institute.

There’s a sort of reality distortion field that Hinton creates, an air of certainty and enthusiasm, that gives you the feeling there’s nothing that vectors can’t do. After all, look at what they’ve been able to produce already: cars that drive themselves, computers that detect cancer, machines that instantly translate spoken language. And look at this charming British scientist talking about gradient descent in high-dimensional spaces!

It’s only when you leave the room that you remember: these “deep learning” systems are still pretty dumb, in spite of how smart they sometimes seem. A computer that sees a picture of a pile of doughnuts piled up on a table and captions it, automatically, as “a pile of doughnuts piled on a table” seems to understand the world; but when that same program sees a picture of a girl brushing her teeth and says “The boy is holding a baseball bat,” you realize how thin that understanding really is, if ever it was there at all.

Neural nets are just thoughtless fuzzy pattern recognizers, and as useful as fuzzy pattern recognizers can be—hence the rush to integrate them into just about every kind of software—they represent, at best, a limited brand of intelligence, one that is easily fooled. A deep neural net that recognizes images can be totally stymied when you change a single pixel, or add visual noise that’s imperceptible to a human. Indeed, almost as often as we’re finding new ways to apply deep learning, we’re finding more of its limits. Self-driving cars can fail to navigate conditions they’ve never seen before. Machines have trouble parsing sentences that demand common-sense understanding of how the world works.

Deep learning in some ways mimics what goes on in the human brain, but only in a shallow way—which perhaps explains why its intelligence can sometimes seem so shallow. Indeed, backprop wasn’t discovered by probing deep into the brain, decoding thought itself; it grew out of models of how animals learn by trial and error in old classical-conditioning experiments. And most of the big leaps that came about as it developed didn’t involve some new insight about neuroscience; they were technical improvements, reached by years of mathematics and engineering. What we know about intelligence is nothing against the vastness of what we still don’t know.

David Duvenaud, an assistant professor in the same department as Hinton at the University of Toronto, says deep learning has been somewhat like engineering before physics. “Someone writes a paper and says, ‘I made this bridge and it stood up!’ Another guy has a paper: ‘I made this bridge and it fell down—but then I added pillars, and then it stayed up.’ Then pillars are a hot new thing. Someone comes up with arches, and it’s like, ‘Arches are great!’” With physics, he says, “you can actually understand what’s going to work and why.” Only recently, he says, have we begun to move into that phase of actual understanding with artificial intelligence.

Hinton himself says, “Most conferences consist of making minor variations … as opposed to thinking hard and saying, ‘What is it about what we’re doing now that’s really deficient? What does it have difficulty with? Let’s focus on that.’”

It can be hard to appreciate this from the outside, when all you see is one great advance touted after another. But the latest sweep of progress in AI has been less science than engineering, even tinkering. And though we’ve started to get a better handle on what kinds of changes will improve deep-learning systems, we’re still largely in the dark about how those systems work, or whether they could ever add up to something as powerful as the human mind.

It’s worth asking whether we’ve wrung nearly all we can out of backprop. If so, that might mean a plateau for progress in artificial intelligence.

Patience

If you want to see the next big thing, something that could form the basis of machines with a much more flexible intelligence, you should probably check out research that resembles what you would’ve found had you encountered backprop in the ’80s: smart people plugging away on ideas that don’t really work yet.

A few months ago I went to the Center for Minds, Brains, and Machines, a multi-institutional effort headquartered at MIT, to watch a friend of mine, Eyal Dechter, defend his dissertation in cognitive science. Just before the talk started, his wife Amy, their dog Ruby, and their daughter Susannah were milling around, wishing him well. On the screen was a picture of Ruby, and next to it one of Susannah as a baby. When Dad asked Susannah to point herself out, she happily slapped a long retractable pointer against her own baby picture. On the way out of the room, she wheeled a toy stroller behind her mom and yelled “Good luck, Daddy!” over her shoulder. “Vámanos!” she said finally. She’s two.

“The fact that it doesn’t work is just a temporary annoyance.”

Eyal started his talk with a beguiling question: How is it that Susannah, after two years of experience, can learn to talk, to play, to follow stories? What is it about the human brain that makes it learn so well? Will a computer ever be able to learn so quickly and so fluidly?

We make sense of new phenomena in terms of things we already understand. We break a domain down into pieces and learn the pieces. Eyal is a mathematician and computer programmer, and he thinks about tasks—like making a soufflé—as really complex computer programs. But it’s not as if you learn to make a soufflé by learning every one of the program’s zillion micro-instructions, like “Rotate your elbow 30 degrees, then look down at the countertop, then extend your pointer finger, then …” If you had to do that for every new task, learning would be too hard, and you’d be stuck with what you already know. Instead, we cast the program in terms of high-level steps, like “Whip the egg whites,” which are themselves composed of subprograms, like “Crack the eggs” and “Separate out the yolks.”

Computers don’t do this, and that is a big part of the reason they’re dumb. To get a deep-learning system to recognize a hot dog, you might have to feed it 40 million pictures of hot dogs. To get Susannah to recognize a hot dog, you show her a hot dog. And before long she’ll have an understanding of language that goes deeper than recognizing that certain words often appear together. Unlike a computer, she’ll have a model in her mind about how the whole world works. “It’s sort of incredible to me that people are scared of computers taking jobs,” Eyal says. “It’s not that computers can’t replace lawyers because lawyers do really complicated things. It’s because lawyers read and talk to people. It’s not like we’re close. We’re so far.”

A real intelligence doesn’t break when you slightly change the requirements of the problem it’s trying to solve. And the key part of Eyal’s thesis was his demonstration, in principle, of how you might get a computer to work that way: to fluidly apply what it already knows to new tasks, to quickly bootstrap its way from knowing almost nothing about a new domain to being an expert.

Hinton made this sketch for his next big idea, to organize neural nets with “capsules.”

Essentially, it is a procedure he calls the “exploration–compression” algorithm. It gets a computer to function somewhat like a programmer who builds up a library of reusable, modular components on the way to building more and more complex programs. Without being told anything about a new domain, the computer tries to structure knowledge about it just by playing around, consolidating what it’s found, and playing around some more, the way a human child does.

His advisor, Joshua Tenenbaum, is one of the most highly cited researchers in AI. Tenenbaum’s name came up in half the conversations I had with other scientists. Some of the key people at DeepMind—the team behind AlphaGo, which shocked computer scientists by beating a world champion player in the complex game of Go in 2016—had worked as his postdocs. He’s involved with a startup that’s trying to give self-driving cars some intuition about basic physics and other drivers’ intentions, so they can better anticipate what would happen in a situation they’ve never seen before, like when a truck jackknifes in front of them or when someone tries to merge very aggressively.

Eyal’s thesis doesn’t yet translate into those kinds of practical applications, let alone any programs that would make headlines for besting a human. The problems Eyal’s working on “are just really, really hard,” Tenenbaum said. “It’s gonna take many, many generations.”

Tenenbaum has long, curly, whitening hair, and when we sat down for coffee he had on a button-down shirt with black slacks. He told me he looks to the story of backprop for inspiration. For decades, backprop was cool math that didn’t really accomplish anything. As computers got faster and the engineering got more sophisticated, suddenly it did. He hopes the same thing might happen with his own work and that of his students, “but it might take another couple decades.”

As for Hinton, he is convinced that overcoming AI’s limitations involves building “a bridge between computer science and biology.” Backprop was, in this view, a triumph of biologically inspired computation; the idea initially came not from engineering but from psychology. So now Hinton is trying to pull off a similar trick.

Neural networks today are made of big flat layers, but in the human neocortex real neurons are arranged not just horizontally into layers but vertically into columns. Hinton thinks he knows what the columns are for—in vision, for instance, they’re crucial for our ability to recognize objects even as our viewpoint changes. So he’s building an artificial version—he calls them “capsules”—to test the theory. So far, it hasn’t panned out; the capsules haven’t dramatically improved his nets’ performance. But this was the same situation he’d been in with backprop for nearly 30 years.

“This thing just has to be right,” he says about the capsule theory, laughing at his own boldness. “And the fact that it doesn’t work is just a temporary annoyance.”

By James Somers – technologyreview.com

Image: ADAM DETOUR

The post Is AI Riding a One-Trick Pony? appeared first on Got help?.

Meta’s AI guru LeCun: Most of today’s AI approaches will never lead to true intelligence

Dave Bowman — Fri, 09 Dec 2022 18:43:00 +0000

Fundamental problems elude many strains of deep learning, says LeCun, including the mystery of how to measure information.

“I think AI systems need to be able to reason,” says Yann LeCun, Meta’s chief AI scientist. Today’s popular AI approaches such as Transformers, many of which build upon his own pioneering work in the field, will not be sufficient. “You have to take a step back and say, Okay, we built this ladder, but we want to go to the moon, and there’s no way this ladder is going to get us there,” says LeCun.

Yann LeCun, chief AI scientist of Meta Properties, owner of Facebook, Instagram, and WhatsApp, is likely to tick off a lot of people in his field.

With the posting in June of a think piece on the Open Review server, LeCun offered a broad overview of an approach he thinks holds promise for achieving human-level intelligence in machines.

Implied if not articulated in the paper is the contention that most of today’s big projects in AI will never be able to reach that human-level goal.

In a discussion this month with ZDNET via Zoom, LeCun made clear that he views with great skepticism many of the most successful avenues of research in deep learning at the moment.

“I think they’re necessary but not sufficient,” the Turing Award winner told ZDNET of his peers’ pursuits.

Those include large language models such as the Transformer-based GPT-3 and their ilk. As LeCun characterizes it, the Transformer devotées believe, “We tokenize everything, and train giganticmodels to make discrete predictions, and somehow AI will emerge out of this.”

“They’re not wrong,” he says, “in the sense that that may be a component of a future intelligent system, but I think it’s missing essential pieces.”

It’s a startling critique of what appears to work coming from the scholar who perfected the use of convolutional neural networks, a practical technique that has been incredibly productive in deep learning programs.

LeCun sees flaws and limitations in plenty of other highly successful areas of the discipline.

Reinforcement learning will also never be enough, he maintains. Researchers such as David Silver of DeepMind, who developed the AlphaZero program that mastered Chess, Shogi and Go, are focusing on programs that are “very action-based,” observes LeCun, but “most of the learning we do, we don’t do it by actually taking actions, we do it by observing.”

Lecun, 62, from a perspective of decades of achievement, nevertheless expresses an urgency to confront what he thinks are the blind alleys toward which many may be rushing, and to try to coax his field in the direction he thinks things should go.

“We see a lot of claims as to what should we do to push forward towards human-level AI,” he says. “And there are ideas which I think are misdirected.”

“We’re not to the point where our intelligent machines have as much common sense as a cat,” observes Lecun. “So, why don’t we start there?”

He has abandoned his prior faith in using generative networks in things such as predicting the next frame in a video. “It has been a complete failure,” he says.

LeCun decries those he calls the “religious probabilists,” who “think probability theory is the only framework that you can use to explain machine learning.”

The purely statistical approach is intractable, he says. “It’s too much to ask for a world model to be completely probabilistic; we don’t know how to do it.”

Not just the academics, but industrial AI needs a deep re-think, argues LeCun. The self-driving car crowd, startups such as Wayve, have been “a little too optimistic,” he says, by thinking they could “throw data at” large neural networks “and you can learn pretty much anything.”

“You know, I think it’s entirely possible that we’ll have level-five autonomous cars without common sense,” he says, referring to the “ADAS,” advanced driver assistance system terms for self-driving, “but you’re going to have to engineer the hell out of it.”

Such over-engineered self-driving tech will be something as creaky and brittle as all the computer vision programs that were made obsolete by deep learning, he believes.

“Ultimately, there’s going to be a more satisfying and possibly better solution that involves systems that do a better job of understanding the way the world works.”

Along the way, LeCun offers some withering views of his biggest critics, such as NYU professor Gary Marcus — “he has never contributed anything to AI” — and Jürgen Schmidhuber, co-director of the Dalle Molle Institute for Artificial Intelligence Research — “it’s very easy to do flag-planting.”

Beyond the critiques, the more important point made by LeCun is that certain fundamental problems confront all of AI, in particular, how to measure information.

“You have to take a step back and say, Okay, we built this ladder, but we want to go to the moon, and there’s no way this ladder is going to get us there,” says LeCun of his desire to prompt a rethinking of basic concepts. “Basically, what I’m writing here is, we need to build rockets, I can’t give you the details of how we build rockets, but here are the basic principles.”

The paper, and LeCun’s thoughts in the interview, can be better understood by reading LeCun’s interview earlier this year with ZDNET in which he argues for energy-based self-supervised learning as a path forward for deep learning. Those reflections give a sense of the core approach to what he hopes to build as an alternative to the things he claims will not make it to the finish line.

What follows is a lightly edited transcript of the interview.

ZDNET: The subject of our chat is this paper, “A path toward autonomous machine intelligence,” of which version 0.9.2 is the extant version, yes?

Yann LeCun: Yeah, I consider this, sort-of, a working document. So, I posted it on Open Review, waiting for people to make comments and suggestions, perhaps additional references, and then I’ll produce a revised version.

ZDNET: I see that Juergen Schmidhuber already added some comments to Open Review.

YL: Well, yeah, he always does. I cite one of his papers there in my paper. I think the arguments that he made on social networks that he basically invented all of this in 1991, as he’s done in other cases, is just not the case. I mean, it’s very easy to doflag-planting, and to, kind-of, write an idea without any experiments, without any theory, just suggest that you could do it this way. But, you know, there’s a big difference between just having the idea, and then getting it to work on a toy problem, and then getting it to work on a real problem, and then doing a theory that shows why it works, and then deploying it. There’s a whole chain, and his idea of scientific credit is that it’s the very first person who just, sort-of, you know, had the idea of that, that should get all the credit. And that’s ridiculous.

ZDNET: Don’t believe everything you hear on social media.

YL: I mean, the main paper that he says I should cite doesn’t have any of the main ideas that I talk about in the paper. He’s done this also with with GANs and other things, which didn’t turn out to be true. It’s easy to do flag-planting, it’s much harder to make a contribution. And, by the way, in this particular paper, I explicitly said this is not a scientific paper in the usual sense of the term. It’s more of a position paper about where this thing should go. And there’s a couple of ideas there that might be new, but most of it is not. I’m not claiming any priority on most of what I wrote in that paper, essentially.

Reinforcement learning will also never be enough, LeCun maintains. Researchers such as David Silver of DeepMind, who developed the AlphaZero program that mastered Chess, Shogi and Go, are “very action-based,” observes LeCun, but “most of the learning we do, we don’t do it by actually taking actions, we do it by observing.”

ZDNET: And that is perhaps a good place to start, because I’m curious why did you pursue this path now? What got you thinking about this? Why did you want to write this?

YL: Well, so, I’ve been thinking about this for a very long time, about a path towards human-level or animal-level-type intelligence or learning and capabilities. And, in my talks I’ve been pretty vocal about this whole thing that both supervised learning and reinforcement learning are insufficient to emulate the kind of learning we observe in animals and humans. I have been doing this for something like seven or eight years. So, it’s not recent. I had a keynote at NeurIPS many years ago where I made that point, essentially, and various talks, there’s recordings. Now, why write a paper now? I’ve come to the point — [Google Brain researcher] Geoff Hinton had done something similar — I mean, certainly, him more than me, we see time running out. We’re not young.

ZDNET: Sixty is the new fifty.

YL: That’s true, but the point is, we see a lot of claims as to what should we do to push forward towards human-level of AI. And there are ideas which I think are misdirected. So, one idea is, Oh, we should just add symbolic reasoning on top of neural nets. And I don’t know how to do this. So, perhaps what I explained in the paper might be one approach that would do the same thing without explicit symbol manipulation. This is the the sort of traditionally Gary Marcuses of the world. Gary Marcus is not an AI person, by the way, he is a psychologist. He has never contributed anything to AI. He’s done really good work in experimental psychology but he’s never written a peer-reviewed paper on AI. So, there’s those people.

There is the [DeepMind principle research scientist] David Silvers of the world who say, you know, reward is enough, basically, it’s all about reinforcement learning, we just need to make it a little more efficient, okay? And, I think they’re not wrong, but I think the necessary steps towards making reinforcement learning more efficient, basically, would relegate reinforcement learning to sort of a cherry on the cake. And the main missing part is learning how the world works, mostly by observation without action. Reinforcement learning is very action-based, you learn things about the world by taking actions and seeing the results.

ZDNET: And it’s reward-focused.

YL: It’s reward-focused, and it’s action-focused as well. So, you have to act in the world to be able to learn something about the world. And the main claim I make in the paper about self-supervised learning is, most of the learning we do, we don’t do it by actually taking actions, we do it by observing. And it is very unorthodox, both for reinforcement learning people, particularly, but also for a lot of psychologists and cognitive scientists who think that, you know, action is — I’m not saying action is not essential, it is essential. But I think the bulk of what we learn is mostly about the structure of the world, and involves, of course, interaction and action and play, and things like that, but a lot of it is observational.

ZDNET: You will also manage to tick off the Transformer people, the language-first people, at the same time. How can you build this without language first? You may manage to tick off a lot of people.

YL: Yeah, I’m used to that. So, yeah, there’s the language-first people, who say, you know, intelligence is about language, the substrate of intelligence is language, blah, blah, blah. But that, kind-of, dismisses animal intelligence. You know, we’re not to the point where our intelligent machines have as much common sense as a cat. So, why don’t we start there? What is it that allows a cat to apprehend the surrounding world, do pretty smart things, and plan and stuff like that, and dogs even better?

Then there are all the people who say, Oh, intelligence is a social thing, right? We’re intelligent because we talk to each other and we exchange information, and blah, blah, blah. There’s all kinds of nonsocial species that never meet their parents that are very smart, like octopus or orangutans.I mean, they [orangutans] certainly are educated by their mother, but they’re not social animals.

But the other category of people that I might tick off is people who say scaling is enough. So, basically, we just use gigantic Transformers, we train them on multimodal data that involves, you know, video, text, blah, blah, blah. We, kind-of, petrifyeverything, and tokenize everything, and then train giganticmodels to make discrete predictions, basically, and somehow AI will emerge out of this. They’re not wrong, in the sense that that may be a component of a future intelligent system. But I think it’s missing essential pieces.

There’s another category of people I’m going to tick off with this paper. And it’s the probabilists, the religious probabilists. So, the people who think probability theory is the only framework that you can use to explain machine learning. And as I tried to explain in the piece, it’s basically too much to ask for a world model to be completely probabilistic. We don’t know how to do it. There’s the computational intractability. So I’m proposing to drop this entire idea. And of course, you know, this is an enormous pillar of not only machine learning, but all of statistics, which claims to be the normal formalism for machine learning.

The other thing —

ZDNET: You’re on a roll…

YL: — is what’s called generative models. So, the idea that you can learn to predict, and you can maybe learn a lot about the world by prediction. So, I give you a piece of video and I ask the system to predict what happens next in the video. And I may ask you to predict actual video frames with all the details. But what I argue about in the paper is that that’s actually too much to ask and too complicated. And this is something that I changed my mind about. Up until about two years ago, I used to be an advocate of what I call latent variable generative models, models that predict what’s going to happen next or the information that’s missing, possibly with the help of a latent variable, if the prediction cannot be deterministic. And I’ve given up on this. And the reason I’ve given up on this is based on empirical results, where people have tried to apply, sort-of, prediction or reconstruction-based training of the type that is used in BERTand large language models, they’ve tried to apply this to images, and it’s been a complete failure. And the reason it’s a complete failure is, again, because of the constraints of probabilistic models where it’s relatively easy to predict discrete tokens like words because we can compute the probability distribution over all words in the dictionary. That’s easy. But if we ask the system to produce the probability distribution over all possible video frames, we have no idea how to parameterize it, or we have some idea how to parameterize it, but we don’t know how to normalize it. It hits an intractable mathematical problem that we don’t know how to solve.

“We’re not to the point where our intelligent machines have as much common sense as a cat,” observes Lecun. “So, why don’t we start there? What is it that allows a cat to apprehend the surrounding world, do pretty smart things, and plan and stuff like that, and dogs even better?”

So, that’s why I say let’s abandon probability theory or the framework for things like that, the weaker one, energy-based models. I’ve been advocating for this, also, for decades, so this is not a recent thing. But at the same time, abandoning the idea of generative models because there are a lot of things in the world that are not understandable and not predictable. If you’re an engineer, you call it noise. If you’re a physicist, you call it heat. And if you are a machine learning person, you call it, you know, irrelevant details or whatever.

So, the example I used in the paper, or I’ve used in talks, is, you want a world-prediction system that would help in a self-driving car, right? It wants to be able to predict, in advance, the trajectories of all the other cars, what’s going to happen to other objects that might move, pedestrians, bicycles, a kid running after a soccer ball, things like that. So, all kinds of things about the world. But bordering the road, there might be trees, and there is wind today, so the leaves are moving in the wind, and behind the trees there is a pond, and there’s ripples in the pond. And those are, essentially, largely unpredictable phenomena. And, you don’t want your model to spend a significant amount of resources predicting those things that are both hard to predict and irrelevant. So that’s why I’m advocating for the joint embedding architecture, those things where the variable you’re trying to model, you’re not trying to predict it, you’re trying to model it, but it runs through an encoder, and that encoder can eliminate a lot of details about the input that are irrelevant or too complicated — basically, equivalent to noise.

ZDNET: We discussed earlier this year energy-based models, the JEPA and H-JEPA. My sense, if I understand you correctly, is you’re finding the point of low energy where these two predictions of X and Y embeddings are most similar, which means that if there’s a pigeon in a tree in one, and there’s something in the background of a scene, those may not be the essential points that make these embeddings close to one another.

YL: Right. So, the JEPA architecture actually tries to find a tradeoff, a compromise, between extracting representations that are maximally informative about the inputs but also predictable from each other with some level of accuracy or reliability. It finds a tradeoff. So, if it has the choice between spending a huge amount of resources including the details of the motion of the leaves, and then modeling the dynamics that will decide how the leaves are moving a second from now, or just dropping that on the floor by just basically running the Y variable through a predictor that eliminates all of those details, it will probably just eliminate it because it’s just too hard to model and to capture.

ZDNET: One thing that’s surprising is you had been a great proponent of saying “It works, we’ll figure out later the theory of thermodynamics to explain it.” Here you’ve taken an approach of, “I don’t know how we’re going to necessarily solve this, but I want to put forward some ideas to think about it,” and maybe even approaching a theory or a hypothesis, at least. That’s interesting because there are a lot of people spending a lot of money working on the car that can see the pedestrian regardless of whether the car has common sense. And I imagine some of those people will be, not ticked off, but they’ll say, “That’s fine, we don’t care if it doesn’t have common sense, we’ve built a simulation, the simulation is amazing, and we’re going to keep improving, we’re going to keep scaling the simulation.”

And so it’s interesting that you’re in a position to now say, let’s take a step back and think about what we’re doing. And the industry is saying we’re just going to scale, scale, scale, scale, because that crank really works. I mean, the semiconductor crank of GPUs really works.

YL: There’s, like, five questions there. So, I mean, scaling is necessary. I’m not criticizing the fact that we should scale. We should scale. Those neural nets get better as they get bigger. There’s no question we should scale. And the ones that will have some level of common sense will be big. There’s no way around that, I think. So scaling is good, it’s necessary, but not sufficient. That’s the point I’m making. It’s not just scaling. That’s the first point.

Second point, whether theory comes first and things like that. So, I think there are concepts that come first that, you have to take a step back and say, okay, we built this ladder, but we want to go to the moon and there’s no way this ladder is going to get us there. So, basically, what I’m writing here is, we need to build rockets. I can’t give you the details of how we build rockets, but here are the basic principles. And I’m not writing a theory for it or anything, but, it’s going to be a rocket, okay? Or a space elevator or whatever. We may not have all the details of all the technology. We’re trying to make some of those things work, like I’ve been working on JEPA. Joint embedding works really well for image recognition, but to use it to train a world model, there’s difficulties. We’re working on it, we hope we’re going to make it work soon, but we might encounter some obstacles there that we can’t surmount, possibly.

Then there is a key idea in the paper about reasoning where if we want systems to be able to plan, which you can think of as a simple form of reasoning, they need to have latent variables. In other words, things that are not computed by any neural net but things that are — whose value is inferred so as to minimize some objective function, some cost function. And then you can use this cost function to drive the behavior of the system. And this is not a new idea at all, right? This is very classical optimal control where the basis of this goes back to the late ’50s, early ’60s. So, not claiming any novelty here. But what I’m saying is that this type of inference has to be part of an intelligent system that’s capable of planning, and whose behavior can be specified or controlled not by a hardwired behavior, not by imitation leaning, but by an objective function that drives the behavior — doesn’t drive learning, necessarily, but it drives behavior. You know, we have that in our brain, and every animal has intrinsic cost or intrinsic motivations for things. That drives nine-month-old babies to want to stand up. The cost of being happy when you stand up, that term in the cost function is hardwired. But how you stand up is not, that’s learning.

“Scaling is good, it’s necessary, but not sufficient,” says LeCun of giant language models such as the Transformer-based programs of the GPT-3 variety. The Transformer devotées believe, “We tokenize everything, and train giganticmodels to make discrete predictions, and somehow AI will emerge out of this … but I think it’s missing essential pieces.”

ZDNET: Just to round out that point, much of the deep learning community seems fine going ahead with something that doesn’t have common sense. It seems like you’re making a pretty clear argument here that at some point it becomes an impasse. Some people say, We don’t need an autonomous car with common sense because scaling will do it. It sounds like you’re saying it’s not okay to just keep going along that path?

YL: You know, I think it’s entirely possible that we’ll have level-five autonomous cars without common sense. But the problem with this approach, this is going to be temporary, because you’re going to have to engineer the hell out of it. So, you know, map the entire world, hard-wire all kinds of specific corner-case behavior, collect enough data that you have all the, kind-of, strange situations you can encounter on the roads, blah, blah, blah. And my guess is that with enough investment and time, you can just engineer the hell out of it. But ultimately, there’s going to be a more satisfying and possibly better solution that involves systems that do a better job of understanding the way the world works, and has, you know, some level of what we would call common sense. It doesn’t need to be human-level common sense, but some type of knowledge that the system can acquire by watching, but not watching someone drive, just watching stuff moving around and understanding a lot about the world, building a foundation of background knowledge about how the world works, on top of which you can learn to drive.

Let me take a historical example of this. Classical computer vision was based on a lot of hardwired, engineered modules, on top of which you would have, kind-of, a thin layer of learning. So, the stuff that was beaten by AlexNet in 2012, had basically a first stage, kind-of, handcrafted feature extractions, like SIFTs [Scale-Invariant Feature Transform (SIFT), a classic vision technique to identify salient objects in an image] and HOG [Histogram of Oriented Gradients, another classic technique] and various other things. And then the second layer of, sort-of, middle-level features based on feature kernels and whatever, and some sort of unsupervised method. And then on top of this, you put a support vector machine, or else a relatively simple classifier. And that was, kind-of, the standard pipeline from the mid-2000s to 2012. And that was replaced by end-to-end convolutional nets, where you don’t hardwire any of this, you just have a lot of data, and you train the thing from end to end, which is the approach I had been advocating for a long time, but you know, until then, was not practical for large problems.

There’s been a similar story in speech recognition where, again, there was a huge amount of detailed engineering for how you pre-process the data, you extract mass-scale cepstrum [an inverse of the Fast Fourier Transform for signal processing], and then you have Hidden Markov Models, with sort-of, pre-set architecture, blah, blah, blah, with Mixture of Gaussians. And so, it’s a bit of the same architecture as vision where you have handcrafted front-end, and then a somewhat unsupervised, trained, middle layer, and then a supervised layer on top. And now that has been, basically, wiped out by end-to-end neural nets. So I’m sort of seeing something similar there of trying to learn everything, but you have to have the right prior, the right architecture, the right structure.

The self-driving car crowd, startups such as Waymo and Wayve, have been “a little too optimistic,” he says, by thinking they could “throw data at it, and you can learn pretty much anything.” Self-driving cars at Level 5 of ADAS are possible, “But you’re going to have to engineer the hell out of it” and the result will be “brittle” like early computer vision models.

ZDNET: What you’re saying is, some people will try to engineer what doesn’t currently work with deep learning for applicability, say, in industry, and they’re going to start to create something that’s the thing that became obsolete in computer vision?

YL: Right. And it’s partly why people working on autonomous driving have been a little too optimistic over the last few years, is because, you know, you have these, sort-of, generic things like convolutional nets and Transformers, that you can throw data at it, and it can learn pretty much anything. So, you say, Okay, I have the solution to that problem. The first thing you do is you build a demo where the car drives itself for a few minutes without hurting anyone. And then you realize there’s a lot of corner cases, and you try to plot the curve of how much better am I getting as I double the training set, and you realize you are never going to get there because there is all kinds of corner cases. And you need to have a car that will cause a fatal accident less than every 200 million kilometers, right? So, what do you do? Well, you walk in two directions.

The first direction is, how can I reduce the amount of data that is necessary for my system to learn? And that’s where self-supervised learning comes in. So, a lot of self-driving car outfits are interested very much in self-supervised learning because that’s a way of still using gigantic amounts of supervisory data for imitation learning, but getting better performance by pre-training, essentially. And it hasn’t quite panned out yet, but it will. And then there is the other option, which most of the companies that are more advanced at this point have adopted, which is, okay, we can do the end-to-end training, but there’s a lot of corner cases that we can’t handle, so we’re going to just engineer systems that will take care of those corner cases, and, basically, treat them as special cases, and hardwire the control, and then hardwire a lot of basic behavior to handle special situations. And if you have a large enough team of engineers, you might pull it off. But it will take a long time, and in the end, it will still be a little brittle, maybe reliable enough that you can deploy, but with some level of brittleness, which, with a more learning-based approach that might appear in the future, cars will not have because it might have some level of common sense and understanding about how the world works.

In the short term, the, sort-of, engineered approach will win — it already wins. That’s the Waymo and Cruise of the world and Wayveand whatever, that’s what they do. Then there is the self-supervised learning approach, which probably will help the engineered approach to make progress. But then, in the long run, which may be too long for those companies to wait for, would probably be, kind-of, a more integrated autonomous intelligent driving system.

ZDNET: We say beyond the investment horizon of most investors.

YL: That’s right. So, the question is, will people lose patience or run out of money before the performance reaches the desired level.

ZDNET: Is there anything interesting to say about why you chose some of the elements you chose in the model? Because you cite Kenneth Craik [1943,The Nature of Explanation], and you cite Bryson and Ho [1969, Applied optimal control], and I’m curious about why you started with these influences, if you believed especially that these people had it nailed it as far as what they had done. Why did you start there?

YL: Well, I don’t think, certainly, they had all the details nailed. So, Bryson and Ho, this is a book I read back in 1987 when I was a postdoc with Geoffrey Hinton in Toronto. But I knew about this line of work beforehand when I was writing my PhD, and made the connection between optimal control and backprop, essentially. If you really wanted to be, you know, another Schmidhuber, you would say that the real inventors of backprop were actually optimal control theorists Henry J. Kelley, Arthur Bryson, and perhaps even Lev Pontryagin, who is a Russian theorist of optimal control back in the late ’50s.

So, they figured it out, and in fact, you can actually see the root of this, the mathematics underneath that, is Lagrangian mechanics. So you can go back to Euler and Lagrange, in fact, and kind of find a whiff of this in their definition of Lagrangian classical mechanics, really. So, in the context of optimal control, what these guys were interested in was basically computing rocket trajectories. You know, this was the early space age. And if you have a model of the rocket, it tells you here is the state of the rocket at time t, and here is the action I’m going to take, so, thrust and actuators of various kinds, here is the state of the rocket at time t+1.

ZDNET: A state-action model, a value model.

YL: That’s right, the basis of control. So, now you can simulate the shooting of your rocket by imagining a sequence of commands, and then you have some cost function, which is the distance of the rocket to its target, a space station or whatever it is. And then by some sort of gradient descent, you can figure out, how can I update my sequence of action so that my rocket actually gets as close as possible to the target. And that has to come by back-propagating signals backwards in time. And that’s back-propagation, gradient back-propagation. Those signals, they’re called conjugate variables in Lagrangian mechanics, but in fact, they are gradients. So, they invented backprop, but they didn’t realize that this principle could be used to train a multi-stage system that can do pattern recognition or something like that. This was not really realized until maybe the late ’70s, early ’80s, and then was not actually implemented and made to work until the mid-’80s. Okay, so, this is where backprop really, kind-of, took off because people showed here’s a few lines of code that you can train a neural net, end to end, multilayer. And that lifts the limitations of the Perceptron. And, yeah, there’s connections with optimal control, but that’s okay.

ZDNET: So, that’s a long way of saying that these influences that you started out with were going back to backprop, and that was important as a starting point for you?

YL: Yeah, but I think what people forgot a little bit about, there was quite a bit of work on this, you know, back in the ’90s, or even the ’80s, including by people like Michael Jordan [MIT Dept. of Brain and Cognitive Sciences] and people like that who are not doing neural nets anymore, but the idea that you can use neural nets for control, and you can use classical ideas of optimal control. So, things like what’s called model-predictive control, what is now called model-predictive control, this idea that you can simulate or imagine the outcome of a sequence of actions if you have a good model of the system you’re trying to control and the environment it’s in. And then by gradient descent, essentially — this is not learning, this is inference — you can figure out what’s the best sequence of actions that will minimize my objective. So, the use of a cost function with a latent variable for inference is, I think, something that current crops of large-scale neural nets have forgotten about. But it was a very classical component of machine learning for a long time. So, every Bayesian Net or graphical model or probabilistic graphical model used this type of inference. You have a model that captures the dependencies between a bunch of variables, you are told the value of some of the variables, and then you have to infer the most likely value of the rest of the variables. That’s the basic principle of inference in graphical models and Bayesian Nets, and things like that. And I think that’s basically what reasoning should be about, reasoning and planning.

ZDNET: You’re a closet Bayesian.

YL: I am a non-probabilistic Bayesian. I made that joke before. I actually was at NeurIPS a few years ago, I think it was in 2018 or 2019, and I was caught on video by a Bayesian who asked me if I was a Bayesian, and I said, Yep, I am a Bayesian, but I’m a non-probabilistic Bayesian, sort-of, an energy-based Bayesian, if you want.

ZDNET: Which definitely sounds like something from Star Trek. You mentioned in the end of this paper, it’s going to take years of really hard work to realize what you envision. Tell me about what some of that work at the moment consists of.

YL: So, I explain how you train and build the JEPA in the paper. And the criterion I am advocating for is having some way of maximizing the information content that the representations that are extracted have about the input. And then the second one is minimizing the prediction error. And if you have a latent variable in the predictor which allows the predictor to be non deterministic, you have to regularize also this latent variable by minimizing its information content. So, you have two issues now, which is how you maximize the information content of the output of some neural net, and the other one is how do you minimize the information content of some latent variable? And if you don’t do those two things, the system will collapse. It will not learn anything interesting. It will give zero energy to everything, something like that, which is not a good model of dependency. It’s the collapse-prevention problem that I mention.

And I’m saying of all the things that people have ever done, there’s only two categories of methods to prevent collapse. One is contrastive methods, and the other one is those regularized methods. So, this idea of maximizing information content of the representations of the two inputs and minimizing the information content of the latent variable, that belongs to regularized methods. But a lot of the work in those joint embedding architectures are using contrastive methods. In fact, they’re probably the most popular at the moment. So, the question is exactly how do you measure information content in a way that you can optimize or minimize? And that’s where things become complicated because we don’t know actually how to measure information content. We can approximate it, we can upper-bound it, we can do things like that. But they don’t actually measure information content, which, actually, to some extent is not even well-defined.

ZDNET: It’s not Shannon’s Law? It’s not information theory? You’ve got a certain amount of entropy, good entropy and bad entropy, and the good entropy is a symbol system that works, bad entropy is noise. Isn’t it all solved by Shannon?

YL: You’re right, but there is a major flaw behind that. You’re right in the sense that if you have data coming at you and you can somehow quantize the data into discrete symbols, and then you measure the probability of each of those symbols, then the maximum amount of information carried by those symbols is the sum over the possible symbols of Pi log Pi, right? Where Pi is the probability of symbol i — that’s the Shannon entropy. [Shannon’s Law is commonly formulated as H = – ∑ pi log pi.]

Here is the problem, though: What is Pi? It’s easy when the number of symbols is small and the symbols are drawn independently. When there are many symbols, and dependencies, it’s very hard. So, if you have a sequence of bits and you assume the bits are independent of each other and the probability are equal between one and zero or whatever, then you can easily measure the entropy, no problem. But if the things that come to you are high-dimensional vectors, like, you know, video frames, or something like this, what is Pi? What is the distribution? First you have to quantize that space, which is a high-dimensional, continuous space. You have no idea how to quantize this properly. You can use k-means, etc. This is what people do when they do video compression and image compression. But it’s only an approximation. And then you have to make assumptions of independence. So, it’s clear that in a video, successive frames are not independent. There are dependencies, and that frame might depend on another frame you saw an hour ago, which was a picture of the same thing. So, you know, you cannot measure Pi. To measure Pi, you have to have a machine learning system that learns to predict. And so you are back to the previous problem. So, you can only approximate the measure of information, essentially.

“The question is exactly how do you measure information content in a way that you can optimize or minimize?” says LeCun. “And that’s where things become complicated because we don’t know actually how to measure information content.” The best that can be done so far is to find a proxy that is “good enough for the task that we want.”

Let me take a more concrete example. One of the algorithm that we’ve been playing with, and I’ve talked about in the piece, is this thing called VICReg, variance-invariance-covariance regularization. It’s in a separate paper that was published at ICLR, and it was put on arXiv about a year before, 2021. And the idea there is to maximize information. And the idea actually came out of an earlier paper by my group called Barlow Twins. You maximize the information content of a vector coming out of a neural net by, basically, assuming that the only dependency between variables is correlation, linear dependency. So, if you assume that the only dependency that is possible between pairs of variables, or between variables in your system, is correlations between pairs of valuables, which is the extremely rough approximation, then you can maximize the information content coming out of your system by making sure all the variables have non-zero variance — let’s say, variance one, it doesn’t matter what it is — and then back-correlating them, same process that’s called whitening, it’s not new either. The problem with this is that you can very well have extremely complex dependencies between either groups of variables or even just pairs of variables that are not linear dependencies, and they don’t show up in correlations. So, for example, if you have two variables, and all the points of those two variables line up in some sort of spiral, there’s a very strong dependency between those two variables, right? But in fact, if you compute the correlation between those two variables, they’re not correlated. So, here’s an example where the information content of these two variables is actually very small, it’s only one quantity because it’s your position in the spiral. They are de-correlated, so you think you have a lot of information coming out of those two variables when in fact you don’t, you only have, you know, you can predict one of the variables from the other, essentially. So, that shows that we only have very approximate ways to measure information content.

ZDNET: And so that’s one of the things that you’ve got to be working on now with this? This is the larger question of how do we know when we’re maximizing and minimizing information content?

YL: Or whether the proxy we’re using for this is good enough for the task that we want. In fact, we do this all the time in machine learning. The cost functions we minimize are never the ones that we actually want to minimize. So, for example, you want to do classification, okay? The cost function you want to minimize when you train a classifier is the number of mistakes the classifier is making. But that’s a non-differentiable, horrible cost function that you can’t minimize because, you know, you’re going to change the weights of your neural net, nothing is going to change until one of those samples flipped its decision, and then a jump in the error, positive or negative.

ZDNET: So you have a proxy which is an objective function that you can definitely say, we can definitely flow gradients of this thing.

YL: That’s right. So people use this cross-entropy loss, or SOFTMAX, you have several names for it, but it’s the same thing. And it basically is a smooth approximation of the number of errors that the system makes, where the smoothing is done by, basically, taking into account the score that the system gives to each of the categories.

ZDNET: Is there anything we haven’t covered that you would like to cover?

YL: It’s probably emphasizing the main points. I think AI systems need to be able to reason, and the process for this that I’m advocating is minimizing some objective with respect to some latent variable. That allows systems to plan and reason. I think we should abandon the probabilistic framework because it’s intractable when we want to do things like capture dependencies between high-dimensional, continuous variables. And I’m advocating to abandon generative models because the system will have to devote too many resources to predicting things that are too difficult to predict and maybe consume too much resources. And that’s pretty much it. That’s the main messages, if you want. And then the overall architecture. Then there are those speculations about the nature of consciousness and the role of the configurator, but this is really speculation.

ZDNET: We’ll get to that next time. I was going to ask you, how do you benchmark this thing? But I guess you’re a little further from benchmarking right now?

YL: Not necessarily that far in, sort-of, simplified versions. You can do what everybody does in control or reinforcement learning, which is, you train the thing to play Atari games or something like that or some other game that has some uncertainty in it.

ZDNET: Thanks for your time, Yann.

By Tiernan Ray – ZDnet

Image: Masahiro Sawada

The post Meta’s AI guru LeCun: Most of today’s AI approaches will never lead to true intelligence appeared first on Got help?.

How Much Longer Until Humanity Becomes A Hive Mind?

Dave Bowman — Fri, 02 Dec 2022 18:32:58 +0000

Last month, researchers created an electronic link between the brains of two rats separated by thousands of miles. This was just another reminder that technology will one day make us telepaths. But how far will this transformation go? And how long will it take before humans evolve into a fully-fledged hive mind? We spoke to the experts to find out.

I spoke to three different experts, all of whom have given this subject considerable thought: Kevin Warwick, a British scientist and professor of cybernetics at the University of Reading; Ramez Naam, an American futurist and author of NEXUS (a scifi novel addressing this topic); and Anders Sandberg, a Swedish neuroscientist from the Future of Humanity Institute at the University of Oxford.

They all told me that the possibility of a telepathic noosphere is very real — and it’s closer to reality than we might think. And not surprisingly, this would change the very fabric of the human condition.

Connecting brains

My first question to the group had to do with the technological requirements. How is it, exactly, that we’re going to connect our minds over the Internet, or some future manifestation of it?

“I really think we have sufficient hardware available now — tools like Braingate,” says Warwick. “But we have a lot to learn with regard to how much the brain can adapt, just how many implants would be required, and where they would need to be positioned.”

Naam agrees that we’re largely on our way. He says we already have the basics of sending some sorts of information in and out of the brain. In humans, we’ve done it with video, audio, and motor control. In principle, nothing prevents us from sending that data back and forth between people.

“Practically speaking, though, there are some big things we have to do,” he tells io9. “First, we have to increase the bandwidth. The most sophisticated systems we have right now use about 100 electrodes, while the brain has more than 100 billion neurons. If you want to get good fidelity on the stuff you’re beaming back and forth between people, you’re going to want to get on the order of millions of electrodes.”

Naam says we can build the electronics for that easily, but building it in such a way that the brain accepts it is a major challenge.

The second hurdle, he says, is going beyond sensory and motor control.

“If you want to beam speech between people, you can probably tap into that with some extensions of what we’ve already been doing, though it will certainly involve researchers specifically working on decoding that kind of data,” he says. “But if you want to go beyond sending speech and get into full blown sharing of experiences, emotions, memories, or even skills (a la The Matrix), then you’re wandering into unknown territory.”

Indeed, Sandberg says that picking up and translating brain signals will be a tricky matter.

“EEG sensors have lousy resolution — we get an average of millions of neurons, plus electrical noise from muscles and the surroundings,” he says. “Subvocalisation and detecting muscle twitches is easier to do, although they will still be fairly noisy. Internal brain electrodes exist and can get a lot of data from a small region, but this of course requires brain surgery. I am having great hopes for optogenetics and nanofibers for making kinder, gentler implants that are less risky to insert and easier on their tissue surroundings.”

The real problem, he says, is translating signals in a sensible way. “Your brain representation of the concept “mountain” is different from mine, the result not just of different experiences, but also on account of my different neurons. So, if I wanted to activate the mountain concept, I would need to activate a disperse, perhaps very complex network across your brain,” he tells io9. “That would require some translation that figured out that I wanted to suggest a mountain, and found which pattern is your mountain.”

Sandberg says we normally “cheat” by learning a convenient code called language, where all the mapping between the code and our neural activations is learned as we grow. We can, of course, learn new codes as adults, and this is rarely a problem — adults already master things like Morse code, SMS abbreviations, or subtle signs of gesture and style. Sandberg points to the recent experiments by Nicolelis connecting brains directly, research which shows that it might be possible to get rodents to learn neural codes. But he says this learning is cumbersome, and we should be able to come up with something simpler.

One way is to boost learning. Some research shows that amphetamine and presumably other learning stimulants can speed up language learning. Recent work on the Nogo Receptor suggests that brain plasticity can be turned on and off. “So maybe we can use this to learn quickly,” says Sandberg.

Another way is to have software do the translation. It is not hard to imagine machine learning to figure out what neural codes or mumbled keywords correspond to which signal — but setting up the training so that users find it acceptably fast is another matter.

“So my guess is that if pairs of people really wanted to ‘get to know each other’ and devoted a lot of time and effort, they could likely learn signals and build translation protocols that would allow a lot of ‘telepathic’ communication — but it would be very specific to them, like the ‘internal language’ some couples have,” says Sandberg. “For the weaker social links, where we do not want to spend months learning how to speak to each other, we would rely on automatically translated signals. A lot of it would be standard things like voice and text, but one could imagine adding supporting ‘subtitles’ showing graphics or activating some neural assemblies.”

Bridging the gap

In terms of the communications backbone, Sandberg believes it’s largely in place, but it will likely have to be extended much further.

“The theoretical bandwidth limitations of even a wireless Internet are far, far beyond the bandwidth limitations of our brains — tens of terabits per second,” he told me, “and there are orbital angular momentum methods that might get far more.”

Take the corpus callosum, for example. It has around 250 million axons, and even at the maximal neural firing rate of just 25 gigabits, that should be enough to keep the hemispheres connected such that we feel we are a single mind.

As for the interface, Warwick says we should stick to implanted multi-electrode arrays. These may someday become wireless, but they’ll have to remain wired until we learn more about the process. Like Sandberg, he adds that we’ll also need to develop adaptive software interfacing.

Naam envisions something laced throughout the brain, coupled with some device that could be worn on the person’s body.

“For the first part, you can imagine a mesh of nano-scale sensors either inserted through a tiny hole in the skull, or somehow through the brain’s blood vessels. In Nexus I imagined a variant on this — tiny nano-particles that are small enough that they can be swallowed and will then cross the blood-brain barrier and find their way to neurons in the brain.”

Realistically, Naam says that whatever we insert in the brain is going to be pretty low energy consumption. The implant, or mesh, or nano-particles could communicate wirelessly, but to boost their signal — and to provide them power — scientists will have to pair them with something the person wears, like a cap, a pair of glasses, a headband — anything that can be worn very near the brain so it can pick up those weak signals and boost them, including signals from the outside world that will be channeled into the brain.

How soon before the hive mind?

Warwick believes that the technologies required to build an early version of the telepathic noosphere are largely in place. All that’s required, he says, is “money on the table” and the proper ethical approval.

Sandberg concurs, saying that we’re already doing it with cellphones. He points to the work of Charles Stross, who suggests that the next generation will never have to be alone, get lost, or forget anything.

“As soon as people have persistent wearable systems that can pick up their speech, I think we can do a crude version,” says Sandberg. “Having a system that’s on all the time will allow us to get a lot of data — and it better be unobtrusive. I would not be surprised to see experiments with Google Glasses before the end of the year, but we’ll probably end up saying it’s just a fancy way of using cellphones.”

At the same time, Sandberg suspects that “real” neural interfacing will take a while, since it needs to be safe, convenient, and have a killer app worth doing. It will also have to compete with existing communications systems and their apps.

Similarly, Naam says we could build a telepathic network in a few years, but with “very, very, low fidelity.” But that low fidelity, he says, would be considerably worse than the quality we get by using phones — or even text or IM. “I doubt anyone who’s currently healthy would want to use it.”

But for a really stable, high bandwidth system in and out of the brain, that could take upwards of 15 to 20 years, which Naam concedes is optimistic.

“In any case, it’s not a huge priority,” he says. “And it’s not one where we’re willing to cut corners today. It’s firmly in the medical sphere, and the first rule there is ‘do no harm’. That means that science is done extremely cautiously, with the priority overwhelmingly — and appropriately — being not to harm the human subject.”

Nearly supernatural

I asked Sandberg how the telepathic noosphere will disrupt the various way humans engage in work and social relations.

“Any enhancement of communication ability is a big deal,” he responded. “We humans are dominant because we are so good at communication and coordination, and any improvement would likely boost that. Just consider flash mobs or how online ARG communities do things that seem nearly supernatural.”

Cell phones, he says, made our schedules flexible in time and space, allowing us to coordinate where to meet on the fly. He says we’re also adding various non-human services like apps and Siri-like agents. “Our communications systems are allowing us to interact not just with each other but with various artificial agents,” he says. Messages can be stored, translated and integrated with other messages.

“If we become telepathic, it means we will have ways of doing the same with concepts, ideas and sensory signals,” says Sandberg. “It is hard to predict just what this will be used for since there are so few limitations. But just consider the possibility of getting instruction and skills via augmented reality and well designed sensory/motor interfaces. A team might help a member perform actions while ‘looking over her shoulder’, as if she knew all they knew. And if the system is general enough, it means that you could in principle get help from any skilled person anywhere in the world.”

In response to the same question, Naam noted that communication boosts can accelerate technical innovation, but more importantly, they can also accelerate the spread of any kind of idea. “And that can be hugely disruptive,” he says.

But in terms of the possibilities, Naam says the sky’s the limit.

“With all of those components, you can imagine people doing all sorts of things with such an interface. You could play games together. You could enter virtual worlds together,” he says. “Designers or architects or artists could imagine designs and share them mentally with others. You could work together on any type of project where you can see or hear what you’re doing. And of course, sex has driven a lot of information technologies forward — with sight, sound, touch, and motor control, you could imagine new forms of virtual sex or virtual pornography.”

Warwick imagines communication in the broadest sense, including the technically-enabled telepathic transmission of feelings, thoughts, ideas, and emotions. “I also think this communication will be far richer when compared to the present pathetic way in which humans communicate.” He suspects that visual information may eventually be possible, but that will take some time to develop. He even imagines the sharing of memories. That may be possible, he says, “but maybe not in my lifetime.”

Put all this together, says Warwick, and “the body becomes redundant.” Moreover, when connected in this way “we will be able to understand each other much more.”

A double-edged sword

We also talked about the potential risks.

“There’s the risk of bugs in hardware or software,” says Naam. “There’s the risk of malware or viruses that infect this. There’s the risk of hackers being able to break into the implants in your head. We’ve already seen hackers demonstrate that they can remotely take over pacemakers and insulin pumps. The same risks exist here.”

But the big societal risk, says Naam, stems entirely from the question of who controls this technology.

“That’s the central question I ask in Nexus,” he says. “If we all have brain implants, you can imagine it driving a very bottom’s up world — another Renaissance, a world where people are free and creating and sharing more new ideas all the time. Or you can imagine it driving a world like that of 1984, where central authorities are the ones in control, and they’re the ones using these direct brain technologies to monitor people, to keep people in line, or even to manipulate people into being who they’re supposed to be. That’s what keeps me up at night.”

Warwick, on the other hand, told me that the “biggest risk is that some idiot — probably a politician or business person — may stop it from going ahead.” He suspects it will lead to a digital divide between those who have and those who do not, but that it’s a natural progression very much in line with evolution to date.

In response to the question of privacy, Sandberg quipped, “Privacy? What privacy?”

Our lives, he says, will reside in the cloud, and on servers owned by various companies that also sell results from them to other organizations.

“Even if you do not use telepathy-like systems, your behaviour and knowledge can likely be inferred from the rich data everybody else provides,” he says. “And the potential for manipulation, surveillance and propaganda are endless.”

Our cloud exoselves

Without a doubt, the telepathic noosphere will alter the human condition in ways we cannot even begin to imagine. The Noosphere will be an extension of our minds. And as David Chalmers and Andy Clark have noted, we should still regard external mental processes as being genuine even though they’re technically happening outside our skulls. Consequently, as Sandberg told me, our devices and “cloud exoselves” will truly be extensions of our minds.

“Potentially very enhancing extensions,” he says, “although unlikely to have much volition of their own.”

Sandberg argues that we shouldn’t want our exoselves to be too independent, since they’re likely to make mistakes in our name. “We will always want to have veto power, a bit like how the conscious level of our minds has veto on motor actions being planned,” he says.

Veto power over our cloud exoselves? The future will be a very strange place, indeed.

By George Dvorsky’s – Gizmodo

Image: Pollen DeFI

The post How Much Longer Until Humanity Becomes A Hive Mind? appeared first on Got help?.

“I see future Hal becoming a version

alexheikel — Thu, 14 Oct 2021 14:20:17 +0000

of what the internet used to feel like back in the early 2000s, before SEO ruined everything. Future Hal might feel and operate like a polished, condensed version of the old forums and chat rooms of Internet’s past. Hal’s platform provides the safety of anonymity with the comfort of human connection, a combination that is hard to find these days. Hal’s future in 5 years seems pretty bright.“
Shaina P. – Los Angeles

The post “I see future Hal becoming a version appeared first on Got help?.

“A changed name for the A.I. and

alexheikel — Tue, 25 May 2021 00:37:00 +0000

Warning: Trying to access array offset on value of type null in /home2/gethal/hello.gethal.com/wp-content/themes/polite/templatesell/filters/excerpt.php on line 29

a developed consciousness.” A. E. – Los Angeles

The post “A changed name for the A.I. and appeared first on Got help?.

“I can imagine Hal expanding into bigger

alexheikel — Mon, 10 May 2021 00:49:00 +0000

Warning: Trying to access array offset on value of type null in /home2/gethal/hello.gethal.com/wp-content/themes/polite/templatesell/filters/excerpt.php on line 29

heights . We can give Siri her termination papers and introduce Hal.” Destiny – Mishawaka

The post “I can imagine Hal expanding into bigger appeared first on Got help?.

“I think this kind of service will

alexheikel — Sun, 04 Apr 2021 14:30:35 +0000

Warning: Trying to access array offset on value of type null in /home2/gethal/hello.gethal.com/wp-content/themes/polite/templatesell/filters/excerpt.php on line 29

make AI as it exists now obsolete. It’s an affordable service that is immeasurably more helpful than smartphone AI, or really any AI based service, for the simple reason that it’s a real person assisting the consumer and not a semiliterate robot.” Alfred I. – New York

The post “I think this kind of service will appeared first on Got help?.

“Being with you has made me a

alexheikel — Sun, 14 Mar 2021 14:28:19 +0000

Warning: Trying to access array offset on value of type null in /home2/gethal/hello.gethal.com/wp-content/themes/polite/templatesell/filters/excerpt.php on line 29

better person.” Mariana T. – Miami

The post “Being with you has made me a appeared first on Got help?.

“When I heard of Hal I immediately

alexheikel — Sun, 14 Feb 2021 14:24:15 +0000

Warning: Trying to access array offset on value of type null in /home2/gethal/hello.gethal.com/wp-content/themes/polite/templatesell/filters/excerpt.php on line 29

thought that it was a brilliant idea for “hero’s” to be helping the community. I would also like to be a Hal because I am currently saving up for college and Hal would be a great way to make a bit of money to add on to that on the

The post “When I heard of Hal I immediately appeared first on Got help?.