Software & Apps

2024 Turn Award

ACM is called Andrew G. Varent and Richard S. Sutton As the recipients of the 2024 ACM ARF Award for the development of concept and algorithmic foundation of learning strengthening. In a series of papers beginning in the 1980s, Barto and Sutton indicate major ideas, builds mathematical fields for making known ways for creating intelligent systems.

Barto is the professor of emeritus of information and science of the computer at the University of Massachusetts, Amherst. Sutton is a professor of computer science at the University of Alberta, a scientist of research on amoi intelligence technologies (Alberta Machinence Institute).

The ACM An Award Award, which is often called “Nobel Prize computation,” brings a $ 1 million prize given to Google, Inc. British Mathematician announcing mathematical mathematics on the computer.

What is learning to reinforce?

The artificial intelligence field (AI) is often concerned with the establishment of agents – that is, entities to see and act. Many intelligent agents are those who choose better courses of action. Therefore, the idea that some course of action is better than others center on AI. Reward – a term borrowed from psychology and neuroscience – means a signal given to an agent associated with its quality of behavior. Learning to reinforce (RL) is the learning process that works more successfully to this signal.

The idea of ​​learning from the reward is announced by the animals trainers in thousands of years. Later, the 1950 role of computing and intelligence in Alan Rict “,” answered the question “Can machines think?“And suggests a mechanical learning method based on rewards and punishments.

While representing a few initial experiments in this method and Arthur Samuel develops the launcher’s playing program in the late development of this AI in the next decades. In the early 1980s, observations from Psychology, Barto and his student PhD Sutton began to form the formation of strengthening as a general problem.

They carry the mathematical foundation given to Markov’s decision processes (MDPS), which an agent with decisions in a reward area. While the standard MDP theory believes that all about MDP is known by the agent, the RL framework is allowed to the environment and the rewards that are unknown. Small RL information requirements, mixed with the MDP framework, allowing rl algorithms available to many problems, as explained below.

Barto and Sutton, jointed and so on, most of the basic algorithmic methods for RL are developed for RL. These include their first contribution, temporal difference in learning, which makes a significant improvement in the prize of the reward and the use of Greaty’s networks to repair as well as the tool that represents learned functions. They also recommend agent designs that combine learning and planning, showing the amount of environmental knowledge as a basis of planning.

Perhaps the same influence is their book, Learning Learning: An Introduction (1998), which is still the standard reference in the field and cited for more than 75,000 times. It allows thousands of researchers to understand and contribute to the emerging field and continue to encourage many important research activities in today’s computer science.

Although the algorithms of Barto and Sutton have been developed decades, major developments in the practical applications have occurred in the past fifteen years by uniting algorithms in union with the past fifteen years by combining unite algorithms by fifteen years by combining algorithms in the past fifteen years by combining algorithms in the united algorithms in the mental algorithms This is the technique of deep learning reinforcement.

The most famous example of RL is the victory through the Alphugo computer program of the best person WALKING Players in 2016 and 2017 ChatGPT is a large language model (LLM) trained in two rounds, the second use a technique called Feforcement learning from man’s feedback (RLHF), to obtain human expectations.

RL has achieved success in many other places. An example of high profile research is the robot skill in the Motor Learning of In-Hand Robotic Manipulation and solution to a physical learning to simulating all the more successful in different real worlds.

Other places include control network control, chip design, internet advertising, optimizing algorithms in one of the most ancient computer computer problems.

Finally, a technology that is a part inspired by neuroscience returns the favor. The new research, including Barto’s job, showing that the specific rl algorithms made by AI give the best explanations about a wide found part of the dopamine system of the human brain.

“Barto and Sutton’s job showed the amount of potential to apply a multidisciplinary approach to long-term challenges in our fields,” acm president Yannis Ioannidis. “Research areas from cognitive science and psychology of the neuroscife encourage the development of the AI’s knowledge of how much that we have been moving from. Learning to build up and offers a lot of potential for the increased development of computers and many other disciplines. We deserve to honor them with the fastest award in our field. “

“In a 1947 lesson, Alan Bucing said ‘All we want is a machine to learn from experience‘”Jeff Dean, Senior Vice President, Google, is recognized.” Learning to strengthen, as Barto and Sutton pioneered, directly responds to the challenge of transit. Their job becomes a lynchpin in AI progress over the past several decades. The tools they have developed remain a CENorth columns at AI Boom and gave great progress, attracted to the legions of young researchers, and operated billions of dollars in investment. The RL effect will continue in the future. Google is proud to support ACM Am Prompt Award and honor individuals who make technologies develop our lives. “

Review News | Print PDF


https://awards.acm.org/binaries/content/gallery/acm/ctas/awards/turing-2024-barto-sutton.jpg

2025-03-05 13:03:00

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button