Guest ozgurguler Posted August 26 Posted August 26 Large Language Models (LLMs) have demonstrated exceptional abilities in performing natural language tasks that involve complex reasoning. As a result, these models have evolved to function as agents capable of planning, strategising, and solving complex problems. However, challenges persist when it comes to making decisions under uncertainty, where outcomes are not deterministic, or when adaptive decision-making is required in changing environments, especially in multi-step scenarios where each step influences the next. We need more advanced capabilities… This is where GPT-4’s advanced reasoning capabilities and Language Agent Tree Search (LATS) come together to address these challenges. LATS incorporates a dynamic, tree-based search methodology that enhances the reasoning capabilities of GPT-4O. By integrating Monte Carlo Tree Search (MCTS) with LLMs, LATS unifies reasoning, acting, and planning, creating a more deliberate and adaptive problem-solving framework. This powerful combination allows for improved decision-making and more robust handling of complex tasks, setting a new standard in the deployment of language models as autonomous agents. [HEADING=1]Is search the missing piece in GenAI problem solving?[/HEADING] [ATTACH type=full" alt="ozgurguler_0-1724696475802.png]63895[/ATTACH] Computational problem solving can be broadly defined as “search through a combinatorial problem space”, represented as a tree. Depth-First Search (DFS)and Breadth-First Search (BFS) are fundamental methods for exploring such solution spaces. A notable example of the power of deep search is AlphaGo’s “Move 37,” which showcased how innovative, human-surpassing solutions can emerge from extensive exploration. Unlike traditional methods that follow predefined paths, LLMs can dynamically generate new branches within the solution space by predicting potential outcomes, strategies, or actions based on context. This capability allows LLMs to not only navigate but also expand the problem space, making them exceptionally powerful in situations where the problem structure is not fully known, is continuously evolving, or is highly complex. [HEADING=1]Inference-time Reasoning with Meta Generation Algorithms (MGA)[/HEADING] [ATTACH type=full" alt="ozgurguler_1-1724696536367.png]63896[/ATTACH] midjourney — mayan puzzle needs to be resolved Scaling compute during training is widely recognised for its ability to improve model performance. The benefits of scaling compute during inference remain under-explored. MGA’s offer a novel approach by amplifying computational resources during inference… Unlike traditional token-level generation methods, meta-generation algorithms employ higher-order control structures such as planning, loops with multiple model calls, self-reflection, task decomposition, and dynamic conditioning. These mechanisms allow the model to execute tasks end-to-end, mimicking higher-level cognitive processes often referred to as Systems-2 thinking. [ATTACH type=full" alt="ozgurguler_2-1724696554768.png]63897[/ATTACH] Therefore one-way meta generation algorithms may enhance LLM reasoning by integrating search into the generation process. During inference, MGA’s dynamically explore a broader solution space, allowing the model to reason through potential outcomes and adapt strategies in real-time. By generating multiple paths and evaluating their viability, meta generation algorithms enable LLMs to simulate deeper, more complex reasoning akin to traditional search methods. This approach not only expands the model’s ability to generate novel insights but also improves decision-making in scenarios with incomplete or evolving information. Techniques like Tree of Thoughts (ToT), and Graph of Thought (GoT) are employed to navigate combinatorial solution spaces efficiently. ToT (2*) enables hierarchical decision-making by structuring potential outcomes as tree branches, facilitating exploration of multiple paths. GoT (6*)maps complex relationships between ideas, allowing the model to dynamically adjust and optimize its reasoning path. CoT (5*) provides step-by-step reasoning that links sequential thoughts, improving the coherence and depth of the generation. [HEADING=1]Why is LATS / MCTS better ?[/HEADING] [ATTACH type=full" alt="ozgurguler_3-1724696641091.png]63898[/ATTACH] LATS relies on MCTS In the Tree of Thoughts (ToT) approach, a tree structure is used to represent different decision paths. Traditional methods like Depth-First Search (DFS) or Breadth-First Search (BFS) can navigate this tree, but they are computationally expensive because they explore each possible path systematically. Monte Carlo Tree Search (MCTS) is an improvement on this by simulating different outcomes for actions and updating the tree based on these simulations. It uses a “selection” process where it picks decision nodes using a strategy that balances exploration (trying new paths) and exploitation (choosing known good paths). This is guided by a formula called Upper Confidence Bound (UCB). The UCB formula has two key parts: Exploration Term: This represents the potential reward of choosing a node and is calculated through simulations. Exploitation Term: This decreases the deeper you go into a certain path, meaning that if a path is over-explored, the algorithm may shift to a less-explored path even if it seems less promising initially. By selecting nodes using UCB, simulating outcomes with language models (LLMs), and backpropagating the rewards up the tree, MCTS effectively balances between exploring new strategies and exploiting known successful ones. The second part of the UCB formula is the ‘exploitation term,’ which decreases as you explore deeper into a specific path. This decrease may lead the selection algorithm to switch to another path in the decision tree, even if that path has a lower immediate reward, because the exploitation term remains higher when that path is less explored. Node selection with UCB, reward calculations with LLM simulations and backpropagation are the essence of MCTS. [HEADING=1]An Implementation — Financial Decision Making…[/HEADING] Let’s say we want to use LLM’s for investment management. We feed the LLM the macro landscape and gives us three investment strategy options… [ATTACH type=full" alt="ozgurguler_4-1724696668948.png]63899[/ATTACH] [HEADING=1]Iteration 1:[/HEADING] Selection: We start at the root, and since this is the first iteration, we select all initial strategies (A, B, and C) and simulate their outcomes. Simulation & Backpropagation: Next LLM “simulates” each strategy based on the context it has and assigns the following “rewards” — investment returns — to each “node”. Strategy A: $5,000 Strategy B: $7,000 Strategy C: $4,000 3. Expansion: Based on the selection, Strategy B has the highest UCB1 value (since all nodes are at the same depth), so we expand only Strategy B by simulating its child nodes. [ATTACH type=full" alt="ozgurguler_5-1724696668948.png]63900[/ATTACH] First Expansion of the tree [HEADING=1]Iteration 2:[/HEADING] Selection: Since B1 & B2 strategies are not simulated, there is a tie in terms of their UCB scores and both nodes will be simulated. Simulate Both Nodes: Simulate B1: LLM predicts a return of $8,500 for B1. Simulate B2: LLM predicts a return of $7,500 for B2. [ATTACH type=full" alt="ozgurguler_6-1724696668952.png]63901[/ATTACH] Expand node B 3. Backpropagation: After each simulation results of the simulation are backpropagated up the tree, updating the values of the parent nodes. This step ensures that the impact of the new information is reflected throughout the tree. Updating Strategy B’s Value: Strategy B now needs to reflect the outcomes of B1 and B2. One common approach is to average the rewards of B1 and B2 to update Strategy B’s value. Now, Strategy B has an updated value of $8,000based on the outcomes of its child nodes. 4. Recalculate UCB Scores: After backpropagation, the UCB scores for all nodes in the tree are recalculated. This recalculation uses the updated values (average rewards) and visit counts, ensuring that each node’s UCB1 score accurately reflects both its potential reward and how much it has been explored. UCB(s) = Reward of the node + (exploitation term) 5. Next Selection & simulation: B1 is selected for further expansion (as it has a higher reward) into child nodes: B1a: “Invest in AI companies” B1b: “Invest in green tech” [ATTACH type=full" alt="ozgurguler_7-1724696669009.png]63902[/ATTACH] 6. Backpropagation [ATTACH type=full" alt="ozgurguler_8-1724696669044.png]63903[/ATTACH] 7.UCB1 Calculation Next UCB values of all nodes are recalculated. Assume that due to the decaying exploration factor, B2 now has a higher UCB1 score than both B1aand B1b. This could occur if B1 has been extensively explored, reducing the exploration term for its children. Instead of continuing to expand B1’s children, the algorithm shifts back to explore B2, which has become more attractive due to its unexplored potential. [ATTACH type=full" alt="ozgurguler_9-1724696668970.png]63904[/ATTACH] This example illustrates how MCTS can dynamically adjust its search path based on new information, ensuring that the algorithm remains efficient and focused on the most promising strategies as it progresses. [HEADING=1] [/HEADING] [HEADING=1]An Implementation with Azure OpenAI GPT4-O[/HEADING] Ok, next we will build a “financial advisor” using AzureOpenAI GPT4-o model, implementing LATS. (Refer to the Github repo here.) (For an accurate analysis I am using the IMF World Economic Outlook report from July, 24 as my LLM context for simulations i.e. for generating child nodes and for assigning rewards to decision nodes …) The code leverages the [iCODE]graphviz[/iCODE] library to visually represent the decision tree generated during the execution of the investment strategy simulations. Below are snippets from parts of the resultant graph generated by a sample iteration. Here is how the code runs - ... Decision tree is too wide and cannot be fit into a single picture hence I have added snippets as to how the tree looks below. You can find a sample decision tree in the github repo here… [ATTACH type=full" alt="ozgurguler_10-1724696739409.png]63905[/ATTACH] Sample run of the MCTS code to find the best investment strategy in current macroeconomic climate [ATTACH type=full" alt="ozgurguler_11-1724696739386.png]63906[/ATTACH] Below is the optimsal strategy inferred by LATS… Optimal Strategy Summary: The optimal investment strategy is structured around several key steps influenced by the IMF report. Here’s a concise summary of each step and its significance: 1. **Diversification Across Geographies and Sectors:** — **Geographic Diversification:** This involves spreading investments across regions to mitigate risk and tap into different growth potentials. Advanced economies like the U.S. remain essential due to their robust consumer spending and resilient labor market, but the portfolio should include cautious weighting to manage risks. Simultaneously, emerging markets in Asia, such as India and Vietnam, are highlighted for their higher growth potential, providing opportunities for higher returns. — **Sector Diversification:** Incorporating investments in sectors like green energy and sustainability reflects the growing global emphasis on renewable energy and environmentally friendly technologies. This also aligns with regulatory changes and consumer preferences, creating future growth opportunities. 2. **Green Energy and Sustainability:** — Investing in green energy demonstrates foresight into the global shift toward reducing carbon footprints and reliance on fossil fuels. This is significant due to increased governmental supports, such as subsidies and policy incentives, which are likely to propel growth within this sector. 3. **Fintech and E-Commerce:** — Allocating capital towards fintech and e-commerce companies capitalizes on the digital transformation accelerated by the global shift towards digital platforms. This sector is expected to grow due to increased adoption of online services and digital payment systems, thus presenting promising investment opportunities. [HEADING=1]Conclusion:[/HEADING] By integrating LATS, we harness the reasoning capabilities of LLMs to simulate and evaluate potential strategies dynamically. This combination allows for the construction of decision trees that not only represent the logical progression of decisions but also adapt to changing contexts and insights, provided by the LLM through simulations and reflections. [HEADING=1]References:[/HEADING] Language Agent Tree Search: Unifying Reasoning, Acting, and Planning in Language Models by Zhou et al Tree of Thoughts: Deliberate Problem Solving with Large Language Models by Yao et al The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey by Tula Masterman, Mason Sawtell, Sandi Besen, and Alex Chao From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models” by Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf*, Alex Xie, Graham Neubig, Ilia Kulikov, and Zaid Harchaoui. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models by Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou Graph of Thoughts: Solving Elaborate Problems with Large Language Modelsby Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michał Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models” by Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, and Zaid Harchaoui. Hope you enjoyed the content. Let me know any comments and please promote the content if you found it useful... Subscribe to my newsletter on linkedin to keep upto date in GenAI space...Use this link... Continue reading... Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.