AI: enhancing customer experience in a complex 5G world
Reinforcement learning enables a network to continuously learn from observations and experiences, maintaining an optimized customer experience in a dynamic environment, as validated in two live networks.
Children soon learn that certain behaviors earn rewards, and these rewards inform their future behavior. This is the basis of RL. Rather than following manually programmed behaviors, AI agents instead focus on goal states, enabling them to learn and even optimize complex processes entirely autonomously. Testing and learning behaviors with digital twins takes the risk out of this approach.
AI applied in telecoms
The expanding scope of 5G applications puts numerous demands on networks, such as high availability, ultra-reliability, low latency and high security. This growing complexity is driving a need for more automation. Intelligent agents capable of handling complex processes are needed to optimize trade-offs between long-term benefits of the agent’s behavior and short-term benefits from the immediate steps to be taken; for example, how to optimize a network in multiple steps. These processes need to be learned autonomously, without the intervention of a human domain expert. RL is the specialized area of machine learning that is well suited for this challenge.
RL delivers long-term rewards in dynamic environments
RL techniques mirror behavioral psychology. The agent accumulates knowledge about the dynamics of the environment – the mobile network – through different interactions that may result in positive or negative outcomes depending on how technically sound they are.
To train the system, a software agent interacts with the environment by repeatedly observing its state and then – based on the knowledge available to the agent at each stage – taking actions that are meant to maximize a long-term reward, that is, the improved situation based on defined criteria. In each iteration, the agent will learn from the outcome of the suggested actions and will become increasingly “wiser”. At the beginning of the process, exploration of the environment will naturally be highly erratic, and then gradually become more focused and precise as the iterations proceed and knowledge about the environment’s dynamics is improved.
At the end of the training phase, the agent should contain enough knowledge to facilitate a decision for each possible state of the environment. Later, when applying the agent to a specific network, the RL system will continue learning and a configurable degree of exploration can be carried out at the same time. This technique has been applied in many different fields, from video games to chess and self-driving cars.
In mobile network optimization, most existing solutions are based on rules defined by highly skilled domain experts who need to translate that knowledge into the proper automation frameworks. These rules are typically static and universal for all networks. The complexity of 5G makes it very challenging to manually devise rule modifications that benefit a specific network case. On the other hand, an RL agent can be pre-trained with general knowledge and then continue to learn in production, allowing an optimal policy for each specific scenario.
Figure 25: Live networks using simulators and emulators as digital twins
Digital twins enabling rewards from first implementation
Digital twins are a suitable solution to avoid the effects of erratic initial explorations on live mobile networks. Exploration is performed on an external entity that mimics the behavior of the live network. Once the agent has acquired all the necessary knowledge from the digital twin, the achieved policy can be safely applied to the live network. From that moment onwards, the agent will decide optimal actions on the live network, while continuing to learn from its feedback and also allowing a configurable degree of controlled exploration.
Typically, two types of digital twins can be considered for initial offline learning: emulators and simulators, as shown in Figure 25. An emulator contains a partial replica of the live network, providing accurate results but requiring big data techniques for efficient operations. A simulator is a software program that models the behavior of a network based on a set of hypothetical scenarios. In many cases, simulators are suitable to capture general trade-offs and trends.
Network-wide, coordinated approach to individual, cell-based optimization
Certain network parameters are configured on a per-cell level but might have a strong impact on the performance of surrounding cells, for example antenna electrical tilt and downlink transmission power. A change in any of these parameters also affects users served by the surrounding cells. Finding the optimal configuration for these types of parameters is a complex exercise.
This problem can be circumvented by means of a local per-cell reward definition that, when assessing the consequences of carrying out a change in one cell, also considers the impact of that change in its closest neighboring cells. This ensures implicit coordination and an operations strategy in which the agents will aim at improving not only each cell individually, but also the network as a whole.
These concepts have been successfully validated in two different live networks: remote electrical tilt (RET) optimization in MásMóvil and downlink transmission power optimization in Swisscom.
MásMóvil: improving customer experience during peak hours
MásMóvil wanted to improve congestion and downlink throughput during busy hours in Malaga, Spain. In this area they have one of the highest per-cell RET support rates. Antennas fitted with RET permit tilting adjustments via remote software commands instead of site visits, ideally suited for innovation towards the vision of zero-touch network optimization.
The RET optimization approach consisted of two phases:
- An initial pre-training phase during which the agent acquired all relevant knowledge from a digital twin, which was a network simulator.
- An online optimization phase, which was an iterative process, in which the pre-trained agent was fed with network performance measurements and applied incremental changes to the cells in the live network, while it continued learning from the resulting rewards.
The trial area consisted of several carriers at different frequency bands, with independent RET devices, making it possible to adjust the antenna tilt in one cell while keeping the tilts of all other co-located cells unaltered. In total, 127 out of 267 4G cells were selected for RET optimization in one carrier of the 1,800MHz band. The remaining cells were also monitored for benchmarking and to build the rewards of their surrounding optimizable neighbors.
During five weeks of automated decisions, the algorithm carried out eight parameter-changing iterations in total. Figure 26 illustrates how the improvements for two of the KPIs, contributing to the reward, were realized during the five weeks in the real network, with the lines representing network parameter changes. The overall outcome achieved a congestion rate close to zero and the downlink user throughput was increased by 12 percent during the busy hours, while keeping similar traffic volume. All this was achieved without any expert human intervention in the decisions or manual filtering before applying the changes.
Figure 26: Evolution of main reward components (percentage)
Figure 27: Power and RET changes in the Ticino area of Switzerland
Swisscom: meeting strict regulations without compromising customer experience
Switzerland has stringent regulations for effective radiated power (ERP) from mobile networks. For Swisscom, one challenge was to lower power emissions within the existing low-band layer to create headroom to allow deployment of a new low-band layer which will be used by both 4G and 5G New Radio (NR). To begin with, the new low-band layer was unable to match the coverage of the existing one due to a lack of available power. A method based on RL using a network emulator as a digital twin was used to reduce ERP in the 4G network as much as possible while keeping the coverage and quality levels, followed by RET optimization using the simulator-based approach.
A trial to optimize both downlink transmission power and RET was executed in the Ticino area of Switzerland. The studied cluster consisted of 163 4G cells in the 800MHz band, from which 100 were selected for downlink transmission power optimization, followed by RET optimization.
The emulation of the network’s behavior after a power change is so accurate that no iterative interaction with the live network is required. Instead, the final optimized values were fully obtained by solely interacting with the digital twin, and then these values were directly implemented in the network. After this phase, RET optimization was applied to the network. The final changes are illustrated in Figure 27.
The transmission power was reduced by 10 percent while simultaneously achieving a 12 percent increase in downlink throughput. An additional round of both power and RET optimization steps were executed to explore the potential limits of the solution, resulting in a final cumulative transmission power reduction of 20 percent while still achieving a 5.5 percent throughput gain. This reduction in ERP implies a 3.4 percent decrease in base station power consumption.
RL in the cognitive network
Zero-touch network management and operations is a vision in which networks are deployed and operated with minimum human intervention, using trustworthy AI technologies. The cognitive network will be based on control design, using both machine reasoning and machine learning techniques that outperform previous methodologies.
RL enables the network to continuously learn from its environmental observations, interactions and previous experiences. The cognitive processes understand the current network situation, plan for a wanted outcome, decide on what to do and act accordingly. Desired outcomes serve as input to learn from its actions. The cognitive network will be able to optimize its existing knowledge, build on experience and reason to solve new problems.