LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game

AI Audio Lecture + Q&A

0:00 / 0:00

Transcript

John: Welcome to Advanced Topics in Distributed Systems. Today's lecture is on 'LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game'. We've seen a lot of recent work trying to make federated learning more efficient, like in 'Online Client Scheduling and Resource Allocation for Efficient Federated Edge Learning', which focuses heavily on the resource side. This paper from researchers at Wuhan University of Science and Technology takes a more integrated approach, trying to solve both data heterogeneity and resource bottlenecks at the same time. Yes, Noah? Noah: Hi Professor. So, is the main contribution here tackling both problems simultaneously? Because it seems like most papers focus on one or the other. John: Precisely. That's the core motivation. Federated Learning, especially the hierarchical version used in large IoT networks, has two major performance killers. First, you have non-IID data—clients have skewed, dissimilar datasets, which throws off the global model's convergence and hurts accuracy. Second, you have communication bottlenecks, where slow clients, or 'stragglers,' delay entire training rounds, wasting time and energy. Noah: And HFL, Hierarchical Federated Learning, is supposed to help with the communication part, right? By adding edge servers? John: Correct. In HFL, clients send updates to a local edge server, which aggregates them before sending a single update to the central server. It reduces the central server's load, but it creates a new problem: how do you group clients under these edge servers? A bad grouping can make the non-IID problem even worse if one edge server gets all of one type of data. LEAP addresses this by reframing the client grouping problem as a coalition formation game. The clients are the players, and they form coalitions with edge servers. Noah: A game? So clients are acting in their own self-interest? John: Not exactly, and that's a key distinction. Instead of a selfish rule, LEAP uses what it calls a 'coalition-friendly preference rule.' A client will only switch to a new edge server if that move benefits the entire system, meaning it makes the data distributions across all edge servers more similar. The goal is to minimize the Jensen-Shannon Divergence, or JSD, between the aggregated data distributions at each edge server. Noah: How is that JSD calculated without violating privacy? Do the clients have to share statistics about their data? John: They only need to share basic label counts, not the raw data itself, which is a common and accepted practice in many privacy-preserving FL schemes. The authors prove this game is an 'exact potential game,' which is a theoretical guarantee that the process of clients switching coalitions will eventually converge to a stable state. Once that stable, data-balanced grouping is found, LEAP moves to its second phase: resource optimization. John: In this second phase, the framework tackles the communication bottleneck. It first uses a technique called the Gradient Projection method to optimally allocate the total available network bandwidth among the different edge server coalitions. The goal is to minimize the total transmission time, which is dictated by the slowest client in each group. This directly addresses the straggler problem. The math works out nicely because the transmission delay function is convex, making it suitable for this kind of optimization. Noah: So it's a two-step process: first form the groups to fix data distribution, then allocate resources to those fixed groups. Does that sequence risk finding a suboptimal solution? Maybe a slightly worse data distribution could allow for a much better resource allocation. John: That's an excellent point. It is a sequential, decoupled optimization, which is a heuristic. The authors don't claim it's globally optimal, but their results suggest it's highly effective. After allocating bandwidth to the group, the final step is for each individual client to calculate its optimal transmission power. The framework provides a closed-form solution for this. Each client calculates the absolute minimum power it needs to send its update within the required time limit, while not exceeding its maximum power budget. This balances the need for speed with the need to conserve energy on resource-constrained devices. Noah: And the results back this up? The report mentioned a 2.24 times reduction in energy consumption. John: Yes, and importantly, that energy saving was achieved while still meeting the latency constraints. Schemes with random power allocation often failed to meet the deadline. On the accuracy front, the coalition game was very effective. By balancing the data, LEAP improved model accuracy by up to 47% compared to an unoptimized initial state, and by over 20% compared to baseline clustering algorithms like Mean-shift. This shows that explicitly optimizing for data similarity via JSD is more effective than general-purpose clustering. John: The primary implication here is a shift toward holistic problem-solving in HFL. It demonstrates that you can't treat data distribution and network performance as independent variables. They are coupled. LEAP provides a practical, privacy-preserving, and theoretically-grounded way to manage this coupling. This approach is quite different from something like 'FlocOff', which also addresses heterogeneity and communication, but uses an offloading strategy rather than a game-theoretic one for client grouping. Noah: This coalition-friendly rule also seems different from reputation-based systems I've seen, where clients might selfishly try to join groups that benefit them the most. John: Exactly. The 'coalition-friendly' rule prioritizes global system health over individual client gain, which is crucial for the performance of the final global model. It’s a cooperative framework, and the 'exact potential game' formulation guarantees that this cooperative process will find a stable equilibrium. It provides a strong template for managing other coupled problems in distributed machine learning. John: So, to wrap up, LEAP presents a cascaded optimization framework. It first uses a coalition game to group clients, turning a non-IID problem into a more balanced one at the edge layer. Then, it uses convex optimization to allocate bandwidth and power, minimizing latency and energy. The key takeaway is that solving the complex challenges of real-world distributed learning requires integrated frameworks that model the interplay between data, networks, and system objectives. John: Thanks for listening. If you have any further questions, ask our AI assistant or drop a comment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game