Decentralized DORA Tuning
Abstract
(Decentralized Weight-Decomposed Low-Rank Adaptation) extends the DoRA algorithm to a decentralized environment, enabling distributed training across multiple servers.
Algorithm
Step 1: Initialization
We first initialize the parameter matrices of Theia and the low-rank matrices.
-
Magnitude Vector Initialization: Initialize the magnitude vector using the column-wise norm of the pre-trained weight matrix :
-
Directional Matrix Initialization: Set the directional matrix to the pre-trained weight matrix :
-
Low-Rank Matrices Initialization: Initialize the low-rank matrices and for the LoRA method.
Step 2: Decomposition
Decompose the pre-trained weight matrix into its magnitude and direction components. The pre-trained weight is directly derived from the NLP model.
-
Magnitude Decomposition:
-
Directional Decomposition:
Step 3: Distributed Training
For each server in the decentralized network, perform the following:
-
Distribution: Distribute the initialized , , , and to all servers.
-
Iterations: For each iteration from 1 to :
-
Weight Update Calculation: Compute the weight update using the low-rank adaptation method:
-
Magnitude Vector Update: Update the magnitude vector :
-
Directional Matrix Update: Update the low-rank matrices and with gradient descent:
-
Step 4: Aggregation
After all iterations are completed, collect updates from all servers and aggregate them:
-
Magnitude Vector Aggregation:
-
Directional Matrix Aggregation:
-
Low-Rank Matrices Aggregation:
Step 5: Merging
Merge the updated components to form the final weight matrix:
- Final Weights Calculation:
Step 6: Return Updated Weights
Return the updated weights as the final output of the algorithm. We use it as the weight parameters of Theia.
Summary
D^2ORA enhances the capabilities of DoRA by enabling decentralized training across multiple servers. This approach leverages blockchain technology to ensure trustworthiness, making it suitable for training AI models in a distributed and secure environment.