Generate to Defend (G2D)
Necessity
For any AI models, adversarial attacks occur because attackers exploit vulnerabilities in the model’s parameters by crafting inputs that cause the model to make incorrect predictions. These attacks are particularly effective when the model parameters remain static, allowing attackers ample time to probe and understand the model’s weaknesses. The necessity of the Generate to Defend (G2D) algorithm arises from the need to counteract these attacks by periodically regenerating model parameters, thereby preventing attackers from gaining prolonged access to a consistent set of parameters. By continually altering the model parameters while maintaining performance, G2D disrupts the attackers’ ability to reliably exploit the model, thus enhancing its robustness against adversarial threats, similar to frequently changing passwords to increase security.
Generate to Defend (G2D) Algorithm
The G2D algorithm periodically generates new model parameters to prevent adversarial attacks by ensuring that the model parameters are not exposed for extended periods. The key idea is to tailor a diffusion model to create new parameters that maintain the model’s performance while being sufficiently different from the original parameters. This is achieved by adding a regularizer to the diffusion process that maximizes the L1 distance between the generated parameters and the original parameters.
1. Preliminaries of Diffusion Models:
Diffusion models consist of forward and reverse processes indexed by timesteps. We summarize these processes below:
Forward Process: Given a sample , Gaussian noise is progressively added for steps to obtain . This process is described by:
Reverse Process: The reverse process aims to train a denoising network to remove the noise from , moving backward from to :
The denoising network is optimized using the negative log-likelihood:
2. Embedding Model into Compact Space:
To prepare the data, we train an autoencoder to extract latent representations of the model parameters. The encoding and decoding processes are formulated as:
where is the set of model parameters, is the latent representation, and are added Gaussian noise, and and are parameters of the encoder and decoder, respectively. The autoencoder is trained by minimizing the mean square error (MSE) loss:
3. Training Diffusion Models with Regularizer:
We modify the diffusion process to include a regularizer that maximizes the L1 distance between the generated parameters and the original parameters. The training objective for the diffusion model with the additional regularizer is given by:
where is a hyperparameter that controls the importance of the regularizer, are the generated parameters, and are the original parameters.
4. Model Generation:
During inference, random noise is fed into the trained diffusion model and decoder to generate new sets of model parameters:
These generated parameters are then used to replace the existing model parameters, ensuring the model’s resilience against adversarial attacks by frequently updating its parameters. This modified diffusion process ensures that the new parameters are significantly different from the original ones, thereby enhancing the model’s defense against adversarial attacks while maintaining its performance.
After the model is trained, the Step 4 is executed periodically to randomize AI model parameters, so that the cost of attack is significantly increased. The more frequent we generate, the safer our AI model become. To generate a new parameter, only a diffusion process is inferenced in a cost-effective manner.