Necessity

For any AI models, adversarial attacks occur because attackers exploit vulnerabilities in the model’s parameters by crafting inputs that cause the model to make incorrect predictions. These attacks are particularly effective when the model parameters remain static, allowing attackers ample time to probe and understand the model’s weaknesses. The necessity of the Generate to Defend (G2D) algorithm arises from the need to counteract these attacks by periodically regenerating model parameters, thereby preventing attackers from gaining prolonged access to a consistent set of parameters. By continually altering the model parameters while maintaining performance, G2D disrupts the attackers’ ability to reliably exploit the model, thus enhancing its robustness against adversarial threats, similar to frequently changing passwords to increase security.

Generate to Defend (G2D) Algorithm

The G2D algorithm periodically generates new model parameters to prevent adversarial attacks by ensuring that the model parameters are not exposed for extended periods. The key idea is to tailor a diffusion model to create new parameters that maintain the model’s performance while being sufficiently different from the original parameters. This is achieved by adding a regularizer to the diffusion process that maximizes the L1 distance between the generated parameters and the original parameters.

1. Preliminaries of Diffusion Models:

Diffusion models consist of forward and reverse processes indexed by timesteps. We summarize these processes below:

Forward Process: Given a sample x0q(x)x_0 \sim q(x), Gaussian noise is progressively added for TT steps to obtain x1,x2,,xTx_1, x_2, \ldots, x_T. This process is described by:

q(xtxt1)=N(xt;1βtxt1,βtI)q(x_t | x_{t-1}) = \mathcal{N}(x_t; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I)

Reverse Process: The reverse process aims to train a denoising network to remove the noise from xtx_t, moving backward from xTx_T to x0x_0:

pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))p_\theta(x_{t-1} | x_t) = \mathcal{N}(x_{t-1}; \mu_\theta(x_t, t), \Sigma_\theta(x_t, t))

The denoising network is optimized using the negative log-likelihood:

Ldm=KL(q(xt1xt,x0)pθ(xt1xt))L_{dm} = \text{KL}(q(x_{t-1} | x_t, x_0) \parallel p_\theta(x_{t-1} | x_t))

2. Embedding Model into Compact Space:

To prepare the data, we train an autoencoder to extract latent representations of the model parameters. The encoding and decoding processes are formulated as:

Z=fencoder(V+ξV,σ)V=fdecoder(Z+ξZ,ρ)Z = f_{\text{encoder}}(V + \xi_V, \sigma) V' = f_{\text{decoder}}(Z + \xi_Z, \rho)

where VV is the set of model parameters, ZZ is the latent representation, ξV\xi_V and ξZ\xi_Z are added Gaussian noise, and σ\sigma and ρ\rho are parameters of the encoder and decoder, respectively. The autoencoder is trained by minimizing the mean square error (MSE) loss:

LMSE=1Kk=1Kvkvk2L_{MSE} = \frac{1}{K} \sum_{k=1}^K \| v_k - v'_k \|^2

3. Training Diffusion Models with Regularizer:

We modify the diffusion process to include a regularizer that maximizes the L1 distance between the generated parameters and the original parameters. The training objective for the diffusion model with the additional regularizer is given by:

θθθ(ϵϵθ(αˉtzk0+1αˉtϵ,t)2+λθgenθorig1)\theta \leftarrow \theta - \nabla_\theta \left( \| \epsilon - \epsilon_\theta (\sqrt{\bar{\alpha}_t} z_k^0 + \sqrt{1 - \bar{\alpha}_t} \epsilon, t) \|^2 + \lambda \| \theta_{\text{gen}} - \theta_{\text{orig}} \|_1 \right)

where λ\lambda is a hyperparameter that controls the importance of the regularizer, θgen\theta_{\text{gen}} are the generated parameters, and θorig\theta_{\text{orig}} are the original parameters.

4. Model Generation:

During inference, random noise is fed into the trained diffusion model and decoder to generate new sets of model parameters:

θgen=fdecoder(ReverseProcess(Random Noise))\theta_{\text{gen}} = f_{\text{decoder}}( \text{ReverseProcess}(\text{Random Noise}))

These generated parameters are then used to replace the existing model parameters, ensuring the model’s resilience against adversarial attacks by frequently updating its parameters. This modified diffusion process ensures that the new parameters are significantly different from the original ones, thereby enhancing the model’s defense against adversarial attacks while maintaining its performance.

After the model is trained, the Step 4 is executed periodically to randomize AI model parameters, so that the cost of attack is significantly increased. The more frequent we generate, the safer our AI model become. To generate a new parameter, only a diffusion process is inferenced in a cost-effective manner.