Running Dreambooth in Stable Diffusion with Low VRAM
Updated with the latest stable diffusion web UI, sd_dreambooth_extension, and xformers as of 1/27/2023
In a recent whitepaper, researchers described a technique to take existing pre-trained text-to-image models and embed new subjects, adding the capability to synthesize photorealistic images of the subject contextualized in the model’s output.
A series of implementations were quickly built, finding their way to the Stable Diffusion web UI project in the form of the sd_dreambooth_extension.
This adds the ability to fine-tune models on images of particular subjects.
Images generated this way come out much better than the baseline model, especially when you start adding additional context.
Unfortunately, the process is resource-intensive, and if your system low VRAM it’s easy for dreambooth to crash with an error like this:
RuntimeError: No executable batch size found, reached zero.
In this blog post, I’ll show a few of the optimizations I used to fix this error and get dreambooth running with on my 8GB Nvidia Geforce 1070. I’ll also show a few of the problems I encountered along the way and how I fixed them.
An arch linux system was used, but these optimizations should work on any OS, possibly with minor modifications.
Use xFormers for image generation
xFormers is a library written by facebook research that improves the speed and memory efficiency of image generation. To install it, stop stable-diffusion-webui if its running and build xformers from source by following these instructions.
Next, you’ll need to add a commandline parameter to enable xformers the next time you start the web ui, like in this line from my
export COMMANDLINE_ARGS="--medvram --xformers"
The next time you launch the web ui it should use xFormers for image generation.
You’ll also need to make sure to explicitly use xformer when you’re training inside Dreambooth tab of the web UI under
Settings > Advanced > Memory Attention (select xformers)
Tweak dreambooth settings
Dreambooth ran succesffully when I used the following settings in the Dreambooth tab of the web ui:
Use LORA: unchecked Training Steps Per Image (Epochs): 150 batch size: 1 Learning Rate Scheduler: constant with warmup Learning Rate: 0.000002 Resolution: 512 Use EMA: unchecked Use 8bit Adam: checked Mixed precision: fp16 Memory Attention: xformers Cache Latents: unchecked
Run Stable Diffusion without a graphical environment
I need to squeeze every last meg of VRAM out of my card, so I usually opt not to run a graphical environment on the machine running stable diffusion. On Linux, this means never starting the X server/Wayland compositor.
To do this, first you’ll need to configure the web ui to work from another machine on your local network.
export COMMANDLINE_ARGS="--medvram --listen --xformers"
You’ll want to disable your display manager or similar mechanism and then reboot to free up as much memory as possible.
Then, using another machine on your network (a wifi-connected smartphone works as well), navigate to [ip address of machine running webui]:[port shown in process output] and control it from there.
Reboot between attempted trainings
I’ve found that when I attempt a training, the follow up will often run OOM even when it should not.
Restarting webui or ensuring that there are no leaking GPU resources with nvtop do not seem to solve the issue.
However, doing a full restart will cause the same configuration that ran OOM will succeed. Until I figure out what is leaking GPU memory like a sieve, I’ll probably continue doing full reboots between training.
Many guides online recommend using LORA to reduce memory usage. However, on my setup, using LORA will cause runs to fail that succeed when I disable it.
If LORA isn’t working for you, try turning it off.
Disabling preview images
Save model frequency,
Save preview frequency, and
Generate Classification Images Using txt2img gives you preview images at epoch checkpoints that you can view to check for overtraining.
Unfortunately the preview images seem to use GPU memory and cause my trainings to fail.
As a result, I end up only setting
Save model frequency. When training completes, I bisect the completed checkpoints and check for overtraining.
For example, if I generate 12 checkpoints, I’ll open up checkpoint 12 and run a few images that include my trained subject in a new context. If the images are incapable of including the new context, e.g. the image above showed Gillian Anderson’s face but didn’t respond to the “starfleet officer” prompt, that means the model has been overtrained.
So I would open checkpoint 6, and if that worked, I would try checkpoint 9, and so on.
Below are a hodgepodge of problems I ran into along the way, along with how I overcame them.
Exception training model: ‘Could not run ‘xformers::efficient_attention_forward_cutlass’ with arguments from the ‘CUDA’ backend.
I got this when I did not properly follow xformers installation steps.
Name ‘str2optimizer8bit_blockwise’ is not defined
This was caused by python being unable to find a CUDA shared library on my system. To fix it, you’ll need to make sure the folders where CUDA’s libraries are located are set on your system’s path.
On arch linux, I did that by setting environment variable
Img should be PIL image
A number of posts online were saying to check “Apply Horizontal Flip” in the dreambooth settings, but doing so was giving me this error.
I didn’t really look much into it, I just unchecked it and the error no longer occurred.
zipfile.BadZipFile: File is not a zip file
After I initially got dreambooth to train successfully, I exported the result as a .ckpt file. When I tried to load the checkpoint to produce the image, the model wouldn’t load, and I could see
zipfile.BadZipFile: File is not a zip file in the server’s output.
I looked at the .ckpt file that was being created and compared it to a valid one from a different model. The valid one looked like a zip archive, but the erronous one I was generating was just a yaml file with a .ckpt filename.
I discovered that setting a custom model name in the “Saving” tab in dreambooth settings caused both a yaml and ckpt to be correctly created, and I could load and use the ckpt file.