Running Dreambooth in Stable Diffusion with Low VRAM
In a recent whitepaper, researchers described a technique to take existing pre-trained text-to-image models and embed new subjects, adding the capability to synthesize photorealistic images of the subject contextualized in the model’s output.
A series of implementations were quickly built, finding their way to the Stable Diffusion web UI project in the form of the sd_dreambooth_extension.
This adds the ability to specify local images of a subject you want to train on, such as Gillian Anderson, and create a model that is more capable of depicting that subject in new contexts, such as being a starfleet officer:
For people running systems with low VRAM, it’s easy for dreambooth to use large amount of memory and cause stable diffusion to crash with the following error:
RuntimeError: No executable batch size found, reached zero.
In this blog post, I’ll show a few of the optimizations I used to fix this error and get dreambooth running with on my 8GB Nvidia Geforce 1070. I’ll also show a few of the problems I encountered along the way and how I fixed them.
An arch linux system was used, but these optimizations should work on any OS, possibly with minor modifications.
Use xFormers for image generation
xFormers is a library written by facebook research that improves the speed and memory efficiency of image generation. To install it, stop stable-diffusion-webui if its running and build xformers from source by following these instructions.
Next, you’ll need to add a commandline parameter to enable xformers the next time you start the web ui, like in this line from my
export COMMANDLINE_ARGS="--medvram --xformers"
The next time you launch the web ui it should use xFormers for image generation.
You’ll also need to make sure to explicitly use xformer when you’re training inside Dreambooth tab of the web UI under
Settings > Advanced > Memory Attention (select xformers)
Tweak dreambooth settings
Dreambooth ran succesffully when I used the following settings in the Dreambooth tab of the web ui:
Use LORA: checked Training Steps Per Image (Epochs): 150 batch size: 1 lora unet learning rate: 0.0008 lora text encoder leraning rate: 0.00006 lr scheduler: constant resolution: 512 Use EMA: unchecked Use 8bit Adam: checked mixed precision: fp16 Memory Attention: xformers Cache Latents: checked
Run Stable Diffusion without a graphical environment
When I need to squeeze every last meg of VRAM out of my card, I opt not to run a graphical environment on the machine running stable diffusion. On Linux, this means never starting the X server/Wayland compositor.
To do this, first you’ll need to configure the web ui to work from another machine on your local network.
export COMMANDLINE_ARGS="--medvram --listen --xformers"
You’ll want to disable your display manager or similar mechanism and then reboot to free up as much memory as possible.
Then, using another machine on your network (a wifi-connected smartphone works as well), navigate to [ip address of machine running webui]:[port shown in process output] and control it from there.
Check memory usage and close applications with nvtop
If you’re on Linux, you can check out nvtop.
It’s a curses program that functions similarly to
htop but lets you view and kill applications that use GPU resources:
I’m not aware of an equivalent program on Windows, but you can always try looking at the Task Manager and sorting by GPU usage.
Below are a hodgepodge of problems I ran into along the way, along with how I overcame them.
Exception training model: ‘Could not run ‘xformers::efficient_attention_forward_cutlass’ with arguments from the ‘CUDA’ backend.
I got this when I did not properly follow xformers installation steps.
Name ‘str2optimizer8bit_blockwise’ is not defined
This was caused by python being unable to find a CUDA shared library on my system. To fix it, you’ll need to make sure the folders where CUDA’s libraries are located are set on your system’s path.
On arch linux, I did that by setting environment variable
Img should be PIL image
A number of posts online were saying to check “Apply Horizontal Flip” in the dreambooth settings, but doing so was giving me this error.
I didn’t really look much into it, I just unchecked it and the error no longer occurred.
zipfile.BadZipFile: File is not a zip file
After I initially got dreambooth to train successfully, I exported the result as a .ckpt file. When I tried to load the checkpoint to produce the image, the model wouldn’t load, and I could see
zipfile.BadZipFile: File is not a zip file in the server’s output.
I looked at the .ckpt file that was being created and compared it to a valid one from a different model. The valid one looked like a zip archive, but the erronous one I was generating was just a yaml file with a .ckpt filename.
I discovered that setting a custom model name in the “Saving” tab in dreambooth settings caused both a yaml and ckpt to be correctly created, and I could load and use the ckpt file.
Hopefully this helped dreambooth work on your machine. If you have any questions, corrections, or comments, you can send them to my public inbox.
Thanks for reading!