Jon Eskin's Website

Running Dreambooth in Stable Diffusion with Low VRAM

Introduction

In a recent whitepaper, researchers described a technique to take existing pre-trained text-to-image models and embed new subjects, adding the capability to synthesize photorealistic images of the subject contextualized in the model’s output.

A series of implementations were quickly built, finding their way to the Stable Diffusion web UI project in the form of the sd_dreambooth_extension.

This adds the ability to specify local images of a subject you want to train on, such as Gillian Anderson, and create a model that is more capable of depicting that subject in new contexts, such as being a starfleet officer:

gillian anderson as a starfleet officer

For people running systems with low VRAM, it’s easy for dreambooth to use large amount of memory and cause stable diffusion to crash with the following error:

RuntimeError: No executable batch size found, reached zero.

In this blog post, I’ll show a few of the optimizations I used to fix this error and get dreambooth running with on my 8GB Nvidia Geforce 1070. I’ll also show a few of the problems I encountered along the way and how I fixed them.

An arch linux system was used, but these optimizations should work on any OS, possibly with minor modifications.

Optimizations

Use xFormers for image generation

xFormers is a library written by facebook research that improves the speed and memory efficiency of image generation. To install it, stop stable-diffusion-webui if its running and build xformers from source by following these instructions.

Next, you’ll need to add a commandline parameter to enable xformers the next time you start the web ui, like in this line from my webui-user.sh:

export COMMANDLINE_ARGS="--medvram --xformers"

The next time you launch the web ui it should use xFormers for image generation.

You’ll also need to make sure to explicitly use xformer when you’re training inside Dreambooth tab of the web UI under Settings > Advanced > Memory Attention (select xformers)

Tweak dreambooth settings

Dreambooth ran succesffully when I used the following settings in the Dreambooth tab of the web ui:

Use LORA: checked
Training Steps Per Image (Epochs): 150
batch size: 1
lora unet learning rate: 0.0008
lora text encoder leraning rate: 0.00006
lr scheduler: constant
resolution: 512
Use EMA: unchecked
Use 8bit Adam: checked
mixed precision: fp16
Memory Attention: xformers
Cache Latents: checked

Run Stable Diffusion without a graphical environment

When I need to squeeze every last meg of VRAM out of my card, I opt not to run a graphical environment on the machine running stable diffusion. On Linux, this means never starting the X server/Wayland compositor.

To do this, first you’ll need to configure the web ui to work from another machine on your local network.

export COMMANDLINE_ARGS="--medvram --listen --xformers"

You’ll want to disable your display manager or similar mechanism and then reboot to free up as much memory as possible.

Then, using another machine on your network (a wifi-connected smartphone works as well), navigate to [ip address of machine running webui]:[port shown in process output] and control it from there.

Check memory usage and close applications with nvtop

If you’re on Linux, you can check out nvtop.

It’s a curses program that functions similarly to htop but lets you view and kill applications that use GPU resources:

nvtop

I’m not aware of an equivalent program on Windows, but you can always try looking at the Task Manager and sorting by GPU usage.

taskmanager

Encountered Issues

Below are a hodgepodge of problems I ran into along the way, along with how I overcame them.

Exception training model: ‘Could not run ‘xformers::efficient_attention_forward_cutlass’ with arguments from the ‘CUDA’ backend.

I got this when I did not properly follow xformers installation steps.

Name ‘str2optimizer8bit_blockwise’ is not defined

This was caused by python being unable to find a CUDA shared library on my system. To fix it, you’ll need to make sure the folders where CUDA’s libraries are located are set on your system’s path.

On arch linux, I did that by setting environment variable LD_LIBRARY_PATH="/opt/cuda/targets/x86_64-linux/lib:$LD_LIBRARY_PATH".

Img should be PIL image

A number of posts online were saying to check “Apply Horizontal Flip” in the dreambooth settings, but doing so was giving me this error.

I didn’t really look much into it, I just unchecked it and the error no longer occurred.

zipfile.BadZipFile: File is not a zip file

After I initially got dreambooth to train successfully, I exported the result as a .ckpt file. When I tried to load the checkpoint to produce the image, the model wouldn’t load, and I could seezipfile.BadZipFile: File is not a zip file in the server’s output.

I looked at the .ckpt file that was being created and compared it to a valid one from a different model. The valid one looked like a zip archive, but the erronous one I was generating was just a yaml file with a .ckpt filename.

I discovered that setting a custom model name in the “Saving” tab in dreambooth settings caused both a yaml and ckpt to be correctly created, and I could load and use the ckpt file.

Conclusion

Hopefully this helped dreambooth work on your machine. If you have any questions, corrections, or comments, you can send them to my public inbox.

Thanks for reading!