<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:dc="http://purl.org/dc/elements/1.1/">
    <channel>
        <title>Jon Eskin's Blog</title>
        <link>https://jeskin.net</link>
        <description><![CDATA[Jon Eskin's programming notes and blog posts]]></description>
        <atom:link href="https://jeskin.net/rss.xml" rel="self"
                   type="application/rss+xml" />
        <lastBuildDate>Fri, 16 May 2025 00:00:00 UT</lastBuildDate>
        <item>
    <title>Maybe Rust's Syntax is Good, Actually</title>
    <link>https://jeskin.net/blog/rust-syntax-is-good-actually.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on May 16, 2025
        
            by Jon Eskin
        
    </section>
    <section>
        <h2 id="introduction">Introduction</h2>
<p>In a <a href="https://github.com/jpe90/candle-pytorch-parity-testing">recent project</a>, I compared output from HuggingFace’s <a href="https://github.com/huggingface/candle">Candle</a> with <a href="https://github.com/pytorch/pytorch">PyTorch</a> equivalents to make sure that embeddings and calculations involving them are behaving as expected.</p>
<p>This led to comparing Python snippets that look like this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> normalize_l2(embeddings):</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> F.normalize(embeddings, p<span class="op">=</span><span class="dv">2</span>, dim<span class="op">=</span><span class="dv">1</span>)</span></code></pre></div>
<p>to Rust that looks like this:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode rust"><code class="sourceCode rust"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">pub</span> <span class="kw">fn</span> normalize_l2(v<span class="op">:</span> <span class="op">&amp;</span>Tensor) <span class="op">-&gt;</span> <span class="dt">Result</span><span class="op">&lt;</span>Tensor<span class="op">&gt;</span> <span class="op">{</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>    <span class="cn">Ok</span>(v<span class="op">.</span>broadcast_div(<span class="op">&amp;</span>v<span class="op">.</span>sqr()<span class="op">?.</span>sum_keepdim(<span class="dv">1</span>)<span class="op">?.</span>sqrt()<span class="op">?</span>)<span class="op">?</span>)</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>You might think this isn’t a fair comparison, because the normalization is hand-rolled in Rust and just tucked away in a function call in the Python equivalent. But that’s besides the point; even a hand-rolled Python equivalent would not be this noisy. PyTorch semantics are fraught with unintuitive behavior behind primitive operators. As a matter of fact, I wrote an <a href="https://jeskin.net/blog/pytorch-broadcasting-mechanics/">earlier post</a> about getting confused by PyTorch broadcasting mechanics when trying to normalize a tensor. Can you spot the problem?</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>P <span class="op">=</span> N.<span class="bu">float</span>()</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>P <span class="op">/=</span> P.<span class="bu">sum</span>(<span class="dv">1</span>)</span></code></pre></div>
<h2 id="favoring-explicitness-over-implicitness">Favoring Explicitness over Implicitness</h2>
<p>When I was learning PyTorch (and I suppose NumPy, by extension), I found that I would often struggle to understand code that I read, because there were a lot of implicit interactions going on that I wasn’t aware of:</p>
<p><img src="/images/complaining.png" /></p>
<p>While these make for clean-looking code, it creates issues:</p>
<ul>
<li>Failures often occur silently and cause calculations to have incorrect results, sometimes subtly. This is worst kind of failure that you can have in software.</li>
<li>There’s a notable absence of a type system that could catch logic failures and serve as documentation.</li>
</ul>
<p>When I was learning, I found myself wishing that these operations were explicitly spelled out and enforced by a compiler. Without this, new developers must rely on someone (or more commonly, on documentation of inconsistent quality) to explain behavior that is not always very intuitive. It is always an option to determine this behavior by reading the source code, but unfortunately large projects such as PyTorch become extremely cumbersome to navigate once they grow past a certain size and complexity.</p>
<h2 id="forcing-the-issues">Forcing the Issues</h2>
<p>The parts of Rust’s syntax which I used to perceive as noise. Let me paste in <code>normalize_l2</code> again:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode rust"><code class="sourceCode rust"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">pub</span> <span class="kw">fn</span> normalize_l2(v<span class="op">:</span> <span class="op">&amp;</span>Tensor) <span class="op">-&gt;</span> <span class="dt">Result</span><span class="op">&lt;</span>Tensor<span class="op">&gt;</span> <span class="op">{</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    <span class="cn">Ok</span>(v<span class="op">.</span>broadcast_div(<span class="op">&amp;</span>v<span class="op">.</span>sqr()<span class="op">?.</span>sum_keepdim(<span class="dv">1</span>)<span class="op">?.</span>sqrt()<span class="op">?</span>)<span class="op">?</span>)</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>My early reaction to Rust code was to find the question marks, Result wrapping of the return type, and “OK” to be a lot of unfamiliar visual noise. These are some of are Rust’s way of forcing the programmer to acknowledge the potential failure points in their code. For a single developer who is continuously working on a codebase, you could argue this is not a big deal. Similarly, you could argue that it would create a burden for a small team of like-minded and extremely talented developers that is more trouble than its worth.</p>
<p>But the value quickly compounds when you consider other scenarios. If you ever want to onboard more junior developers, it is immensely helpful for them to be forced to understand where landmines are in software. If you walk away from the codebase for some period of time, or software development entirely, and some of those details will slip. Having a strict compiler enforcing hygeine is immensely beneficial for compensating for human error. The code is noisy because the task at hand is noisy.</p>
<p>Still, it would be nice if Rust could maintain its strictness with cleaner syntax. FP nerds probably recognize that the function is moving the input tensor through a Monadic context, and if the library existed in Haskell, it would be able to drop some of the noise with it’s “do notation”, like this:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode haskell"><code class="sourceCode haskell"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="ot">normalizeL2 ::</span> <span class="dt">Tensor</span> <span class="ot">-&gt;</span> <span class="dt">Either</span> <span class="dt">Error</span> <span class="dt">Tensor</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>normalizeL2 v <span class="ot">=</span> <span class="kw">do</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>  squared <span class="ot">&lt;-</span> sqr v</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>  summed <span class="ot">&lt;-</span> sumKeepDim <span class="dv">1</span> squared</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>  norms <span class="ot">&lt;-</span> <span class="fu">sqrt</span> summed</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>  broadcastDiv v norms</span></code></pre></div>
<p>Rust adopted many of ML-style language’s ideas, but I don’t think anyone has figured out how to let it take Haskell’s “do” without giving up imperative control flow like FP languages often do, which would be at odds with it goal of being a systems programming language. If you are forced to give up either explicitness, speed, or elegance, I think Rust makes a good compromise.</p>
<p><em>Thanks to Matt for turning me into “that guy that keeps bringing Rust into the conversation”</em></p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Maybe Rust's Syntax is Good, Actually">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Fri, 16 May 2025 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/rust-syntax-is-good-actually.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Reproducing GPT-2 on Cloud GPUs</title>
    <link>https://jeskin.net/blog/reproducing-gpt2.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on September 27, 2024
        
            by Jon Eskin
        
    </section>
    <section>
        <h2 id="introduction">Introduction</h2>
<p>Reproducing GPT-2 from scratch is a great way to build an understanding of deep learning and large language models. A number of learning resources about GPT-2 architecture are available, including Andrej Karpathy’s <a href="https://karpathy.ai/zero-to-hero.html">free Zero to Hero course</a> and Sebastian Raschka’s book <a href="https://www.manning.com/books/build-a-large-language-model-from-scratch">Build a Large Language Model from Scratch</a>. While these resources provide excellent coverage of how GPT-2 works under the hood, there’s a few details about challenges you’ll face in the wild when actually training on expensive hardware that aren’t extensively covered.</p>
<h2 id="understanding-cloud-gpu-costs">Understanding Cloud GPU Costs</h2>
<p>It’s important to understand the process of provisioning servers and the details of what you’re doing to avoid racking up an unnecessarily large bill. I recently used a modified version of the reference <a href="https://github.com/karpathy/llm.c/blob/master/train_gpt2.py">pytorch implementation</a> from Karpathy’s <a href="https://github.com/karpathy/llm.c">llm.c</a> project to do my own reproduction of the model (This version appeared to be more up to date than NanoGPT and build-nanogpt variants at the time I ran it).</p>
<p>Many cloud providers offer VPSs with GPUs attached. I’ve seen Lambda Labs recommended here and there, so I checked their service first. One thing that I made sure to do was to ask for some trial credits - if they weren’t available, I would have chosen a different service. This instinct paid off because I made a number of mistakes and would have felt bad about paying the bill afterwards.</p>
<p>Lambda Labs instances worked with no fuss. If you’ve used VPSs like Digital Ocean’s droplets or Linode, Lambda’s instances work the same way. You upload your SSH public key and remote in to a ubuntu instance.</p>
<p>If like me, you hadn’t rented expensive GPUs before, it’s important to know how they are billed. I’m used to small linux VPSs that charge $5-$10/mo in the past, but servers set up for GPU workloads run for far more than that - about $24/hr on the instance I was interested in. Billing is a flat rate per hour, and it accumulates by the minute. It doesn’t matter whether you’re actively running any jobs, or even if the instance is currently running. The second you provision it the clock starts ticking, and you continue being billed until you destroy the instance and any data on it. This has major implications on how you’ll want to do your training.</p>
<h2 id="recommended-approach">Recommended Approach</h2>
<p>Let’s say you wanted to train the network on 8 x H100s. If you rented a server with this setup, which costs about $24/hr, you would soon discover that you spend a good chunk of your time on data preparation. It involves downloading, tokenizing, and sharding the data - which in my case was the 10B sample of <a href="https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu">fineweb-edu</a>. This is an IO bound operation, so all those expensive GPUs are not going to make it go any faster. If you run all that prep and your training run fails, you’re out $20.</p>
<p>My recommendation would be to start out with the cheapest server that is close to what you want to use for a full run. For example, if you wanted to eventually run on 8 X H100s, you would ideally want a server with multiple GPUs with at least 40GB of memory. That way you can fiddle with distributing loads and hyperparameters to improve your chances of getting things right on a real run.</p>
<h2 id="my-run">My Run</h2>
<p>After going through Karpathy’s course and skimming Raschka’s book, I decided to take a stab at reproducing GPT-2 based on some of the reference code from <a href="https://github.com/karpathy/llm.c">llm.c</a>, with a few tweaks.</p>
<h3 id="data-preparation-phase">Data Preparation Phase</h3>
<p>What I ended up doing was renting 4x 48GB A6000s for cheap and then performing the data preparation. As soon as the data preparation kicked off, I started compressing a copy of the project + the sharded data with <code>tar -czvf llm.c.tar.gz llm.c</code>. While that was working, I opened an additional client and started running some experiments, sanity checks, and small runs to make sure everything was working as expected. This gave me a chance to work out a few kinks that I didn’t know about.</p>
<p>After <code>tar</code> completed, I copied the archive back to my machine with <code>scp -r -i ~/.ssh/id_ed25519 ubuntu@{Machine IP}:/home/ubuntu/llm.c/pylog_gpt2_124M .</code>. Once that finished, I was done, so I shut down the instance and destroyed it.</p>
<h3 id="training-phase">Training Phase</h3>
<p>Next, I tried spinning up an 8 x 80GB H100 instance. I moved my project and sharded data over with <code>scp -C -i ~/.ssh/id_ed25519 llm.c.tar.gz ubuntu@{Machine IP}:/home/ubuntu/</code>, and un-tar’d it with <code>tar -xzvf llm.c.tar.gz</code>, which took about half the time as the initial data preparation process. It probably would have been faster to upload it to a S3 bucket or something, but I don’t have one.</p>
<p>I tried running my code with a set of hyperparameters that I picked out, and the process failed with an OOM error. I was trying an aggressive mini-batch size, and unfortunately I didn’t feel comfortable tweaking the parameters to try to get it working since the bill racks up quickly. Instead, I blew up the instance to try different hyperparameters on a different set of hardware.</p>
<p>I had a hyperparameter set that I was sure would work with 8 x 40GB A1000s, so I spun that up and repeated the process. This time, it ran successfully to completion. I added some code that saved a model and did a little inferencing, so after that finished I copied the logs and model back over to my home machine and blew up this instance as well.</p>
<h2 id="conclusion">Conclusion</h2>
<p>I’ve been slowly building up to understanding GPT-2’s architecture over the last few months and it was very fun to try out a training run. I can easily recommend the two learning resources mentioned above. If you have any questions about the process or get stuck, feel free to give me a holler.</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Reproducing GPT-2 on Cloud GPUs">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Fri, 27 Sep 2024 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/reproducing-gpt2.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Fine-tuning NLP transformers for task automation</title>
    <link>https://jeskin.net/blog/finetune-nlp-transformers.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on August 12, 2024
        
            by Jon Eskin
        
    </section>
    <section>
        <h2 id="introduction">Introduction</h2>
<p>In a <a href="https://jeskin.net/blog/leveraging-local-ai-for-task-automation/">previous post</a>, I explored using LLMs to perform text classification tasks. The idea was that they could enable automation of more complex tasks that are otherwise not automatable. Many people just use a model like ChatGPT for this, but after learning a bit about how they work, I was getting the sense that other approaches might be more efficient and accurate.</p>
<p>My experimental task was to classify messages sent from customers to an online store. Messages were classified as either refund requests, order inquiries, or general feedback.</p>
<table style="width:100%; border-collapse: collapse; margin-bottom: 20px;">
<thead>
<tr style="background-color: #f2f2f2;">
<th style="padding: 12px; text-align: left; border: 1px solid #ddd;">
Message
</th>
<th style="padding: 12px; text-align: left; border: 1px solid #ddd;">
Label
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">
“The feedback submission form took a long time to load.”
</td>
<td style="padding: 12px; border: 1px solid #ddd;">
General Feedback
</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 12px; border: 1px solid #ddd;">
“Why did you send me a cactus? I ordered a dishwasher.”
</td>
<td style="padding: 12px; border: 1px solid #ddd;">
Order Inquiry
</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">
“I purchased a time machine, but it only goes forward at regular speed. I’d like a refund.”
</td>
<td style="padding: 12px; border: 1px solid #ddd;">
Refund Request
</td>
</tr>
</tbody>
</table>
<p>In that post, I used a wrapper around llama.cpp, running inference on the 8B parameter, 4-bit quantized version of Llama 3. Architecturally, this is works like a tiny ChatGPT, and is capable of running on pretty much any consumer hardware. By directly prompting the base model I achieved <strong>~92% accuracy out of the box</strong>, improving to <strong>~95% with post-processing</strong> (retrying invalid responses).</p>
<p>The results seemed okay, but a general purpose instruction model didn’t really seem ideal for a specific task like this. I wanted to explore other options that might be more efficient or have better performance.</p>
<p>Two possible approaches to creating a task-specific model would be to either fine-tune an existing model or to train a smaller, specialized model from scratch.</p>
<p>Training a model from scratch would be extremely interesting, but it would also require a huge amount of data and computing power. On the other hand, fine-tuning is a technique that adapts pre-trained language models to specific tasks by further training on a smaller, task-specific dataset. This allows you to leverage the data and compute that went into the original pre-training of the model.</p>
<h2 id="model-selection">Model selection</h2>
<p>My first naive idea was to try to fine-tune Llama 3 itself. Models such as GPT or Llama are primarily designed for text generation. Mechanically, they are trained to predict the next token in a sequence using masked self-attention, where each token can only attend to context to its left. What this means is that the neural network is <em>only considering</em> the words that appear before the token it is trying to predict. Lets look at an example.</p>
<p>Consider the sentence <strong>Live free or die hard</strong>. GPT/Llama will take this whole sentence and simultaneously learn to predict the next word given the input:</p>
<table style="width:100%; border-collapse: collapse; margin-bottom: 20px;">
<thead>
<tr style="background-color: #f2f2f2;">
<th style="padding: 12px; text-align: left; border: 1px solid #ddd;">
Input
</th>
<th style="padding: 12px; text-align: left; border: 1px solid #ddd;">
Target
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">
Live
</td>
<td style="padding: 12px; border: 1px solid #ddd;">
free
</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 12px; border: 1px solid #ddd;">
Live free
</td>
<td style="padding: 12px; border: 1px solid #ddd;">
or
</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">
Live free or
</td>
<td style="padding: 12px; border: 1px solid #ddd;">
die
</td>
</tr>
<tr style="background-color: #f9f9f9;">
<td style="padding: 12px; border: 1px solid #ddd;">
Live free or die
</td>
<td style="padding: 12px; border: 1px solid #ddd;">
hard
</td>
</tr>
</tbody>
</table>
<p>This makes perfect sense if you’re training a model to generate text, but it doesn’t really align with what we’re trying to do. We’re more interested in looking at an entire chunk of text and <em>classifying</em> it. It turns out there’s better ways to achieve this.</p>
<p>Other models learn other types of tasks, such as <strong>Masked Language Modeling</strong> and <strong>Next Sentence Prediction</strong>. In Masked Language Modeling, the model is tasked with predicting the missing words in a sentence. For example, given the sentence “The [MASK] jumped over the fence,” the model would try to predict the masked word (likely “dog” or “cat”).</p>
<p>In Next Sentence Prediction, the model is tasked with determining if two sentences are adjacent in a text. For instance, given the sentences “I love ice cream” and “It’s my favorite dessert”, the model would determine that these sentences are relatively likely to be adjacent. For “The sky is blue” and “Elephants have trunks”, the model would determine that these sentences relatively unlikely to be adjacent.</p>
<p>These approaches can be implemented through a mechanism called <strong>bidirectional self-attention</strong>. Unlike with masked-self attention, the model does not restrict itself to only consider tokens before something it is trying to predict. Instead, it considers all the tokens in the input and takes a stab at its task. This makes more sense if you’re trying to perform sentiment analysis, named entity recognition, or text classification. In those cases, you wouldn’t want all of the machinery associated with next-token prediction, because it would be dead weight.</p>
<h2 id="bert">BERT</h2>
<p>A model that uses bidirectional self-attention is <a href="https://arxiv.org/abs/1810.04805">Bidirectional Encoder Representations from Transformers (BERT)</a>. Introduced by Google in 2018, BERT is pre-trained on Masked Language Modeling and Next Sentence Prediction described above.</p>
<p>These pre-training tasks equip BERT with a strong grasp of language structure and context. From the words of the authors, <em>…the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.</em></p>
<p>My guess was that a model like this would be a better starting point for my use case.</p>
<h2 id="distilbert">DistilBERT</h2>
<p>In 2019, Hugging Face released a smaller, distilled version of BERT called <a href="https://arxiv.org/abs/1910.01108">DistilBERT</a>. It preserves 95% of BERT’s performance while using 40% fewer parameters and running 60% faster. For comparison, it is less than 1% of the size of Llama 3 8B.</p>
<p>DistilBERT achieves this efficiency through a process called knowledge distillation, where a smaller model (the student) is trained to mimic the behavior of a larger model (the teacher). This results in a more compact model that can still perform well on a wide range of tasks, making it well-suited for fine-tuning on specific applications, especially when computational resources are limited.</p>
<h3 id="distilbert-architecture-overview">DistilBERT Architecture Overview</h3>
<p>DistilBERT is based on the Transformer architecture. The model consists of several key components that work together to process and understand text. Text input is first passed through an <strong>embedding layer</strong>. This layer is responsible for converting input tokens (which are essentially words or parts of words) into dense vector representations. These vectors contain floating point numbers that encode the semantic meaning of the words in a format that the model can work with. The values of the vectors change throughout training as the model iterates on its semantic understanding.</p>
<p>The embedding layer learns that words like “dog” and “cat” are semantically closer to each other than to words like “car” or “building”. It captures subtle relationships, such as understanding that “king” is to “man” as “queen” is to “woman”, or that certain adjectives tend to precede certain types of nouns (like “delicious” often appearing before food-related words).</p>
<p>Once the text is embedded, it passes through multiple <strong>Transformer blocks</strong>. These blocks contain layers of self-attention and feed-forward neural networks. The self-attention mechanism allows the model to weigh the importance of different words in the input. This means the model can understand context and relationships between words, much like how we understand language by considering words in relation to each other. After the self-attention layer, feed-forward neural networks further process the attention output, allowing the model to capture complex patterns in the data.</p>
<p>Consider the sentence “The man who crossed the street was hit by a car.” Self-attention allows the model to understand that “hit” is more strongly related to both “man” and “car” than to “crossed” or “street”. This helps the model correctly interpret who was hit (the man) and what hit him (the car), even though these words are not adjacent in the sentence. The feed-forward networks then process this contextual information, potentially learning higher-level concepts like “traffic accidents” or “pedestrian safety” from such examples.</p>
<p>These layers represent different relationships that sub-word tokens can have with each other. As it learns, the model updates its understanding of both of these relationships at the same time. And during inference, it will consider what it has learned about both of these relationships when it predicts the next most likely token to occur.</p>
<h3 id="fine-tuning-process">Fine-tuning Process</h3>
<p>The fine-tuning process involves adjusting the pre-trained DistilBERT model to our specific classification task.</p>
<p>The code below is implemented using Hugging Face’s <code>Transformers</code> library, which provides a high-level interface for working with DistilBERT and other transformer models. This interface includes pre-built model architectures, tokenizers, and data processing utilities. The library offers components like <code>DistilBertTokenizer</code> for text tokenization and <code>DistilBertForSequenceClassification</code> for the actual model architecture. It also provides <code>Dataset</code> and <code>DataLoader</code> classes for efficient data handling and batching.</p>
<p>These high-level components abstract away much of the complexity involved in working with transformer models. The tokenizer handles the task of converting raw text into a format the model can understand, including subword tokenization and special token management. The model class encapsulates DistilBERT’s architecture, including the self-attention mechanisms and feed-forward networks, exposing simple methods for forward passes and loss computation.</p>
<p><code>Transformers</code> delegates the numerical computations to <a href="https://pytorch.org/">PyTorch</a>, a deep learning framework that handles low-level operations. PyTorch manages tensor operations, automatic differentiation for backpropagation, and GPU acceleration. It provides the foundation for defining and training neural networks, offering classes like <code>torch.nn</code> for neural network layers and <code>torch.optim</code> for optimization algorithms.</p>
<h4 id="data-preparation">Data Preparation</h4>
<p>The process begins by loading the labeled data. It consisted of three files, one for each category, each containing several hundred customer messages.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> load_data(file_path):</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>    <span class="cf">with</span> <span class="bu">open</span>(file_path, <span class="st">&#39;r&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>        <span class="cf">return</span> [line.strip() <span class="cf">for</span> line <span class="kw">in</span> f]</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>feedback <span class="op">=</span> load_data(<span class="st">&#39;data/feedback.txt&#39;</span>)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>inquiries <span class="op">=</span> load_data(<span class="st">&#39;data/inquiries.txt&#39;</span>)</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>refunds <span class="op">=</span> load_data(<span class="st">&#39;data/refunds.txt&#39;</span>)</span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>all_texts <span class="op">=</span> feedback <span class="op">+</span> inquiries <span class="op">+</span> refunds</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>all_labels <span class="op">=</span> [<span class="dv">0</span>] <span class="op">*</span> <span class="bu">len</span>(feedback) <span class="op">+</span> [<span class="dv">1</span>] <span class="op">*</span> <span class="bu">len</span>(inquiries) <span class="op">+</span> [<span class="dv">2</span>] <span class="op">*</span> <span class="bu">len</span>(refunds)</span></code></pre></div>
<p>This code loads the text strings and assigns numerical labels to each. In neural networks, labels refer to “the right answer” and are used during training to check the model’s guesses.</p>
<h4 id="dataset-and-dataloader">Dataset and DataLoader</h4>
<p>A custom dataset class is created to handle tokenization:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> TextClassificationDataset(Dataset):</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, texts, labels, tokenizer, max_len):</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>        <span class="va">self</span>.texts <span class="op">=</span> texts</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>        <span class="va">self</span>.labels <span class="op">=</span> labels</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>        <span class="va">self</span>.tokenizer <span class="op">=</span> tokenizer</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>        <span class="va">self</span>.max_len <span class="op">=</span> max_len</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>    <span class="kw">def</span> <span class="fu">__getitem__</span>(<span class="va">self</span>, item):</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>        text <span class="op">=</span> <span class="bu">str</span>(<span class="va">self</span>.texts[item])</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>        label <span class="op">=</span> <span class="va">self</span>.labels[item]</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>        encoding <span class="op">=</span> <span class="va">self</span>.tokenizer.encode_plus(</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>            text,</span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>            add_special_tokens<span class="op">=</span><span class="va">True</span>,</span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>            max_length<span class="op">=</span><span class="va">self</span>.max_len,</span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a>            return_token_type_ids<span class="op">=</span><span class="va">False</span>,</span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a>            padding<span class="op">=</span><span class="st">&#39;max_length&#39;</span>,</span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a>            truncation<span class="op">=</span><span class="va">True</span>,</span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a>            return_attention_mask<span class="op">=</span><span class="va">True</span>,</span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a>            return_tensors<span class="op">=</span><span class="st">&#39;pt&#39;</span>,</span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a>        )</span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a>        <span class="cf">return</span> {</span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a>            <span class="st">&#39;text&#39;</span>: text,</span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a>            <span class="st">&#39;input_ids&#39;</span>: encoding[<span class="st">&#39;input_ids&#39;</span>].flatten(),</span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a>            <span class="st">&#39;attention_mask&#39;</span>: encoding[<span class="st">&#39;attention_mask&#39;</span>].flatten(),</span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a>            <span class="st">&#39;labels&#39;</span>: torch.tensor(label, dtype<span class="op">=</span>torch.<span class="bu">long</span>)</span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a>        }</span></code></pre></div>
<p>A <strong>tokenizer</strong> is a tool that breaks down text into smaller units called tokens. These could be words, parts of words, or even punctuation. The tokenizer also handles converting these tokens into numbers that the model can understand. The <strong>DistilBertTokenizer</strong> is used because the same tokenizer that the model was trained with is desired. Under the hood, it uses <a href="https://huggingface.co/learn/nlp-course/en/chapter6/6">WordPiece subword segmentation</a>.</p>
<p>The <code>encode_plus</code> method of this tokenizer is used. This method does several things: It tokenizes the input text, adds special tokens that DistilBERT expects (like [CLS] at the start and [SEP] at the end), pads or truncates the input to a specified maximum length, and creates an “attention mask” which tells the model which tokens are actual input and which are padding.</p>
<p>The <strong>torch Dataset</strong> is a PyTorch class that represents a dataset. By inheriting from this class, a custom dataset that PyTorch can easily work with is created. The <code>__getitem__</code> method is called when an item from the dataset needs to be accessed.</p>
<p>PyTorch’s DataLoader is used to efficiently batch and shuffle the data:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>train_dataloader <span class="op">=</span> DataLoader(train_dataset, batch_size<span class="op">=</span><span class="dv">16</span>, shuffle<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>test_dataloader <span class="op">=</span> DataLoader(test_dataset, batch_size<span class="op">=</span><span class="dv">16</span>)</span></code></pre></div>
<p>The DataLoader is a PyTorch utility that helps manage batching of data and provides an iterable over the Dataset. For the training data, <code>shuffle=True</code> is set, which means it will randomly shuffle the data at the start of each epoch. This shuffling helps prevent the model from learning any unintended patterns based on the order of the training data.</p>
<p>A batch size of 16 is used, meaning the DataLoader will yield batches of 16 examples at a time. This batch size is a balance between memory usage and training speed. For the test data, shuffling is not needed, so the <code>shuffle</code> parameter is omitted.</p>
<p>During training and evaluation, these DataLoaders will be iterated over, which will give batches of data in the format the model expects. This abstraction simplifies the training loop and makes it easier to work with large datasets that might not fit into memory all at once.</p>
<h4 id="train-test-split">Train-Test Split</h4>
<p>Before training begins, the data is split into training and testing sets:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> train_test_split(data, test_size<span class="op">=</span><span class="fl">0.1</span>):</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>    split_index <span class="op">=</span> <span class="bu">int</span>(<span class="bu">len</span>(data) <span class="op">*</span> (<span class="dv">1</span> <span class="op">-</span> test_size))</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> data[:split_index], data[split_index:]</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>train_data, test_data <span class="op">=</span> train_test_split(<span class="bu">list</span>(<span class="bu">zip</span>(all_texts, all_labels)))</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>train_texts, train_labels <span class="op">=</span> <span class="bu">zip</span>(<span class="op">*</span>train_data)</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>test_texts, test_labels <span class="op">=</span> <span class="bu">zip</span>(<span class="op">*</span>test_data)</span></code></pre></div>
<p>A manual split is used where the first 90% of the data is used for training and the last 10% for testing. This approach was chosen to make it easy for me to place all of the samples evaluated in the previous blog post into the test set. That way, performance could be more directly compared, because none of those samples would have been seen by any either model during training. More on that later.</p>
<h4 id="training-loop">Training Loop</h4>
<p>The training process involves iterating over the data multiple times and updating the model’s parameters to minimize the classification error.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> train_epoch(model, data_loader, optimizer, device):</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>    model.train()</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    total_loss <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>    total_correct <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>    total_samples <span class="op">=</span> <span class="dv">0</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>    <span class="cf">for</span> batch <span class="kw">in</span> data_loader:</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>        input_ids <span class="op">=</span> batch[<span class="st">&#39;input_ids&#39;</span>].to(device)</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>        attention_mask <span class="op">=</span> batch[<span class="st">&#39;attention_mask&#39;</span>].to(device)</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>        labels <span class="op">=</span> batch[<span class="st">&#39;labels&#39;</span>].to(device)</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>        optimizer.zero_grad()</span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>        outputs <span class="op">=</span> model(input_ids, attention_mask<span class="op">=</span>attention_mask, labels<span class="op">=</span>labels)</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a>        loss <span class="op">=</span> outputs.loss</span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a>        logits <span class="op">=</span> outputs.logits</span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>        _, predicted <span class="op">=</span> torch.<span class="bu">max</span>(logits, <span class="dv">1</span>)</span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a>        total_correct <span class="op">+=</span> (predicted <span class="op">==</span> labels).<span class="bu">sum</span>().item()</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a>        total_samples <span class="op">+=</span> labels.size(<span class="dv">0</span>)</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a>        loss.backward()</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a>        optimizer.step()</span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a>        total_loss <span class="op">+=</span> loss.item()</span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a>    avg_loss <span class="op">=</span> total_loss <span class="op">/</span> <span class="bu">len</span>(data_loader)</span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a>    accuracy <span class="op">=</span> total_correct <span class="op">/</span> total_samples</span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> avg_loss, accuracy</span></code></pre></div>
<p>It isn’t shown here, but <code>model</code> passed in to this function is an instance of the <code>DistilBertForSequenceClassification</code> class. <code>input_ids</code>, <code>attention mask</code>, and <code>labels</code> are determined during dataset preparation outside of this function call. The <code>input_id</code>s represent the tokenized input text, while <code>attention_mask</code> indicates which tokens should be attended to. <code>labels</code> represent the correct answers, while <code>predicted</code> represents the guesses.</p>
<p>The function performs optimization - the process where the model adjusts its parameters to minimize the loss and iteratively improve its performance on the task. This involves a “forward pass” where the model processes the input and makes predictions. The loss is then calculated by comparing these predictions to the correct labels. Next, a “backward pass” (backpropagation) computes the gradients. Finally, the optimizer uses these gradients to update the model’s parameters, adjusting weights to reduce the loss.</p>
<p>Calling <code>model</code> in this manner is the PyTorch idiom for doing a forward pass (this has always been a strange design choice to me).</p>
<p>The backward pass is done by a call to <code>loss.backward()</code> where it computes the direction and amount to nudge its parameters and <code>optimizer.step()</code> where it makes those parameter updates.</p>
<p>Most of the actual number crunching is carried out by PyTorch’s autograd code. The attention mechanics are implemented in <code>DistilBertForSequenceClassification</code> class of the Hugging Face <code>transformers</code> library. This library also provides <code>AdamW</code>’s ability to perform parameter updates (with additional optimizations, in AdamW’s case).</p>
<p>As you can see, there isn’t really a defined API for gradient descent. It’s on you to know what steps need to be done and do them.</p>
<h4 id="model-configuration-and-training">Model Configuration and Training</h4>
<p>The model configuration and training takes place in the program’s main method, separate from the above helper methods.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>tokenizer <span class="op">=</span> DistilBertTokenizer.from_pretrained(<span class="st">&#39;distilbert-base-uncased&#39;</span>)</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>model <span class="op">=</span> DistilBertForSequenceClassification.from_pretrained(<span class="st">&#39;distilbert-base-uncased&#39;</span>, num_labels<span class="op">=</span><span class="dv">3</span>)</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a>train_dataset <span class="op">=</span> TextClassificationDataset(train_texts, train_labels, tokenizer, max_len<span class="op">=</span><span class="dv">128</span>)</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>test_dataset <span class="op">=</span> TextClassificationDataset(test_texts, test_labels, tokenizer, max_len<span class="op">=</span><span class="dv">128</span>)</span></code></pre></div>
<p>First, the <strong>tokenizer</strong> and <strong>model</strong> are initialized. The pre-trained ‘distilbert-base-uncased’ model is used, which means it’s been trained on lowercase English text. The <code>num_labels=3</code> parameter tells the model it’s dealing with a three-class classification problem.</p>
<p><code>max_len=128</code> is set when creating the datasets. This <strong>maximum length</strong> parameter determines the longest sequence of tokens the model will process. Shorter sequences will be padded, and longer ones will be truncated.</p>
<p>The <strong>batch size</strong> of 16 in the DataLoader means 16 examples will be processed at a time during training. This is another balance between memory usage and training speed. Larger batch sizes can lead to faster training but require more memory.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>device <span class="op">=</span> torch.device(<span class="st">&#39;mps&#39;</span> <span class="cf">if</span> torch.backends.mps.is_available() <span class="cf">else</span> <span class="st">&#39;cpu&#39;</span>)</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>model.to(device)</span></code></pre></div>
<p>Next, the model is moved to the appropriate device. A check is made if Apple’s Metal Performance Shaders (MPS) are available, which can accelerate training on compatible Mac hardware (this was only run on a Mac; this line would need to be changed if it were to be run optimally on different hardware).</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>optimizer <span class="op">=</span> AdamW(model.parameters(), lr<span class="op">=</span><span class="fl">2e-5</span>)</span></code></pre></div>
<p><strong>AdamW</strong> is an optimization algorithm for training neural networks. It features adaptive learning rates for each parameter and implements decoupled weight decay regularization. Weight decay is a regularization technique that encourages smaller parameter values in the model. It does this by adding a penalty term to the loss function based on the magnitude of the weights. The primary goal of weight decay is to prevent overfitting by discouraging the model from relying too heavily on any individual feature or learning overly complex patterns that may not generalize well to new data.</p>
<p>A <strong>learning rate</strong> of 2e-5 (0.00002) is used here. It was selected by manually tuning the learning rate and number of epochs and evaluating network performance. It’s small enough to allow for fine adjustments to the pre-trained weights without causing drastic changes that could destroy the model’s pre-trained knowledge. Too high, and the model might overshoot optimal solutions; too low, and the model might train too slowly or get stuck in suboptimal solutions. This learning rate is much lower than what you would use for training a network from scratch, which is usually around 1e-3.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>num_epochs <span class="op">=</span> <span class="dv">3</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> epoch <span class="kw">in</span> <span class="bu">range</span>(num_epochs):</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span>(<span class="ss">f&#39;Epoch </span><span class="sc">{</span>epoch <span class="op">+</span> <span class="dv">1</span><span class="sc">}</span><span class="ss">/</span><span class="sc">{</span>num_epochs<span class="sc">}</span><span class="ss">&#39;</span>)</span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>    train_loss, train_acc <span class="op">=</span> train_epoch(model, train_dataloader, optimizer, device)</span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span>(<span class="ss">f&#39;Train loss </span><span class="sc">{</span>train_loss<span class="sc">:.4f}</span><span class="ss"> accuracy </span><span class="sc">{</span>train_acc<span class="sc">:.4f}</span><span class="ss">&#39;</span>)</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>    val_loss, val_acc <span class="op">=</span> evaluate(model, test_dataloader, device)</span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span>(<span class="ss">f&#39;Val loss </span><span class="sc">{</span>val_loss<span class="sc">:.4f}</span><span class="ss"> accuracy </span><span class="sc">{</span>val_acc<span class="sc">:.4f}</span><span class="ss">&#39;</span>)</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>torch.save(model.state_dict(), <span class="st">&#39;model/distilbert_classifier.pth&#39;</span>)</span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Model saved successfully.&quot;</span>)</span></code></pre></div>
<p>The training loop runs for <strong>3 epochs</strong>, after which the accuracy and loss of both the training and validation sets reached desirable values (see “Results and Discussion” below).</p>
<p>An epoch is one complete pass through the entire training dataset. In each epoch, the model is trained on the training data and then evaluated on the validation (test) data. The loss and accuracy for both training and validation sets are printed out with 4 decimal places.</p>
<p>After training, the model is saved using <code>torch.save()</code>. This function saves the model’s <strong>state_dict</strong>. A state_dict in PyTorch is a Python dictionary that maps each layer to its parameter tensors. It contains all the learned weights and biases of the model. By saving the state_dict, all the knowledge the model has gained during training is essentially saved. The model is saved with a <strong>.pth</strong> file extension. This is a common convention in PyTorch for saved model files, standing for “PyTorch”.</p>
<p>To use this saved model later, it would be loaded like this:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>model <span class="op">=</span> DistilBertForSequenceClassification.from_pretrained(<span class="st">&#39;distilbert-base-uncased&#39;</span>, num_labels<span class="op">=</span><span class="dv">3</span>)</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>model.load_state_dict(torch.load(<span class="st">&#39;model/distilbert_classifier.pth&#39;</span>))</span>
<span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>model.<span class="bu">eval</span>()</span></code></pre></div>
<h3 id="results-and-discussion">Results and Discussion</h3>
<p>For simplicity, I ran with the assumption that each string of text had one and only one classification. It would be much more useful if each string of text instead could have 0-3 classifications. This would also almost certainly bring down the accuracy of the model because determining whether a given string belongs to a single category at all seems like it would be a much more complex task than what it had to learn during training.</p>
<p>After 3 epochs, the model achieves an <strong>accuracy of 100% and loss of 0.156 on the test set</strong> and an <strong>accuracy of 99.59% and loss of 0.0278 on the training set</strong>.</p>
<pre><code>Epoch 1/3
Train loss 0.6052 accuracy 0.8264
Val loss 0.1448 accuracy 1.0000
Epoch 2/3
Train loss 0.0708 accuracy 0.9959
Val loss 0.0314 accuracy 1.0000
Epoch 3/3
Train loss 0.0278 accuracy 0.9959
Val loss 0.0156 accuracy 1.0000</code></pre>
<p>The task was designed to be easy enough that a network could handle it, so the performance met my expectations. The 100% accuracy of the test set includes the entirety of the tasks that <strong>Llama 3 8B only achieved 92% accuracy</strong> on, with additional test data added in. And just for kicks, I later tried coming up with weird and contrived strings of text to try to confuse the model. I could not get it to incorrectly classify anything.</p>
<p>After the fine-tuned DistilBERT model was trained on learning the distinctions between the three specific categories (refund requests, order inquiries, and general feedback), <strong>the fine-tuned model outperformed base Llama 3 for this specialized task at roughly 0.8% of its size</strong>. Llama 3 8B is subject to the variability of a general purpose instruction model, so it makes sense that it gets tripped up on even simple tasks from time to time.</p>
<p>I came away completely sold on the idea of fine-tuning small models for repetitive, special purpose tasks. It is inevitable that many suitable tasks will instead be run on models that are hundreds of billions of parameters or larger via commercial APIs. This presents a huge opportunity for cost savings and efficiency that will only grow over time.</p>
<p><em>The source code for this project is <a href="https://github.com/jpe90/fine-tuned-distilbert-classifier">available on GitHub</a>.</em></p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Fine-tuning NLP transformers for task automation">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Mon, 12 Aug 2024 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/finetune-nlp-transformers.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Pytorch broadcasting mechanics</title>
    <link>https://jeskin.net/blog/pytorch-broadcasting-mechanics.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on July  6, 2024
        
            by Jon Eskin
        
    </section>
    <section>
        <p><em>Update 2025-05-18: Looking back, this post is a little embarassing, but I’ll leave it up in case any other learners stumble across it and find it helpful.</em></p>
<p>In the video <a href="https://www.youtube.com/watch?v=PaCmpygFfXo">The spelled-out intro to language modeling: building makemore</a>, Andrej Karpathy briefly mentioned a bug that can occur when broadcasting tensors during mathematical operations.</p>
<p>In the lines below, N is a 27x27 tensor. The code is trying to normalize the rows of the tensor by dividing each row by the sum of the row.</p>
<p>Here is the incorrect calculation:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>P <span class="op">=</span> N.<span class="bu">float</span>()</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>P <span class="op">/=</span> P.<span class="bu">sum</span>(<span class="dv">1</span>)</span></code></pre></div>
<p>The correct calculation should be:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>P <span class="op">=</span> N.<span class="bu">float</span>()</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>P <span class="op">/=</span> P.<span class="bu">sum</span>(<span class="dv">1</span>,keepdim<span class="op">=</span><span class="va">True</span>)</span></code></pre></div>
<p>Andrej briefly explained why this happens, but his explanation didn’t land for me. I walked through a few smaller examples to understand the mechanics of the tensor operations.</p>
<h1 id="p-p.sum1-keepdimtrue"><code>P / P.sum(1, keepdim=True)</code></h1>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>P <span class="op">=</span> [[<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>],</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>     [<span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">6</span>],</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>     [<span class="dv">7</span>, <span class="dv">8</span>, <span class="dv">9</span>]]</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>P <span class="op">=</span> torch.tensor(P)</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>P_sum <span class="op">=</span> torch.tensor(P).<span class="bu">sum</span>(<span class="dv">1</span>, keepdim<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a><span class="co"># P_sum: this is what a column vector looks like in PyTorch</span></span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a><span class="co"># [[6],</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a><span class="co">#  [15],</span></span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a><span class="co">#  [24]]</span></span></code></pre></div>
<p>Here, <code>P.sum(1, keepdim=True)</code> computes the sum of each row in P and returns its results as a 3 element column vector. The <code>1</code> argument is specifying that we want sums computed by the first axis, which are the rows of the vector. <code>keepdim=True</code> is saying that the output should have the same dimensions as the input. In other words, each row should remain a vector, rather than being flattened during the summation operation.</p>
<p>Now when you divide P by <code>P_sum</code>, each element in a row of P is divided by the sum of that row. This normalizes each row so that the sum of the elements in each row is 1.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>P <span class="op">/</span> P.<span class="bu">sum</span>(<span class="dv">1</span>, keepdim<span class="op">=</span><span class="va">True</span>)  </span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># [[1/6, 2/6, 3/6],</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"># [4/15, 5/15, 6/15],</span></span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a><span class="co"># [7/24, 8/24, 9/24]]</span></span></code></pre></div>
<h1 id="p-p.sum1"><code>P / P.sum(1)</code></h1>
<div class="sourceCode" id="cb5"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>P <span class="op">=</span> [[<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">3</span>],</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>     [<span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">6</span>],</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>     [<span class="dv">7</span>, <span class="dv">8</span>, <span class="dv">9</span>]]</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>     </span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>P <span class="op">=</span> torch.tensor(P)</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>P_sum <span class="op">=</span> P.<span class="bu">sum</span>(<span class="dv">1</span>) </span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="co"># [6, 15, 24]</span></span></code></pre></div>
<p><code>P.sum(1)</code> computes the sum of each row in P but it flattens the results into a single, flat row vector. The inner dimensions are discarded.</p>
<p>When you divide P by <code>P.sum(1)</code>, the broadcasting rules in PyTorch will try to align the shapes. The broadcasting will align the 3 elements of <code>P.sum(1)</code> with the columns of P, which is not the intended behavior. This will result in incorrect normalization. Notice how each summed row is dividing a row, rather than a column. It’s operating on the wrong dimension.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>P <span class="op">/</span> P.<span class="bu">sum</span>(<span class="dv">1</span>)  </span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="co"># [[1/6, 2/15, 3/24],</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="co">#  [4/6, 5/15, 6/24],</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a><span class="co">#  [7/6, 8/15, 9/24]]</span></span></code></pre></div>
<p>PyTorch mechanics can seem a little unintuitive and mind-bendy when you’re learning them, so it can be helpful to pick them apart like we did here to understand them.</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Pytorch broadcasting mechanics">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Sat, 06 Jul 2024 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/pytorch-broadcasting-mechanics.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Leveraging local AI for task automation</title>
    <link>https://jeskin.net/blog/leveraging-local-ai-for-task-automation.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on June 30, 2024
        
            by Jon Eskin
        
    </section>
    <section>
        <p>I’ve encountered many tasks that could almost be automated, except that they have some kind of ambigious step that requires manual attention. For example, it’s extremely common to find instances where freeform text needs to be flagged if it contains sensitive or important information. For such tasks, it’s easy to write software that performs subsequent steps such as interacting with a database or moving files around, but ambiguous text-processing can be extremely difficult to implement programatically. Solutions that use regular expressions or other simple text processing techniques can be finnicky, inaccurate, and difficult to tune.</p>
<p>LLMs have presented an opportunity to make big gains in this problem space. Most programming languages can easily call into large commercial models like OpenAI’s GPT-4 or Google’s Gemini. You can read text in from your program, pass it to these service’s API, and perform actions based on the results. However, using external services for this can be problematic because of the cost and privacy implications. You may not want to depend on a paid service for an important task, and you <em>really</em> may not want to pass organizational data to it.</p>
<p>On the other hand, open source LLMs that run locally on your system have been rapidly improving. They aren’t quite as powerful as commercial models, but they can be surprisingly effective, particularly if you give them simple tasks and make efforts to constrain their responses. Additionally, you always have the option of fine-tuning to improve performance.</p>
<p>One of the biggest open source project to run these models is <a href="https://github.com/ggerganov/llama.cpp">llama.cpp</a>, which performs inference on open source models. It can be built as a library and utilized from other languages which write bindings to its API. I really like to experiment with new problem spaces in Clojure, so I used <a href="https://github.com/phronmophobic/llama.clj">llama.clj</a> to experiment with programatically interfacing with LLMs to automate tasks.</p>
<p>I decided to put together an example to experiment with to simulate a case where we have a bunch of customer service inquiries and we want to figure out whether they are refund requests, order inquiries, or just general feedback. The goal of these experiments was only to perform this classification step.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>(<span class="kw">require</span> &#39;[com.phronemophobic.llama <span class="at">:as</span> llama])</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>(<span class="kw">require</span> &#39;[com.phronemophobic.llama.util <span class="at">:as</span> llutil])</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="co">;; 8B parameter llama 3 model with 4bit quantization that easily runs on my Macbook M1</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> model-path </span><span class="st">&quot;/Users/jon/development/cpp/llama.cpp/models/Meta-Llama-3-8B-Instruct.Q4_0.gguf&quot;</span>)</span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> llama-context </span>(llama/create-context model-path {}))</span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> inquiries</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a>  [{<span class="at">:classification</span> <span class="st">&quot;Order Inquiry&quot;</span> <span class="at">:inquiry</span> <span class="st">&quot;Where is my order? It was supposed to arrive yesterday.&quot;</span>}</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a>   {<span class="at">:classification</span> <span class="st">&quot;Refund Request&quot;</span> <span class="at">:inquiry</span> <span class="st">&quot;I want to return an item and get a refund. Can you help me with that?&quot;</span>}</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>   {<span class="at">:classification</span> <span class="st">&quot;General Feedback&quot;</span> <span class="at">:inquiry</span> <span class="st">&quot;I think your website could be more user-friendly.&quot;</span>}</span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>   <span class="co">;; ... many more omitted</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>])</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> classifications </span>[<span class="st">&quot;Order Inquiry&quot;</span> <span class="st">&quot;Refund Request&quot;</span> <span class="st">&quot;General Feedback&quot;</span>])</span></code></pre></div>
<p>My setup was to run each classification over a classify function and append the experimental result to the actual result so that I could compare them.</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> naive-classify </span>[inquiry]</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>  (llama/generate-string</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>   llama-context</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a>   (llama3-inquiry inquiry)))</span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> naive-greedy-classify </span>[inquiry]</span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>  (llama/generate-string</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a>   llama-context</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>   (llama3-inquiry inquiry)</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>   {<span class="at">:samplef</span> llama/sample-logits-greedy}))</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> classify-inquiries </span>[classify-fn inquiries]</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">map</span> (<span class="kw">fn</span> [inquiry]</span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>         (<span class="kw">assoc</span> inquiry</span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>                <span class="at">:experimental-classification</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a>                (classify-fn (<span class="at">:inquiry</span> inquiry))))</span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a>       inquiries))</span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a>       </span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> correct-results </span>[results]</span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">filter</span> #(<span class="kw">=</span> (<span class="at">:classification</span> <span class="va">%</span>) (<span class="at">:experimental-classification</span> <span class="va">%</span>)) results))</span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> incorrect-results </span>[results]</span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">filter</span> #(<span class="kw">not=</span> (<span class="at">:classification</span> <span class="va">%</span>) (<span class="at">:experimental-classification</span> <span class="va">%</span>)) results))</span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> summary-results </span>[results]</span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true" tabindex="-1"></a>  {<span class="at">:correct</span> (<span class="kw">count</span> (correct-results results))</span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true" tabindex="-1"></a>   <span class="at">:incorrect</span> (<span class="kw">count</span> (incorrect-results results))</span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true" tabindex="-1"></a>   <span class="at">:accuracy</span> (<span class="kw">/</span> (<span class="kw">count</span> (correct-results results)) (<span class="kw">count</span> results))})</span></code></pre></div>
<p>llama.clj provides a few different sampling functions that you can use to generate text. Sampling functions are the means by which the text tokens which form the response are selected. In addition to giving a few options for pre-defined functions, the library also gives you the ability to create your own. For my experiments, I started with microstatv2 (which <code>naive-classify</code> above uses above - llama.clj uses this function by default) and a greedy sampling function (which <code>naive-greedy-classify</code> requests via the <code>samplef</code> key).</p>
<p>I noticed I would generally get about 92% accuracy with the model, data, and prompts I used. When I looked at the responses that were incorrect, I noticed that many of them were from responses that would misformat the specific response format that I asked for, e.g. returning something like “[Order Inquiry]” instead of just “Order Inquiry”. Since llama.cpp is feeding you text straight from the model, you have to get a little creative to deal with this. Both llama.cpp and llama.clj provide different ways to constrain models to produce valid json, which you can then further validate. I decided to try constraining responses by first just retrying if the gets a response does not belong to the set of strings I’m looking for:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> retry-classify </span>[inquiry]</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">loop</span> [<span class="kw">count</span> <span class="dv">0</span>]</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">let</span> [response (llama/generate-string</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>                    llama-context</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>                    (llama3-inquiry inquiry))]</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">if</span> (<span class="kw">some</span> #(<span class="kw">=</span> response <span class="va">%</span>) classifications)</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>        response</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>        (<span class="kw">if</span> (<span class="kw">&gt;</span> <span class="kw">count</span> <span class="dv">3</span>)</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>          <span class="va">nil</span></span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">recur</span> (<span class="kw">inc</span> <span class="kw">count</span>)))))))</span></code></pre></div>
<p>This took care of those cases nicely and brought accuracy up to about 95%. It didn’t noticeably decrease performance because it doesn’t retry often - only in specific failure cases.</p>
<p>One potential issue is that if the model ever went haywire and started filling the context window with garbage (which does happen occasionally on some models), it would take inordinate amounts of time to generate responses and fail, which would slow everything to a crawl. I decided to try hand rolling a greedy sampling function that selects the first token which completes a valid classification to avoid this situation. The way that llama.cpp works is that it determines the relative probability of every possible text token at each point in the response. What I wanted to try to do in my sampling function was constrain the responses by forcing it to only choose tokens that are available in pre-defined categories (defined by <code>classifications</code> above). For the first generated token, it can only select the first token of any of those classifications. It selects the member of those tokens which it deems most likely to be correct.</p>
<p>From there, at each step, we filter out only classifications which start with the response tokens we’ve already accumulated, and repeat the process until we have a full response.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> greedy-constrained-classify </span>[inquiry]</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [prompt (llama3-prompt (<span class="kw">str</span> <span class="st">&quot;Inquiries can be one of &#39;Order Inquiry&#39;, &#39;Refund Request&#39;, or &#39;General Feedback&#39;. What is the classification of the following inquiry? Reply with only the classification and nothing else: </span><span class="sc">\&quot;</span><span class="st">&quot;</span> inquiry <span class="st">&quot;</span><span class="sc">\&quot;</span><span class="st">&quot;</span>))</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>        prompt-tokens (llutil/tokenize llama-context prompt)]</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>    (llama/llama-update llama-context (llama/bos) <span class="dv">0</span>)</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">doseq</span> [token prompt-tokens]</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>      (llama/llama-update llama-context token))</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">loop</span> [classification-tokens (<span class="kw">map</span> #(llutil/tokenize llama-context <span class="va">%</span>) classifications)</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>           acc []]</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">let</span> [logits (llama/get-logits llama-context)</span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>            valid-tokens (<span class="kw">map</span> <span class="kw">first</span> classification-tokens)</span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>            token (<span class="kw">-&gt;&gt;</span> logits</span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>                       (map-indexed (<span class="kw">fn</span> [idx p]</span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>                                      [idx p]))</span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a>                       (<span class="kw">filter</span> #(<span class="kw">contains?</span> (<span class="kw">set</span> valid-tokens) (<span class="kw">first</span> <span class="va">%</span>)))</span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a>                       (<span class="kw">apply</span> <span class="kw">max-key</span> <span class="kw">second</span>)</span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a>                       <span class="kw">first</span>)</span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>            next-acc (<span class="kw">conj</span> acc token)</span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a>            next-classification-tokens (<span class="kw">-&gt;&gt;</span> classification-tokens</span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a>                                            (<span class="kw">filter</span> #(<span class="kw">=</span> (<span class="kw">first</span> <span class="va">%</span>) token))</span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a>                                            (<span class="kw">map</span> <span class="kw">rest</span>)</span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a>                                            (<span class="kw">map</span> <span class="kw">vec</span>)</span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a>                                            (<span class="kw">into</span> []))]</span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a>        (<span class="kw">if</span> (<span class="kw">or</span> (<span class="kw">empty?</span> next-classification-tokens) (<span class="kw">every?</span> <span class="kw">empty?</span> next-classification-tokens))</span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a>          (llutil/untokenize llama-context next-acc)</span>
<span id="cb4-25"><a href="#cb4-25" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">do</span></span>
<span id="cb4-26"><a href="#cb4-26" aria-hidden="true" tabindex="-1"></a>            (llama/llama-update llama-context token)</span>
<span id="cb4-27"><a href="#cb4-27" aria-hidden="true" tabindex="-1"></a>            (<span class="kw">recur</span> next-classification-tokens next-acc)))))))</span></code></pre></div>
<p>Unfortunately, in my testing, this brought accuracy way down to around 84%. When I dug into the issue, it seemed that in failing cases, the very first response token it generates is for the wrong classification. It’s kind of hard for me to tell what the issue is without more knowledge about llama.cpp or how the wrapper works. It’s possible that that there may be an issue with the initial context setup while reading in the prompt. I emulated the setup steps that I found in llama.clj’s documentation, but perhaps llama.cpp has changed since the repository was written (the examples and tutorials were written with llama 2), or maybe llama 3 is fundamentally different and requires different processing from llama 2, or maybe I overlooked some other mistake in my code.</p>
<p>I’m planning to dig into llama.cpp more in depth next to learn more and continue experimenting. By looking at llama.cpp’s internal mechanics, it would be interesting to see why my sampling function failed. Before that, I think it might be a good idea to work through <a href="https://karpathy.ai/zero-to-hero.html">Andrej Karpathy’s neural network course</a> to try to get some more context and understanding of how these libraries work under the hood. It probably isn’t strictly necessary, but the more domain knowledge you have, the easier it is to understand the architecture of software projects.</p>
<p>Overall, llama.clj is extremely fun to learn and sketch out ideas with. Even without much knowledge about LLMs, you could probably use it put together some interesting and effective applications. I came in expecting just a dry but functional wrapper to llama.cpp, but the documentation was fantastic and I learned a lot of new techniques and concepts from the clojure code used throughout the project. From the little bit of llama.cpp that I looked at, it didn’t really seem to strive to maintain API stability, so it seems like it must be a challenge for wrapper libraries such as llama.clj to keep up to date. In any case, I had a great time and I’m looking forward to learning more about training and inference.</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Leveraging local AI for task automation">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Sun, 30 Jun 2024 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/leveraging-local-ai-for-task-automation.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Building llama chat in Go and Clojure</title>
    <link>https://jeskin.net/blog/llama-chat.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on June 20, 2024
        
            by Jon Eskin
        
    </section>
    <section>
        <p><code>ollama</code> is a software project which makes it easy to run LLMs on your local machine. Running <code>ollama run llama3</code> downloaded a 4-bit quantized model that could run on my Macbook M2, and then dropped me into a CLI where I could enter prompts and responses stream into my terminal.</p>
<center>
<video width=75% controls autoplay>
<source src="/videos/LlamaTerminal.m4v" type="video/mp4">
Your browser does not support the video tag.
</video>
</center>
<p>In the above clip, I’m running Meta’s latest open source model <a href="https://ai.meta.com/blog/meta-llama-3/">Llama 3</a>. These models are less powerful than OpenAI’s models - which are mainly GPT-4o, GPT-4, and GPT-3.5 Turbo as of writing, but they pack a serious punch. The fact that a model that can run on a laptop can get within throwing distance of GPT-3.5, which powered all of ChatGPT not long ago is pretty insane.</p>
<p>There’s a number of reasons you would want to run your own local AI instead of something like ChatGPT:</p>
<ol type="1">
<li>Privacy. With ChatGPT, OpenAI has access to every query and response you send. When someone else controls your data, despite their best intentions, sometimes it will leak. In OpenAI’s case, <a href="https://www.pluralsight.com/blog/security-professional/chatgpt-data-breach">it already has already happened</a> at least once. Depending on the sensitivity of your prompts, this may be more or less of a concern.</li>
<li>Flexibility. Using your own models allows you to use special purpose models better suited for individual tasks. You can also fine-tune open source models on your own hardware, which might be incredibly useful if you have a lot of organizational data you would like the model to recognize.</li>
</ol>
<p>The project’s README shows some of its capabilities:</p>
<hr />
<h2 id="rest-api">REST API</h2>
<p>Ollama has a REST API for running and managing models.</p>
<h3 id="generate-a-response">Generate a response</h3>
<pre><code>curl http://localhost:11434/api/generate -d &#39;{
  &quot;model&quot;: &quot;llama3&quot;,
  &quot;prompt&quot;:&quot;Why is the sky blue?&quot;
}&#39;</code></pre>
<h3 id="chat-with-a-model">Chat with a model</h3>
<pre><code>curl http://localhost:11434/api/chat -d &#39;{
  &quot;model&quot;: &quot;llama3&quot;,
  &quot;messages&quot;: [
    { &quot;role&quot;: &quot;user&quot;, &quot;content&quot;: &quot;why is the sky blue?&quot; }
  ]
}&#39;</code></pre>
<p>See the <a href="./docs/api.md">API documentation</a> for all endpoints.</p>
<hr />
<p>We can try one of those curl commands and see what the responses look like:</p>
<pre><code>{&quot;model&quot;:&quot;llama3&quot;,&quot;created_at&quot;:&quot;2024-06-20T00:25:35.629748Z&quot;,&quot;response&quot;:&quot;The&quot;,&quot;done&quot;:false}
{&quot;model&quot;:&quot;llama3&quot;,&quot;created_at&quot;:&quot;2024-06-20T00:25:35.662856Z&quot;,&quot;response&quot;:&quot; sky&quot;,&quot;done&quot;:false}
{&quot;model&quot;:&quot;llama3&quot;,&quot;created_at&quot;:&quot;2024-06-20T00:25:35.695868Z&quot;,&quot;response&quot;:&quot; appears&quot;,&quot;done&quot;:false}
{&quot;model&quot;:&quot;llama3&quot;,&quot;created_at&quot;:&quot;2024-06-20T00:25:35.729704Z&quot;,&quot;response&quot;:&quot; blue&quot;,&quot;done&quot;:false}
{&quot;model&quot;:&quot;llama3&quot;,&quot;created_at&quot;:&quot;2024-06-20T00:25:35.763045Z&quot;,&quot;response&quot;:&quot; because&quot;,&quot;done&quot;:false}
...</code></pre>
<p>These stream in over time instead of being dumped out when the message is complete. Here’s the code from ollama that’s generating the stream:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode go"><code class="sourceCode go"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="kw">func</span> streamResponse<span class="op">(</span>c <span class="op">*</span>gin<span class="op">.</span>Context<span class="op">,</span> ch <span class="kw">chan</span> <span class="dt">any</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>	c<span class="op">.</span>Header<span class="op">(</span><span class="st">&quot;Content-Type&quot;</span><span class="op">,</span> <span class="st">&quot;application/x-ndjson&quot;</span><span class="op">)</span></span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>	c<span class="op">.</span>Stream<span class="op">(</span><span class="kw">func</span><span class="op">(</span>w io<span class="op">.</span>Writer<span class="op">)</span> <span class="dt">bool</span> <span class="op">{</span></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a>		val<span class="op">,</span> ok <span class="op">:=</span> <span class="op">&lt;-</span>ch</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> <span class="op">!</span>ok <span class="op">{</span></span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a>			<span class="cf">return</span> <span class="ot">false</span></span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a>		bts<span class="op">,</span> err <span class="op">:=</span> json<span class="op">.</span>Marshal<span class="op">(</span>val<span class="op">)</span></span>
<span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>			slog<span class="op">.</span>Info<span class="op">(</span>fmt<span class="op">.</span>Sprintf<span class="op">(</span><span class="st">&quot;streamResponse: json.Marshal failed with %s&quot;</span><span class="op">,</span> err<span class="op">))</span></span>
<span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>			<span class="cf">return</span> <span class="ot">false</span></span>
<span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a>		<span class="co">// Delineate chunks with new-line delimiter</span></span>
<span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a>		bts <span class="op">=</span> <span class="bu">append</span><span class="op">(</span>bts<span class="op">,</span> <span class="ch">&#39;\n&#39;</span><span class="op">)</span></span>
<span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> _<span class="op">,</span> err <span class="op">:=</span> w<span class="op">.</span>Write<span class="op">(</span>bts<span class="op">);</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a>			slog<span class="op">.</span>Info<span class="op">(</span>fmt<span class="op">.</span>Sprintf<span class="op">(</span><span class="st">&quot;streamResponse: w.Write failed with %s&quot;</span><span class="op">,</span> err<span class="op">))</span></span>
<span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a>			<span class="cf">return</span> <span class="ot">false</span></span>
<span id="cb4-20"><a href="#cb4-20" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb4-21"><a href="#cb4-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb4-22"><a href="#cb4-22" aria-hidden="true" tabindex="-1"></a>		<span class="cf">return</span> <span class="ot">true</span></span>
<span id="cb4-23"><a href="#cb4-23" aria-hidden="true" tabindex="-1"></a>	<span class="op">})</span></span>
<span id="cb4-24"><a href="#cb4-24" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p><code>application/x-ndjson</code> means “Newline delimited JSON”. This detail makes it easier to delineate between messages. Since you know they are broken on newlines, you know you can use line-reading functionality that’s present in many languages to process them.</p>
<h1 id="a-simple-chatgpt-clone">A Simple ChatGPT Clone</h1>
<p>For fun, we can build a simple ChatGPT-style web application on top of this API with a few components:</p>
<ul>
<li>an HTML page that presents a form to collect a prompt with a button for submission, and javascript to send the prompt, listens for a response, and writes the responses to the page as they stream in</li>
<li>a server with two endpoints:
<ul>
<li>one rendering the page above,</li>
<li>one handling receiving a prompt, marshalling it to ollama, and streaming the response to the client.</li>
</ul></li>
</ul>
<p>With that in mind, let’s get to work!</p>
<h1 id="go-implementation">Go Implementation</h1>
<p>We can do everything in Go using just the standard library.</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode go"><code class="sourceCode go"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">package</span> main</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">import</span> <span class="op">(</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>	<span class="st">&quot;bytes&quot;</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a>	<span class="st">&quot;encoding/json&quot;</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>	<span class="st">&quot;fmt&quot;</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>	<span class="st">&quot;html/template&quot;</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>	<span class="st">&quot;io&quot;</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>	<span class="st">&quot;log&quot;</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>	<span class="st">&quot;net/http&quot;</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a><span class="op">)</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a><span class="kw">func</span> main<span class="op">()</span> <span class="op">{</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a>	tmpl <span class="op">:=</span> template<span class="op">.</span>Must<span class="op">(</span>template<span class="op">.</span>ParseFiles<span class="op">(</span><span class="st">&quot;index.html&quot;</span><span class="op">))</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a>	http<span class="op">.</span>HandleFunc<span class="op">(</span><span class="st">&quot;/&quot;</span><span class="op">,</span> <span class="kw">func</span><span class="op">(</span>w http<span class="op">.</span>ResponseWriter<span class="op">,</span> r <span class="op">*</span>http<span class="op">.</span>Request<span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> err <span class="op">:=</span> tmpl<span class="op">.</span>Execute<span class="op">(</span>w<span class="op">,</span> <span class="ot">nil</span><span class="op">);</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a>			http<span class="op">.</span>Error<span class="op">(</span>w<span class="op">,</span> err<span class="op">.</span>Error<span class="op">(),</span> http<span class="op">.</span>StatusInternalServerError<span class="op">)</span></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a>	<span class="op">})</span></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a>	http<span class="op">.</span>HandleFunc<span class="op">(</span><span class="st">&quot;/api/generate&quot;</span><span class="op">,</span> <span class="kw">func</span><span class="op">(</span>w http<span class="op">.</span>ResponseWriter<span class="op">,</span> r <span class="op">*</span>http<span class="op">.</span>Request<span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a>		<span class="kw">var</span> request <span class="kw">struct</span> <span class="op">{</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a>			Model  <span class="dt">string</span> <span class="st">`json:&quot;model&quot;`</span></span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a>			Prompt <span class="dt">string</span> <span class="st">`json:&quot;prompt&quot;`</span></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> err <span class="op">:=</span> json<span class="op">.</span>NewDecoder<span class="op">(</span>r<span class="op">.</span>Body<span class="op">).</span>Decode<span class="op">(&amp;</span>request<span class="op">);</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a>			http<span class="op">.</span>Error<span class="op">(</span>w<span class="op">,</span> err<span class="op">.</span>Error<span class="op">(),</span> http<span class="op">.</span>StatusBadRequest<span class="op">)</span></span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a>			<span class="cf">return</span></span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a>		payload <span class="op">:=</span> <span class="kw">map</span><span class="op">[</span><span class="dt">string</span><span class="op">]</span><span class="dt">string</span><span class="op">{</span></span>
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a>			<span class="st">&quot;model&quot;</span><span class="op">:</span>  request<span class="op">.</span>Model<span class="op">,</span></span>
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a>			<span class="st">&quot;prompt&quot;</span><span class="op">:</span> request<span class="op">.</span>Prompt<span class="op">,</span></span>
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb5-36"><a href="#cb5-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-37"><a href="#cb5-37" aria-hidden="true" tabindex="-1"></a>		payloadBytes<span class="op">,</span> err <span class="op">:=</span> json<span class="op">.</span>Marshal<span class="op">(</span>payload<span class="op">)</span></span>
<span id="cb5-38"><a href="#cb5-38" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb5-39"><a href="#cb5-39" aria-hidden="true" tabindex="-1"></a>			http<span class="op">.</span>Error<span class="op">(</span>w<span class="op">,</span> err<span class="op">.</span>Error<span class="op">(),</span> http<span class="op">.</span>StatusInternalServerError<span class="op">)</span></span>
<span id="cb5-40"><a href="#cb5-40" aria-hidden="true" tabindex="-1"></a>			<span class="cf">return</span></span>
<span id="cb5-41"><a href="#cb5-41" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb5-42"><a href="#cb5-42" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-43"><a href="#cb5-43" aria-hidden="true" tabindex="-1"></a>		resp<span class="op">,</span> err <span class="op">:=</span> http<span class="op">.</span>Post<span class="op">(</span><span class="st">&quot;http://localhost:11434/api/generate&quot;</span><span class="op">,</span> <span class="st">&quot;application/json&quot;</span><span class="op">,</span> bytes<span class="op">.</span>NewBuffer<span class="op">(</span>payloadBytes<span class="op">))</span></span>
<span id="cb5-44"><a href="#cb5-44" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb5-45"><a href="#cb5-45" aria-hidden="true" tabindex="-1"></a>			http<span class="op">.</span>Error<span class="op">(</span>w<span class="op">,</span> err<span class="op">.</span>Error<span class="op">(),</span> http<span class="op">.</span>StatusInternalServerError<span class="op">)</span></span>
<span id="cb5-46"><a href="#cb5-46" aria-hidden="true" tabindex="-1"></a>			<span class="cf">return</span></span>
<span id="cb5-47"><a href="#cb5-47" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb5-48"><a href="#cb5-48" aria-hidden="true" tabindex="-1"></a>		<span class="cf">defer</span> resp<span class="op">.</span>Body<span class="op">.</span>Close<span class="op">()</span></span>
<span id="cb5-49"><a href="#cb5-49" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-50"><a href="#cb5-50" aria-hidden="true" tabindex="-1"></a>		w<span class="op">.</span>Header<span class="op">().</span>Set<span class="op">(</span><span class="st">&quot;Content-Type&quot;</span><span class="op">,</span> <span class="st">&quot;application/json&quot;</span><span class="op">)</span></span>
<span id="cb5-51"><a href="#cb5-51" aria-hidden="true" tabindex="-1"></a>		w<span class="op">.</span>WriteHeader<span class="op">(</span>http<span class="op">.</span>StatusOK<span class="op">)</span></span>
<span id="cb5-52"><a href="#cb5-52" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-53"><a href="#cb5-53" aria-hidden="true" tabindex="-1"></a>		decoder <span class="op">:=</span> json<span class="op">.</span>NewDecoder<span class="op">(</span>resp<span class="op">.</span>Body<span class="op">)</span></span>
<span id="cb5-54"><a href="#cb5-54" aria-hidden="true" tabindex="-1"></a>		<span class="cf">for</span> <span class="op">{</span></span>
<span id="cb5-55"><a href="#cb5-55" aria-hidden="true" tabindex="-1"></a>			<span class="kw">var</span> response <span class="kw">map</span><span class="op">[</span><span class="dt">string</span><span class="op">]</span><span class="kw">interface</span><span class="op">{}</span></span>
<span id="cb5-56"><a href="#cb5-56" aria-hidden="true" tabindex="-1"></a>			<span class="cf">if</span> err <span class="op">:=</span> decoder<span class="op">.</span>Decode<span class="op">(&amp;</span>response<span class="op">);</span> err <span class="op">==</span> io<span class="op">.</span>EOF <span class="op">{</span></span>
<span id="cb5-57"><a href="#cb5-57" aria-hidden="true" tabindex="-1"></a>				<span class="cf">break</span></span>
<span id="cb5-58"><a href="#cb5-58" aria-hidden="true" tabindex="-1"></a>			<span class="op">}</span> <span class="cf">else</span> <span class="cf">if</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb5-59"><a href="#cb5-59" aria-hidden="true" tabindex="-1"></a>				http<span class="op">.</span>Error<span class="op">(</span>w<span class="op">,</span> err<span class="op">.</span>Error<span class="op">(),</span> http<span class="op">.</span>StatusInternalServerError<span class="op">)</span></span>
<span id="cb5-60"><a href="#cb5-60" aria-hidden="true" tabindex="-1"></a>				<span class="cf">return</span></span>
<span id="cb5-61"><a href="#cb5-61" aria-hidden="true" tabindex="-1"></a>			<span class="op">}</span></span>
<span id="cb5-62"><a href="#cb5-62" aria-hidden="true" tabindex="-1"></a>			<span class="cf">if</span> err <span class="op">:=</span> json<span class="op">.</span>NewEncoder<span class="op">(</span>w<span class="op">).</span>Encode<span class="op">(</span>response<span class="op">);</span> err <span class="op">!=</span> <span class="ot">nil</span> <span class="op">{</span></span>
<span id="cb5-63"><a href="#cb5-63" aria-hidden="true" tabindex="-1"></a>				http<span class="op">.</span>Error<span class="op">(</span>w<span class="op">,</span> err<span class="op">.</span>Error<span class="op">(),</span> http<span class="op">.</span>StatusInternalServerError<span class="op">)</span></span>
<span id="cb5-64"><a href="#cb5-64" aria-hidden="true" tabindex="-1"></a>				<span class="cf">return</span></span>
<span id="cb5-65"><a href="#cb5-65" aria-hidden="true" tabindex="-1"></a>			<span class="op">}</span></span>
<span id="cb5-66"><a href="#cb5-66" aria-hidden="true" tabindex="-1"></a>			w<span class="op">.(</span>http<span class="op">.</span>Flusher<span class="op">).</span>Flush<span class="op">()</span></span>
<span id="cb5-67"><a href="#cb5-67" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb5-68"><a href="#cb5-68" aria-hidden="true" tabindex="-1"></a>	<span class="op">})</span></span>
<span id="cb5-69"><a href="#cb5-69" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-70"><a href="#cb5-70" aria-hidden="true" tabindex="-1"></a>	fmt<span class="op">.</span>Println<span class="op">(</span><span class="st">&quot;Server is running on http://localhost:8080&quot;</span><span class="op">)</span></span>
<span id="cb5-71"><a href="#cb5-71" aria-hidden="true" tabindex="-1"></a>	log<span class="op">.</span>Fatal<span class="op">(</span>http<span class="op">.</span>ListenAndServe<span class="op">(</span><span class="st">&quot;:8080&quot;</span><span class="op">,</span> <span class="ot">nil</span><span class="op">))</span></span>
<span id="cb5-72"><a href="#cb5-72" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>Handling the chunked JSON is done in that last block with the decoder. The decoder is a state machine that handles its own buffering. The <code>for</code> loop checks for error conditions, and if they aren’t found, will flush the buffer contents to the client, which in this case is a javascript program running inside the browser. As the Javascript program receives those messages, it writes them on the page.</p>
<p>Decoding and immediately re-encoding the result looks a little silly, we could have just pointed the client directly to ollama and left all of this out of the server. But in a more realistic deployment, we would not want clients interacting directly with the model - it would be important for the server to sit in between and manage the process.</p>
<center>
<figure>
<video width=75% controls autoplay>
<source src="/videos/LlamaWebapp.m4v" type="video/mp4">
Your browser does not support the video tag.
</video>
<figcaption>
How the webapp looks while running
</figcaption>
</figure>
</center>
<p>That was all it took to build the Go implementation! I like how skimming the docs for standard library modules that sounded like what you’re looking for is all it really takes to get up and running in the language. The fact that I can do that when I’m not very experienced in the language is a testament to the skill of its design.</p>
<p>The rest of this post will cover building this same functionality in Clojure instead of Go.</p>
<h1 id="clojure---http-kit-implementation">Clojure - HTTP kit Implementation</h1>
<p>Unlike Go, Clojure does not have a production grade HTTP server and client available in the standard library.</p>
<p>I used <a href="https://http-kit.github.io">http-kit</a> for this task because it was the smallest library I knew of which could single handedly meet my requirements - it has an synchronous/asynchronous http clients as well as synchronous/asynchronous Ring-compliant web servers. I also chose it because of its reputation of being small and focused with minimal dependencies.</p>
<p>In Clojure, Ring-compliant HTTP server’s use handlers that are maps containing keys for various http constructs such as <code>:status</code>, <code>:headers</code>, and <code>:body</code>. I used this as a starting point and built a run of the mill synchronous handler that serves the same page as the Go app above.</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> read-html-template </span>[]</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">println</span> <span class="st">&quot;fetching html template&quot;</span>)</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">slurp</span> (io/resource <span class="st">&quot;index.html&quot;</span>)))</span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> index-handler </span>[req]</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">println</span> <span class="st">&quot;in index handler&quot;</span>)</span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>  {<span class="at">:status</span>  <span class="dv">200</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>   <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;text/html&quot;</span>}</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a>   <span class="at">:body</span>    (read-html-template)})</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a>   </span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> not-found-handler </span>[req]</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a>  {<span class="at">:status</span>  <span class="dv">404</span></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a>   <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;text/plain&quot;</span>}</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a>   <span class="at">:body</span>    <span class="st">&quot;Page not found.&quot;</span>})</span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a>   </span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> app </span>[req]</span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [uri (<span class="at">:uri</span> req)</span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true" tabindex="-1"></a>        method (<span class="at">:request-method</span> req)]</span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">cond</span></span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">and</span> (<span class="kw">=</span> uri <span class="st">&quot;/&quot;</span>) (<span class="kw">=</span> method <span class="at">:get</span>)) (index-handler req)</span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true" tabindex="-1"></a>      <span class="at">:else</span> (not-found-handler req)))) </span></code></pre></div>
<p>That all works because you can pass any arbitrary string as your response body and set its content type to text/html, it will render correctly.</p>
<p>At this point, the client has a javascript programming that wants to pass a prompt and listen for the streaming response. We will want an asynchronous handler to do this, so we’ll use http-kit’s <code>as-channel</code> function. It’s not covered in the project’s main documentation, but the docstring in the source itself has enough to get started with. Let’s use it and wire it up.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> clients</span>_ (<span class="kw">atom</span> #{}))</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> my-async-handler </span>[ring-req]</span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>  (http/as-channel ring-req</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>              {<span class="at">:on-open</span> (<span class="kw">fn</span> [ch]</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a>                          (<span class="kw">println</span> <span class="st">&quot;conn open!&quot;</span>)</span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a>                          (<span class="kw">println</span> ring-req)</span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>                          (<span class="kw">swap!</span> clients_ <span class="kw">conj</span> ch))</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>               <span class="at">:on-close</span> (<span class="kw">fn</span> [ch]</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a>                           (<span class="kw">println</span> <span class="st">&quot;conn close!&quot;</span>)</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>                           (<span class="kw">swap!</span> clients_ <span class="kw">disj</span> ch))}))</span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> app </span>[req]</span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [uri (<span class="at">:uri</span> req)</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a>        method (<span class="at">:request-method</span> req)]</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">cond</span></span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">and</span> (<span class="kw">=</span> uri <span class="st">&quot;/&quot;</span>) (<span class="kw">=</span> method <span class="at">:get</span>)) (index-handler req)</span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">and</span> (<span class="kw">=</span> uri <span class="st">&quot;/api/generate&quot;</span>) (<span class="kw">=</span> method <span class="at">:post</span>)) (my-async-handler req)</span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true" tabindex="-1"></a>      <span class="at">:else</span> (not-found-handler req))))</span></code></pre></div>
<p>When I evaluate these forms, switch back to my browser, and hit the “Submit” buttton, I see the print statements fire. When I evaliate <code>clients_</code>, I see it now has a new client.</p>
<p>To get the actual messages, we can start writing a function to send a POST request to the ollama server. After that, we want to find a way to access messages as they stream in and send them to the client.</p>
<p>In typical asynchronous programming, this kind of message passage is done with callbacks. http-kit has its own concept of channels that is separate from core.async with its own semantics. When we get a message, we will want to send it to the client like this:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>(http/send! ch {<span class="at">:status</span> <span class="dv">200</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>                <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>}</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>                <span class="at">:body</span> json-encoded}</span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>            <span class="va">false</span>)</span></code></pre></div>
<p>To kick off the HTTP request to ollama with the prompt, we want to use http-kit’s client functionality. There is documentation on making asynchronous requests with callbacks:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co">;fire and forget, returns immediately[1], returned promise is ignored</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>(http/get <span class="st">&quot;http://host.com/path&quot;</span>)</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> options </span>{<span class="at">:timeout</span> <span class="dv">200</span>             <span class="co">; ms</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>              <span class="at">:basic-auth</span> [<span class="st">&quot;user&quot;</span> <span class="st">&quot;pass&quot;</span>]</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>              <span class="at">:query-params</span> {<span class="at">:param</span> <span class="st">&quot;value&quot;</span> <span class="at">:param2</span> [<span class="st">&quot;value1&quot;</span> <span class="st">&quot;value2&quot;</span>]}</span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a>              <span class="at">:user-agent</span> <span class="st">&quot;User-Agent-string&quot;</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a>              <span class="at">:headers</span> {<span class="st">&quot;X-Header&quot;</span> <span class="st">&quot;Value&quot;</span>}})</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>(http/get <span class="st">&quot;http://host.com/path&quot;</span> options</span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">fn</span> [{<span class="at">:keys</span> [status headers body error]}] <span class="co">;; asynchronous response handling</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>            (<span class="kw">if</span> error</span>
<span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a>              (<span class="kw">println</span> <span class="st">&quot;Failed, exception is &quot;</span> error)</span>
<span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a>              (<span class="kw">println</span> <span class="st">&quot;Async HTTP GET: &quot;</span> status))))</span>
<span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> <span class="co">; [1] may not always true, since DNS lookup maybe slow</span></span></code></pre></div>
<p>This will kind of work, but it will fire the promise once all the messages are received. We’re looking for some way to access the messages as the stream in.</p>
<p>I looked at the source for help and found the following. Without guidance from the documentation, the <code>:stream</code> option sounded like a good option to try.</p>
<pre><code>   Returned body type is controlled by `:as` option:

    Without automatic unzipping:
      `:none`           - org.httpkit.DynamicBytes
      `:raw-byte-array` - bytes[]

    With automatic unzipping:
      `:byte-array`     - bytes[]
      `:stream`         - ByteInputStream
      `:text`           - String (charset based on Content-Type header)
      `:auto`           - As `:text` or `:stream` (based on Content-Type header)</code></pre>
<p>There’s no further mention of how to use a ByteInputStream in the docs, so we can check the source for that.</p>
<pre><code>package org.httpkit;

import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;

/**
 * No synchronization, better toString
 */
public class BytesInputStream extends InputStream {
    private final byte[] buf;
    private final int count;
    private int mark = 0;

    private int pos;

    public BytesInputStream(byte[] data, int length) {
        this.buf = data;
        this.count = length;
        this.pos = 0;
    }

    /**
     * get the underlying bytes, copied
     *
     * @return
     */
    public byte[] bytes() {
        return Arrays.copyOf(buf, count);
    }

    public int read() throws IOException {
        return (pos &lt; count) ? (buf[pos++] &amp; 0xff) : -1;
    }
...</code></pre>
<p>Since this class subclasses InputStream, it can be read by an <a href="https://docs.oracle.com/javase%2F8%2Fdocs%2Fapi%2F%2F/java/io/InputStreamReader.html">InputStreamReader</a>.</p>
<p>The docs for InputStreamReader recommend wrapping with a BufferedReader, so let’s do that. We want to take inputs line by line</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> send-prompt-to-ollama </span>[ch prompt]</span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [url <span class="st">&quot;http://localhost:11434/api/generate&quot;</span>]</span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>    (client/post url</span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>                 {<span class="at">:as</span> <span class="at">:stream</span></span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>                  <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>}</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>                  <span class="at">:body</span> (json/write-str {<span class="at">:model</span> <span class="st">&quot;llama3&quot;</span> <span class="at">:prompt</span> prompt})}</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>                 (<span class="kw">fn</span> [{<span class="at">:keys</span> [status headers body error]}]</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>                   (<span class="kw">if</span> error</span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>                     (<span class="kw">do</span></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a>                       (http/send! ch {<span class="at">:status</span> <span class="dv">500</span> <span class="at">:body</span> (<span class="kw">str</span> <span class="st">&quot;Internal Server Error: &quot;</span> error)})</span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>                       (http/close ch))</span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>                     (<span class="kw">let</span> [stream ^java.io.InputStream body</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a>                           reader (java.io.BufferedReader. (java.io.InputStreamReader. stream <span class="st">&quot;UTF-8&quot;</span>))]</span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a>                       (<span class="kw">loop</span> []</span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>                         (<span class="kw">let</span> [line (.readLine reader)]</span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a>                           (<span class="kw">if</span> (<span class="kw">nil?</span> line)</span>
<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a>                             (<span class="kw">do</span></span>
<span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a>                               (.close reader)</span>
<span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a>                               (http/close ch))</span>
<span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a>                             (<span class="kw">do</span></span>
<span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a>                               (<span class="kw">let</span> [response (<span class="at">:response</span> (json/read-json line))</span>
<span id="cb12-22"><a href="#cb12-22" aria-hidden="true" tabindex="-1"></a>                                     json-encoded (json/write-str {<span class="at">:response</span> response})]</span>
<span id="cb12-23"><a href="#cb12-23" aria-hidden="true" tabindex="-1"></a>                                 (http/send! ch {<span class="at">:status</span> <span class="dv">200</span></span>
<span id="cb12-24"><a href="#cb12-24" aria-hidden="true" tabindex="-1"></a>                                                 <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>}</span>
<span id="cb12-25"><a href="#cb12-25" aria-hidden="true" tabindex="-1"></a>                                                 <span class="at">:body</span> json-encoded}</span>
<span id="cb12-26"><a href="#cb12-26" aria-hidden="true" tabindex="-1"></a>                                             <span class="va">false</span>))</span>
<span id="cb12-27"><a href="#cb12-27" aria-hidden="true" tabindex="-1"></a>                               (<span class="kw">recur</span>)))))))))))</span></code></pre></div>
<p>Unfortunately, after writing this up, I observed the same behavior as earlier: responses were not streamed in, but rather returned all at once.</p>
<p>It turns out this appears to be a <a href="https://github.com/http-kit/http-kit/issues/90">known limitation</a> of http-kit’s client functionality. The comment thread mentioned that clj-http works for this use case, but part of the reason I went with http-kit in the first place was to minimize dependencies.</p>
<p>Another option to try without introducing a new dependency is to interop with <a href="https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html">java.net.http.HttpClient</a>, which has been included with the JDK since Java 11.</p>
<p>Here’s how I initially got that working:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>(<span class="kw">import</span> (java.net.http HttpClient HttpRequest HttpResponse HttpResponse$BodyHandlers HttpRequest$BodyPublishers)</span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>        (java.net URI)</span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>        (java.nio.charset StandardCharsets)</span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a>        (java.io InputStreamReader BufferedReader)</span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>        (java.util.concurrent CompletableFuture))</span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>        </span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> handle-response </span>[ch response]</span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">with-open</span> [reader (BufferedReader. (InputStreamReader. (.body response) StandardCharsets/UTF_8))]</span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">loop</span> []</span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">let</span> [line (.readLine reader)]</span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a>        (<span class="kw">if</span> (<span class="kw">nil?</span> line)</span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">do</span></span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a>            (.close reader)</span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a>            (http/close ch))</span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">do</span></span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a>            (<span class="kw">println</span> <span class="st">&quot;got line&quot;</span> line)</span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a>            (<span class="kw">let</span> [response (<span class="at">:response</span> (json/read-json line))</span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a>                  json-encoded (json/write-str {<span class="at">:response</span> response})]</span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a>              (<span class="kw">println</span> <span class="st">&quot;sending response&quot;</span> response)</span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a>              (http/send! ch {<span class="at">:status</span> <span class="dv">200</span></span>
<span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a>                              <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>}</span>
<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a>                              <span class="at">:body</span> json-encoded}</span>
<span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a>                          <span class="va">false</span>))</span>
<span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a>            (<span class="kw">recur</span>)))))))</span>
<span id="cb13-25"><a href="#cb13-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-26"><a href="#cb13-26" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> send-async-request </span>[ch model prompt]</span>
<span id="cb13-27"><a href="#cb13-27" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [client (HttpClient/newHttpClient)</span>
<span id="cb13-28"><a href="#cb13-28" aria-hidden="true" tabindex="-1"></a>        body (<span class="kw">str</span> <span class="st">&quot;{</span><span class="sc">\&quot;</span><span class="st">model</span><span class="sc">\&quot;</span><span class="st">:</span><span class="sc">\&quot;</span><span class="st">&quot;</span> model <span class="st">&quot;</span><span class="sc">\&quot;</span><span class="st">, </span><span class="sc">\&quot;</span><span class="st">prompt</span><span class="sc">\&quot;</span><span class="st">:</span><span class="sc">\&quot;</span><span class="st">&quot;</span> prompt <span class="st">&quot;</span><span class="sc">\&quot;</span><span class="st">}&quot;</span>)</span>
<span id="cb13-29"><a href="#cb13-29" aria-hidden="true" tabindex="-1"></a>        request (<span class="kw">-&gt;</span> (HttpRequest/newBuilder)</span>
<span id="cb13-30"><a href="#cb13-30" aria-hidden="true" tabindex="-1"></a>                    (.uri (URI/create <span class="st">&quot;http://localhost:11434/api/generate&quot;</span>))</span>
<span id="cb13-31"><a href="#cb13-31" aria-hidden="true" tabindex="-1"></a>                    (.header <span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>)</span>
<span id="cb13-32"><a href="#cb13-32" aria-hidden="true" tabindex="-1"></a>                    (.POST (HttpRequest$BodyPublishers/ofString body))</span>
<span id="cb13-33"><a href="#cb13-33" aria-hidden="true" tabindex="-1"></a>                     (.build))]</span>
<span id="cb13-34"><a href="#cb13-34" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">-&gt;</span> (.sendAsync client request (HttpResponse$BodyHandlers/ofInputStream))</span>
<span id="cb13-35"><a href="#cb13-35" aria-hidden="true" tabindex="-1"></a>        (.thenAccept (<span class="kw">reify</span> java.util.function.Consumer</span>
<span id="cb13-36"><a href="#cb13-36" aria-hidden="true" tabindex="-1"></a>                       (accept [_ response]</span>
<span id="cb13-37"><a href="#cb13-37" aria-hidden="true" tabindex="-1"></a>                         (handle-response ch response))))</span>
<span id="cb13-38"><a href="#cb13-38" aria-hidden="true" tabindex="-1"></a>        (.exceptionally (<span class="kw">reify</span> java.util.function.Function</span>
<span id="cb13-39"><a href="#cb13-39" aria-hidden="true" tabindex="-1"></a>                          (<span class="kw">apply</span> [_ error]</span>
<span id="cb13-40"><a href="#cb13-40" aria-hidden="true" tabindex="-1"></a>                            (<span class="kw">println</span> (<span class="kw">str</span> <span class="st">&quot;Request failed: &quot;</span> error))</span>
<span id="cb13-41"><a href="#cb13-41" aria-hidden="true" tabindex="-1"></a>                            <span class="va">nil</span>))))))</span></code></pre></div>
<p>This gave me the behavior I was looking for: responses were sent to the client as soon as they come in.</p>
<p>The code is a little noisy because the library uses Java idioms that were introduced after Clojure was created.</p>
<p>Luckily, Clojure’s maintainers recently introduced language changes that make using these features less painful. From the release notes of Clojure 1.12.0-alpha12:</p>
<pre><code>Functional interfaces

Java programs define &quot;functions&quot; with Java functional interfaces (marked with the @FunctionalInterface annotation), which have a single method.

Clojure developers can now invoke Java methods taking functional interfaces by passing functions with matching arity. The Clojure compiler implicitly converts Clojure functions to the required functional interface by constructing a lambda adapter. You can explicitly coerce a Clojure function to a functional interface by hinting the binding name in a let binding, e.g. to avoid repeated adapter construction in a loop.

See: CLJ-2799
</code></pre>
<p>To take advantage of these changes, we can use a preview build of the language by updating the language version in <code>deps.edn</code>:</p>
<pre><code>org.clojure/clojure       {:mvn/version &quot;1.12.0-beta1&quot;}</code></pre>
<p>After doing so, we can change the above functions to this:</p>
<div class="sourceCode" id="cb16"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> handle-response </span>[ch response]</span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">with-open</span> [reader (BufferedReader. (InputStreamReader. (.body response) StandardCharsets/UTF_8))]</span>
<span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">loop</span> []</span>
<span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">let</span> [line (.readLine reader)]</span>
<span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>        (<span class="kw">if</span> (<span class="kw">nil?</span> line)</span>
<span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">do</span></span>
<span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>            (.close reader)</span>
<span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>            (http/close ch))</span>
<span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">let</span> [response (<span class="at">:response</span> (json/read-json line))</span>
<span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a>                json-encoded (json/write-str {<span class="at">:response</span> response})]</span>
<span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a>            (http/send! ch {<span class="at">:status</span> <span class="dv">200</span></span>
<span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a>                            <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>}</span>
<span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a>                            <span class="at">:body</span> json-encoded}</span>
<span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a>                        <span class="va">false</span>))</span>
<span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">recur</span>))))))</span>
<span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> send-async-request </span>[ch model prompt]</span>
<span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [client (HttpClient/newHttpClient)</span>
<span id="cb16-19"><a href="#cb16-19" aria-hidden="true" tabindex="-1"></a>        body (<span class="kw">str</span> <span class="st">&quot;{</span><span class="sc">\&quot;</span><span class="st">model</span><span class="sc">\&quot;</span><span class="st">:</span><span class="sc">\&quot;</span><span class="st">&quot;</span> model <span class="st">&quot;</span><span class="sc">\&quot;</span><span class="st">, </span><span class="sc">\&quot;</span><span class="st">prompt</span><span class="sc">\&quot;</span><span class="st">:</span><span class="sc">\&quot;</span><span class="st">&quot;</span> prompt <span class="st">&quot;</span><span class="sc">\&quot;</span><span class="st">}&quot;</span>)</span>
<span id="cb16-20"><a href="#cb16-20" aria-hidden="true" tabindex="-1"></a>        request (<span class="kw">-&gt;</span> (HttpRequest/newBuilder)</span>
<span id="cb16-21"><a href="#cb16-21" aria-hidden="true" tabindex="-1"></a>                    (.uri (URI/create <span class="st">&quot;http://localhost:11434/api/generate&quot;</span>))</span>
<span id="cb16-22"><a href="#cb16-22" aria-hidden="true" tabindex="-1"></a>                    (.header <span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>)</span>
<span id="cb16-23"><a href="#cb16-23" aria-hidden="true" tabindex="-1"></a>                    (.POST (HttpRequest$BodyPublishers/ofString body))</span>
<span id="cb16-24"><a href="#cb16-24" aria-hidden="true" tabindex="-1"></a>                    (.build))]</span>
<span id="cb16-25"><a href="#cb16-25" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">-&gt;</span> (.sendAsync client request (HttpResponse$BodyHandlers/ofInputStream))</span>
<span id="cb16-26"><a href="#cb16-26" aria-hidden="true" tabindex="-1"></a>        (.thenAccept (<span class="kw">fn</span> [response] (handle-response ch response)))</span>
<span id="cb16-27"><a href="#cb16-27" aria-hidden="true" tabindex="-1"></a>        (.exceptionally (<span class="kw">fn</span> [error] (<span class="kw">println</span> (<span class="kw">str</span> <span class="st">&quot;Request failed: &quot;</span> error)))))))</span></code></pre></div>
<p>From here, I just needed to wire up the request logic to my async handler and then update my main handler to route to it.</p>
<div class="sourceCode" id="cb17"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> my-async-handler </span>[ring-req]</span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [body (<span class="kw">slurp</span> (<span class="at">:body</span> ring-req))</span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>        prompt (<span class="kw">try</span></span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>                 (<span class="at">:prompt</span> (json/read-json body))</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a>                 (<span class="kw">catch</span> Exception <span class="kw">e</span></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>                   (<span class="kw">println</span> <span class="st">&quot;Error parsing request body:&quot;</span> <span class="kw">e</span>)</span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a>                   <span class="va">nil</span>))]</span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">if</span> prompt</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>      (http/as-channel ring-req</span>
<span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a>                       {<span class="at">:on-open</span> (<span class="kw">fn</span> [ch]</span>
<span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a>                                   (<span class="kw">println</span> <span class="st">&quot;conn open!&quot;</span>)</span>
<span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a>                                   (<span class="kw">swap!</span> clients_ <span class="kw">conj</span> ch)</span>
<span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a>                                   (send-async-request ch <span class="st">&quot;llama3&quot;</span> prompt))</span>
<span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a>                        <span class="at">:on-close</span> (<span class="kw">fn</span> [ch]</span>
<span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a>                                    (<span class="kw">println</span> <span class="st">&quot;conn close!&quot;</span>)</span>
<span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a>                                    (<span class="kw">swap!</span> clients_ <span class="kw">disj</span> ch))})</span>
<span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a>      {<span class="at">:status</span> <span class="dv">400</span></span>
<span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a>       <span class="at">:headers</span> {<span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>}</span>
<span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a>       <span class="at">:body</span> (json/write-str {<span class="at">:error</span> <span class="st">&quot;Invalid request&quot;</span>})})))</span>
<span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> app </span>[req]</span>
<span id="cb17-22"><a href="#cb17-22" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [uri (<span class="at">:uri</span> req)</span>
<span id="cb17-23"><a href="#cb17-23" aria-hidden="true" tabindex="-1"></a>        method (<span class="at">:request-method</span> req)]</span>
<span id="cb17-24"><a href="#cb17-24" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">cond</span></span>
<span id="cb17-25"><a href="#cb17-25" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">and</span> (<span class="kw">=</span> uri <span class="st">&quot;/&quot;</span>) (<span class="kw">=</span> method <span class="at">:get</span>)) (index-handler req)</span>
<span id="cb17-26"><a href="#cb17-26" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">and</span> (<span class="kw">=</span> uri <span class="st">&quot;/api/generate&quot;</span>) (<span class="kw">=</span> method <span class="at">:post</span>)) (<span class="va">#&#39;my-async-handler</span> req)</span>
<span id="cb17-27"><a href="#cb17-27" aria-hidden="true" tabindex="-1"></a>      <span class="at">:else</span> (not-found-handler req))))</span></code></pre></div>
<h1 id="clojure---pedestal-implementation">Clojure - Pedestal Implementation</h1>
<p>I was hoping this project would be a good place to use core.async, which I just started recently using, but realized while working through the previous implementation that http-kit has its own semantics for channels. It didn’t really make sense to use core.async there.</p>
<p>I heard that <a href="http://pedestal.io/pedestal/0.7/index.html">Pedestal</a> was built with core.async in mind, so I ended up throwing together another implementation with it.</p>
<p>In this version, my request function sends a request with the user’s prompt and returns a channel that has the messages queued.</p>
<div class="sourceCode" id="cb18"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> handle-response </span>[response-ch ^HttpResponse response]</span>
<span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [reader (BufferedReader. (InputStreamReader. (.body response) StandardCharsets/UTF_8))]</span>
<span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>    (go-loop []</span>
<span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">if-let</span> [line (.readLine reader)]</span>
<span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a>        (<span class="kw">do</span></span>
<span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">println</span> line)</span>
<span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a>          (a/&gt;! response-ch line)</span>
<span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">recur</span>))</span>
<span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a>        (a/close! response-ch)))))</span>
<span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> send-async-request </span>[model prompt]</span>
<span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [client (HttpClient/newHttpClient)</span>
<span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a>        body (generate-string {<span class="at">:model</span> model <span class="at">:prompt</span> prompt})</span>
<span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a>        request (<span class="kw">-&gt;</span> (HttpRequest/newBuilder)</span>
<span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a>                    (.uri (URI/create <span class="st">&quot;http://localhost:11434/api/generate&quot;</span>))</span>
<span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a>                    (.header <span class="st">&quot;Content-Type&quot;</span> <span class="st">&quot;application/json&quot;</span>)</span>
<span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a>                    (.POST (HttpRequest$BodyPublishers/ofString body))</span>
<span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a>                    (.build))</span>
<span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a>        response-ch (a/chan)]</span>
<span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">-&gt;</span> (.sendAsync client request (HttpResponse$BodyHandlers/ofInputStream))</span>
<span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a>        (.thenAccept (<span class="kw">fn</span> [response] (handle-response response-ch response)))</span>
<span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a>        (.exceptionally (<span class="kw">fn</span> [error] (<span class="kw">println</span> (<span class="kw">str</span> <span class="st">&quot;Request failed: &quot;</span> error)))))</span>
<span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a>    response-ch))</span></code></pre></div>
<p>I used Pedestal’s <a href="http://pedestal.io/pedestal/0.6/reference/server-sent-events.html">Server Sent Events</a> functionality to make it work. The json interceptor is necessary to get access to the json params sent from the client to the asynchronous endpoint.</p>
<p><code>start-event-stream</code> returns an interceptor and a function to call which takes a channel as an argument - you’ll put messages on it to send them to the client. The channel’s buffer is maintained by the library. In my callback function, I pop messages off the request channel and put them onto the request channel.</p>
<div class="sourceCode" id="cb19"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> stream-ready </span>[event-ch ctx]</span>
<span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [{<span class="at">:keys</span> [model prompt] <span class="at">:as</span> raw} (<span class="kw">-&gt;</span> ctx <span class="at">:request</span> <span class="at">:json-params</span>)</span>
<span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>        response-chan (send-async-request model prompt)]</span>
<span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a>    (go-loop []</span>
<span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a>      (<span class="kw">if-let</span> [msg (&lt;! response-chan)]</span>
<span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a>        (<span class="kw">do</span></span>
<span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a>          (a/put! event-ch (generate-string {<span class="at">:response</span> msg}))</span>
<span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a>          (<span class="kw">recur</span>))</span>
<span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a>        (a/close! event-ch)))))</span>
<span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> my-json-interceptor</span></span>
<span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a>  {<span class="at">:name</span>  <span class="at">::my-json-interceptor</span></span>
<span id="cb19-13"><a href="#cb19-13" aria-hidden="true" tabindex="-1"></a>   <span class="at">:enter</span> (<span class="kw">fn</span> [{<span class="at">:keys</span> [request] <span class="at">:as</span> ctx}]</span>
<span id="cb19-14"><a href="#cb19-14" aria-hidden="true" tabindex="-1"></a>            (<span class="kw">if</span> (#{<span class="at">:post</span> <span class="at">:put</span>} (<span class="at">:request-method</span> request))</span>
<span id="cb19-15"><a href="#cb19-15" aria-hidden="true" tabindex="-1"></a>              (<span class="kw">let</span> [raw-body-str (<span class="kw">slurp</span> (<span class="at">:body</span> request))</span>
<span id="cb19-16"><a href="#cb19-16" aria-hidden="true" tabindex="-1"></a>                    json-params (parse-string raw-body-str <span class="va">true</span>)]</span>
<span id="cb19-17"><a href="#cb19-17" aria-hidden="true" tabindex="-1"></a>                (<span class="kw">assoc-in</span> (<span class="kw">assoc-in</span> ctx [<span class="at">:request</span> <span class="at">:json-params</span>] json-params)</span>
<span id="cb19-18"><a href="#cb19-18" aria-hidden="true" tabindex="-1"></a>                          [<span class="at">:request</span> <span class="at">:raw-body-str</span>] raw-body-str))</span>
<span id="cb19-19"><a href="#cb19-19" aria-hidden="true" tabindex="-1"></a>              ctx))})</span>
<span id="cb19-20"><a href="#cb19-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb19-21"><a href="#cb19-21" aria-hidden="true" tabindex="-1"></a>(<span class="bu">def</span><span class="fu"> routes</span></span>
<span id="cb19-22"><a href="#cb19-22" aria-hidden="true" tabindex="-1"></a>  #{[<span class="st">&quot;/&quot;</span> <span class="at">:get</span></span>
<span id="cb19-23"><a href="#cb19-23" aria-hidden="true" tabindex="-1"></a>     [index-handler]</span>
<span id="cb19-24"><a href="#cb19-24" aria-hidden="true" tabindex="-1"></a>     <span class="at">:route-name</span> <span class="at">:index</span>]</span>
<span id="cb19-25"><a href="#cb19-25" aria-hidden="true" tabindex="-1"></a>    [<span class="st">&quot;/api/generate&quot;</span> <span class="at">:post</span></span>
<span id="cb19-26"><a href="#cb19-26" aria-hidden="true" tabindex="-1"></a>     [my-json-interceptor (sse/start-event-stream stream-ready)]</span>
<span id="cb19-27"><a href="#cb19-27" aria-hidden="true" tabindex="-1"></a>     <span class="at">:route-name</span> <span class="at">:stream</span>]})</span></code></pre></div>
<p>I think it makes a lot of sense to use Pedestal for async apps considering both core.async and Pedestal are maintained by the same folks (and I believe they use both libraries extensively in-house). That way you can passively benefit from any improvements or updates to core.async over time - plus it seems like there’s a lot of cool stuff you can do with core.async in general. It can be difficult to use, so you are pretty much required to be willing to read the source and ask for help on Clojurians slack when needed.</p>
<p>For a project as small as this, Go was way easier to work with. I’ve written Clojure on and off over the last few years and I still spent 10-20x longer getting those implementations working. With Go, it’s really easy to just pick the first dumb implementation that pops into your head and it will work. And of course, trivial deployment is always a relief.</p>
<p>I do still reach for Clojure from time to time, both because I enjoy the development process of building projects in small pieces without ceremony - i.e., you don’t have to create mini-projects or something similar to experiment with new ideas, you just immediately try them out from whatever file you’re currently working in. I also find it much easier to maintain a bird’s eye view of your codebase I’m working on. Clojure code tends to be much higher level and I find it easier to reason about when I’m focused, provided I’ve kept everything logically organized and somewhat tidy.</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Building llama chat in Go and Clojure">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Thu, 20 Jun 2024 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/llama-chat.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Clojure and Datomic for web applications</title>
    <link>https://jeskin.net/blog/datomic-webapp.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on April 16, 2024
        
            by Jon Eskin
        
    </section>
    <section>
        <p>A couple of years ago, I worked on an exploratory protoype of web application to get my hands dirty with Clojure and Datomic. I put together a prototype web application with a frontend written using Tailwind CSS and Alpine JS and a backend written using Pedestal and Datomic. I chose to hand-roll everything including authentication and role based access control for various user groups.</p>
<p>From the outside it looks like a typical webapp, but the application internals are unusual and worth discussing.</p>
<center>
<video width=75% autoplay>
<source src="/videos/tutoring.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</center>
<h1 id="brief-overview-of-clojure">Brief Overview of Clojure</h1>
<p>Clojure is a JVM programming language with some distinguishing features:
- dynamic scripting language designed to keep a running process that can be manipulated (similar to Python, but with more interaction)
- emphasizes functional programming, but allows for mutation where needed
- uses Lisp syntax, i.e. <code>(println "hello")</code> instead of <code>System.out.println("hello")</code></p>
<p>Clojure has a very solid selection of libraries for web development, lets you develop rapidly by incrementally defining functions without needing to recompile, and cuts away a lot of the boilerplate you find in older enterprise languages. As a result, web programming is one of Clojure’s strongest use cases.</p>
<p>For these types of applications, deployment works pretty much the same as it does for Java; you can package up Clojure applications into an archive that include all dependencies, drop it into a server, and run it. I spun up a $5/mo VPS, installed the JDK and Clojure, set up the Datomic peer library backed by Postgres, and had a functional starting point without much trouble.</p>
<p>Once deployed, Clojure applications can listen on a socket for connections. Developers can connect to the program from their editor and run commands to inspect or manipulate the program. In fact, it is common to develop applications while they are running, without the need for recompilation. For medium to large sized projects, this is significant. Java projects I worked on in the past could take fifteen minutes to compile, and larger projects took longer.</p>
<h2 id="templating">Templating</h2>
<p>In languages like Java, you tend to use HTML templating languages like Thymeleaf or JSPs to embed dynamic data from the server into pages. It generally looks something like this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode html"><code class="sourceCode html"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;!DOCTYPE</span> html<span class="dt">&gt;</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">html</span><span class="dt">&gt;</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">head</span><span class="dt">&gt;</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">title</span><span class="dt">&gt;</span>Sample Page<span class="dt">&lt;/</span><span class="kw">title</span><span class="dt">&gt;</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">head</span><span class="dt">&gt;</span></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;</span><span class="kw">body</span><span class="dt">&gt;</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a>    <span class="dt">&lt;</span><span class="kw">div</span><span class="ot"> th:text</span><span class="op">=</span><span class="st">&quot;&#39;Hello, &#39; + ${user}&quot;</span><span class="dt">&gt;</span>Hello, World!<span class="dt">&lt;/</span><span class="kw">div</span><span class="dt">&gt;</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">body</span><span class="dt">&gt;</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="dt">&lt;/</span><span class="kw">html</span><span class="dt">&gt;</span></span></code></pre></div>
<p>The Java template above is stored in a file, parsed, and processed by Java code. These work well and are instantly familiar to all web developers, but it’s easy to wind up with either heavy amounts of duplication or a rat’s nest of nested page includes that are difficult to work with.</p>
<p>Clojure, by contrast, has a popular library named Hiccup to render with Clojure data structures directly. Using Hiccup looks like the following:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>[<span class="at">:html</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> [<span class="at">:head</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>  [<span class="at">:title</span> <span class="st">&quot;Sample Page&quot;</span>]]</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> [<span class="at">:body</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>  [<span class="at">:div</span> <span class="st">&quot;Hello, &quot;</span> user]]]</span></code></pre></div>
<p>The Clojure definition is source code in the syntax of the language. You can access and manipulate it by directly operating on the underlying data structure - a <a href="https://clojuredocs.org/clojure.core/vector">vector</a>. In templating languages, you have to usually have to use a combination of A) splitting templates into many pages and B) using a weird templating language that is different from the main programming language. In practice, that can get very hairy.</p>
<p>Using hiccup makes it easy to maintain page logic as it gets increasingly complex. A lot of the noise gets boiled out and you can operate on HTML output as data, rather than relying on template mechanisms and String processing. The value proposition mirrors that of Clojure vs traditional imperative languages.</p>
<h2 id="datomic">Datomic</h2>
<p>Cognitect, the stewards of Clojure, created a database called Datomic that serves as an alternative to SQL. Datomic is built on the principles of immutability and stores data as a series of immutable events. More importantly (for me personally, at least) it provides a lot of extremely useful functionality out of the box that you don’t get in SQL.</p>
<p>As the database schema changes, and when data is added or removed, Datomic keeps a complete accounting of events. This approach contrasts with traditional SQL databases where updates or deletions overwrite the original data. In practice, between careful database migrations and database or server logs, you can sometimes stitch together a true historical record of what happened in a database. But as anyone who has worked as an application developer will know - it is far from easy.</p>
<p>If you’ve maintained a traditional database for an extended period of time, you know that it sees incremental changes over time as enhancements are added. So even if you’re able to look at past query results, it can be very difficult to keep track of the entire state of the database at a previous point of time- what the schema was at that time, what data was in there, and so on. In Datomic, all of this information is forever at your fingertips.</p>
<p>You can query the database for its transaction history, and you’ll get back a traversable Clojure sequence of results the same way you get back any other database result. Here’s one way you can do it:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode clojure"><code class="sourceCode clojure"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>(<span class="bu">defn</span><span class="fu"> collect-transactions </span>[conn]</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>  (<span class="kw">let</span> [tx-log (d/log conn)]</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>    (<span class="kw">map</span> (<span class="kw">fn</span> [tx]</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>           {<span class="at">:transaction-id</span> (<span class="at">:t</span> tx)</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>            <span class="at">:timestamp</span> (get-tx-timestamp (d/db conn) (<span class="at">:t</span> tx))</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a>            <span class="at">:data</span> tx})</span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>         (d/tx-range tx-log <span class="va">nil</span> <span class="va">nil</span>))))</span></code></pre></div>
<p>And this is what results look like through the eyes of the <a href="https://clojure.org/news/2023/04/28/introducing-morse">Morse</a> tool:</p>
<center>
<video width=75% controls autoplay>
<source src="/videos/morse.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</center>
<p>Another key feature of Datomic is its use of a declarative query language similar to Clojure’s syntax, which can simplify the learning curve for those already familiar with Clojure.</p>
<h3 id="database-as-a-value">Database as a Value</h3>
<p>In Datomic, the database is treated as a “value.” This implies that the database, at any point in time, represents a specific state that does not change. When you query the database, you are essentially looking at a snapshot of the data as it existed at a particular moment.</p>
<p>There are a lot of cool consequences to this. One of my favorite is the ability to do speculative translations. What this means is that you can develop experimental features to your application, pass your production database to it, and iterate on your concept with no extra work required.</p>
<p>Another benefit is that it is easy to write tests on the fly using live data. It is easy to explore and manipulate your database without worrying about persisting unwanted data - you can simply create and discard branches from any point of your database as desired.</p>
<h3 id="modifying-schemas">Modifying Schemas</h3>
<p>Datomic allows you to change your schema as needed over time. This means you can add new attributes, change the types of existing attributes, or remove attributes that are no longer necessary without disrupting the existing data. These modifications can be done while the database is live and in use, enabling ongoing development and adjustment to evolving data requirements.</p>
<p>I think this is a big deal, because there are definitely projects where you know ahead of time that you’ll be changing direction quite a lot. In a traditional database, this can result in tech debt piling up fairly quickly, or painting yourself into a corner.</p>
<p>As you might expect, there are <a href="https://docs.datomic.com/pro/schema/schema-change.html">constraints</a> on this. For example, if your modified schema invalidates existing data, you will need to retract that data first. But overall, evolving your database is far less terrifying than it is in a SQL backed project.</p>
<h3 id="thoughts-on-datomic">Thoughts on Datomic</h3>
<p>In a traditional web app + database deployment, the webapp and the database are two separate behemoths, and each requires a lot of work to spin up and maintain. In enterprise environments, this involves commonly is spread across multiple people, often on different teams.</p>
<p>Another issue with this setup is that SQL has quite a lot of arcane knowledge you have to lug around in order to execute even relatively simple projects. You need to be able to create tables, understand queries, joins, indexing, migration, and best practices associated with all of them. And when you work with your database, it’s generally separate from the application, either from the command line or with some kind of GUI tool.</p>
<p>Datomic has a bit of an initial learning curve because of its use of Datalog syntax and the fact that it operates quite differently from SQL, but I found it to be much simpler to use than SQL after that. You write your schema by passing in maps that lay out your data, update the schema by passing in new maps, seed the database by passing in maps that conform to the schema, and query the database by passing in Clojure vectors that describe what you’re looking for. You don’t need additional dependencies or ORMs, you just interface with the API directly from within your source code. If you’ve ever had to spend time bug fixing decades old ORMs, or migrating off database dependencies because they’re abandoned and are lighting up security scanners, you’ll understand this is a hugely welcome change.</p>
<p>When your application takes shape, it consists of pure functions in your application layer and pure functions in your UI layer through which you thread data you’re entering or retrieving. It comes together quite beautifully.</p>
<p>In many ways, I think that Datomic is to the traditional database as Clojure is to traditional web applications. The value proposition and tradeoffs for both of them with respect to their traditional alternatives is similar- you get the benefits of functional programming at the cost of added space and performance constraints. In Clojure, this means that you (and the folks who build your web servers and other tooling) avoid many gnarly situations that arise from mutable state. In Datomic, you get to treat the database as an immutable value as described above.</p>
<p>Datomic is a bold departure from SQL, and I have a lot of respect for its choices. You may have noticed a lot of its benefits are still somewhat doable in a traditional database, but it often requires quite a lot of extra work, or extra dependencies that bring additional points of failure. Overall, I think Datomic aligns better to real-world persistance requirements of applications than SQL.</p>
<p>It comes at a cost, of course. I’ve read that performance on Datomic Cloud can be quite bad. I personally used the on-prem “peer” library instead, which worked great for me, even when I threw a few stress tests at it.</p>
<p>Some other significant constraints are that Datomic uses a single process to write, and after 10 billion transactions it apparently becomes unweildly. My intuition is that a large number of small to mid sized applications would stay well within its performance budget. The auditing you get for free would already have been such a life saver for a number of projects I’ve worked on in the past.</p>
<p>I ran into a handful of quirks and frustrating moments while building my prototype, mainly stemming from Clojure’s stack traces and error messages from Datomic that were hard to read. Since the language is not very widely used, it can also be much more difficult to get help with specific issues, because you’re less likely to find online posts from other people who ran into them.</p>
<p>Another big drawback is that while the database tracks its “value” over time, you cannot go back and add previous historical data. It only records data as of the moment it passes through the transactor. This presents challenges for situations that will occur often enough in the real world that it bears mentioning. This is called “bi-temporal history”, which is a feature of other similar database systems, but is not a feature of Datomic.</p>
<p>Overall, I really loved working with Datomic. If it fits your problem, it removes so many issues that arise with SQL. I’ll be keeping an eye on the library and looking for opportunities to use it in the future.</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Clojure and Datomic for web applications">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Tue, 16 Apr 2024 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/datomic-webapp.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Getting started with VR in C++ on Meta Quest</title>
    <link>https://jeskin.net/blog/meta-quest-sdk.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on November 26, 2023
        
            by Jon Eskin
        
    </section>
    <section>
        <p>Meta Quest headsets let you write 3D programs in C or C++, using OpenGL ES or Vulkan as a rendering backend, and deploy them using the <a href="https://developer.oculus.com/downloads/package/oculus-openxr-mobile-sdk/">XR Mobile SDK</a>.</p>
<p>I bought a Quest 3 at launch, and I’ve benefited a lot from studying the SDK and experimenting with the sample programs:</p>
<ol type="1">
<li>It gave me a thorough understanding of the OpenXR specification, from its design intent to the low-level details of how it is implemented on the hardware.</li>
<li>It significantly improved my understanding of OpenGL ES. I’ve skirted by with much less knowledge when writing other 3D programs in the past, but it was impossible for me to modify the SDK examples and get them to actually run correctly without filling in many knowledge gaps.</li>
<li>I learned a lot about how to structure programs in C and C-flavored C++ from the simple &amp; readable SDK example programs, which were written by extremely talented graphics programmers.</li>
<li>It was a lot of fun! The best projects are the ones you’re motivated to work on, and bringing your programs to life in VR is such a cool experience.</li>
</ol>
<p>Given how powerful these devices are, and how much fun they are to hack on, it’s kind of crazy to me how little community or discourse there is online surrounding them.</p>
<p>In this blog post I’ll introduce how developing on Quest devices works. I’ll touch on the inner workings of the SDK, cover the structure of sample projects, and go over a small set of changes I made to one of them.</p>
<h2 id="meta-xr-mobile-sdk">Meta XR Mobile SDK</h2>
<p>The SDK includes a collection of small programs, each consisting of an NDK project that focuses on a particular capability of the device.</p>
<center>
<figure>
<video width=75% controls autoplay>
<source src="/videos/xrkeyboard.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<figcaption>
Building a working virtual keyboard from scratch
</figcaption>
</figure>
</center>
<center>
<figure>
<video width=75% controls autoplay>
<source src="/videos/xrpassthrough.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<figcaption>
Manipulating passthrough
</figcaption>
</figure>
</center>
<center>
<figure>
<video width=75% controls autoplay>
<source src="/videos/spatial1.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<figcaption>
Placing spatial anchors
</figcaption>
</figure>
</center>
<p>In total, you’ll find 18 such projects in the SDK.</p>
<p>For people who are curious about VR development, enjoy low-level programming, and have at least glanced at the core topics covered by OpenGL tutorials that are floating around online (<a href="https://learnopengl.com/Getting-started/OpenGL">1</a>,<a href="https://www.opengl-tutorial.org/">2</a>), these devices are a slam dunk for hands-on learning.</p>
<h3 id="meta-quest-android-beefy-gpu-sensors">Meta Quest = Android + Beefy GPU + Sensors</h3>
<p>Meta Quest devices run a modified version of Android operating system. I think it’s helpful to build an understanding of how graphics work in the Android ecosystem at large to gain a deeper understanding of how Quest devices work.</p>
<p>There are three ways you can draw to the screen in Android-land: with the <a href="https://source.android.com/docs/core/graphics/arch-sh#canvas">Canvas API</a>, <a href="https://source.android.com/docs/core/graphics/arch-egl-opengl">OpenGL ES</a>, and <a href="https://source.android.com/docs/core/graphics/arch-vulkan">Vulkan</a>.</p>
<p>The Canvas API is built upon the <a href="https://skia.org/">Skia Graphics Library</a>, a cross-platform rendering engine. Android programs that use the Canvas API typically use the official SDK or a cross-platform toolkit like Flutter.</p>
<p>Such toolkits offer pre-built UI components that facilitate basic 2D rendering. They offer solutions for simple graphics, text rendering, and basic animations.</p>
<p>Instead of using the Canvas, you can opt to draw by directly calling into the OpenGL ES and Vulkan runtimes on the device. If you want this level of control over drawing, you probably want to use the “Native Development Kit” or NDK.</p>
<p>Unlike with the traditional SDK, the NDK has you write application code in C or C++ and bridge to it with a very small amount of boilerplate; basically a single Java entry point and a handful of config files. This is the approach taken by the Mobile XR SDK.</p>
<h2 id="minimal-complexity">Minimal Complexity</h2>
<p>The example projects make minimal use of C++ language features. You won’t find elaborate abstractions, excessive indirection, or extensive use of templates.</p>
<p>For each sample project, you get the impression that the developer knew exactly what they wanted to make and just wrote out a common sense implementation of that thing. Older projects are largely self-contained, while later ones started to factor out re-used code into libraries external to their respective projects.</p>
<p>The resulting code mostly takes the form of structs and functions and that favor clarity and readability. You can tell what things do by looking at them.</p>
<p>A benefit of this readability is that you end up build a lot of intuition on “how VR works” by making superficial observations. Take for example the following code from the main render loop of one of the samples.</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">typedef</span> <span class="kw">struct</span> XrPosef <span class="op">{</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a>    XrQuaternionf    orientation<span class="op">;</span></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>    XrVector3f       position<span class="op">;</span></span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="op">}</span> XrPosef<span class="op">;</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a><span class="co">// ...</span></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a>XrPosef xfLocalFromEye<span class="op">[</span>NUM_EYES<span class="op">];</span></span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> <span class="op">(</span><span class="dt">int</span> eye <span class="op">=</span> <span class="dv">0</span><span class="op">;</span> eye <span class="op">&lt;</span> NUM_EYES<span class="op">;</span> eye<span class="op">++)</span> <span class="op">{</span></span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>    <span class="co">// LOG_POSE( &quot;viewTransform&quot;, &amp;projectionInfo.projections[eye].viewTransform );</span></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a>    XrPosef xfHeadFromEye <span class="op">=</span> projections<span class="op">[</span>eye<span class="op">].</span>pose<span class="op">;</span></span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>    XrPosef_Multiply<span class="op">(&amp;</span>xfLocalFromEye<span class="op">[</span>eye<span class="op">],</span> <span class="op">&amp;</span>xfLocalFromHead<span class="op">,</span> <span class="op">&amp;</span>xfHeadFromEye<span class="op">);</span></span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a>    XrPosef xfEyeFromLocal <span class="op">=</span> XrPosef_Inverse<span class="op">(</span>xfLocalFromEye<span class="op">[</span>eye<span class="op">]);</span></span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true" tabindex="-1"></a>    XrMatrix4x4f viewMat<span class="op">{};</span></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true" tabindex="-1"></a>    XrMatrix4x4f_CreateFromRigidTransform<span class="op">(&amp;</span>viewMat<span class="op">,</span> <span class="op">&amp;</span>xfEyeFromLocal<span class="op">);</span></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true" tabindex="-1"></a>    <span class="at">const</span> XrFovf fov <span class="op">=</span> projections<span class="op">[</span>eye<span class="op">].</span>fov<span class="op">;</span></span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true" tabindex="-1"></a>    XrMatrix4x4f projMat<span class="op">;</span></span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true" tabindex="-1"></a>    XrMatrix4x4f_CreateProjectionFov<span class="op">(&amp;</span>projMat<span class="op">,</span> GRAPHICS_OPENGL_ES<span class="op">,</span> fov<span class="op">,</span> <span class="fl">0.1</span><span class="bu">f</span><span class="op">,</span> <span class="fl">0.0</span><span class="bu">f</span><span class="op">);</span></span>
<span id="cb1-23"><a href="#cb1-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-24"><a href="#cb1-24" aria-hidden="true" tabindex="-1"></a>    frameIn<span class="op">.</span>View<span class="op">[</span>eye<span class="op">]</span> <span class="op">=</span> OvrFromXr<span class="op">(</span>viewMat<span class="op">);</span></span>
<span id="cb1-25"><a href="#cb1-25" aria-hidden="true" tabindex="-1"></a>    frameIn<span class="op">.</span>Proj<span class="op">[</span>eye<span class="op">]</span> <span class="op">=</span> OvrFromXr<span class="op">(</span>projMat<span class="op">);</span></span>
<span id="cb1-26"><a href="#cb1-26" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>It becomes immediately clear when reading the above that the position for each eye is independently tracked, and as you might guess, independent frames are rendered for each eye.</p>
<p>You can verify this by looking at that project’s vertex shader (seen below) and note that each eye gets its own matrix transforms:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode glsl"><code class="sourceCode glsl"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#define NUM_VIEWS 2</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="pp">#define VIEW_ID gl_ViewID_OVR</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#extension GL_OVR_multiview2 : require</span></span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="kw">layout</span><span class="op">(</span>num_views<span class="op">=</span>NUM_VIEWS<span class="op">)</span> <span class="dt">in</span><span class="op">;</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="dt">in</span> <span class="dt">vec3</span> vertexPosition<span class="op">;</span></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a><span class="dt">in</span> <span class="dt">vec4</span> vertexColor<span class="op">;</span></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a><span class="kw">uniform</span> <span class="dt">mat4</span> ModelMatrix<span class="op">;</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="kw">uniform</span> <span class="dt">vec4</span> ColorScale<span class="op">;</span></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a><span class="kw">uniform</span> <span class="dt">vec4</span> ColorBias<span class="op">;</span></span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a><span class="kw">uniform</span> SceneMatrices</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>    <span class="kw">uniform</span> <span class="dt">mat4</span> ViewMatrix<span class="op">[</span>NUM_VIEWS<span class="op">];</span></span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a>    <span class="kw">uniform</span> <span class="dt">mat4</span> ProjectionMatrix<span class="op">[</span>NUM_VIEWS<span class="op">];</span></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a><span class="op">}</span> sm<span class="op">;</span></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a><span class="dt">out</span> <span class="dt">vec4</span> fragmentColor<span class="op">;</span></span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> <span class="fu">main</span><span class="op">()</span></span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true" tabindex="-1"></a>    <span class="bu">gl_Position</span> <span class="op">=</span> sm<span class="op">.</span><span class="fu">ProjectionMatrix</span><span class="op">[</span>VIEW_ID<span class="op">]</span> <span class="op">*</span> <span class="op">(</span> sm<span class="op">.</span><span class="fu">ViewMatrix</span><span class="op">[</span>VIEW_ID<span class="op">]</span> <span class="op">*</span> <span class="op">(</span> ModelMatrix <span class="op">*</span> <span class="dt">vec4</span><span class="op">(</span> vertexPosition<span class="op">,</span> <span class="fl">1.0</span> <span class="op">)</span> <span class="op">)</span> <span class="op">);</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true" tabindex="-1"></a>    fragmentColor <span class="op">=</span> vertexColor <span class="op">*</span> ColorScale <span class="op">+</span> ColorBias<span class="op">;</span></span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>In hindsight, it should be obvious that this is something you do for VR, but I hadn’t really thought about it. I find that the code conveys the concept more clearly and completely than words.</p>
<h2 id="structure-of-sample-projects">Structure of Sample Projects</h2>
<p>Below is an overview of some representative files of a sample project. To build one of the same projects, you can grab a copy of Android Studio Bumblebee (2021.1.1) or earlier, and open settings.gradle. I ran projects in that IDE and worked on them in a separate text editor.</p>
<pre><code>XrVirtualKeyboard
├── Projects
│   └── Android
│       ├── AndroidManifest.xml
│       ├── build
│       ├── build.bat
│       ├── build.gradle
│       ├── build.py
│       ├── jni
│       │   ├── Android.mk
│       │   └── Application.mk
│       ├── local.properties
│       └── settings.gradle
├── Src
│   ├── VirtualKeyboardModelRenderer.cpp
│   ├── VirtualKeyboardModelRenderer.h
│   ├── XrHandHelper.h
│   ├── XrHelper.h
│   ├── XrRenderModelHelper.cpp
│   ├── XrRenderModelHelper.h
│   ├── XrVirtualKeyboardHelper.cpp
│   ├── XrVirtualKeyboardHelper.h
│   └── main.cpp
├── assets
│   └── panel.ktx
├── java
│   └── MainActivity.java
└── res
    └── values
        └── strings.xml</code></pre>
<p>Android.mk contains the Makefile that you’ll be the most interested in. When you add source code or libraries, you’ll be changing this file. They generally look like this:</p>
<pre><code>LOCAL_PATH := $(call my-dir)
include $(CLEAR_VARS)

LOCAL_MODULE := xrvirtualkeyboard

include ../../../../cflags.mk

LOCAL_C_INCLUDES := \
                    $(LOCAL_PATH)/../../../../../SampleCommon/Src \
                    $(LOCAL_PATH)/../../../../../SampleXrFramework/Src \
                    $(LOCAL_PATH)/../../../../../1stParty/OVR/Include \
                    $(LOCAL_PATH)/../../../../../1stParty/utilities/include \
                    $(LOCAL_PATH)/../../../../../3rdParty/stb/src \
                    $(LOCAL_PATH)/../../../../../3rdParty/khronos/openxr/OpenXR-SDK/include \
                    $(LOCAL_PATH)/../../../../../3rdParty/khronos/openxr/OpenXR-SDK/src/common

#
LOCAL_SRC_FILES     :=  ../../../Src/main.cpp \
                        ../../../Src/VirtualKeyboardModelRenderer.cpp \
                        ../../../Src/XrRenderModelHelper.cpp \
                        ../../../Src/XrVirtualKeyboardHelper.cpp \

# include default libraries
LOCAL_LDLIBS            := -llog -landroid -lGLESv3 -lEGL
LOCAL_STATIC_LIBRARIES  := samplexrframework
LOCAL_SHARED_LIBRARIES  := openxr_loader

include $(BUILD_SHARED_LIBRARY)

$(call import-module,SampleXrFramework/Projects/Android/jni)
$(call import-module,OpenXR/Projects/AndroidPrebuilt/jni)</code></pre>
<p>The entry point of each project contains OpenGL and OXR initialization before entering the event loop.</p>
<p>Graphics initialization is done with EGL in the same manner as on other embedded systems (or the desktop):</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="kw">struct</span> ovrEgl <span class="op">{</span></span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>    <span class="dt">void</span> Clear<span class="op">();</span></span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>    <span class="dt">void</span> CreateContext<span class="op">(</span><span class="at">const</span> ovrEgl<span class="op">*</span> shareEgl<span class="op">);</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>    <span class="dt">void</span> DestroyContext<span class="op">();</span></span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#if defined(XR_USE_GRAPHICS_API_OPENGL_ES)</span></span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>    EGLint MajorVersion<span class="op">;</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>    EGLint MinorVersion<span class="op">;</span></span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>    EGLDisplay Display<span class="op">;</span></span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>    EGLConfig Config<span class="op">;</span></span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a>    EGLSurface TinySurface<span class="op">;</span></span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>    EGLSurface MainSurface<span class="op">;</span></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a>    EGLContext Context<span class="op">;</span></span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a><span class="pp">#elif defined(XR_USE_GRAPHICS_API_OPENGL)</span></span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a>    HDC hDC<span class="op">;</span></span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a>    HGLRC hGLRC<span class="op">;</span></span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a><span class="pp">#endif </span><span class="co">//</span></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a><span class="op">};</span></span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true" tabindex="-1"></a><span class="co">// ...</span></span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true" tabindex="-1"></a><span class="pp">#if defined(XR_USE_GRAPHICS_API_OPENGL_ES)</span></span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> ovrEgl<span class="op">::</span>CreateContext<span class="op">(</span><span class="at">const</span> ovrEgl<span class="op">*</span> shareEgl<span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-23"><a href="#cb5-23" aria-hidden="true" tabindex="-1"></a>    <span class="cf">if</span> <span class="op">(</span>Display <span class="op">!=</span> <span class="dv">0</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-24"><a href="#cb5-24" aria-hidden="true" tabindex="-1"></a>        <span class="cf">return</span><span class="op">;</span></span>
<span id="cb5-25"><a href="#cb5-25" aria-hidden="true" tabindex="-1"></a>    <span class="op">}</span></span>
<span id="cb5-26"><a href="#cb5-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb5-27"><a href="#cb5-27" aria-hidden="true" tabindex="-1"></a>    Display <span class="op">=</span> eglGetDisplay<span class="op">(</span>EGL_DEFAULT_DISPLAY<span class="op">);</span></span>
<span id="cb5-28"><a href="#cb5-28" aria-hidden="true" tabindex="-1"></a>    ALOGV<span class="op">(</span><span class="st">&quot;        eglInitialize( Display, &amp;MajorVersion, &amp;MinorVersion )&quot;</span><span class="op">);</span></span>
<span id="cb5-29"><a href="#cb5-29" aria-hidden="true" tabindex="-1"></a>    eglInitialize<span class="op">(</span>Display<span class="op">,</span> <span class="op">&amp;</span>MajorVersion<span class="op">,</span> <span class="op">&amp;</span>MinorVersion<span class="op">);</span></span>
<span id="cb5-30"><a href="#cb5-30" aria-hidden="true" tabindex="-1"></a>    <span class="co">// Do NOT use eglChooseConfig, because the Android EGL code pushes in multisample</span></span>
<span id="cb5-31"><a href="#cb5-31" aria-hidden="true" tabindex="-1"></a>    <span class="co">// flags in eglChooseConfig if the user has selected the &quot;force 4x MSAA&quot; option in</span></span>
<span id="cb5-32"><a href="#cb5-32" aria-hidden="true" tabindex="-1"></a>    <span class="co">// settings, and that is completely wasted for our warp target.</span></span>
<span id="cb5-33"><a href="#cb5-33" aria-hidden="true" tabindex="-1"></a>    <span class="at">const</span> <span class="dt">int</span> MAX_CONFIGS <span class="op">=</span> <span class="dv">1024</span><span class="op">;</span></span>
<span id="cb5-34"><a href="#cb5-34" aria-hidden="true" tabindex="-1"></a>    EGLConfig configs<span class="op">[</span>MAX_CONFIGS<span class="op">];</span></span>
<span id="cb5-35"><a href="#cb5-35" aria-hidden="true" tabindex="-1"></a>    EGLint numConfigs <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb5-36"><a href="#cb5-36" aria-hidden="true" tabindex="-1"></a>    <span class="cf">if</span> <span class="op">(</span>eglGetConfigs<span class="op">(</span>Display<span class="op">,</span> configs<span class="op">,</span> MAX_CONFIGS<span class="op">,</span> <span class="op">&amp;</span>numConfigs<span class="op">)</span> <span class="op">==</span> EGL_FALSE<span class="op">)</span> <span class="op">{</span></span>
<span id="cb5-37"><a href="#cb5-37" aria-hidden="true" tabindex="-1"></a>        ALOGE<span class="op">(</span><span class="st">&quot;        eglGetConfigs() failed: </span><span class="sc">%s</span><span class="st">&quot;</span><span class="op">,</span> EglErrorString<span class="op">(</span>eglGetError<span class="op">()));</span></span>
<span id="cb5-38"><a href="#cb5-38" aria-hidden="true" tabindex="-1"></a>        <span class="cf">return</span><span class="op">;</span></span>
<span id="cb5-39"><a href="#cb5-39" aria-hidden="true" tabindex="-1"></a>    <span class="op">}</span></span>
<span id="cb5-40"><a href="#cb5-40" aria-hidden="true" tabindex="-1"></a>    <span class="co">// blah blah</span></span></code></pre></div>
<p>Logging is done with a macro that delegates to NDK’s provided logging facilities when the code is running on Android, otherwise it just falls back on printf.</p>
<pre><code>ALOGV(&quot;Creating passthrough layer&quot;);</code></pre>
<p>And OpenGL function calls use the common idiom of being wrapped in a macro that executes the OpenGL function, checks the global error message after execution, and prints an error message if an error flag is set after the line executes.</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ARRAY_BUFFER<span class="op">,</span> VertexBuffer<span class="op">));</span></span></code></pre></div>
<p>Code that calls into OXR functionality work similar to OpenGL in a few ways.</p>
<p><code>OXR</code> is an error checking macro that behaves in the same manner as <code>GL</code>.</p>
<p>And like with OpenGL, functions from the OpenXR specification are are dynamically linked to at runtime. This means you won’t find their implementation in the SDK; they are running in their own address space outside of the program and are available to all applications on the device.</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>OXR<span class="op">(</span>xrLocateViews<span class="op">(</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>    app<span class="op">.</span>Session<span class="op">,</span></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>    <span class="op">&amp;</span>projectionInfo<span class="op">,</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>    <span class="op">&amp;</span>viewState<span class="op">,</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>    projectionCapacityInput<span class="op">,</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>    <span class="op">&amp;</span>projectionCountOutput<span class="op">,</span></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>    projections<span class="op">));</span></span></code></pre></div>
<h2 id="making-modifications-to-sample-projects">Making Modifications to Sample Projects</h2>
<p>Some codebases can be very daunting, and it can take a long time before you feel comfortable making changes.</p>
<p>In this case, after skimming the code for a few of the samples, I was able to dive in and start poking at stuff.</p>
<p>One of the first things I did was to modify the spatial anchors project shown above to change the dimensions of the anchors and to draw a texture on one side of the cube.</p>
<center>
<figure>
<video width=75% controls autoplay>
<source src="/videos/spatial2.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
</figure>
</center>
<p>Here are what the changes to make this work look like:</p>
<p>The dimensions and colors were single line changes in the event loop.</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode diff"><code class="sourceCode diff"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="st">- persistedCube.Model *= OVR::Matrix4f::Scaling(0.01f, 0.01f, 0.05f);</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="va">+ persistedCube.Model *= OVR::Matrix4f::Scaling(0.05f, 0.05f, 0.005f);</span></span></code></pre></div>
<div class="sourceCode" id="cb10"><pre class="sourceCode diff"><code class="sourceCode diff"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="st">- persistedCube.ColorBias = OVR::Vector4f(1, 0.5, 0, 1); // Orange</span></span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="va">+ persistedCube.ColorBias = OVR::Vector4f(0.3, 0.2, 0.1, 1); // Brown</span></span></code></pre></div>
<p>I dropped stb_image.h into the project and added some code that loads textures from images on the device and maintains a pool of the loaded textures which is cycled through during rendering.</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&quot;TexturePool.h&quot;</span></span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;android/log.h&gt;</span></span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="pp">#define STB_IMAGE_IMPLEMENTATION</span></span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&quot;stb_image.h&quot;</span></span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;android/asset_manager_jni.h&gt;</span></span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a><span class="pp">#include </span><span class="im">&lt;android/asset_manager.h&gt;</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a><span class="bu">std::</span>vector<span class="op">&lt;</span><span class="bu">std::</span>string<span class="op">&gt;</span> texture_filenames <span class="op">=</span> <span class="op">{</span><span class="st">&quot;NASA1.jpg&quot;</span><span class="op">,</span> <span class="st">&quot;NASA2.jpg&quot;</span><span class="op">,</span> <span class="st">&quot;NASA3.jpg&quot;</span><span class="op">,</span> <span class="st">&quot;NASA4.jpg&quot;</span><span class="op">,</span> <span class="st">&quot;NASA5.jpg&quot;</span><span class="op">,</span> <span class="st">&quot;NASA6.jpg&quot;</span><span class="op">,</span> <span class="st">&quot;NASA7.jpg&quot;</span><span class="op">,</span> <span class="st">&quot;NASA8.jpg&quot;</span>  <span class="op">};</span></span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a><span class="dt">size_t</span> texture_pointer <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a><span class="bu">std::</span>vector<span class="op">&lt;</span>GLuint<span class="op">&gt;</span> texture_ids<span class="op">;</span></span>
<span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> texture_pool_init<span class="op">(</span><span class="bu">std::</span>vector<span class="op">&lt;</span><span class="bu">std::</span>string<span class="op">&gt;</span> filenames<span class="op">,</span> AAssetManager <span class="op">*</span>amgr<span class="op">)</span> <span class="op">{</span></span>
<span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a>	<span class="cf">for</span> <span class="op">(</span><span class="bu">std::</span>string filename <span class="op">:</span> filenames<span class="op">)</span> <span class="op">{</span></span>
<span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a>        GLuint texture_id<span class="op">;</span></span>
<span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-18"><a href="#cb11-18" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glGenTextures<span class="op">(</span><span class="dv">1</span><span class="op">,</span> <span class="op">&amp;</span>texture_id<span class="op">));</span></span>
<span id="cb11-19"><a href="#cb11-19" aria-hidden="true" tabindex="-1"></a>        texture_ids<span class="op">.</span>push_back<span class="op">(</span>texture_id<span class="op">);</span></span>
<span id="cb11-20"><a href="#cb11-20" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glBindTexture<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> texture_id<span class="op">));</span></span>
<span id="cb11-21"><a href="#cb11-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-22"><a href="#cb11-22" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glTexParameteri<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> GL_TEXTURE_MIN_FILTER<span class="op">,</span></span>
<span id="cb11-23"><a href="#cb11-23" aria-hidden="true" tabindex="-1"></a>                           GL_LINEAR_MIPMAP_LINEAR<span class="op">));</span></span>
<span id="cb11-24"><a href="#cb11-24" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glTexParameteri<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> GL_TEXTURE_MAG_FILTER<span class="op">,</span> GL_LINEAR<span class="op">));</span></span>
<span id="cb11-25"><a href="#cb11-25" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glTexParameteri<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> GL_TEXTURE_WRAP_S<span class="op">,</span> GL_REPEAT<span class="op">));</span></span>
<span id="cb11-26"><a href="#cb11-26" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glTexParameteri<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> GL_TEXTURE_WRAP_T<span class="op">,</span> GL_REPEAT<span class="op">));</span></span>
<span id="cb11-27"><a href="#cb11-27" aria-hidden="true" tabindex="-1"></a>        GLenum format<span class="op">;</span></span>
<span id="cb11-28"><a href="#cb11-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-29"><a href="#cb11-29" aria-hidden="true" tabindex="-1"></a>		AAsset<span class="op">*</span> asset <span class="op">=</span> AAssetManager_open<span class="op">(</span>amgr<span class="op">,</span> filename<span class="op">.</span>c_str<span class="op">(),</span> AASSET_MODE_BUFFER<span class="op">);</span></span>
<span id="cb11-30"><a href="#cb11-30" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> <span class="op">(</span>asset <span class="op">==</span> NULL<span class="op">)</span> <span class="op">{</span></span>
<span id="cb11-31"><a href="#cb11-31" aria-hidden="true" tabindex="-1"></a>			ALOGE<span class="op">(</span><span class="st">&quot;Failed to open </span><span class="sc">%s</span><span class="st">&quot;</span><span class="op">,</span> filename<span class="op">.</span>c_str<span class="op">());</span></span>
<span id="cb11-32"><a href="#cb11-32" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span> <span class="cf">else</span> <span class="op">{</span></span>
<span id="cb11-33"><a href="#cb11-33" aria-hidden="true" tabindex="-1"></a>			ALOGV<span class="op">(</span><span class="st">&quot;opened file </span><span class="sc">%s</span><span class="st">&quot;</span><span class="op">,</span> filename<span class="op">.</span>c_str<span class="op">());</span></span>
<span id="cb11-34"><a href="#cb11-34" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb11-35"><a href="#cb11-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-36"><a href="#cb11-36" aria-hidden="true" tabindex="-1"></a>		<span class="dt">int</span> width<span class="op">,</span> height<span class="op">,</span> channels<span class="op">;</span></span>
<span id="cb11-37"><a href="#cb11-37" aria-hidden="true" tabindex="-1"></a>        stbi_set_flip_vertically_on_load<span class="op">(</span><span class="kw">true</span><span class="op">);</span></span>
<span id="cb11-38"><a href="#cb11-38" aria-hidden="true" tabindex="-1"></a>		<span class="dt">void</span> <span class="op">*</span>data <span class="op">=</span> stbi_load_from_memory<span class="op">((</span><span class="dt">unsigned</span> <span class="dt">char</span><span class="op">*)</span>AAsset_getBuffer<span class="op">(</span>asset<span class="op">),</span> AAsset_getLength<span class="op">(</span>asset<span class="op">),</span> <span class="op">&amp;</span>width<span class="op">,</span> <span class="op">&amp;</span>height<span class="op">,</span> <span class="op">&amp;</span>channels<span class="op">,</span> <span class="dv">0</span><span class="op">);</span></span>
<span id="cb11-39"><a href="#cb11-39" aria-hidden="true" tabindex="-1"></a>		<span class="cf">if</span> <span class="op">(</span>data <span class="op">==</span> NULL<span class="op">)</span> <span class="op">{</span></span>
<span id="cb11-40"><a href="#cb11-40" aria-hidden="true" tabindex="-1"></a>			ALOGE<span class="op">(</span><span class="st">&quot;Failed to load </span><span class="sc">%s</span><span class="st">&quot;</span><span class="op">,</span> filename<span class="op">.</span>c_str<span class="op">());</span></span>
<span id="cb11-41"><a href="#cb11-41" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span> <span class="cf">else</span> <span class="op">{</span></span>
<span id="cb11-42"><a href="#cb11-42" aria-hidden="true" tabindex="-1"></a>			ALOGV<span class="op">(</span><span class="st">&quot;loaded file </span><span class="sc">%s</span><span class="st">&quot;</span><span class="op">,</span> filename<span class="op">.</span>c_str<span class="op">());</span></span>
<span id="cb11-43"><a href="#cb11-43" aria-hidden="true" tabindex="-1"></a>		<span class="op">}</span></span>
<span id="cb11-44"><a href="#cb11-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-45"><a href="#cb11-45" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span> <span class="op">(</span>channels <span class="op">==</span> <span class="dv">3</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb11-46"><a href="#cb11-46" aria-hidden="true" tabindex="-1"></a>            format <span class="op">=</span> GL_RGB<span class="op">;</span></span>
<span id="cb11-47"><a href="#cb11-47" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span> <span class="cf">else</span> <span class="cf">if</span> <span class="op">(</span>channels <span class="op">==</span> <span class="dv">4</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb11-48"><a href="#cb11-48" aria-hidden="true" tabindex="-1"></a>            format <span class="op">=</span> GL_RGBA<span class="op">;</span></span>
<span id="cb11-49"><a href="#cb11-49" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span> <span class="cf">else</span> <span class="op">{</span></span>
<span id="cb11-50"><a href="#cb11-50" aria-hidden="true" tabindex="-1"></a>            fprintf<span class="op">(</span>stderr<span class="op">,</span> <span class="st">&quot;Unsupported number of channels: </span><span class="sc">%d\n</span><span class="st">&quot;</span><span class="op">,</span> channels<span class="op">);</span></span>
<span id="cb11-51"><a href="#cb11-51" aria-hidden="true" tabindex="-1"></a>            exit<span class="op">(</span>EXIT_FAILURE<span class="op">);</span></span>
<span id="cb11-52"><a href="#cb11-52" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb11-53"><a href="#cb11-53" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-54"><a href="#cb11-54" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glTexImage2D<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> <span class="dv">0</span><span class="op">,</span> GL_RGB<span class="op">,</span> width<span class="op">,</span> height<span class="op">,</span> <span class="dv">0</span><span class="op">,</span> format<span class="op">,</span> GL_UNSIGNED_BYTE<span class="op">,</span></span>
<span id="cb11-55"><a href="#cb11-55" aria-hidden="true" tabindex="-1"></a>                        data<span class="op">));</span></span>
<span id="cb11-56"><a href="#cb11-56" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glGenerateMipmap<span class="op">(</span>GL_TEXTURE_2D<span class="op">));</span></span>
<span id="cb11-57"><a href="#cb11-57" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-58"><a href="#cb11-58" aria-hidden="true" tabindex="-1"></a>        stbi_image_free<span class="op">(</span>data<span class="op">);</span></span>
<span id="cb11-59"><a href="#cb11-59" aria-hidden="true" tabindex="-1"></a>	<span class="op">}</span></span>
<span id="cb11-60"><a href="#cb11-60" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb11-61"><a href="#cb11-61" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-62"><a href="#cb11-62" aria-hidden="true" tabindex="-1"></a><span class="co">// reset at the start of each frame to keep the same rendering order</span></span>
<span id="cb11-63"><a href="#cb11-63" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> reset_texture_pointer<span class="op">()</span> <span class="op">{</span></span>
<span id="cb11-64"><a href="#cb11-64" aria-hidden="true" tabindex="-1"></a>    ALOGV<span class="op">(</span><span class="st">&quot;resetting texture from </span><span class="sc">%u</span><span class="st"> to 0&quot;</span><span class="op">,</span>texture_pointer<span class="op">);</span></span>
<span id="cb11-65"><a href="#cb11-65" aria-hidden="true" tabindex="-1"></a>    texture_pointer <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb11-66"><a href="#cb11-66" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span>
<span id="cb11-67"><a href="#cb11-67" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb11-68"><a href="#cb11-68" aria-hidden="true" tabindex="-1"></a><span class="co">// if more textures are requested than are available, wrap around</span></span>
<span id="cb11-69"><a href="#cb11-69" aria-hidden="true" tabindex="-1"></a>GLuint next_texture<span class="op">()</span> <span class="op">{</span></span>
<span id="cb11-70"><a href="#cb11-70" aria-hidden="true" tabindex="-1"></a>    <span class="dt">size_t</span> last_texture_idx <span class="op">=</span> texture_pointer<span class="op">;</span></span>
<span id="cb11-71"><a href="#cb11-71" aria-hidden="true" tabindex="-1"></a>    <span class="dt">size_t</span> next_texture_idx <span class="op">=</span> <span class="op">(</span>texture_pointer <span class="op">+</span> <span class="dv">1</span><span class="op">)</span> <span class="op">%</span> texture_ids<span class="op">.</span>size<span class="op">();</span></span>
<span id="cb11-72"><a href="#cb11-72" aria-hidden="true" tabindex="-1"></a>    texture_pointer <span class="op">=</span> next_texture_idx<span class="op">;</span></span>
<span id="cb11-73"><a href="#cb11-73" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> texture_ids<span class="op">[</span>last_texture_idx<span class="op">];</span></span>
<span id="cb11-74"><a href="#cb11-74" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>I added UVs to the stored vertices, and split out the indices into two groups: one group to be rendered with the a set of shaders that use colors, and another group to be rendered using a set of shaders that use textures.</p>
<div class="sourceCode" id="cb12"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="dt">void</span> ovrGeometry<span class="op">::</span>CreateCube<span class="op">()</span></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="op">{</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a>    <span class="kw">struct</span> ovrCubeVertices</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>    <span class="op">{</span></span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>        <span class="dt">signed</span> <span class="dt">char</span> positions<span class="op">[</span><span class="dv">8</span><span class="op">][</span><span class="dv">4</span><span class="op">];</span></span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a>        <span class="dt">unsigned</span> <span class="dt">char</span> colors<span class="op">[</span><span class="dv">8</span><span class="op">][</span><span class="dv">4</span><span class="op">];</span></span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>        <span class="dt">float</span> uvs<span class="op">[</span><span class="dv">8</span><span class="op">][</span><span class="dv">2</span><span class="op">];</span></span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a>    <span class="op">};</span></span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>    <span class="at">static</span> <span class="at">const</span> ovrCubeVertices cubeVertices <span class="op">=</span> <span class="op">{</span></span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a>        <span class="co">// positions</span></span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a>        <span class="op">{</span></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a>            <span class="op">{-</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">},</span></span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>            <span class="op">{+</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">},</span></span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a>            <span class="op">{-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">},</span></span>
<span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a>            <span class="op">{+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">},</span></span>
<span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a>            <span class="op">{-</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">},</span></span>
<span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a>            <span class="op">{+</span><span class="dv">127</span><span class="op">,</span> <span class="op">-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">},</span></span>
<span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a>            <span class="op">{-</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">},</span></span>
<span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a>            <span class="op">{+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">,</span> <span class="op">+</span><span class="dv">127</span><span class="op">}},</span></span>
<span id="cb12-22"><a href="#cb12-22" aria-hidden="true" tabindex="-1"></a>        <span class="co">// colors</span></span>
<span id="cb12-23"><a href="#cb12-23" aria-hidden="true" tabindex="-1"></a>        <span class="op">{</span></span>
<span id="cb12-24"><a href="#cb12-24" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0x00</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">},</span></span>
<span id="cb12-25"><a href="#cb12-25" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0xff</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">},</span></span>
<span id="cb12-26"><a href="#cb12-26" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">},</span></span>
<span id="cb12-27"><a href="#cb12-27" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">},</span></span>
<span id="cb12-28"><a href="#cb12-28" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0x00</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">},</span></span>
<span id="cb12-29"><a href="#cb12-29" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0xff</span><span class="op">,</span> <span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">},</span></span>
<span id="cb12-30"><a href="#cb12-30" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0x00</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">},</span></span>
<span id="cb12-31"><a href="#cb12-31" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">,</span> <span class="bn">0xff</span><span class="op">}},</span></span>
<span id="cb12-32"><a href="#cb12-32" aria-hidden="true" tabindex="-1"></a>        <span class="co">// uvs</span></span>
<span id="cb12-33"><a href="#cb12-33" aria-hidden="true" tabindex="-1"></a>        <span class="op">{</span></span>
<span id="cb12-34"><a href="#cb12-34" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">0.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">0.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Bottom back left corner</span></span>
<span id="cb12-35"><a href="#cb12-35" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">1.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">0.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Bottom back right corner</span></span>
<span id="cb12-36"><a href="#cb12-36" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">0.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">1.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Top back left corner</span></span>
<span id="cb12-37"><a href="#cb12-37" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">1.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">1.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Top back right corner</span></span>
<span id="cb12-38"><a href="#cb12-38" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">0.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">0.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Bottom front left corner</span></span>
<span id="cb12-39"><a href="#cb12-39" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">1.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">0.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Bottom front right corner</span></span>
<span id="cb12-40"><a href="#cb12-40" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">0.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">1.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Top front left corner</span></span>
<span id="cb12-41"><a href="#cb12-41" aria-hidden="true" tabindex="-1"></a>            <span class="op">{</span><span class="fl">1.0</span><span class="bu">f</span><span class="op">,</span> <span class="fl">1.0</span><span class="bu">f</span><span class="op">},</span> <span class="co">// Top front right corner</span></span>
<span id="cb12-42"><a href="#cb12-42" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb12-43"><a href="#cb12-43" aria-hidden="true" tabindex="-1"></a>    <span class="op">};</span></span>
<span id="cb12-44"><a href="#cb12-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-45"><a href="#cb12-45" aria-hidden="true" tabindex="-1"></a>    <span class="at">static</span> <span class="at">const</span> <span class="dt">unsigned</span> <span class="dt">short</span> cube_indices<span class="op">[</span><span class="dv">30</span><span class="op">]</span> <span class="op">=</span> <span class="op">{</span></span>
<span id="cb12-46"><a href="#cb12-46" aria-hidden="true" tabindex="-1"></a>        <span class="dv">0</span><span class="op">,</span> <span class="dv">2</span><span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="dv">2</span><span class="op">,</span> <span class="dv">3</span><span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="co">// back</span></span>
<span id="cb12-47"><a href="#cb12-47" aria-hidden="true" tabindex="-1"></a>        <span class="dv">6</span><span class="op">,</span> <span class="dv">7</span><span class="op">,</span> <span class="dv">2</span><span class="op">,</span> <span class="dv">2</span><span class="op">,</span> <span class="dv">7</span><span class="op">,</span> <span class="dv">3</span><span class="op">,</span> <span class="co">// top</span></span>
<span id="cb12-48"><a href="#cb12-48" aria-hidden="true" tabindex="-1"></a>        <span class="dv">4</span><span class="op">,</span> <span class="dv">0</span><span class="op">,</span> <span class="dv">5</span><span class="op">,</span> <span class="dv">5</span><span class="op">,</span> <span class="dv">0</span><span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="co">// bottom</span></span>
<span id="cb12-49"><a href="#cb12-49" aria-hidden="true" tabindex="-1"></a>        <span class="dv">0</span><span class="op">,</span> <span class="dv">4</span><span class="op">,</span> <span class="dv">2</span><span class="op">,</span> <span class="dv">2</span><span class="op">,</span> <span class="dv">4</span><span class="op">,</span> <span class="dv">6</span><span class="op">,</span> <span class="co">// left</span></span>
<span id="cb12-50"><a href="#cb12-50" aria-hidden="true" tabindex="-1"></a>        <span class="dv">5</span><span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="dv">7</span><span class="op">,</span> <span class="dv">7</span><span class="op">,</span> <span class="dv">1</span><span class="op">,</span> <span class="dv">3</span>  <span class="co">// right</span></span>
<span id="cb12-51"><a href="#cb12-51" aria-hidden="true" tabindex="-1"></a>    <span class="op">};</span></span>
<span id="cb12-52"><a href="#cb12-52" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-53"><a href="#cb12-53" aria-hidden="true" tabindex="-1"></a>    <span class="at">static</span> <span class="at">const</span> <span class="dt">unsigned</span> <span class="dt">short</span> tex_indices<span class="op">[</span><span class="dv">6</span><span class="op">]</span> <span class="op">=</span> <span class="op">{</span></span>
<span id="cb12-54"><a href="#cb12-54" aria-hidden="true" tabindex="-1"></a>        <span class="dv">4</span><span class="op">,</span> <span class="dv">5</span><span class="op">,</span> <span class="dv">6</span><span class="op">,</span> <span class="dv">6</span><span class="op">,</span> <span class="dv">5</span><span class="op">,</span> <span class="dv">7</span><span class="op">,</span> <span class="co">// front</span></span>
<span id="cb12-55"><a href="#cb12-55" aria-hidden="true" tabindex="-1"></a>    <span class="op">};</span></span>
<span id="cb12-56"><a href="#cb12-56" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-57"><a href="#cb12-57" aria-hidden="true" tabindex="-1"></a>    VertexCount <span class="op">=</span> <span class="dv">8</span><span class="op">;</span></span>
<span id="cb12-58"><a href="#cb12-58" aria-hidden="true" tabindex="-1"></a>    ColorIndexCount <span class="op">=</span> <span class="dv">30</span><span class="op">;</span></span>
<span id="cb12-59"><a href="#cb12-59" aria-hidden="true" tabindex="-1"></a>    TextureIndexCount <span class="op">=</span> <span class="dv">6</span><span class="op">;</span></span>
<span id="cb12-60"><a href="#cb12-60" aria-hidden="true" tabindex="-1"></a>    IndexCount <span class="op">=</span> <span class="dv">0</span><span class="op">;</span></span>
<span id="cb12-61"><a href="#cb12-61" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-62"><a href="#cb12-62" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">0</span><span class="op">].</span>Index <span class="op">=</span> VERTEX_ATTRIBUTE_LOCATION_POSITION<span class="op">;</span></span>
<span id="cb12-63"><a href="#cb12-63" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">0</span><span class="op">].</span>Size <span class="op">=</span> <span class="dv">4</span><span class="op">;</span></span>
<span id="cb12-64"><a href="#cb12-64" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">0</span><span class="op">].</span>Type <span class="op">=</span> GL_BYTE<span class="op">;</span></span>
<span id="cb12-65"><a href="#cb12-65" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">0</span><span class="op">].</span>Normalized <span class="op">=</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb12-66"><a href="#cb12-66" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">0</span><span class="op">].</span>Stride <span class="op">=</span> <span class="kw">sizeof</span><span class="op">(</span>cubeVertices<span class="op">.</span>positions<span class="op">[</span><span class="dv">0</span><span class="op">]);</span></span>
<span id="cb12-67"><a href="#cb12-67" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">0</span><span class="op">].</span>Pointer <span class="op">=</span> <span class="op">(</span><span class="at">const</span> GLvoid <span class="op">*)</span>offsetof<span class="op">(</span>ovrCubeVertices<span class="op">,</span> positions<span class="op">);</span></span>
<span id="cb12-68"><a href="#cb12-68" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-69"><a href="#cb12-69" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">1</span><span class="op">].</span>Index <span class="op">=</span> VERTEX_ATTRIBUTE_LOCATION_COLOR<span class="op">;</span></span>
<span id="cb12-70"><a href="#cb12-70" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">1</span><span class="op">].</span>Size <span class="op">=</span> <span class="dv">4</span><span class="op">;</span></span>
<span id="cb12-71"><a href="#cb12-71" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">1</span><span class="op">].</span>Type <span class="op">=</span> GL_UNSIGNED_BYTE<span class="op">;</span></span>
<span id="cb12-72"><a href="#cb12-72" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">1</span><span class="op">].</span>Normalized <span class="op">=</span> <span class="kw">true</span><span class="op">;</span></span>
<span id="cb12-73"><a href="#cb12-73" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">1</span><span class="op">].</span>Stride <span class="op">=</span> <span class="kw">sizeof</span><span class="op">(</span>cubeVertices<span class="op">.</span>colors<span class="op">[</span><span class="dv">0</span><span class="op">]);</span></span>
<span id="cb12-74"><a href="#cb12-74" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">1</span><span class="op">].</span>Pointer <span class="op">=</span> <span class="op">(</span><span class="at">const</span> GLvoid <span class="op">*)</span>offsetof<span class="op">(</span>ovrCubeVertices<span class="op">,</span> colors<span class="op">);</span></span>
<span id="cb12-75"><a href="#cb12-75" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-76"><a href="#cb12-76" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">2</span><span class="op">].</span>Index <span class="op">=</span> VERTEX_ATTRIBUTE_LOCATION_UV<span class="op">;</span> <span class="co">// UV attribute</span></span>
<span id="cb12-77"><a href="#cb12-77" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">2</span><span class="op">].</span>Size <span class="op">=</span> <span class="dv">2</span><span class="op">;</span></span>
<span id="cb12-78"><a href="#cb12-78" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">2</span><span class="op">].</span>Type <span class="op">=</span> GL_FLOAT<span class="op">;</span></span>
<span id="cb12-79"><a href="#cb12-79" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">2</span><span class="op">].</span>Normalized <span class="op">=</span> <span class="kw">false</span><span class="op">;</span></span>
<span id="cb12-80"><a href="#cb12-80" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">2</span><span class="op">].</span>Stride <span class="op">=</span> <span class="kw">sizeof</span><span class="op">(</span>cubeVertices<span class="op">.</span>uvs<span class="op">[</span><span class="dv">0</span><span class="op">]);</span></span>
<span id="cb12-81"><a href="#cb12-81" aria-hidden="true" tabindex="-1"></a>    VertexAttribs<span class="op">[</span><span class="dv">2</span><span class="op">].</span>Pointer <span class="op">=</span> <span class="op">(</span><span class="at">const</span> GLvoid <span class="op">*)</span>offsetof<span class="op">(</span>ovrCubeVertices<span class="op">,</span> uvs<span class="op">);</span></span>
<span id="cb12-82"><a href="#cb12-82" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-83"><a href="#cb12-83" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glGenBuffers<span class="op">(</span><span class="dv">1</span><span class="op">,</span> <span class="op">&amp;</span>VertexBuffer<span class="op">));</span></span>
<span id="cb12-84"><a href="#cb12-84" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ARRAY_BUFFER<span class="op">,</span> VertexBuffer<span class="op">));</span></span>
<span id="cb12-85"><a href="#cb12-85" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBufferData<span class="op">(</span>GL_ARRAY_BUFFER<span class="op">,</span> <span class="kw">sizeof</span><span class="op">(</span>cubeVertices<span class="op">),</span> <span class="op">&amp;</span>cubeVertices<span class="op">,</span> GL_STATIC_DRAW<span class="op">));</span></span>
<span id="cb12-86"><a href="#cb12-86" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ARRAY_BUFFER<span class="op">,</span> <span class="dv">0</span><span class="op">));</span></span>
<span id="cb12-87"><a href="#cb12-87" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-88"><a href="#cb12-88" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glGenBuffers<span class="op">(</span><span class="dv">1</span><span class="op">,</span> <span class="op">&amp;</span>ColorIndexBuffer<span class="op">));</span></span>
<span id="cb12-89"><a href="#cb12-89" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ELEMENT_ARRAY_BUFFER<span class="op">,</span> ColorIndexBuffer<span class="op">));</span></span>
<span id="cb12-90"><a href="#cb12-90" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBufferData<span class="op">(</span>GL_ELEMENT_ARRAY_BUFFER<span class="op">,</span> <span class="kw">sizeof</span><span class="op">(</span>cube_indices<span class="op">),</span> cube_indices<span class="op">,</span> GL_STATIC_DRAW<span class="op">));</span></span>
<span id="cb12-91"><a href="#cb12-91" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ELEMENT_ARRAY_BUFFER<span class="op">,</span> <span class="dv">0</span><span class="op">));</span></span>
<span id="cb12-92"><a href="#cb12-92" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb12-93"><a href="#cb12-93" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glGenBuffers<span class="op">(</span><span class="dv">1</span><span class="op">,</span> <span class="op">&amp;</span>TextureIndexBuffer<span class="op">));</span></span>
<span id="cb12-94"><a href="#cb12-94" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ELEMENT_ARRAY_BUFFER<span class="op">,</span> TextureIndexBuffer<span class="op">));</span></span>
<span id="cb12-95"><a href="#cb12-95" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBufferData<span class="op">(</span>GL_ELEMENT_ARRAY_BUFFER<span class="op">,</span> <span class="kw">sizeof</span><span class="op">(</span>tex_indices<span class="op">),</span> tex_indices<span class="op">,</span> GL_STATIC_DRAW<span class="op">));</span></span>
<span id="cb12-96"><a href="#cb12-96" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ELEMENT_ARRAY_BUFFER<span class="op">,</span> <span class="dv">0</span><span class="op">));</span></span>
<span id="cb12-97"><a href="#cb12-97" aria-hidden="true" tabindex="-1"></a><span class="op">}</span></span></code></pre></div>
<p>I re-used the existing color shaders and added new ones for the texture face:</p>
<div class="sourceCode" id="cb13"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="at">const</span> <span class="dt">char</span> CUBE_TEX_VERTEX_SHADER<span class="op">[]</span> <span class="op">=</span></span>
<span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>    <span class="st">R&quot;GLSL(</span></span>
<span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="vs">        #define NUM_VIEWS 2</span></span>
<span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="vs">        #define VIEW_ID gl_ViewID_OVR</span></span>
<span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a><span class="vs">        #extension GL_OVR_multiview2 : require</span></span>
<span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a><span class="vs">        layout(num_views=NUM_VIEWS) in;</span></span>
<span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a><span class="vs">        in vec3 vertexPosition;</span></span>
<span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a><span class="vs">        in vec2 vertexUv;</span></span>
<span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a><span class="vs">        uniform mat4 ModelMatrix;</span></span>
<span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a><span class="vs">        uniform SceneMatrices</span></span>
<span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a><span class="vs">        {</span></span>
<span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a><span class="vs">            uniform mat4 ViewMatrix[NUM_VIEWS];</span></span>
<span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a><span class="vs">            uniform mat4 ProjectionMatrix[NUM_VIEWS];</span></span>
<span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a><span class="vs">        } sm;</span></span>
<span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a><span class="vs">        out vec2 UV;</span></span>
<span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a><span class="vs">        void main()</span></span>
<span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a><span class="vs">        {</span></span>
<span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a><span class="vs">            gl_Position = sm.ProjectionMatrix[VIEW_ID] * ( sm.ViewMatrix[VIEW_ID] * ( ModelMatrix * vec4( vertexPosition, 1.0 ) ) );</span></span>
<span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a><span class="vs">            UV = vertexUv;</span></span>
<span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a><span class="vs">        }</span></span>
<span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a><span class="vs">    </span><span class="st">)GLSL&quot;</span><span class="op">;</span></span>
<span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a><span class="at">static</span> <span class="at">const</span> <span class="dt">char</span> CUBE_TEX_FRAGMENT_SHADER<span class="op">[]</span> <span class="op">=</span></span>
<span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a>    <span class="st">R&quot;GLSL(</span></span>
<span id="cb13-25"><a href="#cb13-25" aria-hidden="true" tabindex="-1"></a><span class="vs">        in vec2 UV;</span></span>
<span id="cb13-26"><a href="#cb13-26" aria-hidden="true" tabindex="-1"></a><span class="vs">        out vec4 color;</span></span>
<span id="cb13-27"><a href="#cb13-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb13-28"><a href="#cb13-28" aria-hidden="true" tabindex="-1"></a><span class="vs">        uniform sampler2D Texture0;</span></span>
<span id="cb13-29"><a href="#cb13-29" aria-hidden="true" tabindex="-1"></a><span class="vs">        void main()</span></span>
<span id="cb13-30"><a href="#cb13-30" aria-hidden="true" tabindex="-1"></a><span class="vs">        {</span></span>
<span id="cb13-31"><a href="#cb13-31" aria-hidden="true" tabindex="-1"></a><span class="vs">            color = vec4(texture( Texture0, UV ).rgb, 1.0);</span></span>
<span id="cb13-32"><a href="#cb13-32" aria-hidden="true" tabindex="-1"></a><span class="vs">        }</span></span>
<span id="cb13-33"><a href="#cb13-33" aria-hidden="true" tabindex="-1"></a><span class="vs">        </span><span class="st">)GLSL&quot;</span><span class="op">;</span></span></code></pre></div>
<p>The main other substantive change was to add a new section to the frame rendering code to loop over the cubes and draw the texture faces:</p>
<div class="sourceCode" id="cb14"><pre class="sourceCode cpp"><code class="sourceCode cpp"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glUseProgram<span class="op">(</span>Scene<span class="op">.</span>TextureProgram<span class="op">.</span>Program<span class="op">));</span></span>
<span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindBufferBase<span class="op">(</span></span>
<span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>           GL_UNIFORM_BUFFER<span class="op">,</span></span>
<span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>           Scene<span class="op">.</span>TextureProgram<span class="op">.</span>UniformBinding<span class="op">[</span>ovrUniform<span class="op">::</span>Index<span class="op">::</span>SCENE_MATRICES<span class="op">],</span></span>
<span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a>           Scene<span class="op">.</span>SceneMatrices<span class="op">));</span></span>
<span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>    <span class="cf">if</span> <span class="op">(</span>Scene<span class="op">.</span>TextureProgram<span class="op">.</span>UniformLocation<span class="op">[</span>ovrUniform<span class="op">::</span>Index<span class="op">::</span>VIEW_ID<span class="op">]</span> <span class="op">&gt;=</span></span>
<span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>        <span class="dv">0</span><span class="op">)</span> <span class="co">// </span><span class="al">NOTE</span><span class="co">: will not be present when multiview path is enabled.</span></span>
<span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>    <span class="op">{</span></span>
<span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glUniform1i<span class="op">(</span>Scene<span class="op">.</span>TextureProgram<span class="op">.</span>UniformLocation<span class="op">[</span>ovrUniform<span class="op">::</span>Index<span class="op">::</span>VIEW_ID<span class="op">],</span> <span class="dv">0</span><span class="op">));</span></span>
<span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a>    <span class="op">}</span></span>
<span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a>    <span class="cf">for</span> <span class="op">(</span><span class="kw">auto</span> c <span class="op">:</span> Scene<span class="op">.</span>CubeData<span class="op">)</span> <span class="op">{</span></span>
<span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glBindVertexArray<span class="op">(</span>Scene<span class="op">.</span>Cube<span class="op">.</span>VertexArrayObject<span class="op">));</span></span>
<span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a>        GLint loc <span class="op">=</span> Scene<span class="op">.</span>TextureProgram<span class="op">.</span>UniformLocation<span class="op">[</span>ovrUniform<span class="op">::</span>Index<span class="op">::</span>MODEL_MATRIX<span class="op">];</span></span>
<span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a>        <span class="cf">if</span> <span class="op">(</span>loc <span class="op">&gt;=</span> <span class="dv">0</span><span class="op">)</span> <span class="op">{</span></span>
<span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a>            GL<span class="op">(</span>glUniformMatrix4fv<span class="op">(</span>loc<span class="op">,</span> <span class="dv">1</span><span class="op">,</span> GL_TRUE<span class="op">,</span> <span class="op">&amp;</span>c<span class="op">.</span>Model<span class="op">.</span>M<span class="op">[</span><span class="dv">0</span><span class="op">][</span><span class="dv">0</span><span class="op">]));</span></span>
<span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a>        <span class="op">}</span></span>
<span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glBindBuffer<span class="op">(</span>GL_ELEMENT_ARRAY_BUFFER<span class="op">,</span> Scene<span class="op">.</span>Cube<span class="op">.</span>TextureIndexBuffer<span class="op">));</span></span>
<span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glActiveTexture<span class="op">(</span>GL_TEXTURE0<span class="op">));</span></span>
<span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glBindTexture<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> c<span class="op">.</span>KaiTextureID<span class="op">));</span></span>
<span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a>        GL<span class="op">(</span>glDrawElements<span class="op">(</span>GL_TRIANGLES<span class="op">,</span> Scene<span class="op">.</span>Cube<span class="op">.</span>TextureIndexCount<span class="op">,</span> GL_UNSIGNED_SHORT<span class="op">,</span> <span class="kw">nullptr</span><span class="op">));</span></span>
<span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a>    <span class="op">}</span></span>
<span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a>    glBindTexture<span class="op">(</span>GL_TEXTURE_2D<span class="op">,</span> <span class="dv">0</span><span class="op">);</span></span>
<span id="cb14-26"><a href="#cb14-26" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glBindVertexArray<span class="op">(</span><span class="dv">0</span><span class="op">));</span></span>
<span id="cb14-27"><a href="#cb14-27" aria-hidden="true" tabindex="-1"></a>    GL<span class="op">(</span>glUseProgram<span class="op">(</span><span class="dv">0</span><span class="op">));</span></span></code></pre></div>
<p>What I’m trying to illustrate is that the code you’re working with in this SDK looks and behaves very similarly to run of the mill OpenGL rendering code. For the fancy stuff, like per-eye matrix transforms, the sample programs give you a strong foundation to build on rather than you having to implement them from scratch.</p>
<p>In addition to the above changes to rendering spatial anchors, I also had fun messing around with XR controls. You call into the OpenXR bindings to do so, but it feels largely the same as dealing with game controller inputs, although these are more varied and much cooler.</p>
<p>There’s a lot of overlap between the APIs, libraries, build systems used both in these projects and in other OpenGL code that has been authored over the last several decades - tutorials, example projects, open source games, and more. When you read those projects and learn from them, you eventually reach a plateau where it’s unclear what to do next to continue learning. The answer here is just to continue getting exposure to more code and to continue building and thinking critically about what you write. This is the point where developing with Meta Quest becomes an extremely appealing.</p>
<p>I found it personally extremely cool to see how much is accomplished with code that is this simple. The code samples are doing interfacing with numerous sensors and 3D rendering on a tight frame budget, while maintaining at a high frame rate, on constrained hardware, using a less performant graphics backend for demonstrative purposes (using Vulkan would cause a large portion of the surface area of these programs to be dedicated to setup and bookkeeping).</p>
<p>You probably already know if you would enjoy hacking on one. If so, I’d definitely recommend picking one up. Quest 2’s are pretty cheap these days, and you’ve always got the return period if you’re not feeling it.</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Getting started with VR in C++ on Meta Quest">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Sun, 26 Nov 2023 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/meta-quest-sdk.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>Running Dreambooth in Stable Diffusion with low VRAM</title>
    <link>https://jeskin.net/blog/dreambooth-low-vram.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on January 14, 2023
        
            by Jon Eskin
        
    </section>
    <section>
        <h2 id="introduction">Introduction</h2>
<p>In a <a href="https://arxiv.org/abs/2208.12242">recent whitepaper</a>, researchers described a technique to take existing pre-trained text-to-image models and embed new subjects, adding the capability to synthesize photorealistic images of the subject contextualized in the model’s output.</p>
<p>A series of implementations were quickly built, finding their way to the <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui">Stable Diffusion web UI</a> project in the form of the <a href="https://github.com/d8ahazard/sd_dreambooth_extension">sd_dreambooth_extension</a>.</p>
<p>This adds the ability to fine-tune models on images of particular subjects.</p>
<p>Images generated this way come out much better than the baseline model, especially when you start adding additional context.</p>
<p>Unfortunately, the process is resource-intensive, and if your system low VRAM it’s easy for dreambooth to crash with an error like this:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="ex">RuntimeError:</span> No executable batch size found, reached zero.</span></code></pre></div>
<p>In this blog post, I’ll show a few of the optimizations I used to fix this error and get dreambooth running with on my 8GB Nvidia Geforce 1070. I’ll also show a few of the problems I encountered along the way and how I fixed them.</p>
<p>An arch linux system was used, but these optimizations should work on any OS, possibly with minor modifications.</p>
<h2 id="optimizations">Optimizations</h2>
<h3 id="use-xformers-for-image-generation">Use xFormers for image generation</h3>
<p><a href="https://github.com/facebookresearch/xformers">xFormers</a> is a library written by facebook research that improves the speed and memory efficiency of image generation. To install it, stop stable-diffusion-webui if its running and build xformers from source by following <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers">these instructions</a>.</p>
<p>Next, you’ll need to add a commandline parameter to enable xformers the next time you start the web ui, like in this line from my <code>webui-user.sh</code>:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="bu">export</span> <span class="va">COMMANDLINE_ARGS</span><span class="op">=</span><span class="st">&quot;--medvram --xformers&quot;</span></span></code></pre></div>
<p>The next time you launch the web ui it should use xFormers for image generation.</p>
<p>You’ll also need to make sure to explicitly use xformer when you’re training inside Dreambooth tab of the web UI under <code>Settings &gt; Advanced &gt; Memory Attention (select xformers)</code></p>
<h3 id="tweak-dreambooth-settings">Tweak dreambooth settings</h3>
<p>Dreambooth ran succesffully when I used the following settings in the Dreambooth tab of the web ui:</p>
<pre><code>Use LORA: unchecked
Training Steps Per Image (Epochs): 150
batch size: 1
Learning Rate Scheduler: constant with warmup
Learning Rate: 0.000002
Resolution: 512
Use EMA: unchecked
Use 8bit Adam: checked
Mixed precision: fp16
Memory Attention: xformers
Cache Latents: unchecked</code></pre>
<h3 id="run-stable-diffusion-without-a-graphical-environment">Run Stable Diffusion without a graphical environment</h3>
<p>I need to squeeze every last meg of VRAM out of my card, so I usually opt not to run a graphical environment on the machine running stable diffusion. On Linux, this means never starting the X server/Wayland compositor.</p>
<p>To do this, first you’ll need to configure the web ui to work from another machine on your local network.</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="bu">export</span> <span class="va">COMMANDLINE_ARGS</span><span class="op">=</span><span class="st">&quot;--medvram --listen --xformers&quot;</span></span></code></pre></div>
<p>You’ll want to disable your display manager or similar mechanism and then reboot to free up as much memory as possible.</p>
<p>Then, using another machine on your network (a wifi-connected smartphone works as well), navigate to [ip address of machine running webui]:[port shown in process output] and control it from there.</p>
<h3 id="reboot-between-attempted-trainings">Reboot between attempted trainings</h3>
<p>I’ve found that when I attempt a training, the follow up will often run OOM even when it should not.</p>
<p>Restarting webui or ensuring that there are no leaking GPU resources with <a href="https://github.com/Syllo/nvtop">nvtop</a> do not seem to solve the issue.</p>
<p>However, doing a full restart will cause the same configuration that ran OOM will succeed. Until I figure out what is leaking GPU memory like a sieve, I’ll probably continue doing full reboots between training.</p>
<h3 id="avoiding-lora">Avoiding LORA</h3>
<p>Many guides online recommend using LORA to reduce memory usage. However, on my setup, using LORA will cause runs to fail that succeed when I disable it.</p>
<p>If LORA isn’t working for you, try turning it off.</p>
<h3 id="disabling-preview-images">Disabling preview images</h3>
<p>Setting <code>Save model frequency</code>, <code>Save preview frequency</code>, and <code>Generate Classification Images Using txt2img</code> gives you preview images at epoch checkpoints that you can view to check for overtraining.</p>
<p>Unfortunately the preview images seem to use GPU memory and cause my trainings to fail.</p>
<p>As a result, I end up only setting <code>Save model frequency</code>. When training completes, I bisect the completed checkpoints and check for overtraining.</p>
<p>For example, if I generate 12 checkpoints, I’ll open up checkpoint 12 and run a few images that include my trained subject in a new context. If the images are incapable of including the new context, e.g. the image above showed Gillian Anderson’s face but didn’t respond to the “starfleet officer” prompt, that means the model has been overtrained.</p>
<p>So I would open checkpoint 6, and if that worked, I would try checkpoint 9, and so on.</p>
<h2 id="encountered-issues">Encountered Issues</h2>
<p>Below are a hodgepodge of problems I ran into along the way, along with how I overcame them.</p>
<h3 id="exception-training-model-could-not-run-xformersefficient_attention_forward_cutlass-with-arguments-from-the-cuda-backend.">Exception training model: ‘Could not run ’xformers::efficient_attention_forward_cutlass’ with arguments from the ‘CUDA’ backend.</h3>
<p>I got this when I did not properly follow <a href="https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers">xformers installation steps</a>.</p>
<h3 id="name-str2optimizer8bit_blockwise-is-not-defined">Name ‘str2optimizer8bit_blockwise’ is not defined</h3>
<p>This was caused by python being unable to find a CUDA shared library on my system. To fix it, you’ll need to make sure the folders where CUDA’s libraries are located are set on your system’s path.</p>
<p>On arch linux, I did that by setting environment variable <code>LD_LIBRARY_PATH="/opt/cuda/targets/x86_64-linux/lib:$LD_LIBRARY_PATH"</code>.</p>
<h3 id="img-should-be-pil-image">Img should be PIL image</h3>
<p>A number of posts online were saying to check “Apply Horizontal Flip” in the dreambooth settings, but doing so was giving me this error.</p>
<p>I didn’t really look much into it, I just unchecked it and the error no longer occurred.</p>
<h3 id="zipfile.badzipfile-file-is-not-a-zip-file">zipfile.BadZipFile: File is not a zip file</h3>
<p>After I initially got dreambooth to train successfully, I exported the result as a .ckpt file. When I tried to load the checkpoint to produce the image, the model wouldn’t load, and I could see<code>zipfile.BadZipFile: File is not a zip file</code> in the server’s output.</p>
<p>I looked at the .ckpt file that was being created and compared it to a valid one from a different model. The valid one looked like a zip archive, but the erronous one I was generating was just a yaml file with a .ckpt filename.</p>
<p>I discovered that setting a custom model name in the “Saving” tab in dreambooth settings caused both a yaml and ckpt to be correctly created, and I could load and use the ckpt file.</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: Running Dreambooth in Stable Diffusion with low VRAM">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Sat, 14 Jan 2023 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/dreambooth-low-vram.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>
<item>
    <title>FIFO based IRC with ii and lchat</title>
    <link>https://jeskin.net/blog/irc-ii-lchat.html</link>
    <description><![CDATA[<article>
    <section class="header">
        Posted on January  3, 2023
        
            by Jon Eskin
        
    </section>
    <section>
        <h1 id="introduction">Introduction</h1>
<p>lchat is a small program that is designed to interface with other programs such as ii that present FIFO interfaces. This blog post will briefly show what that means and how lchat works in practice.</p>
<p><img src="/images/lchat_preview.png" width="100%"></p>
<h1 id="ii">ii</h1>
<p>A typical irc client works by negotiating a connection with an IRC server before presenting the user with some kind of interface to interact with the server.</p>
<p>For example, the popular client <a href="https://weechat.org/">weechat</a> draws a TUI interface that shows the server you’re connected to, what channels you’ve joined, and so on.</p>
<p>Similar to many other interactive programs, it continually listens for the user’s keyboard input. When the user enters messages or commands, weechat marshals information back and forth between the IRC server, updates its internal state, and redraws its interface.</p>
<p>ii (short for “irc it”) is an IRC client which works at a much lower level. It doesn’t draw anything or listen for keyboard or mouse input at all.</p>
<p>Instead, represents your connection to an IRC server as a hierarchy of files.</p>
<p>Each entity that you interact with in IRC, be it the server (to issue commands), or a channel, or a user, gets its own directory and its own pair of files.</p>
<p>One file is named <code>in</code>, and anything written to that file is sent to the process. Another file is named <code>out</code>, and anything received from the process is written to it.</p>
<p>Reading and writing to these files is not done by <code>ii</code> itself, but by some other existing part of the system, such as <code>echo</code> for writing and <code>cat</code> for reading.</p>
<p>Representing processes by files in this manner is known as using a “Named Pipe” or a “FIFO” in Unix systems.</p>
<h1 id="using-ii">using ii</h1>
<p>If you run <code>ii</code> on its own, you’ll see a concise summary of its usage:</p>
<pre><code>usage: ii -s host [-p port | -u sockname] [-i ircdir]
[-n nickname] [-f fullname] [-k env_pass]</code></pre>
<p>When I run <code>ii -s irc.libera.chat -n eskin</code>, ii connects me to libera.chat. By default this creates a few folders in “~/irc”:</p>
<p><img src="/images/ii1.png" width="100%"></p>
<p>The tty that I ran <code>ii</code> from is now taken over by a process that maintains a connection to the server and displays all messages from the server. To interact with ii, I switch to another terminal and interact with the <code>in</code> and <code>out</code> files created for the server I joined.</p>
<p>Here’s me failing to login:</p>
<p><img src="/images/ii2.png" width="100%"></p>
<p>And here’s me joining a room:</p>
<p><img src="/images/ii3.png" width="100%"></p>
<p>Writing to <code>in</code> files and reading from <code>out</code> files with <code>echo</code> and <code>cat</code> are a little unwieldly, which is where <code>lchat</code> comes in.</p>
<h1 id="lchat">lchat</h1>
<p>lchat expects a command line argument containing the location of a directory containing <code>in</code> and <code>out</code> files.</p>
<p>The command starts a process that continually reads from the <code>out</code> file and displaying its content in the terminal. At the same time, it continually provides a message area for you to type messages and submit them by pressing ‘Enter’. When you do, the contents of the message area are sent to the <code>in</code> file.</p>
<p>In the image below, I navigate to the #linux subdirectory which is where the <code>in</code> and <code>out</code> files for the linux channel exist on my filesystem and the enter <code>lchat .</code> to start lchat in that location.</p>
<p><img src="/images/lc.png" width="100%"></p>
<p>lchat will work with any program that uses FIFO this way, including a few programs listed on the <a href="https://tools.suckless.org/lchat/">lchat homepage</a>.</p>
<h1 id="installation">Installation</h1>
<p>If you’d like to try out lchat, you’ll first need to grab ii:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://git.suckless.org/ii</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> ii</span></code></pre></div>
<p>If you want to use SSL, you’ll want to apply a patch to the repository:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">wget</span> https://tools.suckless.org/ii/patches/ssl/ii-2.0-ssl.diff</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> apply ii-2.0-ssl.diff</span></code></pre></div>
<p>Then build and install ii:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="fu">make</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="fu">sudo</span> make install</span></code></pre></div>
<p>Finally, you’ll want to clone, build, make, and install lchat:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode bash"><code class="sourceCode bash"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="fu">git</span> clone https://git.suckless.org/lchat</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="bu">cd</span> lchat</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="fu">make</span></span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a><span class="fu">sudo</span> make install</span></code></pre></div>
<hr />
<p>I’m a fan of the filebased approach to irc, so that I can keep it passively running without keeping track of an active window.</p>
<p><code>ii</code> and <code>lchat</code> are fun, hackable projects that are fun to play around with. <code>lchat</code> was the first use I saw of <a href="https://libs.suckless.org/libgrapheme/">libgrapheme</a> in the wild, so that was cool to see. Check it out if you’re in to that sort of stuff!</p>
    </section>
    <section class="comment-footer">
        <a href="mailto:eskinjp@gmail.com?subject=Re: FIFO based IRC with ii and lchat">Comment via email</a>
    </section>
</article>
]]></description>
    <pubDate>Tue, 03 Jan 2023 00:00:00 UT</pubDate>
    <guid>https://jeskin.net/blog/irc-ii-lchat.html</guid>
    <dc:creator>Jon Eskin</dc:creator>
</item>

    </channel>
</rss>
