Blog Posts
- The Naive Roofline Model in Performance Modeling -
- Ring Attention - scaling attention across multiple devices -
- The basic idea behind FlashAttention -
- What is the Transformer KV Cache? -
- Fun with x86-64 assembly -
- CPU vs. GPU for neural networks -
- Why do LLM input tokens cost less than output tokens? -
- Condorcet paradox and its implications -
- Molyneux's problem -
- English and LLMs -
- How are 2D and 3D thread blocks linearized into warps in CUDA? -
- Profiling CUDA programs on WSL 2 -
- Compression of Base64 -
- BatchNorm and the curious case of training vs. inference variance -
- Token selection strategies: Top-K, Top-P, and Temperature -
- Avoid Early Stopping in A/B Tests -
- Why internal documentation is lacking -
- Error Bars -
- What iteration order can you expect from a Java HashMap? -
- Gummy Bears -
- JShell: The Java REPL -
- Leaf Plots for interpreting test results -
- How accurate is your test? -
- How is Vaccine Efficacy Measured? -
- Don't dilute your A/B tests -
- Dealing with Imposter Syndrome -
- Reflections on 10 years of running -
- Gaining a better understanding of statistical inference -
- The friendship paradox and why your friends are more popular than you -
- Building Binomial and Multinomial Samplers in Java -
- Why a Java String may not be a String -
- Running during shelter in place -
- Testing for no effect -
- Leaky Abstractions -
- The Bus Factor -
- Avoid accidental library functions -
- How long is that string? -
- Use BLUF to improve your written communication -
- Java Date and Time APIs -
- What every software engineer should know about characters/strings -
- Target prices differ between online and in-store -
- Hugo: First Impressions -
- My Hugo Setup -
- Welcome to the new site! -