Open Source Contributions

This page tracks open source contributions I’ve made over the years. I’ll use the classic excuse of “my best work is proprietary”, so this is not a complete list.

Hopefully this should at least paint a picture that I’m out in the world writing code. For open source projects which I have made myself, see the “Own Projects” heading below, or my GitHub profile.

2025

XLA: [docs] note mlir-hlo deprecation

GitHub PR #8812. The mlir-hlo tool is being deprecated, update the documentation to reflect this.

MLIR/LLVM: [mlir][python] add dict-style to IR attributes

GitHub PR #163200. Improving the usability of the MLIR Python bindings, by allowing attributes to be accessed in a dictionary-like way.

dvc: [cli,lock] Allow wait_for_lock

GitHub PR #10784. DVC’s purpose is to act as a more robust version of git LFS. However, I was integrating it in Bazel, meaning I was running multiple dvc processes in parallel (to pull data from a repo). DVC crashed in this case, as it generates a lock file that other processes crash if they encounter. This PR added a flag that means that other DVC processes can wait for the lock to free.

HuggingFace Transformers: fix: Fully remove legacy cache from Llama

GitHub PR #36958. In the popular HuggingFace Transformers library, I found that the Llama model still had partial support for the legacy cache system (a list of tensors), rather than the new system (a specialised class). This PR fully removed the legacy cache system, since attempting to use it would fail in a confusing way.

2024

MLIR/LLVM: [mlir,python] Fix case when FuncOp.arg_attrs is not set

GitHub PR #117188. In the MLIR Python API, there was a KeyError that was raised if one tried to access the arg_attrs of a FuncOp that did not have any. This PR fixed this, by returning an empty dictionary in this case, rather than raising an error. An alternative could have been to return None, but I felt that an empty dictionary was more consistent with the behaviour of the rest of the API, and better suited the requirements of the Python API.

MLIR/LLVM: [mlir,python] Expose replaceAllUsesExcept to Python bindings

GitHub PR #115850. MLIR’s Python bindings are great for quickly hacking and exploring IR transformations. However, I found that a useful method Value.replaceAllUsesExcept() was not exposed. I added this, with appropriate tests.

MLIR/LLVM: [mlir] Fix remove-dead-values pass throws error when module has a name

GitHub PR #109990. Encountered a bug when using OpenXLA’s StableHLO, which gives MLIR modules names (module @IrToHlo.6443). This caused the remove-dead-values pass to throw an error, since it was expecting a module without a name. Could not find a good reason for this, so I fixed it, providing a test case to ensure it doesn’t regress.

Docker Suno API: Improved docker compose integration

GitHub PR #115. This repo provides an API to the Suno AI music generation tool. I used it as part of another project I was working on. However, this project used a containerised approach (for better modularity and security controls/clear trust boundaries). This PR made the project more amenable to this goal.

MLIR/LLVM: [mlir] Retain original identifier names for debugging

GitHub PR #79704. Currently under active development, with a high volume of discussion on the LLVM developer forums. This feature aims to add a flag to mlir-opt to keep the original names, which can be helpful for debugging compiler pipelines. Right now, identifier names are made anonymous, e.g., %my_input becomes %arg0. Meaningful variable names can make it easy to reason about behaviour, hence why this feature is valuable. Three designs are under consideration, although since this touches the core of MLIR, it is key that caution and careful consideration is used before merging.

<!– The motivation for this

came from trying to add my own dialect, and experiencing some friction since it had been a while. –>

Triton: [TESTING] Added precision option to benchmark CSV data saving

GitHub PR #2933. Triton has a built in benchmarking suite, however I discovered that it was saving data with an unusually low level of precision (.1f). In this patch, I made the precision user-configurable, and set the default to 6. I do not think the downsides of higher precision, namely larger file sizes for the CSVs, is relevant compared to the downsides of losing data. By making the value configurable, this gives us the best of both worlds.

Triton: [CLEANUP] Fix typos across the project

GitHub PR #2876. This PR came from my initial reading of the documentation, and identification of a few spelling errors that impacted readability. After a suggestion from one of the maintainers, I used the automated spell checking tool codespell to do a more general cleanup of the codebase.

I was conservative in my correction criteria:

codespell provided suggestions, but I used my own discretion when applying them
I ignored anything in the third-party directory
Corrections were only on comments and docs, no code (even if a variable name was clearly a typo). Exceptions to this include:
- An error message string in AxisInfo.cpp
- An error message string in hip.c
- An error message string in WSPipeline.cpp
- Docstrings in tablegen files (still documentation, but is compiled)

2023

Apache TVM [fix][relay][qnn] Bug fix for 8-bit quantized mul

GitHub PR #14286. I identified that there was a case where operations within quantized CNN models were not being supported adequately. I reproduced the error with this gist. Upon closer inspection, I identified that the issue is related to the “Squeeze-and-Excitation block”, where we multiply the output of a sigmoid with an earlier output, found in models such as EfficientNet. This broke some of the assumptions of how quantization mul operations were implemented in TVM. I fixed the bug.

2022

Apache TVM: [docs] Update debugger.rst

GitHub PR #11231. TVM’s debugger and profiler is a very powerful tool, but was/is quite new and underutilised. The documentation did not reflect its correct usage, and I had to reverse engineer how it was implemented. My PR updated the documentation to reflect how the debugger can actually be used.

MLIR/LLVM: [mlir][docs] Broken link in MLIR Toy docs

Phabricator #D133977. A minor documentation fix, such that the Toy tutorial (many user’s first experience of MLIR, and a common reference point for MLIR developers) correctly linked to the correct location.

2021

Apache TVM: Better grouped convolution for CPU targets

GitHub PR #6137. This pull request replaced the original grouped convolution algorithm in TVM for x86 and Arm targets, with the faster Grouped Spatial Pack Convolutions (GSPC) algorithm. I developed this algorithm in my ASAP’2020 paper “Optimizing Grouped Convolutions on Edge Devices”. This is now the default algorithm used in TVM for all CPU code for grouped convolutions.

pypylon: Update setup.py to fix #296 (deprecate version)

GitHub PR #314. This PR officially deprecated support for an old version of Pylon (5), since it was no longer supported in other parts of the system. This ensured that users with the old version installed would not encounter issues.

2020

Apache TVM: Asymmetric padding and dilation in conv2d workload

GitHub PR #7142. The goal of this pull request was to make asymmetric padding and dilation a first-class citizen in 2D convolution. The previous workload description had hpad and wpad, however this is not representative of all of the possible configurations. Most conv2d implementations in TVM already support asymmetric padding in their algorithm, so by allowing workload description to reflect this, it could be exploited.

The process of developing this PR also uncovered a bug, where the output dimensions were not being properly calculated for fallback_schedules. Both asymmetric padding and dilation were not being considered properly, which was leading to some untested incorrect behaviour. For some cases, this could perhaps result in a schedule with a performance regression, but this has not been tested. I fixed the bug, and added a test case.

Own Projects

2025

petit-pois: A tool to archive podcasts and create a feed for them.
dnn64: A neural network inference engine for the Nintendo 64, using TVM and RSPL. See my blog posts.

2024

RSPL Examples: Examples of using the RSPL language for the N64 RSP GPU.
Triton Samples: Some home-brewed Triton kernels, with varying degrees of optimisation