Open Source Contributions
This page tracks open source contributions Iâve made over the years. Iâll use the classic excuse of âmy best work is proprietaryâ, so this is not a complete list.
Hopefully this should at least paint a picture that Iâm out in the world writing code. For open source projects which I have made myself, see the âOwn Projectsâ heading below, or my GitHub profile.
2025
XLA: [docs] note mlir-hlo deprecation
GitHub PR #8812. The mlir-hlo
tool is being deprecated, update the documentation to reflect this.
MLIR/LLVM: [mlir][python] add dict-style to IR attributes
GitHub PR #163200. Improving the usability of the MLIR Python bindings, by allowing attributes to be accessed in a dictionary-like way.
dvc: [cli,lock] Allow wait_for_lock
GitHub PR #10784. DVCâs purpose
is to act as a more robust version of git LFS. However, I was integrating it in
Bazel, meaning I was running multiple dvc processes in parallel (to pull data
from a repo). DVC crashed in this case, as it generates a lock file that other
processes crash if they encounter. This PR added a flag that means that other
DVC processes can wait for the lock to free.
HuggingFace Transformers: fix: Fully remove legacy cache from Llama
GitHub PR #36958. In
the popular HuggingFace Transformers library, I found that the Llama model
still had partial support for the legacy cache system (a list of tensors),
rather than the new system (a specialised class). This PR fully removed the
legacy cache system, since attempting to use it would fail in a confusing way.
2024
MLIR/LLVM: [mlir,python] Fix case when FuncOp.arg_attrs is not set
GitHub PR #117188. In the
MLIR Python API, there was a KeyError that was raised if one tried to access
the arg_attrs of a FuncOp that did not have any. This PR fixed this, by
returning an empty dictionary in this case, rather than raising an error. An
alternative could have been to return None, but I felt that an empty
dictionary was more consistent with the behaviour of the rest of the API, and
better suited the requirements of the Python API.
MLIR/LLVM: [mlir,python] Expose replaceAllUsesExcept to Python bindings
GitHub PR #115850. MLIRâs
Python bindings are great for quickly hacking and exploring IR transformations.
However, I found that a useful method
Value.replaceAllUsesExcept()
was not exposed. I added this, with appropriate tests.
MLIR/LLVM: [mlir] Fix remove-dead-values pass throws error when module has a name
GitHub PR #109990.
Encountered a bug when using OpenXLAâs
StableHLO, which gives MLIR modules names
(module @IrToHlo.6443). This caused the remove-dead-values pass to throw an
error, since it was expecting a module without a name. Could not find a good
reason for this, so I fixed it, providing a test case to ensure it doesnât
regress.
Docker Suno API: Improved docker compose integration
GitHub PR #115. This repo provides an API to the Suno AI music generation tool. I used it as part of another project I was working on. However, this project used a containerised approach (for better modularity and security controls/clear trust boundaries). This PR made the project more amenable to this goal.
MLIR/LLVM: [mlir] Retain original identifier names for debugging
GitHub PR #79704. Currently
under active development, with a high volume of discussion on the LLVM
developer
forums.
This feature aims to add a flag to mlir-opt to keep the original names, which
can be helpful for debugging compiler pipelines. Right now, identifier names are
made anonymous, e.g., %my_input becomes %arg0. Meaningful variable names can
make it easy to reason about behaviour, hence why this feature is valuable.
Three designs are under
consideration,
although since this touches the core of MLIR, it is key that caution and careful
consideration is used before merging.
<!â The motivation for this
came from trying to add my own dialect, and experiencing some friction since it had been a while. â>
Triton: [TESTING] Added precision option to benchmark CSV data saving
GitHub PR #2933. Triton has a
built in benchmarking suite, however I discovered that it was saving data with
an unusually low level of precision (.1f). In this patch, I made the precision
user-configurable, and set the default to 6. I do not think the downsides of
higher precision, namely larger file sizes for the CSVs, is relevant compared to
the downsides of losing data. By making the value configurable, this gives us
the best of both worlds.
Triton: [CLEANUP] Fix typos across the project
GitHub PR #2876. This PR came from
my initial reading of the documentation, and identification of a few spelling
errors that impacted readability. After a suggestion from one of the
maintainers, I used the automated spell checking tool
codespell to do a more
general cleanup of the codebase.
I was conservative in my correction criteria:
- codespell provided suggestions, but I used my own discretion when applying them
- I ignored anything in the third-party directory
- Corrections were only on comments and docs, no code (even if a variable name
was clearly a typo). Exceptions to this include:
- An error message string in
AxisInfo.cpp - An error message string in
hip.c - An error message string in
WSPipeline.cpp - Docstrings in tablegen files (still documentation, but is compiled)
- An error message string in
2023
Apache TVM [fix][relay][qnn] Bug fix for 8-bit quantized mul
GitHub PR #14286. I identified that there was a case where operations within quantized CNN models were not being supported adequately. I reproduced the error with this gist. Upon closer inspection, I identified that the issue is related to the âSqueeze-and-Excitation blockâ, where we multiply the output of a sigmoid with an earlier output, found in models such as EfficientNet. This broke some of the assumptions of how quantization mul operations were implemented in TVM. I fixed the bug.
2022
Apache TVM: [docs] Update debugger.rst
GitHub PR #11231. TVMâs debugger and profiler is a very powerful tool, but was/is quite new and underutilised. The documentation did not reflect its correct usage, and I had to reverse engineer how it was implemented. My PR updated the documentation to reflect how the debugger can actually be used.
MLIR/LLVM: [mlir][docs] Broken link in MLIR Toy docs
Phabricator #D133977. A minor documentation fix, such that the Toy tutorial (many userâs first experience of MLIR, and a common reference point for MLIR developers) correctly linked to the correct location.
2021
Apache TVM: Better grouped convolution for CPU targets
GitHub PR #6137. This pull request replaced the original grouped convolution algorithm in TVM for x86 and Arm targets, with the faster Grouped Spatial Pack Convolutions (GSPC) algorithm. I developed this algorithm in my ASAPâ2020 paper âOptimizing Grouped Convolutions on Edge Devicesâ. This is now the default algorithm used in TVM for all CPU code for grouped convolutions.
pypylon: Update setup.py to fix #296 (deprecate version)
GitHub PR #314. This PR officially deprecated support for an old version of Pylon (5), since it was no longer supported in other parts of the system. This ensured that users with the old version installed would not encounter issues.
2020
Apache TVM: Asymmetric padding and dilation in conv2d workload
GitHub PR #7142. The goal of this
pull request was to make asymmetric padding and dilation a first-class citizen
in 2D convolution. The previous workload description had hpad and wpad,
however this is not representative of all of the possible configurations. Most
conv2d implementations in TVM already support asymmetric padding in their
algorithm, so by allowing workload description to reflect this, it could be
exploited.
The process of developing this PR also uncovered a bug, where the output
dimensions were not being properly calculated for fallback_schedules. Both
asymmetric padding and dilation were not being considered properly, which was
leading to some untested incorrect behaviour. For some cases, this could perhaps
result in a schedule with a performance regression, but this has not been
tested. I fixed the bug, and added a test case.
Own Projects
2025
- petit-pois: A tool to archive podcasts and create a feed for them.
- dnn64: A neural network inference engine for the Nintendo 64, using TVM and RSPL. See my blog posts.
2024
- RSPL Examples: Examples of using the RSPL language for the N64 RSP GPU.
- Triton Samples: Some home-brewed Triton kernels, with varying degrees of optimisation