Research | Arjun Dahal

Current Projects

My current focus is on diagnosis - finding which components are resposible for the bias, followed by repair - how to best intervene on the model to reduce bias with minimal impact on performance.

Bias Mitigation in LLMs

2026 - Current

Using mechanistic interpretability techniques for identifying and reducing biased behavior in large language models.

Bias Mitigation in Neural Networks

2025 - Current

Diagnosing whether bias in a neural network is spurious or embedded in the learned representation, then make minimal interventions to reduce the bias.

Past Projects

A key gap in fairness testing for machine learning is the generation of natural test cases.

My research focuses on generating natural test cases for evaluating the fairness in machine learning models. Much of my work focuses on how to test black-box models with natural test cases that respect the relationship between features. The model is trained on the generated test cases to mitigate bias in the model.

TabChange: Precise Attribute Changes in Tabular Data

2025 - 2026

Generate realistic counterfactuals for tabular data on sensitive attributes while preserving relationships among attributes. The counterfactual pair can be used for testing machine learning models for counterfactual defination of individual fairness.

Notes

Fairness Testing of Machine Learning Models using Combinatorial Testing in Latent Space

2024 - 2025

Generate test instances using combinatorial testing in the latent space of a VAE to uncover discriminatory behavior in black-box machine learning systems. Combinatorial testing is used for efficiently searching the latent space. Leverage the relative independence of latent features of VAE and a decoder that knows the constraints of the dataset to obtain test cases with high naturalness in the input domain. The black box model is tested for causal definition of Individual Fairness.

Dahal, Arjun, Sunny Shree, Yu Lei, Raghu N. Kacker, and D. Richard Kuhn. "Fairness Testing of Machine Learning Models using Combinatorial Testing in Latent Space." In 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pp. 268-277. IEEE, 2025.

Paper Notes

Undergraduate Publication

Nepali Speech Recognition using RNN-CTC Model

2018 - 2019

Applied a Recurrent Neural Network along with Connectionist Temporal Classification (CTC) loss for end-to-end Nepali language recognition.

Regmi, Paribesh, Arjun Dahal, and Basanta Joshi. "Nepali speech recognition using rnn-ctc model." International Journal of Computer Applications 178.31 (2019): 1-6.

Paper