Making Natural Changes in Tabular Data

Summary:

Modifying an attribute in tabular data often creates unnatural instances by breaking the relationship with other attributes. This is an important problem in applications like fairness testing, where the counterfactual must be both natural and minimally changed from the original instance.
In this work, we address the challenge of generating such counterfactuals.
Our approach analyzes the relationship between the attribute of interest and other attributes in the dataset. If the relation is weak, it simply flips the attribute; if it is strong, it uses an adversarial framework to learn a latent representation that removes information about the attribute. This removal enables precise modifications, making only the necessary adjustments to maintain naturalness.

Relevant Papers:

  1. Becker, Barry, and Ronny Kohavi. "Adult." UCI Machine Learning Repository. 1996. DOI: https://doi.org/10.24432/C5XW20.
  2. Bellamy, Rachel K. E., et al. "AI Fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias." IBM Journal of Research and Development 63.4/5 (2019): 4–1.
  3. Chan, Jason, and Jing Wang. "Hiring preferences in online labor markets: Evidence of a female hiring bias." Management Science 64.7 (2018): 2973–2994.
  4. Cinquini, Martina, and Riccardo Guidotti. "Causality-aware local interpretable model-agnostic explanations." World Conference on Explainable Artificial Intelligence. 2024: 108–124.
  5. Downs, Michael, Jonathan L. Chu, Yaniv Yacoby, Finale Doshi-Velez, and Weiwei Pan. "Cruds: Counterfactual recourse using disentangled subspaces." ICML WHI 2020 (2020): 1–23.
  6. Fan, Ming, Wenying Wei, Wuxia Jin, Zijiang Yang, and Ting Liu. "Explanation-guided fairness testing through genetic algorithm." Proceedings of the 44th International Conference on Software Engineering. 2022: 871–882.
  7. Garg, Prateek, Lokesh Nagalapatti, and Sunita Sarawagi. "From Search To Sampling: Generative Models For Robust Algorithmic Recourse." arXiv preprint arXiv:2505.07351 (2025).
  8. Han, Jiawei, Jian Pei, and Yiwen Yin. "Mining frequent patterns without candidate generation." ACM SIGMOD Record 29.2 (2000): 1–12.
  9. Hofmann, Hans. "Statlog (German Credit Data)." UCI Machine Learning Repository. 1994. DOI: https://doi.org/10.24432/C5NC77.
  10. Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. "Towards realistic individual recourse and actionable explanations in black-box decision making systems." arXiv preprint arXiv:1907.09615 (2019).
  11. Kim, Hyemi, Seungjae Shin, JoonHo Jang, Kyungwoo Song, Weonyoung Joo, Wanmo Kang, and Il-Chul Moon. "Counterfactual fairness with disentangled causal effect variational autoencoder." Proceedings of the AAAI Conference on Artificial Intelligence 35 (2021): 8128–8136.
  12. Madaan, Nishtha, and Srikanta Bedathur. "Navigating the Structured What-If Spaces: Counterfactual Generation via Structured Diffusion." 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). 2024: 710–722.
  13. Majumdar, Ayan. "On Computing Counterfactuals for Causal Fairness." Master’s Thesis, Saarland University. 2021.
  14. Moro, Rita P., S., and P. Cortez. "Bank Marketing." UCI Machine Learning Repository. 2014. DOI: https://doi.org/10.24432/C5K306.
  15. Mothilal, Ramaravind K., Amit Sharma, and Chenhao Tan. "Explaining machine learning classifiers through diverse counterfactual explanations." Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 2020: 607–617.
  16. Mukerjee, Amitabha, Rita Biswas, Kalyanmoy Deb, and Amrit P. Mathur. "Multi–objective evolutionary algorithms for the risk–return trade–off in bank loan management." International Transactions in Operational Research 9.5 (2002): 583–597.
  17. Nemirovsky, Daniel, Nicolas Thiebaut, Ye Xu, and Abhishek Gupta. "CounterGAN: Generating realistic counterfactuals with residual generative adversarial nets." arXiv preprint arXiv:2009.05199 (2020).
  18. Panagiotou, Emmanouil, Manuel Heurich, Tim Landgraf, and Eirini Ntoutsi. "TabCF: Counterfactual explanations for tabular data using a transformer-based VAE." Proceedings of the 5th ACM International Conference on AI in Finance. 2024: 274–282.
  19. Patki, Neha, Roy Wedge, and Kalyan Veeramachaneni. "The synthetic data vault." 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2016: 399–410.
  20. Pawelczyk, Martin, Klaus Broelemann, and Gjergji Kasneci. "Learning model-agnostic counterfactual explanations for tabular data." Proceedings of the Web Conference 2020. 2020: 3126–3132.
  21. Perarnau, Guim, Joost Van De Weijer, Bogdan Raducanu, and Jose M. Álvarez. "Invertible conditional GANs for image editing." arXiv preprint arXiv:1611.06355 (2016).
  22. Pfisterer, Florian. "national-longitudinal-survey-binary (OpenML dataset 43892), version 1." OpenML. 2022. https://www.openml.org/d/43892. Binarized extract from the U.S. Bureau of Labor Statistics National Longitudinal Surveys. Accessed: 2025-09-02.
  23. Poyiadzi, Rafael, Kacper Sokol, Raul Santos-Rodriguez, Tijl De Bie, and Peter Flach. "FACE: feasible and actionable counterfactual explanations." Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 2020: 344–350.
  24. ProPublica. "propublica/compas-analysis: Data and analysis for ‘Machine Bias’." GitHub repository. 2016. https://github.com/propublica/compas-analysis. Accessed: 2025-06-12.
  25. Qian, Zhaozhi, Bogdan-Constantin Cebere, and Mihaela van der Schaar. "SynthCity: facilitating innovative use cases of synthetic data in different data modalities." arXiv preprint arXiv:2301.07573 (2023). doi:10.48550/arXiv.2301.07573.
  26. Rajabi, Amirarsalan, and Ozlem Ozmen Garibay. "TabFairGAN: Fair tabular data generation with generative adversarial networks." Machine Learning and Knowledge Extraction 4.2 (2022): 488–501.
  27. Saleiro, Pedro, Benedict Kuester, Loren Hinkson, Jesse London, Abby Stevens, Ari Anisfeld, Kit T. Rodolfa, and Rayid Ghani. "Aequitas: A bias and fairness audit toolkit." arXiv preprint arXiv:1811.05577 (2018).
  28. Sohn, Kihyuk, Honglak Lee, and Xinchen Yan. "Learning structured output representation using deep conditional generative models." Advances in Neural Information Processing Systems 28 (2015).
  29. TabChange. "TabChange." Code repository: https://anonymous.4open.science/r/TabChange-AB63. 2025. Accessed: 2025-09-09.
  30. U.S. Census Bureau. "American Community Survey (ACS) 1-Year Estimates, 2018: S2704 — Public Health Insurance Coverage by Type and Selected Characteristics (Alabama)." data.census.gov. 2018. https://data.census.gov/table/ACSST1Y2018.S2704. Subject Table S2704. Geography: Alabama (state). Accessed: 2025-09-02.
  31. Ustun, Berk, Alexander Spangher, and Yang Liu. "Actionable recourse in linear classification." Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019: 10–19.
  32. Van Breugel, Boris, Trent Kyono, Jeroen Berrevoets, and Mihaela van der Schaar. "DECAF: Generating fair synthetic data using causally-aware generative networks." Advances in Neural Information Processing Systems 34 (2021): 22221–22233.
  33. Xiao, Yisong, Aishan Liu, Tianlin Li, and Xianglong Liu. "Latent imitator: Generating natural individual discriminatory instances for black-box fairness testing." Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis. 2023: 829–841.
  34. Xu, Depeng, Shuhan Yuan, Lu Zhang, and Xintao Wu. "FairGAN+: Achieving fair data generation and classification through generative adversarial nets." 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019: 1401–1406.
  35. Xu, Lei, Maria Skoularidou, Alfredo Cuesta-Infante, and Kalyan Veeramachaneni. "Modeling tabular data using conditional GAN." Advances in Neural Information Processing Systems 32 (2019).
  36. Yang, Zeyu, Han Yu, Peikun Guo, Khadija Zanna, Xiaoxue Yang, and Akane Sano. "Balanced mixed-type tabular data synthesis with diffusion models." arXiv preprint arXiv:2404.08254 (2024).
  37. Yin, Ziqiang, Wentian Zhao, and Tian Song. "Boundary-guided black-box fairness testing." 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2024: 1230–1239.
  38. Zhang, Lingfeng, Yueling Zhang, and Min Zhang. "Efficient white-box fairness testing through gradient search." Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis. 2021: 103–114.
  39. Zhang, Peixin, Jingyi Wang, Jun Sun, Xinyu Wang, Guoliang Dong, Xingen Wang, Ting Dai, and Jin Song Dong. "Automatic fairness testing of neural classifiers through adversarial sampling." IEEE Transactions on Software Engineering 48.9 (2021): 3593–3612.
  40. Zheng, Haibin, Zhiqing Chen, Tianyu Du, Xuhong Zhang, Yao Cheng, Shouling Ji, Jingyi Wang, Yue Yu, and Jinyin Chen. "NeuronFair: Interpretable white-box fairness testing through biased neuron identification." Proceedings of the 44th International Conference on Software Engineering. 2022: 1519–1531.