BAT AI²: Bias Aversion Toolset for Application Information x AI

Home/
Portfolio/
BAT AI²: Bias Aversion Toolset for Application Information x AI

BAT AI²: Bias Aversion Toolset for Application Information x AI

In my last year as Director Data Projects for the Lindau Nobel Laureate Meetings, I finished an ambitious ML prediction project.

Problem Description

Every year, several thousand applications are being reviewed by a large number of human reviewers. As part of this process, it is impossible to avoid unconscious bias, which refers to reviewers unconsciously disadvantaging or favouring certain applications due to subliminal preferences. These could include certain views and expectations towards gender, nationality, university, field of research, and so on. The algorithm to be developed should alleviate this problem.

Solution Approach

The solution approach was two-fold:

Analyse the characteristics and features of a publication and develop metrics that provide at least some guidance regarding the quality of the application. For example: Number of publications that are often recited during a certain time period (~h-index). Or: Extent of extracurricular activities. Or: Sentiment of letters of recommendation.
Train a machine learning classifier based on existing data which applications should get accepted (of course, there is some inherent bias in this solution).
The predictions from the first two steps could then be compared with the recommendation by the reviewers (before final decision). In case of a mismatch, the application will be re-evaluated. This, of course, also allows for some evaluation of the evaluators.

Results

The developed toolset focused on step 2. Step 1 is not really a machine learning problem, and while challenging nevertheless, this was not my main interest. Step 3 is some simple web-programming with easy logic.

As is the case quite often, the actual model training is rather simple; I used the Python implementation of the XGBoost classifier. A lot of time and effort went into data preparation:

data extraction from the SQL database
anonymization (while keeping socio-cultural characteristics)
removal of personally identifiable information
converting PDFs to text
stemming and lemmatization
converting text to vectors
categorisation of features
feature selection
etc.

In the end, the solution achieved a prediction accuracy of 85%, which shows that the approach is working, but may not be good enough to be put into production. There is a variety of reasons for this value, particularly the system and structure of the evaluations, which is not perfect for predicting binary decisions. This could be solved though with some changes in how the labelled data is generated. I also briefly explored training a neural network on the data, but the dataset was not as big as needed for neural networks.

I aim to publish part of the project on Github, but a lot of it contains sensitive information which will need to be removed first.

Task

Create a bias-averting, privacy-compliant AI evaluation tool

Date

Mai 1, 2024
Skills

SQL, Python, Masking, NLP, Vectorisation, XGBoost
Client

Lindau Nobel Laureate Meetings

Open Website

BAT AI²: Bias Aversion Toolset for Application Information x AI