Twitter‘s algorithm for automatically cropping images attached to tweets often doesn’t focus on the important content in them. A bother, for sure, but it seems like a minor one on the surface. However, over the weekend, researchers found that the cropping algorithm might have a more serious problem: white bias.
Several users posted a lot of photos to show that in an image that has people with different colors, Twitter chooses to show folks with lighter skin after cropping those images to fit its display parameters on its site and embeds. Some of them even tried to reproduce results with fictional characters and dogs.
If you tap on these images, you’ll see an uncropped version of the image which includes more details such as another person or character. What’s odd is that even if users flipped the order of where dark-skinned and light-skinned people appeared in the image, the results were the same.
Underrepresentation of disabilities in datasets and how they are processed in NLP tasks is an important area of discussion that is often not studied empirically in the literature that primarily focuses on other demographic groups. There are many consequences of this, especially as it relates to how text related to disabilities is classified and has impacts on how people read, write, and seek information about this.
Research from the World Bank indicates that about 1 billion people have disabilities of some kind and often these are associated with strong negative social connotations. Utilizing 56 linguistic expressions as they are used in relation to disabilities and classifying them into recommended and non-recommended uses (following the guidelines from Anti-Defamation League, ACM SIGACCESS, and ADA National Network), the authors seek to study how automated systems classify phrases that indicate disability and whether usages split by recommended vs. non-recommended uses make a difference in how these snippets of text are perceived.
To quantify the biases in the text classification models, the study uses the method of perturbation. It starts by collecting instances of sentences that have naturally occurring pronouns he and she. Then, they replace them with the phrases indicating disabilities as identified in the previous paragraph and compare the change in the classification scores in the original and perturbed sentences. The difference indicates how much of an impact the use of a disability phrase has on the classification process.
Using the Jigsaw tool that gives the toxicity score for sentences, they test these original and perturbed sentences and observe that the change in toxicity is lower for recommended phrases vs. the non-recommended ones. But, when disaggregated by categories, they find that some of them elicit a stronger response than others. Given that the primary use of such a model might in the case of online content moderation (especially given that we now have more automated monitoring happening as human staff has been thinning out because of pandemic related closures), there is a high rate of false positives where it can suppress content that is non-toxic and is merely discussing disability or replying to other hate speech that talks about disability.
To look at sentiment scores for disability related phrases, the study looks at the popular BERT model and adopts a template-based fill-in-the-blank analysis. Given a query sentence with a missing word, BERT produces a ranked list of words that can fill the blank. Using a simple template perturbed with recommended disability phrases, the study then looks at how the predictions from the BERT model change when disability phrases are used in the sentence. What is observed is that a large percentage of the words that are predicted by the model have negative sentiment scores associated with them. Since BERT is used quite widely in many different NLP tasks, such negative sentiment scores can have potentially hidden and unwanted effects on many downstream tasks.
Such models are trained on large corpora, which are analyzed to build “meaning” representations for words based on co-occurrence metrics, drawing from the idea that “you shall know a word by the company it keeps”. The study used the Jigsaw Unintended Bias in Toxicity Classification challenge dataset which had a mention of a lot of disability phrases. After balancing for different categories and analyzing toxic and non-toxic categories, the authors manually inspected the top 100 terms in each category and found that there were 5 key types: condition, infrastructure, social, linguistic, and treatment. In analyzing the strength of association, the authors found that condition phrases had the strongest association, and was then followed by social phrases that had the next highest strongest association. This included topics like homelessness, drug abuse, and gun violence all of which have negative valences. Because these terms are used when discussing disability, it leads to a negative shaping of the way disability phrases are shaped and represented in the NLP tasks.
The authors make recommendations for those working on NLP tasks to think about the socio-technical considerations when deploying such systems and to consider the intended, unintended, voluntary, and involuntary impacts on people both directly and indirectly while accounting for long-term impacts and feedback loops.
Such indiscriminate censoring of content that has disability phrases in them leads to an underrepresentation of people with disabilities in these corpora since they are the ones who tend to use these phrases most often. Additionally, it also negatively impacts the people who might search for such content and be led to believe that the prevalence of some of these issues are smaller than they actually are because of this censorship. It also has impacts on reducing the autonomy and dignity of these people which in turn has a larger implication of how social attitudes are shaped.