Photo by Najib Kalil on Unsplash

With deep learning at the helm, we have seen a huge increase in the amount of data being used in the field of NLP. However, controlling the quality of these datasets becomes a challenging task. For example, dataset designers flock to crowdsourcing platforms to get help with the annotation task…


An intuitive mathematical introduction

Photo by Vincent Ledvina on Unsplash

[In my last article, I tried scratching the surface of understanding some different reasons of biased datasets in NLP. Feel free to go and take a look, as this article builds upon it!]

As seen earlier, datasets tend to get biased when certain terms get associated with one particular label…


Photo by Traf on Unsplash

Step 1: Download the embeddings

Choose the embeddings that work for you. I choose the Wikipedia 2014 + Gigaword 5 variant.

You can execute this code as-is in a Jupyter environment (eg- Google Colab). If not available, execute these separately in a shell.

Step 2: Parse the vocabulary and…


A hands-on tutorial

Photo by Filiberto Santillán on Unsplash

About

Transformer-encoder based language models like BERT [1] have taken the NLP community by storm, with both research and development strata utilising these architectures heavily to solve their tasks. They have become ubiquitous by displaying state-of-the-art results on a wide range of language tasks like text classification, next-sentence prediction, etc.

The…


An Introduction

Introduction

The Perspective API from Jigsaw and Google aims at moderating toxic content in online social media platforms to promote a more civil and inclusive environment for everyone.

Although it seems to work pretty well at first, but on a deeper look🕵️‍♂️ some serious cracks start to appear. …


A hands-on guide

Photo by Alex Knight on Unsplash

Introduction

RoBERTa

Since BERT (Devlin et al., 2019) came out, the NLP community has been booming with the Transformer (Vaswani et al., 2017) encoder based Language Models enjoying state of the art (SOTA) results on a multitude of downstream tasks.

The RoBERTa model (Liu et al., 2019) introduces some key modifications above…

Tanmay Garg

MTech, IIIT Delhi | NLP | ML | I am interested in knowing more about biased data/models

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store