No wonder, Artificial Intelligence has become a major part of our lives over the past few years. Chatbots like Alexa and Siri are solely dedicated to making our everyday lives simpler and has numerous use cases. But, most of them are lacking in a major area, which remains a big challenge as of now. No prizes for guessing that! Of course, I am talking about language processing.

So, today, I was feeling like having a little chat with the Google Allo assistant. So, this is how it went,

Looks like our little bot did not get my context.

So, folks, this is the importance of NLP in AI. Without NLP, artificial intelligence only can understand our questions when they are simple, but will be completely incapable of understanding the meaning of words in context. NLP allows us to speak in our own language and still make the computer understand it.

A pinch of history

The history of NLP generally started in the 1950s with the famous Turing test (link). Then came the first chatbot ELIZA followed by PARRY, Racter, Jabberwacky, A.L.I.C.E etc. However, the real revolution took place with the glorious entry of machine learning algorithms, such as decision trees, followed by hidden Markov models and statistical models introduced by part-of speech tagging. Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms.Such algorithms are based on the concept of training the model with a particular dataset and expecting the probable result on a new dataset. We will talk about these algorithms in details in my next article.

Statistical Natural Language Processing

Statistical approaches to processing natural language text have become dominant in recent years, and the NLP researches are relying heavily on Machine Learning. These models make probabilistic decisions based on attaching a weight factor to each input feature. Also, statistical inference algorithms can be used to build machines which can function with unfamiliar or erroneous output.

Now, folks, if you remember, we had started with chatbots, right? Well, it’s one of the most diverse applications of Natural Language Processing.

Human language is not precise and for sure is complicated. That’s why we need an NLP language to decode human language in a form that the machine would understand. The NLP engine uses common libraries for performing tasks like tokenization and named entity recognition. Tokenization breaks sentences down into individual words and named entity recognition looks for words which are predefined in different categories.

For example, look at the conversation below:

The bot can greet me when I tell Hi. It responds with “greetings” or “hi there” or similar predefined words. But, then I responded with thanks and asked another question in the same line. Our bot then responded only for the first part, i.e thanks but not for the second part. I had to ask the question again for it to answer.

What if someone is looking to build a bot which will really be intelligent on its own and can break down complex human statements and sentences? To do so, your bot should understand the context and intent. To establish context and intent, you’ll need some additional NLP tasks that allow the NLP engine to understand the relationships between words. One way to achieve it is through parts of speech tagging or POS tagging.

What is POS tagging?

In English Grammar, we have different parts of speech like Noun, Pronoun, Verb, Adverb, Adjective etc. POS tagging refers to the process of assigning one of the parts of speech to the given word.

Ex - River (noun)
Beautiful (adjective)

This might look pretty simple but the real challenge is tagging words with a unique part of speech as some words might represent more than one part of speech depending on the context where it is used. For example, we all know that “house” is a noun, right? But, it’s POS changes according to the context. Look at the following examples.

House(noun) - a building for human habitation
House(JJ) - relating to a firm, institution, or society. Ex - "a house journal"
House (V) - provide with shelter or accommodation. Ex- "they converted a disused cinema to house twelve employees"

Parts Of Speech tagger or POS tagger is a program that does this job. Taggers use several kinds of information: dictionaries, rules etc. Dictionaries have different categories mentioned for a discrete word. For example, "run" is both noun and verb. Taggers use a probabilistic information to solve this ambiguity.

POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. Rule-based taggers use hand-written rules to distinguish the tag ambiguity. Stochastic tags either use Hidden Markov Models or Decision trees for tagging each word.

Approaches to Tagging

HMM tagging = The bold approach: 'Use all the information you have and guess’
CG tagging = The cautious approach: Don't guess, just eliminate the impossible!
TB tagging = The whimsical approach: 'Guess first, then
change your mind if necessary

HMM Tagging Example:

What is a Tagset?

Tagset is the set of tags from which the tagger is supposed to choose to attach to the relevant word.The collection of tags used for a particular task is known as a tagset. Tagsets used for representing different parts of speech are N (Noun), V(Verb), ADJ(Adjective), ADV(Adverb), PREP(Preposition), CONJ(Conjunction) etc.

POS Tagging Components


A part-of-speech tagger, or POS-tagger, processes a sequence of words, and attaches a part of speech tag to each word. The given text is divided into tokens so that they can be used for further analysis. This process is called tokenization.


-> text = word_tokenize("Let’s do something totally unique")

nltk.pos_tag(text)[('Let’s', 'RB'), ('do', 'VV')('something', 'NN'),('totally', 'RB'), (unique', 'JJ')]


2.Lexical Analyzer:

This is to use lexical analyzer for unknown words. Lexical Analyzer is also called scanner. It extracts token features. These are simple functions of the token describing some specific property of the sequence of characters that make up the token.


Disambiguation is identifying which sense of a word (i.e. meaning) is used in a sentence when the word has multiple meanings. For example, “dance” is most likely to be used as a verb. Disambiguation also depends on the context of the sentence and the order in which the tags/words are used. For example, in the sentence, “he lay on his back”, the word back is a noun but in the sentence, “He has his rich father backing him”, back is a verb here. It is one of the most complex parts of tagging.

The main problem of POS tagging is to resolve ambiguities and choosing the proper tag for the context.

Another alternative to POS tagging can be dependency parsing, which identifies phrases, subjects, and objects. For example, the sentence “Please deliver my veg noodles with no onion” might not be effective in communicating the instructions to a basic and simple bot, but our dependency parser would hopefully recognize that “no onion” is meant to modify “veg noodles.” Understanding the sentence context allows bots to act promptly like ask the user additional questions until they understand the request. From there, you can add more complex NLP tasks like sentiment analysis, which can analyze the mood of the user and take care of escalations wherever applicable.

In case you are venturing into building your first NLP machine, well there are a lot of options indeed. Python is often celebrated for its robust machine learning libraries, such as NLTK which can be used for basic NLP tasks, as well as some more advanced applications like deep learning.