NLP: Explaining BERT to Business PeopleArticle

Jun 23th, 2020 – Natural Language Processing: Explaining BERT to Business People

Request a demo

BERT is certainly a significant step forward in the context of NLP. Business activities such as topic detection and sentiment analysis will be much easier to create and execute, and the results much more accurate. But how did you get to BERT, and how exactly does the model work? Why is it so powerful? Last but not least, what benefits it can bring to the business, and our decision to integrate it into the sandsiv+ Customer Experience platform.

This time it is difficult, I have set myself an ambitious goal. To explain the transformers to people who have neither a background in programming nor in artificial intelligence. The challenge is great, I will try to do my best.

This morning I got it into my head to learn a new language, Portuguese. I like the sound of the language and I said to myself, let’s learn it! The first thing that came into my mind was to take some words in Portuguese translated from my mother tongue, Italian, in order to build a first elementary vocabulary.

It was amusing because some of the words sounded a lot like Italian, and in the context of Italian I tried to understand things as synonyms and autonomous so as to appear a little bit more scented than what they really are. So I tried to understand the semantics or the meaning of the relationships between words.

Practically I used a language I know well – my mother tongue – and then associated those new Portuguese language terms and slowly learned the new language. A similar process, thanks to deep machine learning and the considerable increase in computing power, has been done in the computational field. The computer knows only one language, mathematics, so you have to refer to that if you want to “teach” the machine the interpretation of a human language.

It is important to remember that any problem that is solved with Deep Learning is a mathematical problem. The computer, for example, “sees” thanks to a Convolutional Neural Network. This CNN receives the images in the form of mathematical matrices, whether they are black and white or color, and then applies linear algebra rules. The same happens in tasks such as topic detection, sentiment analysis, etc. The problem is mathematical and not linguistic if someone offers you a NLP solution that is language sensitive, know that it is probably 4 generations old, or even worst: a keyword search solution.

In the computer world, taking an unknown word – a Portuguese word in my case – and translating it into a known language (Italian) trying to learn, has been tackled by word embedding systems in vectors. Algorithms like FasText, Word2Vec, and GloVe have done exactly this: transforming words of any language into mathematical vectors that computers can “understand” applying linear algebra. Once again, a mathematical problem, not a linguistic one.

The next step in my effort to learn Portuguese was to translate small sentences. I listened carefully to each new Portuguese word to translate it into Italian. A linear operation with each new word coincided with my effort to translate. In the computer world, the same operation was done with algorithms called Encoder and Decoder. The system, “listening” sequentially to the “words” – mathematical vectors – translates them into new instructions, both language-to-language translations and computational models for “understanding”. …the tongue.

Of course, this allowed me to translate small sentences from Portuguese into Italian, but when the sentence became longer, or even became a whole document, at this point the word-for-word system no longer worked very well. I had to increase my ability to concentrate and try to better understand the context in which each new word was presented in Portuguese and of course relate it to my Italian knowledge. This was one more step than the word-by-word approach I used before. Even in the computing world, the next step was to add to the Encoder-Decoder models what is called “The attention mechanism” which allows the computer to pay more attention to the words in its context. Practically trying to apply those semantic rules that are missing in a linear process like Encoder and Decoder.

The approach, despite the increased level of attention, is always sequential, and clearly shows its limits. In my case, with every new word that comes from Portuguese, I try to read it in Italian paying a lot of attention, but I have to say that certain ambiguities of the language, I can hardly interpret them correctly. The same happens on the computer where complex semantic rules are hardly understood by the model.

I have to say that my level of translation from Portuguese to Italian has increased considerably, I can translate, albeit with some errors, sentences much longer than those made with the previous methods. At this point, however, I need more, I want to be faster, and more precise. I want to understand the context much better. I want to reduce ambiguities. I need some kind of parallel process as well as context knowledge, and finally, I need to understand long term dependencies. My computational process has exactly the same needs as mine, that’s where the Transformers come in.

Let’s take a little example, let’s look at these two sentences:

– I went to the bank to open an account.

– The ship had approached the bank.

The exact same word “bank” has two meanings that change in both contexts. You need to look at the sentence as a whole to understand the syntax and semantics. ELMo- Embedding from Language Models looks at the entire sentence to understand the syntax, semantics, and context to increase the accuracy of the NLP tasks.

My next step to better learn the Portuguese language was to read lots of books, listen to Portuguese television, watch Portuguese language movies, etc.. I tried to significantly increase my vocabulary, understand the language and its dependencies. My computer did the same thing. It has, for example, “read and memorized” all Wikipedia in Portuguese, it has done technically what is called Transfer Learning. In this way, my computer no longer starts from scratch when it has to perform any linguistic operation in Portuguese but has created a fairly vast level of knowledge in that language. The model that “learns” from a large body of words to have a strong initial understanding of the language is called Generative Pre-Trained Transformers (GPT). The model uses only the decoder part of the Transformer. It uses what it has learned from reading, for example, Wikipedia (Transfer Learning) and “reads” words from left to right (Uni-directional).

When you learn different aspects of the language, you realize that exposure to a variety of texts is very useful for applying Transfer Learning. Start reading books to build a strong vocabulary and understanding of the language. When some words in a sentence are masked or hidden, then rely on your knowledge of the language and read the entire sentence from left to right and right to left (two-way). Now you can predict masked words more accurately (Masked Language Modeling). It is like filling in the blanks. You can also predict when the two sentences are related or not (Prediction of the next sentence). This is a simple BERT -stands for Bidirectional Encoder Representations from Transformers (BERT) – is a technique for NLP pre-training developed by Google. Its job is quite self-explainable from the acronyms: bidirectional representations of encoders from transformers.

Are you confused? Don’t worry, so am I. I will only try to explain the real advantages of BERT from a practical point of view.

Of all the things we’ve seen before, BERT is a big evolution. It collects all the features of the previous models, from word embedding to transformers, with all the advantages they achieve. But it brings other very interesting practical innovations:

BERT is bidirectional, it doesn’t just “read” from left to right, it does the opposite. This allows it to better “understand” words in context. Not only for ambiguous words, but also for related words, an example: Mike has gone to the stage. He had a great time! BERT understands that “he” refers to Mike, which is no small thing to solve language problems.
BERT, while training, not only “reads”, but he hides 15% of the words and tries to “guess” them. In this way, he tries to create knowledge that goes beyond “reading”, but helps BERT to anticipate the word based on the previous context, and even predict the sentence based on the previous one. Which is no small thing in an automatic question and answer system, or a chatbot.
BERT offers several generic models that can be “uploaded” and then fine-tuned to the specific case (e.g. topic detection or sentiment analysis), without having a huge mass of data to do the fine-tuning. This is no small thing for those who have already tried to train NLP models by labeling the data.

I’ve been working in Natural Language Processing for several years – the real thing, not the keyword search that my competitors pass off as Text Mining – and I was impressed with BERT. Let me give you a small practical example. I built a sentiment model, with a final accuracy F1 89% from a dataset so composed:

1,270 happy
154 indifferent
26 angry
11 bored
3 frustrated

All this is only possible because by using Transfer Learning and the generic models available from BERT, even very small cases (in our case FRUSTRATED) can be fine-tuned. Practically as if you could load in your brain a model that summarizes the linguistic knowledge obtained by reading all Wikipedia in Portuguese, and then just do a little fine-tuning for the specific case you want to solve. A leap forward in NLP!

I forgot, BERT clearly provides a number of models for Transfer Learning. And it clearly offers a large number of languages. For example, the BERT-Base Multilingual Cased model “has read texts” in 104 different languages, and can be refined with your own small dataset in each of them.

BERT will soon be available in our sandsiv+ solution, and our customers will be able to take advantage of all these benefits that this great innovation brings in topic detection and sentiment analysis.

NLP: Explaining BERT to Business People

Author:
Federico Cesconi

Read the article on LinkedIn

Start growing with sandsiv+ today

Request a demo

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
__hssc	1 hour	HubSpot sets this cookie to keep track of sessions and to determine if HubSpot should increment the session number and timestamps in the __hstc cookie.
__hssrc	session	This cookie is set by Hubspot whenever it changes the session cookie. The __hssrc cookie set to 1 indicates that the user has restarted the browser, and if the cookie does not exist, it is assumed to be a new session.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	This cookie is set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
rc::a	Never Expires	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
rc::f	Never Expires	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__hstc	6 months	Hubspot set this main cookie for tracking visitors. It contains the domain, initial timestamp (first visit), last timestamp (last visit), current timestamp (this visit), and session number (increments for each subsequent session).
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_gtag_UA_*	1 minute	Google Analytics sets this cookie to store a unique user ID.
_gd_session	4 hours	This cookie is used for collecting information on users visit to the website. It collects data such as total number of visits, average time spent on the website and the pages loaded.
_gd_visitor	1 year 1 month 4 days	This cookie is used for collecting information on the users visit such as number of visits, average time spent on the website and the pages loaded for displaying targeted ads.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
_hjSession_*	1 hour	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjSessionUser_*	1 year	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
hubspotutk	6 months	HubSpot sets this cookie to keep track of the visitors to the website. This cookie is passed to HubSpot on form submission and used when deduplicating contacts.

Cookie	Duration	Description
mautic_device_id	1 year	This cookie is set by the provider Mautic.This cookie is used for identifying visitor across visits and devices. Mautic cookies are used for supporting marketing activities.
mautic_referer_id	1 hour	This cookie is set by the provider Mautic. This cookie is used for marketing purposes. It heps in tracking people submitting forms.
mtc_id	session	This cookie is set by the provider Mautic.This cookie is used for setting unique ID for visitor, to track visitor across multiple websites inorder to serve them with relevant advertisements. Mautic cookies are used for supporting marketing activities.
mtc_sid	session	This cookie is set by the provider Mautic.This cookie is used for setting unique ID for visitor, to track visitor across multiple websites inorder to serve them with relevant advertisements. Mautic cookies are used for supporting marketing activities.

Cookie	Duration	Description
_cfuvid	session	This cookie is set by Hubspot.
SNID	1 year 1 month 4 days	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.

Human experiences

By Focus

By Industry

By Use Case

Product overview

Listen & Integrate

Analyse & Predict

Implement & Improve

Measure & Succeed

Deploy everywhere

Latest recognitions

About us

What’s New

Customer stories

Get Started

Get Inspired

Partners Breaking News

NLP: Explaining BERT to Business PeopleArticle

Author:
Federico Cesconi

Human experiences

By Focus

By Industry

By Use Case

Product overview

Listen & Integrate

Analyse & Predict

Implement & Improve

Measure & Succeed

Deploy everywhere

Latest recognitions

About us

What’s New

Customer stories

Get Started

Get Inspired

Partners Breaking News

NLP: Explaining BERT to Business PeopleArticle

Author: Federico Cesconi

Author:
Federico Cesconi