Automated QnA using deep learning

Sunil D Shashidhara

Mar 31, 20206 min read

Automated Question Answering and machine comprehension have gathered a lot of momentum recently with advances in Deep Learning, which became an essential tool for NLP (Natural Language Processing) and NLU (Natural Language Understanding). Smart personal assistants like Apple’s Siri and Google Assistant are becoming an indispensable part of user experience, with natural language interfaces enabling users to get answers to their questions and delegate various tasks to AI-powered software.

How does Question Answering work?

QnA models predict the best answer for a query (Q), given a passage (P) or a set of passages that contain the answer to that query. The task of a QnA model is to predict the best candidate answers by studying the passage and query interactively and evaluating various contextual relationships between them. To support the development of the state-of-the-art Machine Learning (ML) models for QnA, a number of large datasets were created. These include Stanford’s SQuAD for automated question answering. With the SQuAD dataset gaining popularity a lot of deep learning architectures were explored with a lot of these models achieving human-level accuracy in the task.

Stanford Question Answering Dataset reading comprehension dataset, consisting of questions posed by crowd-workers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage. With 100,000+ question-answer pairs on 500+ articles, SQuAD is significantly larger than previous reading comprehension datasets.

SQuAD dataset public leaderboard

Deepavlov

Let’s say you’re an insurance company and you have a bunch of insurance products (life insurance, health insurance, motor insurance .etc) and you want to improve customer service by using a QnA bot to answer some of the frequently asked queries about any given product. Building a bot in a traditional setting requires collecting data (based on your requirements) and annotating them. This is an extremely time-consuming process for a simple QnA bot. In such cases, it would be useful to build the bot using readily available models (pre-trained models). Here’s where DeepPavlov comes in.

DeepPavlov is an open-sourced conversational AI library built to make it easier for beginners and experts alike to create complex dialogue systems. Many of the important pre-trained components/models such as NER (identifying keywords), the Intent classifier (understanding intent of sentence), Semantic Sentence similarity (Finding sentences with similar meaning) .etc which are required to build end-to-end conversational systems are provided in the library. More information regarding the various pre-trained models can be found here. If you’re unaware of NER and Intent classifiers (the building blocks of a chatbot) it would be advisable to go through this.

The part we are interested in is the BERT based Question Answering pre-trained model which is trained on the SQuAD dataset. The way it works is that you provide the pre-trained model with a context string and a question. The model then marks the start and end position of the answer in the context string. Below is an example of deepPavlov in action.

Applications

Let us examine the capabilities of the model. As an example, product information of a health insurance product has been considered as the context string. The bot will be asked a series of questions and the model answer will be compared subjectively against the expected answers.

Energy plan is a health insurance which covers your pre-existing conditions and complications and is targeted towards individuals with diabetes.The main features of this plan are that it has no waiting period for diabetes and hypertension related hospitalization.

Other benefits include:
1. The no claim bonus is an increase in your basic sum insured by 10% for every claim free year up to a maximum of 100%.

2. The restore benefit is that you get 100% instant addition of your basic sum insured, on your first claim.

3. The premium amount paid under this policy qualifies for a tax benefit and deduction under Section 80D of the Income Tax Act.

4. There is a wellness incentive where consultation, diagnostic, medicine or other health expenses will be reimbursed.

Some more information regarding the product:
1. Energy plan can be bought by individuals with Diabetes Type 1, Diabetes mellitus Type 2, pre-diabetes (IFG, IGT) and/ or hypertension.

2. Energy plan is only offered to persons diagnosed with Type 1 diabetes, Type 2 diabetes, IFG/ IGT and/or hypertension.

3. The sum insured options available are 2, 3, 5, 10, 15, 20, 25 & 50 lacs on individual sum insured basis.

4. The minimum entry age is 18 years and the maximum entry age is 65 years.

5. The waiting periods on this product are 2 years for specified illnesses, PEDs.

6. All insurance proposals would be subject to a Pre-policy Check. PPC would be done on a cashless basis.

7. This product offers 4 plans (Gold without Co-pay, Gold with 20% Co-pay, Silver without Co-pay, Silver with 20% Co-pay)

More information can be found here:
www.apollomunichinsurance.com/buyenergy/plan-details.html

- Health insurance product information -

Questions where model performed well

Question: What are the main features of this plan?

Expected answer - The main features of this plan are that it has no waiting period for diabetes and hypertension related hospitalization.

Model answer - it has no waiting period for diabetes and hypertension related hospitalization

Question: What is the no claim bonus for this product?

Expected answer - an increase in your basic sum insured by 10% for every claim free year up to a maximum of 100%.

Model answer - an increase in your basic sum insured by 10%

Question: What is the restore benefit on this product?

Expected answer - you get 100% instant addition of your basic sum insured, on your first claim.

Model answer - you get 100% instant addition of your basic sum insured

Question: Is there a tax benefit for this product?

Expected answer - The premium amount paid under this policy qualifies for a tax benefit and deduction under Section 80D of the Income Tax Act.

Model answer - The premium amount paid under this policy qualifies for a tax benefit

Question: What is the wellness program?

Expected answer - consultation, diagnostic, medicine or other health expenses will be reimbursed.

Model answer - consultation, diagnostic, medicine or other health expenses will be reimbursed

Question: What are sum insured options available?

Expected answer - 2, 3, 5, 10, 15, 20, 25 & 50 lacs on individual sum insured basis.

Model answer - 2, 3, 5, 10, 15, 20, 25 & 50 lacs

Questions where model performed poorly

Question: What is energy plan?

Expected Answer: Energy plan is a health insurance plan which not just covers your condition and complications but it also partners you in living with diabetes successfully. A health plan that truly understands diabetes.

Model Answer: a health insurance

Question: Does this product offer more plans?

Expected Answer: This product offers 4 plans (Gold without Co-pay, Gold with 20% Co-pay, Silver without Co-pay, Silver with 20% Co-pay)

Model Answer: More information can be found here www.apollomunichinsurance.com/buyenergy/plan-details.html

Question: What are the minimum and the maximum entry age?

Expected Answer: The minimum entry age is 18 years and the maximum entry age is 65 years.

Model Answer: 65 years

Improvements

As you can observe from the above examples, the drawback of the model is that the output answers are very short and terse (this has to mostly do with the structure of the training data). One way to overcome this is to select the complete sentence the model output phrase belongs to.

Question - What is energy plan?

Expected answer - Energy plan is a health insurance plan which not just covers your condition and complications but it also partners you in living with diabetes successfully. A health plan that truly understands diabetes.

Model answer - Energy plan is a health insurance which covers your pre-existing conditions and complications and is targeted towards individuals with diabetes.

Question - Does this product offer more plans?

Expected answer - This product offers 4 plans (Gold without Co-pay, Gold with 20% Co-pay, Silver without Co-pay, Silver with 20% Co-pay)

Model answer - This product offers 4 plans (Gold without Co-pay, Gold with 20% Co-pay, Silver without Co-pay, Silver with 20% Co-pay) More information can be found here (www.apollomunichinsurance.com/buyenergy/plan-details.html)

Question - What are the minimum and the maximum entry age?

Expected answer - The minimum entry age is 18 years and the maximum entry age is 65 years.

Model answer - The minimum entry age is 18 years and the maximum entry age is 65 years.

To conclude, DeepPavlov does save a significant effort and provides a good starting point in creating a simple and effective QnA bot. DeepPavlov is not yet suitable for more complex cases where the bot is required to have an active dialogue but with the rapid advancements in deep learning research, we hope to see an open-sourced end-to-end conversation assistant capable of having complex conversations with the user in the near future.

References

Thanks for Arun Ghontale for putting together this blog post!