Stanford CS224n Natural Language Processing Course10

Course 10 - Question Answering

Motivation/History

With massive collections of full-text documents, return relevant documents.

2 parts

  1. find documents that might contain an answer
  2. find an answer in a paragraph or a document

MCTest Reading Comprehension: Passage+Question=Answer

image-20200330135135377.png

The SQuAD dataset

Evalution

  • Systems are scored on two metrics

    • exact match
    • f1: Precision = tp/(tp+fp), Recall = tp/(tp+tn), F1=2PR/(P+R) - taken as primary

    Both metrics ignore punctuation and articles (a, an, the only)

Limiations

  • Only span-based answers
  • Questions were constructed looking at passages
  • Barely any multi-facts/sentence inference beyonce coreference

But still, well-targeted, well-structured, clean dataset

The Stanford Attentive Reader model

BiDAF

image-20200331105339872.png

central idea: the Attention Flow layer

Idea: attention should flow both ways - from the context to the question and from the question to the context

Make the similarity matrix:

Context-to-Question attention:

Attention Flow Idea: attention should flow both ways

Question-to-Context attention:

Recent, more advanced architectures

FusionNet

ELMo and BERT preview