ROUGE

ROUGE (Recall-Oriented Understudy for Gisting Evaluation)는 텍스트 자동 요약, 기계 번역 등 자연어 생성 모델의 성능을 평가하기 위한 지표이다.

ROUGE는 일반적으로 n-gram recall을 기준으로 평가한 metric으로 보면 된다. 간단하게 요약을 하면 BLEU의 recall 버전이라고 볼 수 있다. (실제로는 다르다) 근데 상황에 따라서 recall, precision, F1 score로 다 표현하기 때문에 ROUGE-N recall/precision/f1 score 식으로 표기한다.

일단 N-gram에 대한 ROUGE-N의 식은 다음과 같다.

$$ROUGE-N = \cfrac{\text{Number of overlapped n-gram}}{\text{Total words in reference summary}}$$

위의 식은 reference summary(정답 summary)와 생성된 summary사이의 n-gram recall을 의미합니다.

예제

ROUGE-2를 이용해서 Recall score를 계산하는 예제이다.

System summary:

the cat was found under the bed

Reference summary:

the cat was under the bed

System summary (bigrams):

the cat, cat was, was found, found under, under the, the bed

Reference summary (bigrams):

the cat, cat was, was under, under the, the bed

예시로부터 도출된 bigram들을 사용해 계산한 ROUGE-2의 Recall 점수는 다음과 같다.

$$ROUGE-2 = \cfrac{\text{Number of overlapped bigram}}{\text{Total words in reference summary}} = \cfrac{4}{5}$$

다른 종류의 ROUGE

ROUGE는 단순히 n-gram을 이용한 지표말고도 여러 종류가 존재한다. 정리하면 아래와 같이 있다.

ROUGE-L

LCS 기법을 이용해서 최장 길이로 매칭되는 문자열의 길이를 측정해서 score를 내는 방식이다. 즉, 가장 긴 Sequence의 recall을 구한다. Sequence는 이어지지 않아도 된다.

Reference: police killed the gunman

System-1: police kill the gunman

System-2: the gunman kill police

$$ROUGE-L_{system-1} = \cfrac{3}{4} \ (\text{"police the gunman"})$$

$$ROUGE-L_{system-2} = \cfrac{2}{4} \ (\text{"police the gunman"})$$

ROUGE-S

특정 Window size가 주어졌을 때, Window size 내에 위치하는 단어쌍들을 묶어 해당 단어쌍들이 얼마나 중복되게 나타나는 지를 측정한다. 때문에 해당 기법을 Skip-gram Co-ocurrence 기법이라 부르기도 한다.

Reference: cat in the hat

위와 같은 문장에서 Skip-bigram은 “cat in”, “cat the”, “cat hat”, “in the”, “in hat”, “the hat”이 된다. 이 Skip-bigram을 가지고 recall을 구하면 된다.

'AI' 카테고리의 다른 글

CutMix (0)	2022.01.27
Cross Entropy를 사용하는 이유 (0)	2021.11.12
Backpropagation (0)	2021.11.03
[Metric] Recall과 Precision (0)	2021.10.03
[Metric] BLEU (Bilingual Evaluation Understudy) (0)	2021.10.02

그냥 블로그

[Metric] ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

ROUGE

예제

다른 종류의 ROUGE

ROUGE-L

ROUGE-S

'AI' 카테고리의 다른 글

티스토리툴바

[Metric] ROUGE (Recall-Oriented Understudy for Gisting Evaluation)

ROUGE

예제

다른 종류의 ROUGE

ROUGE-L

ROUGE-S

'AI' 카테고리의 다른 글

관련글

티스토리툴바