G-EVAL: More-human-like NLG Evaluation
Evaluating quality text in the field of Natural Language Generation (NLG) has always been challenging. Assessing text quality to align with human intuition becomes particularly complex for creative or open-ended tasks. Traditional metrics like BLEU and ROUGE are useful for quantifying performance...