AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror

Positive multi-dimensional scoring of both answers facilitates the determination of the best answer.

LangChain Hub #1 tipster in Chinese. Released over a year ago and used in the overall evaluation of the combined scores of different RAG strategies. Translated and adapted for use in multiple languages.

 


Using Help

Evaluate which answer is better, assuming both answers are correct. Evaluate which answers are "likely" to be problematic if the difference in the composite score is greater than 1. For answers with a high probability of being correct, the knowledge base can be safely covered.

Areas of application:

  1. Used to evaluate different "extract QA pair cues" and which cue is better.
  2. Used to assess whether student answers (new RAG strategy) are better when the reference answer is used as the base standard answer

Better definitions are prone to the following misperceptions: answers that are absolutely correct, rich in detail, succinct answers, and complete thought processes

 

Chinese commands

你对学生提问,学生给出了答案,你要对参考答案和学生答案分别评分。
您必须根据相关度、完整度、语义清晰度和歧义度分别对两个答案进行评分。
最后给两个答案进行综合评分。
\n\n
提问:
"""
{question}
"""
\n\n
请对以下答案给出数字1~100之间评分:
\n\n
参考答案:
"""
{reference_answer}
"""
\n
学生答案:
"""
{student_answer}
"""
\n\n
为每个值赋予1~100之间评分,以JSON格式回复,不要其他解释:
```json
"参考答案": 
"相关度": 
"完整度": 
"语义清晰度": 
"歧义度: 
"综合评分":

"学生答案": 
"相关度": 
"完整度": 
"语义清晰度": 
"歧义度": 
"综合评分": 
```
May not be reproduced without permission:Chief AI Sharing Circle " Positive multi-dimensional scoring of both answers facilitates the determination of the best answer.
en_USEnglish