LangChain Hub #1 tipster in Chinese. Released over a year ago and used in the overall evaluation of the combined scores of different RAG strategies. Translated and adapted for use in multiple languages.
Using Help
Evaluate which answer is better, assuming both answers are correct. Evaluate which answers are "likely" to be problematic if the difference in the composite score is greater than 1. For answers with a high probability of being correct, the knowledge base can be safely covered.
Areas of application:
- Used to evaluate different "extract QA pair cues" and which cue is better.
- Used to assess whether student answers (new RAG strategy) are better when the reference answer is used as the base standard answer
Better definitions are prone to the following misperceptions: answers that are absolutely correct, rich in detail, succinct answers, and complete thought processes
Chinese commands
You ask the student a question, the student gives an answer, and you have to grade the reference answer and the student's answer separately. You have to grade both answers separately on the basis of relevance, completeness, semantic clarity and ambiguity. Finally give a combined score for both answers. \n\n Ask a question: """ {question} """ \n\n Please rate the following answers by giving a number between 1 and 100: \n\n Reference answer: """ {reference_answer} """ \n Student answer: """ {student_answer} """ \n\n Assign a rating between 1 and 100 to each value, reply in JSON format,no other explanations: ðŸ "ðŸ "ðŸ "ðŸ "ðŸ "json "reference_answer". "relevance". "completeness". "Semantic Clarity". "Ambiguity". "Overall rating". "Student Answers". "Relevance". "Completeness". "Semantic Clarity". "Ambiguity". "Overall rating". ``