AI Personal Learning
and practical guidance
CyberKnife Drawing Mirror

LaWGPT: Chinese legal knowledge modeling, supporting legal quizzes and judicial exam training

General Introduction

LaWGPT is an open source project supported by the Machine Learning and Data Mining Research Group of Nanjing University, which is dedicated to building a large language model based on Chinese legal knowledge. It extends the proprietary word lists in the legal domain on the basis of generalized Chinese models (e.g., Chinese-LLaMA and ChatGLM), and significantly improves the model's semantic comprehension and conversation ability in legal scenarios through large-scale legal corpus pre-training and instruction fine-tuning of legal Q&A datasets. The project is promoted by multiple collaborators and is applicable to scenarios such as legal conversations and judicial exam training. Although the model is still limited by data and capacity, and the output may be uncertain, its open source nature and community support make it an important resource for AI research in the legal field.

LaWGPT: Chinese legal knowledge modeling, supporting legal quiz and judicial exam training-1


 

Function List

  • Legal Q&A Generation: Generate accurate answers based on inputted legal questions, suitable for counseling and learning.
  • Judicial examination training: Provides Q&A training based on the China Judicial Exam dataset to help users prepare for the exam.
  • Legal Corpus Comprehension: Pre-training to be able to parse complex legal instruments and statutory content.
  • Command Line Batch Reasoning: Supports developers in batch processing of law-related data through scripts.
  • Interactive mode dialog: Interactively answer user questions in real time when no predefined data is available.
  • Model Weighting Support: LoRA weights are provided to allow the user to make customized adjustments in conjunction with the original model.

 

Using Help

Installation process

LaWGPT is a GitHub-based open source project , you need to install the environment and dependencies before use. The following are the detailed installation steps:

  1. Cloning Project Code
    Open a terminal and enter the following command to download the code locally:
git clone git@github.com:pengxiao-song/LaWGPT.git
cd LaWGPT

This will clone the LaWGPT codebase to your computer and go into the project directory.

  1. Creating a Virtual Environment
    Use Conda to create a separate Python environment and avoid dependency conflicts:
conda create -n lawgpt python=3.10 -y
conda activate lawgpt

After activating the environment, subsequent operations will be performed on the lawgpt environment to carry out.

  1. Installation of dependencies
    The program provides requirements.txt file that lists the required libraries. Run the following command to install them:
pip install -r requirements.txt

Dependencies include transformers,peft,gradio etc. to ensure that the network is free to complete the download.

  1. Getting model weights
    Since LLaMA and Chinese-LLaMA do not open source the full weights, LaWGPT only provides LoRA weights. You need to:
  • Obtain weights for Chinese-LLaMA or other base models from official sources.
  • Merge LoRA weights with the base model (see project documentation for details on how to do this).
  1. Verify Installation
    Run the sample script to confirm that the environment is correct:
bash scripts/infer.sh

If you successfully enter interactive mode, the installation is complete.

Usage

Main Functional Operations: Legal Quizzing and Reasoning

  • interactive mode
    When the test data path is not specified, run the bash scripts/infer.sh It will go into interactive mode. You can enter legal questions directly, for example:
Please explain the content of article 10 of the Contract Law of the People's Republic of China.

The model generates answers in real time and is suitable for quick consultations or learning.

  • batch inference
    To handle multiple questions, prepare a JSON file (format reference) resources/example_instruction_train.json), for example:
{"instruction": "How is property divided after a divorce?" , "output": ""}

Pass the file path into the script:

bash scripts/infer.sh --infer_data_path . /test.json

The model processes and outputs the results line by line, and the results can be saved for subsequent analysis.

Featured Feature Operation: Judicial Exam Training

  • Preparing the dataset
    LaWGPT supports training based on the Judicial Exam dataset. You can refer to Awesome Chinese Legal Resources Download the publicly available dataset, or construct your own Q&A pairs in the following format:

    {"instruction": "Which of the following is not an element of a crime?" , "output": "A. Subject of the crime B. Object of the crime C. Motive for the crime D. Objective aspects of the crime"}
    

    Save as a JSON file, e.g. exam_data.jsonThe

  • running training
    utilization finetune.py Scripts for command fine-tuning:

    python finetune.py --data_path . /exam_data.json ---base_model  --lora_weights 
    

    Parameter Description:

    • --data_path: The dataset path.
    • ---base_model: Base model paths.
    • --lora_weights: LoRA weight path.
      Once the training is complete, the model will be more adaptable to judicial exam type questions.

Web Interface Usage

  • Starting the WebUI
    Project support provides a graphical interface via Gradio. Runs:

    bash scripts/webui.sh
    

    Upon startup, the browser opens a local page (usually the http://127.0.0.1:7860).

  • workflow
    1. Enter a legal question in the input box, e.g., "How do I apply for patent protection?"
    2. Click "Submit" and wait for the model to generate a response.
    3. View the output, which can be copied or saved.
      The web interface is suitable for non-technical users and is intuitive to use.

caveat

  • hardware requirement: It is recommended to use a GPU (e.g. Tesla V100) to accelerate inference, CPU operation may be slower.
  • Model Selection: The default is to use LaWGPT-7B-alphaIf you need to beta 1.0 maybe beta 1.1The model parameters in the script need to be adjusted.
  • limitations: Models may generate inaccurate content due to data limitations, and the results need to be validated when used, especially in real legal scenarios.

With these steps, you can easily get started with LaWGPT and get efficient support whether you are conducting legal quizzes or preparing for judicial exams.

CDN1
May not be reproduced without permission:Chief AI Sharing Circle " LaWGPT: Chinese legal knowledge modeling, supporting legal quizzes and judicial exam training

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish