LaWGPT: Chinese legal knowledge modeling, supporting legal quizzes and judicial exam training

Latest AI Resources5mos agorelease AI Sharing Circle

General Introduction

LaWGPT is an open source project supported by the Machine Learning and Data Mining Research Group of Nanjing University, which is dedicated to building a large language model based on Chinese legal knowledge. It extends the proprietary word lists in the legal domain on the basis of generalized Chinese models (e.g., Chinese-LLaMA and ChatGLM), and significantly improves the model's semantic comprehension and conversation ability in legal scenarios through large-scale legal corpus pre-training and instruction fine-tuning of legal Q&A datasets. The project is promoted by multiple collaborators and is applicable to scenarios such as legal conversations and judicial exam training. Although the model is still limited by data and capacity, and the output may be uncertain, its open source nature and community support make it an important resource for AI research in the legal field.

Function List

Legal Q&A Generation: Generate accurate answers based on inputted legal questions, suitable for counseling and learning.
Judicial examination training: Provides Q&A training based on the China Judicial Exam dataset to help users prepare for the exam.
Legal Corpus Comprehension: Pre-training to be able to parse complex legal instruments and statutory content.
Command Line Batch Reasoning: Supports developers in batch processing of law-related data through scripts.
Interactive mode dialog: Interactively answer user questions in real time when no predefined data is available.
Model Weighting Support: LoRA weights are provided to allow the user to make customized adjustments in conjunction with the original model.

Using Help

Installation process

LaWGPT is a GitHub-based open source project , you need to install the environment and dependencies before use. The following are the detailed installation steps:

Cloning Project Code
Open a terminal and enter the following command to download the code locally:

git clone git@github.com:pengxiao-song/LaWGPT.git
cd LaWGPT

This will clone the LaWGPT codebase to your computer and go into the project directory.

Creating a Virtual Environment
Use Conda to create a separate Python environment and avoid dependency conflicts:

conda create -n lawgpt python=3.10 -y
conda activate lawgpt

After activating the environment, subsequent operations will be performed on the lawgpt environment to carry out.

Installation of dependencies
The program provides requirements.txt file that lists the required libraries. Run the following command to install them:

pip install -r requirements.txt

Dependencies include transformers,peft,gradio etc. to ensure that the network is free to complete the download.

Getting model weights
Since LLaMA and Chinese-LLaMA do not open source the full weights, LaWGPT only provides LoRA weights. You need to:

Obtain weights for Chinese-LLaMA or other base models from official sources.
Merge LoRA weights with the base model (see project documentation for details on how to do this).

Verify Installation
Run the sample script to confirm that the environment is correct:

bash scripts/infer.sh

If you successfully enter interactive mode, the installation is complete.

Usage

Main Functional Operations: Legal Quizzing and Reasoning

interactive mode
When the test data path is not specified, run the bash scripts/infer.sh It will go into interactive mode. You can enter legal questions directly, for example:

请解释《中华人民共和国合同法》第十条的内容。

The model generates answers in real time and is suitable for quick consultations or learning.

batch inference
To handle multiple questions, prepare a JSON file (format reference) resources/example_instruction_train.json), for example:

{"instruction": "离婚后财产如何分割？", "output": ""}

Pass the file path into the script:

bash scripts/infer.sh --infer_data_path ./test.json

The model processes and outputs the results line by line, and the results can be saved for subsequent analysis.

Featured Feature Operation: Judicial Exam Training

Preparing the dataset
LaWGPT supports training based on the Judicial Exam dataset. You can refer to Awesome Chinese Legal Resources Download the publicly available dataset, or construct your own Q&A pairs in the following format:
```
{"instruction": "下列哪项不属于犯罪构成要件？", "output": "A. 犯罪主体 B. 犯罪客体 C. 犯罪动机 D. 犯罪客观方面"}
```
Save as a JSON file, e.g. exam_data.jsonThe
running training
utilization finetune.py Scripts for command fine-tuning:
```
python finetune.py --data_path ./exam_data.json --base_model <path_to_base_model> --lora_weights <path_to_lora>
```
Parameter Description:
- --data_path: The dataset path.
- --base_model: Base model paths.
- --lora_weights: LoRA weight path.
  Once the training is complete, the model will be more adaptable to judicial exam type questions.

Web Interface Usage

Starting the WebUI
Project support provides a graphical interface via Gradio. Runs:
```
bash scripts/webui.sh
```
Upon startup, the browser opens a local page (usually the http://127.0.0.1:7860).
workflow
1. Enter a legal question in the input box, e.g., "How do I apply for patent protection?"
2. Click "Submit" and wait for the model to generate a response.
3. View the output, which can be copied or saved.
  The web interface is suitable for non-technical users and is intuitive to use.

caveat

hardware requirement: It is recommended to use a GPU (e.g. Tesla V100) to accelerate inference, CPU operation may be slower.
Model Selection: The default is to use LaWGPT-7B-alphaIf you need to beta1.0 maybe beta1.1The model parameters in the script need to be adjusted.
limitations: Models may generate inaccurate content due to data limitations, and the results need to be validated when used, especially in real legal scenarios.

With these steps, you can easily get started with LaWGPT and get efficient support whether you are conducting legal quizzes or preparing for judicial exams.