Claude's Guide to Common Use Cases: Legal Briefs

AI hands-on tutorials10mos agoupdate AI Sharing Circle

1.7K 00

This guide describes how to leverage Claude's advanced natural language processing capabilities to efficiently summarize legal documents, extract key information and accelerate legal research. With Claude, you can streamline contract review, litigation preparation and compliance, saving time and ensuring accuracy in the legal process.

Visit our Abstract Recipes , see a sample implementation of legal summarization using Claude.

Before building with Claude

Deciding whether to use Claude for legal briefs

Here are some key instructions for summarizing legal documents using an LLM like Claude:

You want to review large volumes of documents efficiently and economically

Large-scale document review by hand can be time-consuming and costly, and Claude can quickly process and summarize large volumes of legal documents, significantly reducing the time and cost required for document review. This capability is particularly valuable in tasks such as due diligence, contract analysis or litigation discovery, where efficiency is critical.

You need to automatically extract key metadata

Claude efficiently extracts and categorizes important metadata from legal documents, such as the parties involved, dates, contract terms, or specific clauses. This automated extraction can help organize information and make it easier to search, analyze and manage large document collections. It is particularly useful in contract management, compliance checking or creating searchable databases of legal information.

You want to generate clear, concise and standardized summaries

Claude generates structured summaries that follow a predetermined format, enabling legal professionals to quickly grasp the key points of various documents. These standardized summaries improve readability, facilitate comparisons between documents, and enhance overall understanding, especially when dealing with complex legal language or technical terminology.

You need to provide accurate citations for your abstracts

When creating legal summaries, proper attribution and citation is critical to ensure credibility and compliance with legal standards.Claude can be prompted to provide accurate citations for all cited points of law, making it easier for legal professionals to review and validate the summarized information.

You want to simplify and speed up the legal research process

Claude can assist with legal research by quickly analyzing large volumes of case law, statutes, and law reviews. It identifies relevant precedents, extracts key legal principles, and summarizes complex legal arguments. This capability can significantly speed up the research process, allowing legal professionals to focus on higher-level analysis and strategy development.

Determine the details you want the summary to extract

There is no single correct summary for any given document. Without clear direction, Claude may have difficulty determining what details to include. For best results, identify the specific information you wish to include in the summary.

For example, when summarizing a sublease agreement, you may want to extract the following key points:

details_to_extract = [
    '相关方（转租人、转租承租人和原出租人）',
    '物业详情（地址、描述和允许用途）', 
    '期限和租金（开始日期、结束日期、月租金和保证金）',
    '责任（公用设施、维护和维修）',
    '同意和通知（房东的同意和通知要求）',
    '特别条款（家具、停车位和转租限制）'
]

Establishment of success criteria

Assessing the quality of summaries is a notoriously challenging task. Unlike many other natural language processing tasks, the assessment of abstracts usually lacks explicit, objective metrics. The process tends to be highly subjective, and different readers may value different aspects of abstracts differently. Here's what you can expect when evaluating Claude Criteria that may need to be considered in how the legal brief is implemented.

Factual accuracy

The summary should accurately present the facts, legal concepts, and key points in the document.

Legal precision

Terminology and references to statutes, case law or regulations must be correct and comply with legal standards.

simplicity

The summary should compress the legal document into its core points without leaving out important details.

consistency

In the case of summarizing multiple documents, the Big Language Model should maintain a consistent structure and processing for each summary.

readable

The text should be clear and easy to understand. If the audience is not a legal expert, the summary should not contain legal terms that may confuse the audience.

Bias and Impartiality

Abstracts should present fair and unbiased legal arguments and positions.

Check out our guide to learn more about Establishment of success criteria The message.

How to use Claude to summarize legal documents

Selecting the right Claude model

When summarizing legal documents, the accuracy of the model is critical.Claude 3.5 Sonnet is an excellent choice for such use cases where high accuracy is required. If the size and number of documents is large, causing cost to be an issue, you can also try using a smaller model such as Claude 3 Haiku.

To help estimate these costs, here is a comparison of the costs of summarizing 1,000 sublease agreements using Sonnet and Haiku:

Scale of content
- Number of agreements: 1,000
- Characters per agreement: 300,000
- Total characters: 300M
Estimated Tokens
- Input tokens: 86M (Assumption 1) token (corresponding to 3.5 characters)
- Output tokens per abstract: 350
- Total output tokens: 350,000
Claude 3.5 Sonnet Estimated Costs
- Input token cost: 86 MTok * $3.00/MTok = $258
- Output token cost: 0.35 MTok * $15.00/MTok = $5.25
- Total cost: $258.00 + $5.25 = $263.25
Claude 3 Haiku Estimated cost
- Enter token cost: 86 MTok * $0.25/MTok = $21.50
- Output token cost: 0.35 MTok * $1.25/MTok = $0.44
- Total Cost: $21.50 + $0.44 = $21.96

Actual costs may differ from these estimates. The above estimates are based on prompting Examples in Chapters.

Convert files to a format that Claude can handle

Before you can start summarizing the document, you need to prepare the data. This involves extracting the text from the PDF, cleaning up the text, and making sure it can be processed by Claude.

Below is a demonstration of this process on a sample PDF:

from io import BytesIO
import re

import pypdf
import requests

def get_llm_text(pdf_file):
    reader = pypdf.PdfReader(pdf_file)
    text = "\n".join([page.extract_text() for page in reader.pages])

    # 去除多余的空格
    text = re.sub(r'\s+', ' ', text) 

    # 去除页码
    text = re.sub(r'\n\s*\d+\s*\n', '\n', text) 

    return text


# 从 GitHub 仓库创建完整的 URL
url = "https://raw.githubusercontent.com/anthropics/anthropic-cookbook/main/skills/summarization/data/Sample Sublease Agreement.pdf"
url = url.replace(" ", "%20")

# 下载 PDF 文件到内存中
response = requests.get(url)

# 从内存加载 PDF
pdf_file = BytesIO(response.content)

document_text = get_llm_text(pdf_file) 
print(document_text[:50000])

In this example, we first downloaded a PDF of a sublease agreement from the summarization cookbook . The agreement originates from the sec.gov website The sublease agreement that was publicized on the

We use the pypdf library to extract the contents of the PDF and convert it to text. The text data is then cleaned up by removing extra spaces and page numbers.

Building Powerful Cue Words

Claude can accommodate a variety of summarization styles. You can adjust the details of the cue words as needed to direct Claude to generate more or less detailed or concise content, to include more or less jargon, or to provide a higher or lower level of contextual summarization.

Below is an example showing how to create a cue word to ensure that the summaries generated when analyzing sublease agreements follow a consistent structure:

import anthropic

# 初始化 Anthropic 客户端
client = anthropic.Anthropic()

def summarize_document(text, details_to_extract, model="claude-3-5-sonnet-20240620", max_tokens=1000):

    # 将要提取的细节格式化为提示词上下文的一部分
    details_to_extract_str = '\n'.join(details_to_extract)
    
    # 提示模型总结转租协议
    prompt = f"""Summarize the following sublease agreement. Focus on these key aspects:

    {details_to_extract_str}

    Provide the summary in bullet points nested within the XML header for each section. For example:

    <parties involved>
    - Sublessor: [Name]
    // 根据需要添加更多细节
    </parties involved>
    
    如果文档中没有明确说明某些信息，请标注为「未说明」。不要使用前言。

    转租协议内容：
    {text}
    """

    response = client.messages.create(
        model=model,
        max_tokens=max_tokens,
        system="You are a legal analyst specializing in real estate law, known for highly accurate and detailed summaries of sublease agreements.",
        messages=[
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": "Here is the summary of the sublease agreement: <summary>"}
        ],
        stop_sequences=["</summary>"]
    )

    return response.content[0].text

sublease_summary = summarize_document(document_text, details_to_extract)
print(sublease_summary)

This code implements a summarize_document function that uses Claude to summarize the contents of a sublease agreement. The function takes as input a text string and a list of details to be extracted. In this example, we use the document_text cap (a poem) details_to_extract variable calls this function.

Inside the function, a cue word is generated for Claude that contains the document to be summarized, the details to be extracted, and specific instructions for summarizing the document. The prompt instructs Claude to return a summary of each extracted detail as a nested XML tag.

Since we decided to output each part of the summary within a tag, we can easily parse each part in a post-processing step. This approach generates structured summaries, adapts to your usage scenario, and ensures that each summary follows the same pattern.

Evaluate your cue words

Cue words often need to be tested and optimized before they can be put into production use. To determine if your solution is ready, use a systematic process combining quantitative and qualitative methods to assess the quality of the summaries. Create success criteria based on definedStrong empirical assessmentwill help you optimize your cue words. Here are some metrics you may want to include in your assessment:

ROUGE score

BLEU score

Context embedding similarity

LLM-based scoring

manual assessment

Deployment Tips

Keep the following considerations in mind when deploying your solution to a production environment.

Ensure that there is no risk of liability: Understand the potential legal implications of errors in the abstracts, which could result in legal liability for your organization or clients. Provide a disclaimer or legal statement that the abstract was generated by AI and needs to be reviewed by a legal professional.
Handles multiple document types: In this guide, we discuss how to extract text from PDF. In practice, documents may be in multiple formats (PDF, Word documents, text files, etc.). Make sure that your data extraction process converts all the file formats you may receive.
Parallel calls to Claude's API: For long documents with a large number of Tokens, it can take up to a minute for Claude to generate a digest. For large collections of documents, you may need to send API calls to Claude in parallel to ensure that the digest is completed in a reasonable amount of time. See Anthropic's speed limit to determine the maximum number of API calls that can be executed in parallel.

improve performance

In complex scenarios, in addition to the standard Tip Engineering Beyond that, it may be beneficial to consider some additional strategies to improve performance. Here are some advanced strategies:

Executive meta-summaries to summarize long documents

Legal summarization often involves working with long documents or multiple related documents, which may be outside of Claude's context window. You can use a chunking method called meta-digesting to deal with this situation. This technique involves splitting documents into smaller, manageable chunks and then processing each chunk separately. Later, you can combine the summaries from each chunk to produce a meta-summary of the entire document.

The following is an example of how to perform a meta-summary:

import anthropic

# 初始化 Anthropic 客户端
client = anthropic.Anthropic()

def chunk_text(text, chunk_size=20000):
    return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]

def summarize_long_document(text, details_to_extract, model="claude-3-5-sonnet-20240620", max_tokens=1000):

    # 格式化提取细节以放置在提示的上下文中
    details_to_extract_str = '\n'.join(details_to_extract)

    # 遍历块并分别对每个块进行摘要
    chunk_summaries = [summarize_document(chunk, details_to_extract, model=model, max_tokens=max_tokens) for chunk in chunk_text(text)]
    
    final_summary_prompt = f"""
    
    你正在查看多个相关文档的分块摘要。
    将以下来自不同可信来源的文档摘要整合成连贯的整体摘要：

    <chunked_summaries>
    {"".join(chunk_summaries)}
    </chunked_summaries>

    重点关注以下关键方面：
    {details_to_extract_str})

    将摘要以嵌套在每个部分的 XML 标头下的项目符号点的形式提供。例如：

    <parties involved>
    - 转租方：[姓名]
    // 根据需要添加更多细节
    </parties involved>
    
    如果文档中未明确说明任何信息，请注明「未指定」。不要加入前言。
    """

    response = client.messages.create(
        model=model,
        max_tokens=max_tokens,
        system="你是一名总结文档笔记的法律专家。",
        messages=[
            {"role": "user",  "content": final_summary_prompt},
            {"role": "assistant", "content": "以下是转租协议的摘要：<summary>"}

        ],
        stop_sequences=["</summary>"]
    )
    
    return response.content[0].text

long_summary = summarize_long_document(document_text, details_to_extract)
print(long_summary)

summarize_long_document function is based on the previous summarize_document function, which does this by splitting the document into smaller chunks and summarizing each chunk separately.

The code does this by setting the summarize_document The function is applied to each of the 20,000 character blocks in the original document to accomplish this. The summaries of each block are then combined to produce a final summary consisting of these block summaries.

Note that for our example PDF, thesummarize_long_document function is not strictly necessary because the entire document can fit into Claude's context window. However, this approach is critical when the document exceeds Claude's context window or when multiple related documents need to be summarized. In any case, this meta-summarization technique can often capture more important details in the final summary that were missed in earlier single-summarization methods.

Explore large numbers of documents using summary indexed documents

Searching document collections using Large Language Models (LLMs) typically involves Retrieval Augmentation Generation (RAG). However, in scenarios involving large documents or where precise information retrieval is critical, the basic RAG method may be insufficient. Digest indexing documents is an advanced RAG approach that provides a more efficient way of ranking documents for retrieval, using less context than traditional RAG methods. In this method, Claude is used to first generate a concise summary for each document in the corpus, and then Clade is used to rank the relevance of each summary to the query. For more details on this approach, including a code-based example, check out the summarization cookbook The summary index document section in the

Fine-tuning Claude to learn your dataset

Another advanced technique for improving Claude's ability to generate summaries is fine-tuning. Fine-tuning involves training Claude on a customized dataset that is highly aligned with your legal summarization needs, ensuring that it adapts to your usage scenario. The following is an overview of performing fine-tuning:

Misidentification: Begin by collecting examples of Claude summaries that do not meet the requirements - this may include omitting key legal details, misunderstanding the context, or using inappropriate legal terminology.
Preparation of data sets: Once these issues are identified, compile a dataset containing examples of these issues. This dataset should include the original legal documents as well as your corrected summaries to ensure that Claude learns the desired behaviors.
Implementation of fine-tuning: Fine-tuning involves retraining the model on the dataset you've collated to adjust its weights and parameters. This retraining helps Claude better understand the specific requirements of your area of law and improves its ability to summarize documents according to your criteria.
Iterative Improvement: Fine-tuning is not a one-time process. As Claude continues to generate summaries, you can iteratively add new underperforming examples to further refine its capabilities. Over time, this ongoing feedback loop will produce a highly specialized model dedicated to your legal summarization tasks.

Fine Tuning is currently only available through Amazon Bedrock. For more details, see AWS Publishing BlogThe