AI Personal Learning
and practical guidance
Beanbag Marscode1

Foudinge Scrub: Building a Knowledge Graph from Restaurant Reviews

General Introduction

Foudinge Scrub is an open source web tool hosted on GitHub and created by developer Théophile Cantelobre. It is designed to help users clean and edit knowledge graph entities extracted from complex textual data, specifically targeting data crawled from restaurant review site LeFooding.com. Built using the Flask framework and pure JavaScript, the tool supports features such as full-text search for users who need to deal with duplicate data or coding issues. By incorporating structured generation techniques from the Large Language Model (LLM), Foudinge Scrub provides an intuitive interface that allows users to efficiently optimize extraction results while maintaining the structural integrity of the data. The project code and related resources are publicly available on GitHub for developers to reuse or improve.

Foudinge Scrub: Building a Knowledge Graph from Restaurant Reviews-1


 

Function List

  • Data cleansing and de-duplication:: Recognize and repair duplicate entities or error items extracted from text data.
  • Full text search: Support for quick searching for specific entities or keywords in the editing interface.
  • Structured editing:: Provide a visual interface to manually adjust entities in the knowledge graph while keeping the data structure consistent.
  • Coding issues fixed: Resolve character encoding errors due to SQLite or other reasons.
  • Open Source Support: The project code is publicly available and users can download, modify or contribute code.

 

Using Help

Acquisition and Installation

Foudinge Scrub is an open source project based on GitHub, users need to download the code first and run it locally. The following is the detailed installation process:

1. Pre-conditions

  • operating system: Windows, MacOS, or Linux.
  • software dependency: Requires Python 3.7+, Git, and a code editor (such as VS Code).
  • network environment: Ensure that you have access to GitHub and install the PyPIs needed for your dependencies.

2. Downloading the project

  • Open a terminal or command line tool.
  • Enter the following command to clone the repository:
    git clone https://github.com/theophilec/foudinge-scrub.git
  • Go to the project catalog:
    cd foudinge-scrub
    

3. Installation of dependencies

  • The project is based on Flask and JavaScript development and requires a Python dependency to be installed. Run the following command:
    pip install -r requirements.txt
    
  • in the event that requirements.txt Documentation is not provided, core dependencies can be installed manually:
    pip install flask
    
  • The JavaScript part uses Jinja templates, which do not require additional installation, but make sure you have a modern browser (e.g. Chrome, Firefox) locally.

4. Running the application

  • Run the Flask application in the project root directory:
    python app.py
    
  • After successful startup, the terminal will display something like Running on http://127.0.0.1:5000/ The Tip.
  • Open your browser and type http://127.0.0.1:5000/To access the Foudinge Scrub interface, click here.

5. Troubleshooting

  • should we encounter ModuleNotFoundError, check for missing dependency installations.
  • If the port is occupied, modify the app.py port number in the 5000 change into 5001The

Main function operation flow

Data cleansing and de-duplication

  1. Prepare data: Foudinge Scrub processes restaurant review data from LeFooding.com by default. For customized data, please refer to theophilec/foudinge crawl code in the repository (using SQLite, asyncio, and aiohttp) to generate compatible knowledge graph files.
  2. Import data: Places the data file in the specified directory of the project (usually the root directory or the path specified by the configuration file).
  3. Initiate cleanup:: When the web interface is opened, the system automatically loads the data and displays a visual mapping. Duplicate or erroneous entities are highlighted or flagged.
  4. manual adjustment: Click on the duplicate entity, select "Merge" or "Delete", confirm and save the changes.
  5. Validation of results: After cleaning, the atlas is updated in real time to ensure that there are no errors of omission.

Full text search

  1. Enter Search Mode: Find the search box at the top of the interface (usually an input field next to a magnifying glass icon).
  2. Enter keywords: Enter the name of the entity to be looked up (e.g., restaurant name, person's name) or a keyword.
  3. View Results: The system will list the matches and click to jump to the corresponding entity location.
  4. Advanced Usage:: Supports fuzzy searches, e.g. typing "Gren" matches "Grenat".

Structured editing

  1. Open the editing screen:: In the graph view, click on the node that needs to be edited (e.g., the "Chef" field for a restaurant).
  2. Content of the modification: Enter the new value in the pop-up edit box, e.g. change the name of the restaurant before "Neil Mahatsry" from "La Brasserie Communale" to something else.
  3. Save Changes:: Click the "Save" button, the system will verify the data format to ensure a consistent structure.
  4. Undo:: If you made a mistake, you can click the "Undo" button to restore the previous status.

Coding issues fixed

  1. Recognizing the problem:: If the interface is garbled (e.g. "Antoine Joannier" becomes "Antoine Joanniér"), there is a coding error.
  2. auto-repair: Select "Fix encoding" in the Setup menu and the system will try to standardize UTF-8 or other encoding formats.
  3. manual input: If the automatic fix fails, manually edit the garbled field and enter the correct characters.

Featured Functions

Knowledge Graph Optimization in conjunction with LLM

The core feature of Foudinge Scrub is the use of large language models (LLMs) to generate structured data that can be further optimized through manual editing. For example, when extracting "Antoine Joannier worked at La Brasserie Communale before working at Grenat" from a restaurant review, the LLM generates JSON:

{
"Person": {
"name": "Antoine Joannier",
"role": "Host".
"previous_restaurants": ["La Brasserie Communale"]
}
}

You can adjust this structure in the interface, for example, by adding a new field "current_restaurant" and filling it with "Grenat", as follows:

  1. Check the nodes for JSON display.
  2. Click "Add Field" and enter the key-value pairs.
  3. When saved, the mapping is updated and reflects the new relationship.

Open Source Collaboration

  • Contribute code: Users can fork the repository, modify the code, and submit a pull request, for example to add a new search algorithm or optimize the interface.
  • View Document: The README file in the root directory of the project provides basic instructions, for detailed code logic refer to the app.py and JavaScript files.

Recommendations for use

  • initial use: Run the sample data first to familiarize yourself with the interface layout and operation logic.
  • Large-scale data: If dealing with a large number of comments, it is recommended to import them in batches to avoid browser lag.
  • Community Support: Ask a question on the GitHub Issues page, developers or the community may be able to help.

With these steps, users can quickly get started with Foudinge Scrub and efficiently complete data cleaning and knowledge graph optimization tasks.

CDN1
May not be reproduced without permission:Chief AI Sharing Circle " Foudinge Scrub: Building a Knowledge Graph from Restaurant Reviews

Chief AI Sharing Circle

Chief AI Sharing Circle specializes in AI learning, providing comprehensive AI learning content, AI tools and hands-on guidance. Our goal is to help users master AI technology and explore the unlimited potential of AI together through high-quality content and practical experience sharing. Whether you are an AI beginner or a senior expert, this is the ideal place for you to gain knowledge, improve your skills and realize innovation.

Contact Us
en_USEnglish