General Introduction
SFT-data-builder is an open-source project designed to generate high-quality SFT training data by utilizing the free Big Model API combined with users' private domain data. The tool supports a variety of AI model formats and provides one-click generation, batch generation, flexible editing and local storage functions to help users quickly generate training data suitable for pre-training, fine-tuning, function calling and other scenarios.
Function List
- Generate training data with one click: Numerous OpenAI format calls for local or cloud-based models are supported.
- Batch Generation: Generate training data from multiple different perspectives at once, support batch URL articles to generate data automatically.
- Flexible editing: All generated data can be edited and adjusted at any time.
- local storage: Automatically saves all data locally.
- Easy to export: Export standard format JSON files with one click.
- Multi-model support: Supports a wide range of mainstream AI models, which can be customized.
- Multi-format support: Support for PDF, Word, TXT and other file formats.
Using Help
Installation process
- Installation of dependencies: Run in the project directory
npm install
The - Initiation of projects: Run
npm run start
Launching the project.
Guidelines for use
- Configuring the API::
- Click on the "Open Configuration" button.
- Set the API address and key.
- Select or customize the AI model.
- Sets the number of data entries generated at a time.
- input::
- Upload files (PDF, DOCX, TXT supported).
- or enter the text content directly.
- Generate data::
- Click the "Generate AI Response" button.
- Toggle through multiple generated results and edit the generated content as needed.
- Management data::
- Add to the data list.
- Preview all generated data.
- Delete unwanted data.
- Export as a JSON file.
Functional operation flow
- Generate training data with one click::
- Select or customize the AI model.
- Enter or upload text content.
- Click the "Generate AI Response" button, the system will automatically generate training data.
- Batch Generation::
- Set the batch generation parameters on the configuration page.
- Upload a file containing multiple URLs or enter multiple URLs.
- Click the "Batch Generate" button, the system will automatically generate multiple training data.
- Flexible editing::
- On the Generated Results screen, click the data entry to be edited.
- Modify the content in the editor to save the changes.
- Local storage and export::
- All generated data is automatically saved to local storage.
- On the data management screen, select the data to be exported and click the "Export to JSON" button.