General Introduction
UnDatas.IO is a platform focused on parsing and processing unstructured data. It utilizes advanced technology to automatically recognize document layouts and categorize tables, images, formulas and text, greatly simplifying the data processing process. The platform not only saves a lot of time in organizing data, but also helps users extract valuable insights from data and make more strategic decisions. UnDatas.IO provides powerful data support for academic research, business analysis and technology development.
Function List
- Automatic recognition of document layout
- Categorize tables, images, formulas and text
- Data extraction and conversion
- Supports multiple data formats
- Integration with large-scale language models for enhanced data processing capabilities
- Provide API interface for developers' convenience
Using Help
Installation process
- Visit the official UnDatas.IO website to register and get your API key.
- Install the UnDatas.IO Python API library:
pip install undatasio
- Install the OpenAI Python SDK:
pip install openai
- Configure environment variables to save the API key:
import os
os.environ['UNDATASIO_API_KEY'] = 'your_api_key'
os.environ['OPENAI_API_KEY'] = 'your_openai_api_key'
Usage Process
- Import the UnDatas.IO library and initialize it:
from undatasio.undatasio import UnDatasIO
undatasio_obj = UnDatasIO(os.getenv('UNDATASIO_API_KEY'))
- utilization
get_result_type
method to extract the data type:
result_type = undatasio_obj.get_result_type('your_document')
- utilization
show_version
method to view version information:
version_info = undatasio_obj.show_version()
Main Functions
- Automatic recognition of document layout: After uploading a document, the platform automatically recognizes and categorizes the tables, images, formulas and text in the document.
- Data extraction and conversion: The required data formats can be easily extracted and converted through API interfaces.
- Integration with large-scale language models: Enhance data processing and analysis capabilities with OpenAI's large-scale language models. For example, mathematical problems can be solved using the Qwen-max model:
from openai import OpenAI
openai_obj = OpenAI(os.getenv('OPENAI_API_KEY'))
response = openai_obj.Completion.create(
model="qwen2.5-math-72b-instruct", prompt="Solve the following math problem.
prompt="Solve the following math problem: ..." ,
max_tokens=100
)
print(response.choices[0].text)
Detailed Operation Procedure
- Data upload: Upload the documents to be parsed to UnDatas.IO through the platform's upload interface.
- Data classification: The platform automatically recognizes the different elements of a document and categorizes them for display.
- data extraction: Use the API interface to extract the desired data type, e.g., table data, image data, etc.
- data conversion: Convert the extracted data into the desired format for subsequent analysis and processing, as required.
- data analysis: Utilize the analytics tools provided by the platform to analyze data and extract valuable insights.
- Result Output: Export analysis results to reports or other formats for easy sharing and use.
With the above steps, users can easily get started with UnDatas.IO for unstructured data parsing and processing, improving data processing efficiency and saving time and effort.