a factor (leading an effect)
The Chief AI Sharing Circle has compiled a large number of "practical commands" and various "AI tools", which can be found on the website by enteringbywordMatching for searching could not find the exact resource needed. The website is full of excellentGenerate Video ToolsThe inability to be found is intolerable.
Lacking the ability to develop a website, we can rely on external functions for searching:
Relying on the search engine to use the "site search" method to solve the problem seems to be a bit cumbersome, and the content is not included in the full:
Of course, I don't have the ability to directly convert website content to semantic search and provide a good interface to use it, so the problem centers around:
How to convert website content into an easily retrievable knowledge base.
content analysis
AI tools and instructions for their use, in the header area, essentially describes clearly the characteristics of their content, while the content area, although presented in more detail, can appear to bedisruptionstext, which affects the quality of retrieval. Also, there are images in the content that I would like to try to provide readers with a preview of.
Examples of AI tool content
Example of using command contents
Thinking about search strategies
1. Title and content are mixed as a whole paragraph to participate in semantic retrieval
Pros: complete content
Cons: Too much content leads to imprecise searches
2. Retrieve only the title and then cite the content knowledge according to the title
Advantage: accurate search
Cons: Reduced effective search scope
3. Input the title and content to the big model to split into QA pairs
Benefits: Greatly improves effective search coverage
Disadvantages: Higher processing costs and time costs; important content and structure of the original text will be lost
PS: do not need any development experience, you can deploy the DIFY project to batch generate QA pairs, not demonstrated here.
4. Knowledge mapping
Content not suitable, ignore.
I'm going to rely on free and open platforms for editing intelligences, which also don't support knowledge graphs.
Selective retrieval2 is simple and efficient. Although the effective retrieval range is reduced, it can be incrementally optimized through continuous iteration.
The content subject doesn't really need to be involved in the retrieval either, as long as it follows the semantics to retrieve thecaptionThis reduces the number of exceptions generated by the large model when dealing with long contexts, and returning the URL allows for more complete reading.
Search Tool Carrier
Which three-way platform is used to implement semantic search?
There are many free platforms on the market that support knowledge bases, such as Metabase, Smart Spectrum, Buckle, and Wenshin. Here I'm going to choose the platform that supports importing QA pairs for retrieval.
Retrieve QA pairs: return the answer B corresponding to question A by retrieving question A back to the big model, and use B as reference content to answer the user's question.
Which platform is better, which semantic understanding is better, is not considered here, their basic performance is basically considered up to standard.
Where do users use it?
The main push is public, so it allows users to search in public.
Smart Spectrum is good, but I choose Wenxin Intelligent Body, which has clearer operational instructions when dealing with QA rules. At the same time, Wenxin Intelligent Body can be published to Baidu for customer acquisition. Recommended Reading:Killer traffic portal: using AI intelligent body to get external traffic for websites and public numbers in the long run
Operation Tutorial
1. Export XML files from WordPress
2.XML converted to MD format
2.1 Click here to downloadblog2md project(math.) genusUnzip to directory D:\222\blog2md
2.2 Click the right mouse button at the beginning of the blog2md directory to open the SHELL terminal.
2.3 Most likely you need to install the dependency, enter the following command
Installation command: npm install xml2js Verify the command: npm list xml2js
2.4 Name the exported XML file 111.xml, place it in the D:\222\blog2md directory, and execute the following command
node index.js w 111.xml out
2.5 At this point, the directory D:\222\blog2md\out is generated, and you can verify whether the generated content is correct after entering it.
3.MD Convert EXCEL Format
The md content grid is structured so it's good to extract, here I write a regular in chatgpt and execute it in python.
I want to extract: filename (the filename is the URL, e.g. https://www.aisharenet.com/anse/), title, content area (--- the content below)
3.1 After executing the python script, the output.xlsx file is generated in the current directory.
Script content:
Save the script file with a random name: 111.py and put the script in any directory, here I put it in D:\222\blog2md.
Execute from the command line (the default command line cannot execute 111.pt directly, you must add the . \ prefix)
. 1.py
The script file code is as follows, please save it as 111.py (generated by CHATGPT)
Directory to read md files: folder_path = "D:\\222\\blog2md\\out"
Generate EXCEL in current directory: output_file = "output.xlsx"
3.2 Organize output.xlsx as a knowledge base to be uploaded
Here only the title is kept and the full URL is spliced out.
4. Knowledge base uploaded by the Manxim smart body
4.1 Accessing the Wenshin Intelligence Body and uploading the knowledge base
4.2 Uploading EXCEL files
4.3 Customized search columns (this is the reason for using Wenshin Intelligence, other tools lack this interface)
For more tips on organizing your knowledge base read on:Wenxin Intelligent Body Tutorial: (4) Processing Documents and Synchronizing to the Knowledge Base
5. Create intelligences and publish them for use
5.1 Creating Intelligentsia
Let's simply configure it here without getting bogged down in the specifics. Start creating the smart body...
You can try to use low-code mode to create intelligent bodies, add multiple knowledge base judgment logic, after all, the site has many channels well, here I will not demonstrate, interested in low-code friends can read:Wenxin Intelligent Body Tutorial: (V) Organize Intelligent Body Workflow
5.2 Configuring Intelligentsia
Turn off non-Knowledge Base functions to avoid anomalies, and I'll leave the other settings at default without fine-tuning.
The hit rate of the recalled knowledge base should be tested briefly, otherwise it is easy to match irrelevant content.
5.3 Debugging and Previewing the Output
5.4 Publishing Intelligentsia
ultimate
In the end, you get a smart body that can quickly look up AI tools in the public, all for free! Also, based on the Wenxin Smartbody distribution channel ( Wenxin Intelligent Body Platform: Intelligent Body Applications Built on Complete Distribution Channels and Commercial Closures ), this tool will be released to the Baidu home page to provide users with access.