Knowledge Base
Process your uploaded datasets using SimplAI’s Knowledge Base section, employing various techniques like chunking, vector database, embedding, re-ranking, and retrieval testing.
What is Knowledge Base Processing?
Knowledge base processing involves organizing and refining your data to make it useful and accessible to your tools and agents. This includes breaking down data into manageable chunks, embedding data for efficient retrieval, storing the data into vector database, and testing retrieval performance.
Create a Knowledge Base
1. Navigate to the Knowledge Base Page:
-
Log in to SimplAI.
-
From the main dashboard, go to the
Knowledge Base
section.
2. Create a New Knowledge base
- Go to
+ Create Knowledge Base
on the top right section of the page.
-
Provide a name and description to the Knowledge Base
3. Selecting KB type
In the Create Knowledge Base
section, choose the KB type
based on whether your data is Unstructured or Structured. This choice will determine how your data is processed and stored.
4. Default Settings
SimplAI provides default settings with predefined parameters to simplify the setup process.
- Default embedding model and its pre-defined parameters are available for quick setup.
- SimplAI offers a default vector database selection for optimized performance.
This default setting is optimized for general use cases but can be customized if needed.
You can use this default option or you can use Custom Knowledge Base Settings based on your requirements.
5. Custom Settings
For more control over your knowledge base processing, SimplAI allows customization of various components, including the vector database, embedding model, and re-ranking parameters.
Custom Settings Overview:
- Vector Database:
- In the custom setting in
Create Knowledge Base
section, selectVector DB
. - Choose from available Vector Databases based on performance, scalability, and your specific requirements.
- In the custom setting in
- Custom Embedding Model:
- In the
Create Knowledge Base
section, selectDeployed Embedding Model
. - Choose the embedding model that best fits your data type and use case (e.g., BERT, GPT-3).
- Configure parameters such as similarity threshold and the number of top K results.
- Adjust the settings to fine-tune the embedding process for improved accuracy and relevance.
- In the
- Re-Ranking:
- Access the
Re-Ranking
tool within theKnowledge Base
section. - Choose the re-ranking model that aligns with your data retrieval goals.
- Define parameters such as the number of top K results to re-rank.
- Access the
Benefits of Custom Settings:
- Tailored Performance: Customize settings to match the specific needs of your data and use cases.
- Enhanced Control: Greater flexibility in configuring how data is processed, stored, and retrieved.
- Optimized Relevance: Fine-tuning parameters ensures that the most relevant data is prioritized, improving the overall effectiveness of your knowledge base.
6. Import file from Dataset
- To Import a file from the
Dataset
, go to theImport from Dataset
section at the top right section of the page.
-
Select the
Dataset
you want to import for your Knowledge Base.
7. Parsing
Parsing helps in extracting and structuring data efficiently.
- Basic Parsing:
- Access the
Basic Parsing
tool within theKnowledge Base
section. - This uses predefined parsers to extract common data types and structures from your dataset.
- Access the
- Advanced Parsing:
- Navigate to
Advanced Parsing
in theKnowledge Base
section. - Apply the advanced parser to extract all the data types and structures from your dataset.
- Navigate to
For more detail, refer to Parsing
8. Chunk Setting
Chunking is the process of dividing your data into smaller, manageable pieces.
- Access Chunking Tools:
- Navigate to the
Knowledge Base
section. - Select
Chunking Setting
.
- Navigate to the
- Choose a Chunking Method
-
Automatic Chunking
Select automatic chunking to let SimplAI automatically divide your data based on predefined settings.
-
Manual Chunking
Choose manual chunking to customize how your data is divided. Select from different splitter types and configure parameters to achieve the best chunking for your use case.
- Recursive character text splitter: Recursive Character Text Splitter break down text into smaller chunks by recursively attempting to split it using different separators.
- Sentense splitter: Sentence Splitter function divides text into smaller chunks, with each chunk containing a certain number of sentences, ensuring that each chunk maintains complete sentences.
- Token splitter:
- Semantic splitter: Semantic Splitter divides a piece of text into groups of sentences based on their semantic similarity, ensuring that each group contains sentences that are closely related in meaning.
- Markdown splitter: Markdown Splitter divides a piece of text based on based on the granularity of the text.
For the detailed guide about chunking, refer to Chunking Strategy.
-
- Testing Chunks
-
On the right side of the screen, you can test the chunked data in the
Chunk Preview
. -
Review which chunks have been split from the original data to ensure they meet your needs.
-
Retrieval Testing
Test the performance of your knowledge base to ensure it returns relevant results.
1. Configure Retrieval Tests
Go to Retrieval Testing
in your Knowledge Base.
2. Run Tests
-
Execute retrieval tests to see how well your knowledge base returns relevant results.
-
Analyze test outcomes and refine your processing steps if necessary.
For more details, refer to retrieval testing
By following these steps, you can efficiently process and manage your knowledge base within SimplAI, ensuring high-quality data retrieval and improved AI application performance.