Skip to main content

Process Unstructured Files in SimplAI

Once you have created a Knowledge Base, the next step is to import data into it. For unstructured files, SimplAI provides multiple options to efficiently process and manage textual data from various sources.

Steps to Process Unstructured Data

1. Navigate to the Knowledge Base Section

  • Log in to SimplAI.
  • Go to the Knowledge Base section and select the Knowledge Base where you want to import data.
  • Ensure the selected Knowledge Base type is Document-Search.

2. Click on Import from Dataset

  • In the selected Knowledge Base, click on Import Data at the top right of the page.

    Import Data

3. Select Dataset Containing Unstructured Files

  • Choose the dataset that contains unstructured files (PDFs, DOCX, TXT, etc.).

  • You can also select multiple files for processing.

    Select Dataset

4. Select Parsing Method

Parsing is the process of extracting content from unstructured files. SimplAI offers the following parsing methods:

  • Basic Parsing: Uses standard methods to extract text from documents.
  • OCR (Optical Character Recognition): Extracts text from scanned documents or images.
  • LVM (Large Vision Model): Uses AI-powered vision models to parse complex documents.

Using LVM for Parsing

If you choose LVM, follow these steps:

  • Select the vision model to use.

  • Provide the model API key for authentication.

  • Enter specific instructions for parsing the data.

5. Preview Parsed Data

  • After parsing, you can preview the extracted text using Text Preview.
  • Ensure the parsed data meets your expectations before proceeding.

6. Chunking the Data

After parsing, the next step is chunking, which divides data into smaller, manageable segments for better processing.

Chunking Methods:

  • Automatic Chunking: Uses predefined rules to split data automatically.
  • Manual Chunking: Allows you to define custom chunking strategies.

For more details, refer to the Chunking User Guide.

7. Finalize Chunking Strategy

  • Review the chunked data preview.
  • If satisfied, click Confirm and Proceed to start the ingestion process.

8. Ingestion Process Status

Once ingestion begins, it progresses through three statuses:

  • In Queue: Waiting for processing to start.
  • Processing: Data is being parsed and chunked.
  • Completed: Ingestion is successfully finished.

9. Next Steps After Data Ingestion

After the data ingestion is completed, you can:

  • Perform Data Retrieval Testing to ensure the Knowledge Base returns relevant results.
  • Directly use the Knowledge Base in the SimplAI Studio to build AI-powered applications.

By following these steps, you can efficiently import and process unstructured data within your Knowledge Base, ensuring that it is well-organized and accessible for AI applications.