Data Sources

Data Sources, as the name suggests, define sources of information that can be indexed and injected into prompts.

Create New Data Sources

To view the list of data sources or create a new data source, click on “Data Sources” under “Knowledge” in the left side menu.

Data Source List

Click “Create” in the top Nav bar to create a new function.

New Data Source Form

To edit a source, click the “Edit” action next to the source item in the list.

Semantic Function Form

Data Source Types

The following source types are currently supported.

API
Extract data from a general API.
Document
Extract data from a document. Document types include CSV, Word, PDF, Text, Powerpoint, Excel, EPUB, HTML, Outlook Item Message Format, Markdown, XML, JSON, and Rich Text Format (RTF).
Feature Store
Lookup data from an Online Feature Store with current support for Feast, Vertex, and Anaml.
SQL
Relational database tables.
Web Crawler
Crawl a web site to extract data given a Scraping Spec.

Scraping Spec

To scrape specific items from a website crawl, you can define a Scraping Spec. A Scraping Spec is a type of schema document that describes the structure of the information you want and the selectors for each item. A complex nested structure can be defined, for example:

Scraping Spec

which is a visual way of defining the following:

{
  "type": "object",
  "properties": {
    "product_title": {
      "type": "string",
      "title": "[data-content-id=product-title]"
    },
    "product_description": {
      "type": "string",
      "title": "[data-content-id=product-short-description]"
    },
    "product_features": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "title": ".EE,[class^=styles_assistiveText]"
    },
    "stock_status": {
      "type": "string",
      "title": "[data-content-id=stock-status]"
    },
    "stock_description": {
      "type": "string",
      "title": "[data-content-id=stock-description]"
    }
  },
  "title": ".product-page",
  "required": []
}