Data Sources
Data Sources, as the name suggests, define sources of information that can be indexed and injected into prompts.
Create New Data Sources
To view the list of data sources or create a new data source, click on “Data Sources” under “Knowledge” in the left side menu.
Click “Create” in the top Nav bar to create a new function.
To edit a source, click the “Edit” action next to the source item in the list.
Data Source Types
The following source types are currently supported.
- API
- Extract data from a general API.
- Document
- Extract data from a document. Document types include CSV, Word, PDF, Text, Powerpoint, Excel, EPUB, HTML, Outlook Item Message Format, Markdown, XML, JSON, and Rich Text Format (RTF).
- Feature Store
- Lookup data from an Online Feature Store with current support for Feast, Vertex, and Anaml.
- SQL
- Relational database tables.
- Web Crawler
- Crawl a web site to extract data given a Scraping Spec.
Scraping Spec
To scrape specific items from a website crawl, you can define a Scraping Spec. A Scraping Spec is a type of schema document that describes the structure of the information you want and the selectors for each item. A complex nested structure can be defined, for example:
which is a visual way of defining the following:
{
"type": "object",
"properties": {
"product_title": {
"type": "string",
"title": "[data-content-id=product-title]"
},
"product_description": {
"type": "string",
"title": "[data-content-id=product-short-description]"
},
"product_features": {
"type": "array",
"items": {
"type": "string"
},
"title": ".EE,[class^=styles_assistiveText]"
},
"stock_status": {
"type": "string",
"title": "[data-content-id=stock-status]"
},
"stock_description": {
"type": "string",
"title": "[data-content-id=stock-description]"
}
},
"title": ".product-page",
"required": []
}