π Firecrawl Website Content Extractor (n8n Workflow)
This n8n automation workflow uses Firecrawl API to extract structured data (e.g., quotes and authors) from web pages β such as Quotes to Scrape β and handles retries in case of delayed extraction.
π Workflow Overview
π― Purpose:
- Crawl and extract structured web data using Firecrawl
- Wait for asynchronous scraping to complete
- Retrieve and validate results
- Support retries if content is not ready
π§ Step-by-Step Node Breakdown
1. π§ͺ Manual Trigger
- Node:
When clicking βTest workflowβ
- Used to manually test or execute the workflow during setup or debugging.
2. π€ Firecrawl Extract API Request
- Node:
Extract
- Sends a
POST request to https://api.firecrawl.dev/v1/extract
- Payload includes:
urls: List of pages to crawl (https://quotes.toscrape.com/*)
prompt: "Extract all quotes and their corresponding authors from the website."
schema: JSON schema defining expected structure (quotes[], each with text and author)
π Uses an HTTP Header Auth credential for Firecrawl API
3. β±οΈ Wait for 30 Seconds
- Node:
30 Secs
- Gives Firecrawl time to finish processing in the background
- Prevents hitting the API before results are ready
4. π₯ Get Results
- Node:
Get Results
- Performs a
GET request to the status URL using {{ $('Extract').item.json.id }} to retrieve extraction results.
5. β
β Condition Check
- Node:
If
- Checks if the
data array is empty (i.e., no results yet)
- If data is empty :
- Waits 10 more seconds and retries
- If data is available :
- Passes data to the next step (e.g., processing or storage)
6. π Retry Delay
- Node:
10 Seconds
- Waits briefly before sending another
GET request to Firecrawl
7. π οΈ Edit Fields (Optional Output Formatting)
- Node:
Edit Fields
- Placeholder to structure or format the extracted results (quotes and authors)
π§Ύ Sticky Note: Firecrawl Setup Guide
Included as an embedded reference:
- π 10% Firecrawl Discount
- π§° Instructions to:
- Add Firecrawl API credentials in n8n
- Use Firecrawl Community Node for self-hosted instances
- Set up the schema and prompt for targeted data extraction
β
Key Features
- π API-based crawling with schema-structured output
- β±οΈ Smart waiting + retry mechanism
- π§ AI prompt integration for intelligent data parsing
- βοΈ Flexible for different URLs, prompts, and schemas
π¦ Sample Output Schema
{
"quotes": [
{
"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein"
},
{
"text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
"author": "J.K. Rowling"
}
]
}