n8nflow.net logo

Extract Structured Data from D&B Company Reports with GPT-4o

by Robert Breenβ€’Updated: Last update 8 days agoβ€’Source: n8n.io
Loading workflow viewer...

Getting Started

Pull a Dun & Bradstreet Business Information Report (PDF) by DUNS, convert the response into a binary PDF file , extract readable text, and use OpenAI to return a clean, flat JSON with only the key fields you care about (e.g., report date, Paydex, viability score, credit limit). Includes Sticky Notes for quick setup help and guidance.


βœ… What this template does

  • Requests a D &B report (PDF) for a specific DUNS via HTTP
  • Converts the API response into a binary PDF file
  • Extracts the text from the PDF for analysis
  • Uses OpenAI with a Structured Output Parser to return a flat JSON
  • Designed to be extended to Sheets, databases, or CRMs

🧩 How it works (node-by-node)

  1. Manual Trigger β€” Runs the workflow on demand ("When clicking 'Execute workflow'").
  2. D &B Report (HTTP Request) β€” Calls the D&B Reports API for a Business Information Report (PDF).
  3. Convert to PDF File (Convert to File) β€” Turns the D&B response payload into a binary PDF.
  4. Extract Binary (Extract from File) β€” Extracts text content from the PDF.
  5. OpenAI Chat Model β€” Provides the language model context for the analyzer.
  6. Analyze PDF (AI Agent) β€” Reads the extracted text and applies strict rules for a flat JSON output.
  7. Structured Output (AI Structured Output Parser) β€” Enforces a schema and validates/auto-fixes the JSON shape.
  8. (Optional) Get Bearer Token (HTTP Request) β€” Template guidance for OAuth token retrieval (shown as disabled; included for reference if you prefer Bearer flows).

πŸ› οΈ Setup instructions (from the JSON)

1) D&B Report (HTTP Request)

Put your Authorization: Bearer <token> header inside this credential, not directly in the node.

2) Convert to PDF File (Convert to File)

  • Operation: toBinary
  • Source Property: contents[0].contentObject

This takes the PDF content from the D&B API response and converts it to a binary file for downstream nodes.

3) Extract Binary (Extract from File)

  • Operation: pdf

Produces a text field with the extracted PDF content, ready for AI analysis.

4) OpenAI Model(s)

  • OpenAI Chat Model
  • Model: gpt-4o (as configured in the JSON)
  • Credential: Your stored OpenAI API credential (do not hardcode keys)
  • Wiring:
    • Connect OpenAI Chat Model as ai_languageModel to Analyze PDF
    • Connect another OpenAI Chat Model (also gpt-4o) as ai_languageModel to Structured Output

5) Analyze PDF (AI Agent)

  • Prompt Type: define
  • Text:
    ={{ $json.text }}
  • System Message (rules):
    • You are a precision extractor. Read the provided business report PDF and return only a single flat JSON object with the fields below.
    • No arrays/lists.
    • No prose.
    • If a value is missing, output null.
    • Dates: YYYY-MM-DD.
    • Numbers: plain numerics (no commas or $).
    • Prefer most recent or highest-level overall values if multiple are shown.
    • Never include arrays, nested structures, or text outside of the JSON object.

6) Structured Output (AI Structured Output Parser)

  • JSON Schema Example:

    { "report_date": "", "company_name": "", "duns": "", "dnb_rating_overall": "", "composite_credit_appraisal": "", "viability_score": "", "portfolio_comparison_score": "", "paydex_3mo": "", "paydex_24mo": "", "credit_limit_conservative": "" }

  • Auto Fix: enabled

  • Wiring: Connect as ai_outputParser to Analyze PDF

7) (Optional) Get Bearer Token (HTTP Request) β€” Disabled example

If you prefer fetching tokens dynamically:

  • Auth: Basic Auth (D&B username/password)
  • Method: POST
  • URL: https://plus.dnb.com/v3/token
  • Body Parameters:
    • grant_type = client_credentials
  • Headers:
    • Accept: application/json
  • Downstream usage: Set header Authorization: Bearer {{$json["access_token"]}} in subsequent calls.

In this template, the D&B Report node uses Header Auth credential instead. Use one strategy consistently (credentials are recommended for security).


🧠 Output schema (flat JSON)

The analyzer + parser return a single flat object like:

{
  "report_date": "2024-12-31",
  "company_name": "Example Corp",
  "duns": "123456789",
  "dnb_rating_overall": "5A2",
  "composite_credit_appraisal": "Fair",
  "viability_score": "3",
  "portfolio_comparison_score": "2",
  "paydex_3mo": "80",
  "paydex_24mo": "78",
  "credit_limit_conservative": "25000"
}

πŸ§ͺ Test flow

  1. Click Execute workflow (Manual Trigger).
  2. Confirm D &B Report returns the PDF response.
  3. Check Convert to PDF File for a binary file.
  4. Verify Extract from File produces a text field.
  5. Inspect Analyze PDF β†’ Structured Output for valid JSON.

πŸ” Security notes

  • Do not hardcode tokens in nodes; use Credentials (HTTP Header Auth or Basic Auth).
  • Restrict who can execute the workflow if it's accessible from outside your network.
  • Avoid storing sensitive payloads in logs; mask tokens/headers.

🧩 Customize

  • Map the structured JSON to Google Sheets , Postgres/BigQuery , or a CRM.
  • Extend the schema with additional fields (e.g., number of employees, HQ address) β€” keep it flat.
  • Add validation (Set/IF nodes) to ensure required fields exist before writing downstream.

🩹 Troubleshooting

  • Missing PDF text? Ensure Convert to File source property is contents[0].contentObject.
  • Unauthorized from D &B? Refresh/verify token; confirm Header Auth credential contains Authorization: Bearer <token>.
  • Parser errors? Keep the agent output short and flat; the Structured Output node will auto-fix minor issues.
  • Different DUNS/product? Update the D&B Report URL query params (duns, productId, etc.).

πŸ—’οΈ Sticky Notes (included)

  • Overview: "Fetch D&B Company Report (PDF) β†’ Convert β†’ Extract β†’ Summarize to Structured JSON (n8n)"
  • Setup snippets for Data Blocks (optional) and Auth flow

πŸ“¬ Contact

Need help customizing this (e.g., routing the PDF to Drive, mapping JSON to your CRM, or expanding the schema)?

πŸ“§ [email protected]
πŸ”— https://www.linkedin.com/in/robert-breen-29429625/
🌐 https://ynteractive.com