Company Research Workflow

Overview

v2.45.0 Feature: Company Research is a Celery-orchestrated workflow that chains multiple SpiderIQ services together for comprehensive company intelligence:

tip

Beyond Lead Generation: While orchestrated campaigns focus on lead capture, Company Research provides deep enrichment including registry data, social profiles, and company validation.

SpiderMaps - Discover businesses from Google Maps
SpiderSite - Crawl websites for contact info and company details
SpiderCompanyData - Enrich with official registry data (US SEC, UK Companies House, EU VIES)
Social Enrichment - Find Instagram, Facebook, LinkedIn profiles
SpiderVerify - Verify extracted email addresses

info

One workflow, complete company intelligence. Submit locations or domains and receive enriched company profiles with verified contacts, registry data, and social presence.

How It Works

The Pipeline

Step	Service	What Happens
1	SpiderMaps	Searches Google Maps for businesses (if using locations input)
2	Domain Filter	Removes social media, review sites, directories
3	SpiderSite	Crawls each business website for contacts
4	SpiderCompanyData	Fetches registry data (US/UK/EU)
5	Social Enrichment	Discovers Instagram, Facebook, LinkedIn profiles
6	SpiderVerify	Verifies each email via SMTP
7	Aggregate	Combines all data per company

Input Types

Company Research supports two input methods:

Input Type	Description	Use Case
locations	List of locations to search	Discover new companies in target areas
domains	List of domains to research	Enrich known company domains directly

Submitting Research Jobs

Location-Based Discovery

cURL
Python
JavaScript

curl -X POST https://spideriq.ai/api/v1/company-research/submit \
  -H "Authorization: Bearer <your_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "UK Tech Companies Research",
    "locations": [
      {"city": "London", "country": "GB"},
      {"city": "Manchester", "country": "GB"}
    ],
    "search_query": "software companies",
    "config": {
      "spidersite": {
        "enabled": true,
        "max_pages": 10,
        "extract_company_info": true
      },
      "spidercompanydata": {
        "enabled": true,
        "include_officers": true,
        "include_financials": false
      },
      "social_enrichment": {
        "enabled": true,
        "platforms": ["instagram", "linkedin"]
      },
      "spiderverify": {
        "enabled": true,
        "max_emails_per_business": 5
      }
    }
  }'

import requests

response = requests.post(
    "https://spideriq.ai/api/v1/company-research/submit",
    headers={"Authorization": "Bearer <your_token>"},
    json={
        "name": "UK Tech Companies Research",
        "locations": [
            {"city": "London", "country": "GB"},
            {"city": "Manchester", "country": "GB"}
        ],
        "search_query": "software companies",
        "config": {
            "spidersite": {
                "enabled": True,
                "max_pages": 10,
                "extract_company_info": True
            },
            "spidercompanydata": {
                "enabled": True,
                "include_officers": True,
                "include_financials": False
            },
            "social_enrichment": {
                "enabled": True,
                "platforms": ["instagram", "linkedin"]
            },
            "spiderverify": {
                "enabled": True,
                "max_emails_per_business": 5
            }
        }
    }
)
result = response.json()
print(f"Research ID: {result['research_id']}")
print(f"Status: {result['status']}")

const response = await fetch(
  'https://spideriq.ai/api/v1/company-research/submit',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer <your_token>',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      name: 'UK Tech Companies Research',
      locations: [
        { city: 'London', country: 'GB' },
        { city: 'Manchester', country: 'GB' }
      ],
      search_query: 'software companies',
      config: {
        spidersite: {
          enabled: true,
          max_pages: 10,
          extract_company_info: true
        },
        spidercompanydata: {
          enabled: true,
          include_officers: true,
          include_financials: false
        },
        social_enrichment: {
          enabled: true,
          platforms: ['instagram', 'linkedin']
        },
        spiderverify: {
          enabled: true,
          max_emails_per_business: 5
        }
      }
    })
  }
);
const result = await response.json();
console.log(`Research ID: ${result.research_id}`);

Response:

{
  "success": true,
  "research_id": "res_uk_tech_20260215_abc123",
  "status": "queued",
  "input_type": "locations",
  "celery_workflow_id": "abc123-def456-ghi789",
  "created_at": "2026-02-15T10:30:00Z"
}

Domain-Based Research

Skip SpiderMaps and research specific domains directly:

cURL
Python

curl -X POST https://spideriq.ai/api/v1/company-research/submit \
  -H "Authorization: Bearer <your_token>" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Competitor Analysis",
    "domains": [
      "acme-corp.com",
      "competitor-inc.co.uk",
      "rival-software.de"
    ],
    "config": {
      "spidersite": {
        "enabled": true,
        "extract_company_info": true,
        "extract_team": true
      },
      "spidercompanydata": {
        "enabled": true,
        "include_officers": true,
        "include_financials": true
      }
    }
  }'

response = requests.post(
    "https://spideriq.ai/api/v1/company-research/submit",
    headers={"Authorization": "Bearer <your_token>"},
    json={
        "name": "Competitor Analysis",
        "domains": [
            "acme-corp.com",
            "competitor-inc.co.uk",
            "rival-software.de"
        ],
        "config": {
            "spidersite": {
                "enabled": True,
                "extract_company_info": True,
                "extract_team": True
            },
            "spidercompanydata": {
                "enabled": True,
                "include_officers": True,
                "include_financials": True
            }
        }
    }
)

Configuration Reference

SpiderSite Options

Parameter	Type	Default	Description
`enabled`	boolean	`true`	Enable website crawling
`max_pages`	integer	`10`	Pages to crawl per site (1-50)
`crawl_strategy`	string	`"bestfirst"`	`bestfirst`, `bfs`, or `dfs`
`extract_company_info`	boolean	`false`	Extract company info with AI
`extract_team`	boolean	`false`	Extract team members
`timeout`	integer	`30`	Crawl timeout seconds

SpiderCompanyData Options

Parameter	Type	Default	Description
`enabled`	boolean	`false`	Enable registry lookup
`countries_filter`	array	`null`	Limit to specific countries (e.g., `["US", "GB"]`)
`include_officers`	boolean	`false`	Fetch directors/officers
`include_financials`	boolean	`false`	Fetch financial data (10-K, UK accounts)
`include_vat_validation`	boolean	`false`	Validate EU VAT numbers
`match_threshold`	float	`0.7`	Name match confidence threshold

info

Supported Countries: US (SEC EDGAR), GB (Companies House), EU (VIES VAT validation)

Parameter	Type	Default	Description
`enabled`	boolean	`false`	Enable social profile discovery
`platforms`	array	`["instagram", "facebook", "linkedin"]`	Platforms to search

SpiderVerify Options

Parameter	Type	Default	Description
`enabled`	boolean	`true`	Enable email verification
`max_emails_per_business`	integer	`10`	Max emails to verify (1-50)
`check_gravatar`	boolean	`false`	Check for Gravatar images

Monitoring Progress

Status Endpoint

cURL
Python

curl https://spideriq.ai/api/v1/company-research/{research_id}/status \
  -H "Authorization: Bearer <your_token>"

response = requests.get(
    f"https://spideriq.ai/api/v1/company-research/{research_id}/status",
    headers={"Authorization": "Bearer <your_token>"}
)
status = response.json()
print(f"Status: {status['status']}")
print(f"Stage: {status['current_stage']}")
print(f"Companies found: {status['progress']['companies_found']}")
print(f"Emails verified: {status['progress']['emails_verified']}")

Response:

{
  "success": true,
  "research_id": "res_uk_tech_20260215_abc123",
  "status": "processing",
  "current_stage": "spidercompanydata",
  "progress": {
    "companies_found": 45,
    "companies_processed": 32,
    "emails_found": 156,
    "emails_verified": 89
  },
  "started_at": "2026-02-15T10:30:15Z"
}

Progress Fields

Field	Description
`companies_found`	Total companies discovered
`companies_processed`	Companies through all stages
`emails_found`	Total emails extracted
`emails_verified`	Emails verified via SMTP

Job Lifecycle

Status	Description
`queued`	Job waiting to start
`processing`	Actively running pipeline stages
`paused`	Temporarily paused (can be resumed)
`completed`	All stages finished successfully
`failed`	Pipeline failed (check error details)
`cancelled`	Job was cancelled

Getting Results

Results Endpoint

cURL
Python

curl https://spideriq.ai/api/v1/company-research/{research_id}/results \
  -H "Authorization: Bearer <your_token>"

response = requests.get(
    f"https://spideriq.ai/api/v1/company-research/{research_id}/results",
    headers={"Authorization": "Bearer <your_token>"}
)
results = response.json()

for company in results['companies']:
    print(f"\n=== {company['name']} ===")
    print(f"Domain: {company['domain']}")

    # Registry data
    if company.get('registry_data'):
        reg = company['registry_data']
        print(f"Registration: {reg.get('registration_number')}")
        print(f"Status: {reg.get('status')}")

    # Verified emails
    for email in company.get('emails_verified', []):
        print(f"Email: {email['email']} ({email['status']})")

Response Structure:

{
  "success": true,
  "research_id": "res_uk_tech_20260215_abc123",
  "status": "completed",
  "progress": {
    "companies_found": 45,
    "companies_processed": 45,
    "emails_found": 156,
    "emails_verified": 142
  },
  "companies": [
    {
      "company_id": "comp_abc123",
      "name": "Acme Software Ltd",
      "domain": "acme-software.co.uk",
      "website_data": {
        "pages_crawled": 8,
        "industry": "Software Development",
        "key_services": ["Custom Software", "Cloud Solutions"],
        "company_size": "50-200 employees"
      },
      "registry_data": {
        "source": "uk_companies_house",
        "registration_number": "12345678",
        "legal_form": "Private Limited Company",
        "status": "active",
        "incorporation_date": "2015-03-21",
        "registered_address": {
          "line1": "123 Tech Street",
          "city": "London",
          "postcode": "EC1A 1BB"
        },
        "sic_codes": ["62012"],
        "officers": [
          {"name": "John Smith", "role": "director", "appointed": "2015-03-21"}
        ]
      },
      "social_profiles": {
        "linkedin": "https://linkedin.com/company/acme-software",
        "instagram": "@acmesoftware"
      },
      "emails_found": ["info@acme-software.co.uk", "sales@acme-software.co.uk"],
      "emails_verified": [
        {
          "email": "info@acme-software.co.uk",
          "status": "valid",
          "score": 95,
          "is_deliverable": true
        }
      ],
      "detected_country": "GB",
      "country_detection_source": "domain_tld"
    }
  ],
  "summary": {
    "total_companies": 45,
    "with_registry_data": 38,
    "with_social_profiles": 41,
    "valid_emails": 89
  },
  "completed_at": "2026-02-15T11:45:30Z"
}

Wait for Completion (Blocking)

For synchronous workflows, use the blocking endpoint:

cURL
Python

# Wait up to 5 minutes for completion
curl "https://spideriq.ai/api/v1/company-research/{research_id}/wait?timeout=300" \
  -H "Authorization: Bearer <your_token>"

response = requests.get(
    f"https://spideriq.ai/api/v1/company-research/{research_id}/wait",
    params={"timeout": 300, "poll_interval": 5},
    headers={"Authorization": "Bearer <your_token>"}
)
result = response.json()

if result['completed']:
    print(f"Research complete! Found {len(result['companies'])} companies")
else:
    print(f"Timeout - status: {result['status']}")

Managing Research Jobs

List All Jobs

curl "https://spideriq.ai/api/v1/company-research?status=processing&limit=20" \
  -H "Authorization: Bearer <your_token>"

Pause a Job

curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/pause \
  -H "Authorization: Bearer <your_token>"

Resume a Job

curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/resume \
  -H "Authorization: Bearer <your_token>"

Cancel a Job

curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/cancel \
  -H "Authorization: Bearer <your_token>"

Complete Python Example

import requests
import time

# Configuration
API_URL = "https://spideriq.ai/api/v1"
TOKEN = "your_token_here"
HEADERS = {"Authorization": f"Bearer {TOKEN}"}

# 1. Submit Research Job
print("Submitting research job...")
submit_response = requests.post(
    f"{API_URL}/company-research/submit",
    headers=HEADERS,
    json={
        "name": "UK SaaS Company Research",
        "locations": [
            {"city": "London", "country": "GB"},
            {"city": "Edinburgh", "country": "GB"}
        ],
        "search_query": "saas software companies",
        "config": {
            "spidersite": {
                "enabled": True,
                "max_pages": 10,
                "extract_company_info": True
            },
            "spidercompanydata": {
                "enabled": True,
                "include_officers": True
            },
            "social_enrichment": {
                "enabled": True,
                "platforms": ["linkedin", "instagram"]
            },
            "spiderverify": {
                "enabled": True,
                "max_emails_per_business": 5
            }
        }
    }
)

research = submit_response.json()
research_id = research['research_id']
print(f"Research ID: {research_id}")

# 2. Monitor Progress
print("\nMonitoring progress...")
while True:
    status_response = requests.get(
        f"{API_URL}/company-research/{research_id}/status",
        headers=HEADERS
    )
    status = status_response.json()

    progress = status['progress']
    print(f"Status: {status['status']} | "
          f"Stage: {status.get('current_stage', 'N/A')} | "
          f"Companies: {progress['companies_processed']}/{progress['companies_found']} | "
          f"Emails: {progress['emails_verified']}")

    if status['status'] in ('completed', 'failed', 'cancelled'):
        break

    time.sleep(10)

# 3. Get Results
print("\nFetching results...")
results_response = requests.get(
    f"{API_URL}/company-research/{research_id}/results",
    headers=HEADERS
)
results = results_response.json()

# 4. Process Results
print(f"\n=== Research Complete ===")
print(f"Total companies: {results['summary']['total_companies']}")
print(f"With registry data: {results['summary']['with_registry_data']}")
print(f"Valid emails: {results['summary']['valid_emails']}")

for company in results['companies'][:5]:  # First 5
    print(f"\n{company['name']}")
    print(f"  Domain: {company['domain']}")
    if company.get('registry_data'):
        reg = company['registry_data']
        print(f"  Reg #: {reg.get('registration_number')}")
        print(f"  Status: {reg.get('status')}")
    if company.get('emails_verified'):
        for email in company['emails_verified'][:2]:
            print(f"  Email: {email['email']} ({email['status']})")

Best Practices

tip

Start with Domains: If you have a list of target companies, use domain-based input to skip discovery and get faster results.

tip

Enable SpiderCompanyData: Registry data adds significant value for B2B research - company status, officers, and financials.

tip

Country Detection: SpiderCompanyData automatically detects country from domain TLD and website content. Use countries_filter to restrict to specific registries.

warning

Rate Limiting: Research jobs respect system capacity. Large jobs may queue during peak times.

Recommended Settings for B2B Research

{
  "config": {
    "spidersite": {
      "enabled": true,
      "max_pages": 10,
      "extract_company_info": true
    },
    "spidercompanydata": {
      "enabled": true,
      "include_officers": true,
      "include_financials": false
    },
    "social_enrichment": {
      "enabled": true,
      "platforms": ["linkedin"]
    },
    "spiderverify": {
      "enabled": true,
      "max_emails_per_business": 3
    }
  }
}

Comparison: Campaigns vs Research

Feature	Orchestrated Campaigns	Company Research
Input	Locations only	Locations or Domains
Control	Manual `/next` calls	Fully automated
Registry Data	No	Yes (US/UK/EU)
Social Enrichment	No	Yes
Pause/Resume	Yes	Yes
Best For	Lead generation loops	Deep company intelligence

Next Steps

For manual lead generation workflows

SpiderCompanyData standalone usage

SpiderVerify deep dive

Handle failures gracefully

Company Research Workflow

Overview

How It Works

The Pipeline

Input Types

Submitting Research Jobs

Location-Based Discovery

Domain-Based Research

Configuration Reference

SpiderSite Options

SpiderCompanyData Options

SpiderVerify Options

Monitoring Progress

Status Endpoint

Progress Fields

Job Lifecycle

Getting Results

Results Endpoint

Wait for Completion (Blocking)

Managing Research Jobs

List All Jobs

Pause a Job

Resume a Job

Cancel a Job

Complete Python Example

Best Practices

Recommended Settings for B2B Research

Comparison: Campaigns vs Research

Next Steps

Orchestrated Campaigns

Company Data

Email Verification

Error Handling

Overview​

How It Works​

The Pipeline​

Input Types​

Submitting Research Jobs​

Location-Based Discovery​

Domain-Based Research​

Configuration Reference​

SpiderSite Options​

SpiderCompanyData Options​

Social Enrichment Options​

SpiderVerify Options​

Monitoring Progress​

Status Endpoint​

Progress Fields​

Job Lifecycle​

Getting Results​

Results Endpoint​

Wait for Completion (Blocking)​

Managing Research Jobs​

List All Jobs​

Pause a Job​

Resume a Job​

Cancel a Job​

Complete Python Example​

Best Practices​

Recommended Settings for B2B Research​

Comparison: Campaigns vs Research​

Next Steps​

Orchestrated Campaigns

Company Data

Email Verification

Error Handling

Overview

How It Works

The Pipeline

Input Types

Submitting Research Jobs

Location-Based Discovery

Domain-Based Research

Configuration Reference

SpiderSite Options

SpiderCompanyData Options

Social Enrichment Options

SpiderVerify Options

Monitoring Progress

Status Endpoint

Progress Fields

Job Lifecycle

Getting Results

Results Endpoint

Wait for Completion (Blocking)

Managing Research Jobs

List All Jobs

Pause a Job

Resume a Job

Cancel a Job

Complete Python Example

Best Practices

Recommended Settings for B2B Research

Comparison: Campaigns vs Research

Next Steps