Company Research Workflow
Overview
v2.45.0 Feature: Company Research is a Celery-orchestrated workflow that chains multiple SpiderIQ services together for comprehensive company intelligence:
Beyond Lead Generation: While orchestrated campaigns focus on lead capture, Company Research provides deep enrichment including registry data, social profiles, and company validation.
- SpiderMaps - Discover businesses from Google Maps
- SpiderSite - Crawl websites for contact info and company details
- SpiderCompanyData - Enrich with official registry data (US SEC, UK Companies House, EU VIES)
- Social Enrichment - Find Instagram, Facebook, LinkedIn profiles
- SpiderVerify - Verify extracted email addresses
One workflow, complete company intelligence. Submit locations or domains and receive enriched company profiles with verified contacts, registry data, and social presence.
How It Works
The Pipeline
| Step | Service | What Happens |
|---|---|---|
| 1 | SpiderMaps | Searches Google Maps for businesses (if using locations input) |
| 2 | Domain Filter | Removes social media, review sites, directories |
| 3 | SpiderSite | Crawls each business website for contacts |
| 4 | SpiderCompanyData | Fetches registry data (US/UK/EU) |
| 5 | Social Enrichment | Discovers Instagram, Facebook, LinkedIn profiles |
| 6 | SpiderVerify | Verifies each email via SMTP |
| 7 | Aggregate | Combines all data per company |
Input Types
Company Research supports two input methods:
| Input Type | Description | Use Case |
|---|---|---|
| locations | List of locations to search | Discover new companies in target areas |
| domains | List of domains to research | Enrich known company domains directly |
Submitting Research Jobs
Location-Based Discovery
- cURL
- Python
- JavaScript
curl -X POST https://spideriq.ai/api/v1/company-research/submit \
-H "Authorization: Bearer <your_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "UK Tech Companies Research",
"locations": [
{"city": "London", "country": "GB"},
{"city": "Manchester", "country": "GB"}
],
"search_query": "software companies",
"config": {
"spidersite": {
"enabled": true,
"max_pages": 10,
"extract_company_info": true
},
"spidercompanydata": {
"enabled": true,
"include_officers": true,
"include_financials": false
},
"social_enrichment": {
"enabled": true,
"platforms": ["instagram", "linkedin"]
},
"spiderverify": {
"enabled": true,
"max_emails_per_business": 5
}
}
}'
import requests
response = requests.post(
"https://spideriq.ai/api/v1/company-research/submit",
headers={"Authorization": "Bearer <your_token>"},
json={
"name": "UK Tech Companies Research",
"locations": [
{"city": "London", "country": "GB"},
{"city": "Manchester", "country": "GB"}
],
"search_query": "software companies",
"config": {
"spidersite": {
"enabled": True,
"max_pages": 10,
"extract_company_info": True
},
"spidercompanydata": {
"enabled": True,
"include_officers": True,
"include_financials": False
},
"social_enrichment": {
"enabled": True,
"platforms": ["instagram", "linkedin"]
},
"spiderverify": {
"enabled": True,
"max_emails_per_business": 5
}
}
}
)
result = response.json()
print(f"Research ID: {result['research_id']}")
print(f"Status: {result['status']}")
const response = await fetch(
'https://spideriq.ai/api/v1/company-research/submit',
{
method: 'POST',
headers: {
'Authorization': 'Bearer <your_token>',
'Content-Type': 'application/json'
},
body: JSON.stringify({
name: 'UK Tech Companies Research',
locations: [
{ city: 'London', country: 'GB' },
{ city: 'Manchester', country: 'GB' }
],
search_query: 'software companies',
config: {
spidersite: {
enabled: true,
max_pages: 10,
extract_company_info: true
},
spidercompanydata: {
enabled: true,
include_officers: true,
include_financials: false
},
social_enrichment: {
enabled: true,
platforms: ['instagram', 'linkedin']
},
spiderverify: {
enabled: true,
max_emails_per_business: 5
}
}
})
}
);
const result = await response.json();
console.log(`Research ID: ${result.research_id}`);
Response:
{
"success": true,
"research_id": "res_uk_tech_20260215_abc123",
"status": "queued",
"input_type": "locations",
"celery_workflow_id": "abc123-def456-ghi789",
"created_at": "2026-02-15T10:30:00Z"
}
Domain-Based Research
Skip SpiderMaps and research specific domains directly:
- cURL
- Python
curl -X POST https://spideriq.ai/api/v1/company-research/submit \
-H "Authorization: Bearer <your_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "Competitor Analysis",
"domains": [
"acme-corp.com",
"competitor-inc.co.uk",
"rival-software.de"
],
"config": {
"spidersite": {
"enabled": true,
"extract_company_info": true,
"extract_team": true
},
"spidercompanydata": {
"enabled": true,
"include_officers": true,
"include_financials": true
}
}
}'
response = requests.post(
"https://spideriq.ai/api/v1/company-research/submit",
headers={"Authorization": "Bearer <your_token>"},
json={
"name": "Competitor Analysis",
"domains": [
"acme-corp.com",
"competitor-inc.co.uk",
"rival-software.de"
],
"config": {
"spidersite": {
"enabled": True,
"extract_company_info": True,
"extract_team": True
},
"spidercompanydata": {
"enabled": True,
"include_officers": True,
"include_financials": True
}
}
}
)
Configuration Reference
SpiderSite Options
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable website crawling |
max_pages | integer | 10 | Pages to crawl per site (1-50) |
crawl_strategy | string | "bestfirst" | bestfirst, bfs, or dfs |
extract_company_info | boolean | false | Extract company info with AI |
extract_team | boolean | false | Extract team members |
timeout | integer | 30 | Crawl timeout seconds |
SpiderCompanyData Options
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable registry lookup |
countries_filter | array | null | Limit to specific countries (e.g., ["US", "GB"]) |
include_officers | boolean | false | Fetch directors/officers |
include_financials | boolean | false | Fetch financial data (10-K, UK accounts) |
include_vat_validation | boolean | false | Validate EU VAT numbers |
match_threshold | float | 0.7 | Name match confidence threshold |
Supported Countries: US (SEC EDGAR), GB (Companies House), EU (VIES VAT validation)
Social Enrichment Options
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable social profile discovery |
platforms | array | ["instagram", "facebook", "linkedin"] | Platforms to search |
SpiderVerify Options
| Parameter | Type | Default | Description |
|---|---|---|---|
enabled | boolean | true | Enable email verification |
max_emails_per_business | integer | 10 | Max emails to verify (1-50) |
check_gravatar | boolean | false | Check for Gravatar images |
Monitoring Progress
Status Endpoint
- cURL
- Python
curl https://spideriq.ai/api/v1/company-research/{research_id}/status \
-H "Authorization: Bearer <your_token>"
response = requests.get(
f"https://spideriq.ai/api/v1/company-research/{research_id}/status",
headers={"Authorization": "Bearer <your_token>"}
)
status = response.json()
print(f"Status: {status['status']}")
print(f"Stage: {status['current_stage']}")
print(f"Companies found: {status['progress']['companies_found']}")
print(f"Emails verified: {status['progress']['emails_verified']}")
Response:
{
"success": true,
"research_id": "res_uk_tech_20260215_abc123",
"status": "processing",
"current_stage": "spidercompanydata",
"progress": {
"companies_found": 45,
"companies_processed": 32,
"emails_found": 156,
"emails_verified": 89
},
"started_at": "2026-02-15T10:30:15Z"
}
Progress Fields
| Field | Description |
|---|---|
companies_found | Total companies discovered |
companies_processed | Companies through all stages |
emails_found | Total emails extracted |
emails_verified | Emails verified via SMTP |
Job Lifecycle
| Status | Description |
|---|---|
queued | Job waiting to start |
processing | Actively running pipeline stages |
paused | Temporarily paused (can be resumed) |
completed | All stages finished successfully |
failed | Pipeline failed (check error details) |
cancelled | Job was cancelled |
Getting Results
Results Endpoint
- cURL
- Python
curl https://spideriq.ai/api/v1/company-research/{research_id}/results \
-H "Authorization: Bearer <your_token>"
response = requests.get(
f"https://spideriq.ai/api/v1/company-research/{research_id}/results",
headers={"Authorization": "Bearer <your_token>"}
)
results = response.json()
for company in results['companies']:
print(f"\n=== {company['name']} ===")
print(f"Domain: {company['domain']}")
# Registry data
if company.get('registry_data'):
reg = company['registry_data']
print(f"Registration: {reg.get('registration_number')}")
print(f"Status: {reg.get('status')}")
# Verified emails
for email in company.get('emails_verified', []):
print(f"Email: {email['email']} ({email['status']})")
Response Structure:
{
"success": true,
"research_id": "res_uk_tech_20260215_abc123",
"status": "completed",
"progress": {
"companies_found": 45,
"companies_processed": 45,
"emails_found": 156,
"emails_verified": 142
},
"companies": [
{
"company_id": "comp_abc123",
"name": "Acme Software Ltd",
"domain": "acme-software.co.uk",
"website_data": {
"pages_crawled": 8,
"industry": "Software Development",
"key_services": ["Custom Software", "Cloud Solutions"],
"company_size": "50-200 employees"
},
"registry_data": {
"source": "uk_companies_house",
"registration_number": "12345678",
"legal_form": "Private Limited Company",
"status": "active",
"incorporation_date": "2015-03-21",
"registered_address": {
"line1": "123 Tech Street",
"city": "London",
"postcode": "EC1A 1BB"
},
"sic_codes": ["62012"],
"officers": [
{"name": "John Smith", "role": "director", "appointed": "2015-03-21"}
]
},
"social_profiles": {
"linkedin": "https://linkedin.com/company/acme-software",
"instagram": "@acmesoftware"
},
"emails_found": ["info@acme-software.co.uk", "sales@acme-software.co.uk"],
"emails_verified": [
{
"email": "info@acme-software.co.uk",
"status": "valid",
"score": 95,
"is_deliverable": true
}
],
"detected_country": "GB",
"country_detection_source": "domain_tld"
}
],
"summary": {
"total_companies": 45,
"with_registry_data": 38,
"with_social_profiles": 41,
"valid_emails": 89
},
"completed_at": "2026-02-15T11:45:30Z"
}
Wait for Completion (Blocking)
For synchronous workflows, use the blocking endpoint:
- cURL
- Python
# Wait up to 5 minutes for completion
curl "https://spideriq.ai/api/v1/company-research/{research_id}/wait?timeout=300" \
-H "Authorization: Bearer <your_token>"
response = requests.get(
f"https://spideriq.ai/api/v1/company-research/{research_id}/wait",
params={"timeout": 300, "poll_interval": 5},
headers={"Authorization": "Bearer <your_token>"}
)
result = response.json()
if result['completed']:
print(f"Research complete! Found {len(result['companies'])} companies")
else:
print(f"Timeout - status: {result['status']}")
Managing Research Jobs
List All Jobs
curl "https://spideriq.ai/api/v1/company-research?status=processing&limit=20" \
-H "Authorization: Bearer <your_token>"
Pause a Job
curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/pause \
-H "Authorization: Bearer <your_token>"
Resume a Job
curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/resume \
-H "Authorization: Bearer <your_token>"
Cancel a Job
curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/cancel \
-H "Authorization: Bearer <your_token>"
Complete Python Example
import requests
import time
# Configuration
API_URL = "https://spideriq.ai/api/v1"
TOKEN = "your_token_here"
HEADERS = {"Authorization": f"Bearer {TOKEN}"}
# 1. Submit Research Job
print("Submitting research job...")
submit_response = requests.post(
f"{API_URL}/company-research/submit",
headers=HEADERS,
json={
"name": "UK SaaS Company Research",
"locations": [
{"city": "London", "country": "GB"},
{"city": "Edinburgh", "country": "GB"}
],
"search_query": "saas software companies",
"config": {
"spidersite": {
"enabled": True,
"max_pages": 10,
"extract_company_info": True
},
"spidercompanydata": {
"enabled": True,
"include_officers": True
},
"social_enrichment": {
"enabled": True,
"platforms": ["linkedin", "instagram"]
},
"spiderverify": {
"enabled": True,
"max_emails_per_business": 5
}
}
}
)
research = submit_response.json()
research_id = research['research_id']
print(f"Research ID: {research_id}")
# 2. Monitor Progress
print("\nMonitoring progress...")
while True:
status_response = requests.get(
f"{API_URL}/company-research/{research_id}/status",
headers=HEADERS
)
status = status_response.json()
progress = status['progress']
print(f"Status: {status['status']} | "
f"Stage: {status.get('current_stage', 'N/A')} | "
f"Companies: {progress['companies_processed']}/{progress['companies_found']} | "
f"Emails: {progress['emails_verified']}")
if status['status'] in ('completed', 'failed', 'cancelled'):
break
time.sleep(10)
# 3. Get Results
print("\nFetching results...")
results_response = requests.get(
f"{API_URL}/company-research/{research_id}/results",
headers=HEADERS
)
results = results_response.json()
# 4. Process Results
print(f"\n=== Research Complete ===")
print(f"Total companies: {results['summary']['total_companies']}")
print(f"With registry data: {results['summary']['with_registry_data']}")
print(f"Valid emails: {results['summary']['valid_emails']}")
for company in results['companies'][:5]: # First 5
print(f"\n{company['name']}")
print(f" Domain: {company['domain']}")
if company.get('registry_data'):
reg = company['registry_data']
print(f" Reg #: {reg.get('registration_number')}")
print(f" Status: {reg.get('status')}")
if company.get('emails_verified'):
for email in company['emails_verified'][:2]:
print(f" Email: {email['email']} ({email['status']})")
Best Practices
Start with Domains: If you have a list of target companies, use domain-based input to skip discovery and get faster results.
Enable SpiderCompanyData: Registry data adds significant value for B2B research - company status, officers, and financials.
Country Detection: SpiderCompanyData automatically detects country from domain TLD and website content. Use countries_filter to restrict to specific registries.
Rate Limiting: Research jobs respect system capacity. Large jobs may queue during peak times.
Recommended Settings for B2B Research
{
"config": {
"spidersite": {
"enabled": true,
"max_pages": 10,
"extract_company_info": true
},
"spidercompanydata": {
"enabled": true,
"include_officers": true,
"include_financials": false
},
"social_enrichment": {
"enabled": true,
"platforms": ["linkedin"]
},
"spiderverify": {
"enabled": true,
"max_emails_per_business": 3
}
}
}
Comparison: Campaigns vs Research
| Feature | Orchestrated Campaigns | Company Research |
|---|---|---|
| Input | Locations only | Locations or Domains |
| Control | Manual /next calls | Fully automated |
| Registry Data | No | Yes (US/UK/EU) |
| Social Enrichment | No | Yes |
| Pause/Resume | Yes | Yes |
| Best For | Lead generation loops | Deep company intelligence |