Skip to main content

Company Research Workflow

Overview

v2.45.0 Feature: Company Research is a Celery-orchestrated workflow that chains multiple SpiderIQ services together for comprehensive company intelligence:

tip

Beyond Lead Generation: While orchestrated campaigns focus on lead capture, Company Research provides deep enrichment including registry data, social profiles, and company validation.

  1. SpiderMaps - Discover businesses from Google Maps
  2. SpiderSite - Crawl websites for contact info and company details
  3. SpiderCompanyData - Enrich with official registry data (US SEC, UK Companies House, EU VIES)
  4. Social Enrichment - Find Instagram, Facebook, LinkedIn profiles
  5. SpiderVerify - Verify extracted email addresses
info

One workflow, complete company intelligence. Submit locations or domains and receive enriched company profiles with verified contacts, registry data, and social presence.

How It Works

The Pipeline

StepServiceWhat Happens
1SpiderMapsSearches Google Maps for businesses (if using locations input)
2Domain FilterRemoves social media, review sites, directories
3SpiderSiteCrawls each business website for contacts
4SpiderCompanyDataFetches registry data (US/UK/EU)
5Social EnrichmentDiscovers Instagram, Facebook, LinkedIn profiles
6SpiderVerifyVerifies each email via SMTP
7AggregateCombines all data per company

Input Types

Company Research supports two input methods:

Input TypeDescriptionUse Case
locationsList of locations to searchDiscover new companies in target areas
domainsList of domains to researchEnrich known company domains directly

Submitting Research Jobs

Location-Based Discovery

curl -X POST https://spideriq.ai/api/v1/company-research/submit \
-H "Authorization: Bearer <your_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "UK Tech Companies Research",
"locations": [
{"city": "London", "country": "GB"},
{"city": "Manchester", "country": "GB"}
],
"search_query": "software companies",
"config": {
"spidersite": {
"enabled": true,
"max_pages": 10,
"extract_company_info": true
},
"spidercompanydata": {
"enabled": true,
"include_officers": true,
"include_financials": false
},
"social_enrichment": {
"enabled": true,
"platforms": ["instagram", "linkedin"]
},
"spiderverify": {
"enabled": true,
"max_emails_per_business": 5
}
}
}'

Response:

{
"success": true,
"research_id": "res_uk_tech_20260215_abc123",
"status": "queued",
"input_type": "locations",
"celery_workflow_id": "abc123-def456-ghi789",
"created_at": "2026-02-15T10:30:00Z"
}

Domain-Based Research

Skip SpiderMaps and research specific domains directly:

curl -X POST https://spideriq.ai/api/v1/company-research/submit \
-H "Authorization: Bearer <your_token>" \
-H "Content-Type: application/json" \
-d '{
"name": "Competitor Analysis",
"domains": [
"acme-corp.com",
"competitor-inc.co.uk",
"rival-software.de"
],
"config": {
"spidersite": {
"enabled": true,
"extract_company_info": true,
"extract_team": true
},
"spidercompanydata": {
"enabled": true,
"include_officers": true,
"include_financials": true
}
}
}'

Configuration Reference

SpiderSite Options

ParameterTypeDefaultDescription
enabledbooleantrueEnable website crawling
max_pagesinteger10Pages to crawl per site (1-50)
crawl_strategystring"bestfirst"bestfirst, bfs, or dfs
extract_company_infobooleanfalseExtract company info with AI
extract_teambooleanfalseExtract team members
timeoutinteger30Crawl timeout seconds

SpiderCompanyData Options

ParameterTypeDefaultDescription
enabledbooleanfalseEnable registry lookup
countries_filterarraynullLimit to specific countries (e.g., ["US", "GB"])
include_officersbooleanfalseFetch directors/officers
include_financialsbooleanfalseFetch financial data (10-K, UK accounts)
include_vat_validationbooleanfalseValidate EU VAT numbers
match_thresholdfloat0.7Name match confidence threshold
info

Supported Countries: US (SEC EDGAR), GB (Companies House), EU (VIES VAT validation)

Social Enrichment Options

ParameterTypeDefaultDescription
enabledbooleanfalseEnable social profile discovery
platformsarray["instagram", "facebook", "linkedin"]Platforms to search

SpiderVerify Options

ParameterTypeDefaultDescription
enabledbooleantrueEnable email verification
max_emails_per_businessinteger10Max emails to verify (1-50)
check_gravatarbooleanfalseCheck for Gravatar images

Monitoring Progress

Status Endpoint

curl https://spideriq.ai/api/v1/company-research/{research_id}/status \
-H "Authorization: Bearer <your_token>"

Response:

{
"success": true,
"research_id": "res_uk_tech_20260215_abc123",
"status": "processing",
"current_stage": "spidercompanydata",
"progress": {
"companies_found": 45,
"companies_processed": 32,
"emails_found": 156,
"emails_verified": 89
},
"started_at": "2026-02-15T10:30:15Z"
}

Progress Fields

FieldDescription
companies_foundTotal companies discovered
companies_processedCompanies through all stages
emails_foundTotal emails extracted
emails_verifiedEmails verified via SMTP

Job Lifecycle

StatusDescription
queuedJob waiting to start
processingActively running pipeline stages
pausedTemporarily paused (can be resumed)
completedAll stages finished successfully
failedPipeline failed (check error details)
cancelledJob was cancelled

Getting Results

Results Endpoint

curl https://spideriq.ai/api/v1/company-research/{research_id}/results \
-H "Authorization: Bearer <your_token>"

Response Structure:

{
"success": true,
"research_id": "res_uk_tech_20260215_abc123",
"status": "completed",
"progress": {
"companies_found": 45,
"companies_processed": 45,
"emails_found": 156,
"emails_verified": 142
},
"companies": [
{
"company_id": "comp_abc123",
"name": "Acme Software Ltd",
"domain": "acme-software.co.uk",
"website_data": {
"pages_crawled": 8,
"industry": "Software Development",
"key_services": ["Custom Software", "Cloud Solutions"],
"company_size": "50-200 employees"
},
"registry_data": {
"source": "uk_companies_house",
"registration_number": "12345678",
"legal_form": "Private Limited Company",
"status": "active",
"incorporation_date": "2015-03-21",
"registered_address": {
"line1": "123 Tech Street",
"city": "London",
"postcode": "EC1A 1BB"
},
"sic_codes": ["62012"],
"officers": [
{"name": "John Smith", "role": "director", "appointed": "2015-03-21"}
]
},
"social_profiles": {
"linkedin": "https://linkedin.com/company/acme-software",
"instagram": "@acmesoftware"
},
"emails_found": ["info@acme-software.co.uk", "sales@acme-software.co.uk"],
"emails_verified": [
{
"email": "info@acme-software.co.uk",
"status": "valid",
"score": 95,
"is_deliverable": true
}
],
"detected_country": "GB",
"country_detection_source": "domain_tld"
}
],
"summary": {
"total_companies": 45,
"with_registry_data": 38,
"with_social_profiles": 41,
"valid_emails": 89
},
"completed_at": "2026-02-15T11:45:30Z"
}

Wait for Completion (Blocking)

For synchronous workflows, use the blocking endpoint:

# Wait up to 5 minutes for completion
curl "https://spideriq.ai/api/v1/company-research/{research_id}/wait?timeout=300" \
-H "Authorization: Bearer <your_token>"

Managing Research Jobs

List All Jobs

curl "https://spideriq.ai/api/v1/company-research?status=processing&limit=20" \
-H "Authorization: Bearer <your_token>"

Pause a Job

curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/pause \
-H "Authorization: Bearer <your_token>"

Resume a Job

curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/resume \
-H "Authorization: Bearer <your_token>"

Cancel a Job

curl -X POST https://spideriq.ai/api/v1/company-research/{research_id}/cancel \
-H "Authorization: Bearer <your_token>"

Complete Python Example

import requests
import time

# Configuration
API_URL = "https://spideriq.ai/api/v1"
TOKEN = "your_token_here"
HEADERS = {"Authorization": f"Bearer {TOKEN}"}

# 1. Submit Research Job
print("Submitting research job...")
submit_response = requests.post(
f"{API_URL}/company-research/submit",
headers=HEADERS,
json={
"name": "UK SaaS Company Research",
"locations": [
{"city": "London", "country": "GB"},
{"city": "Edinburgh", "country": "GB"}
],
"search_query": "saas software companies",
"config": {
"spidersite": {
"enabled": True,
"max_pages": 10,
"extract_company_info": True
},
"spidercompanydata": {
"enabled": True,
"include_officers": True
},
"social_enrichment": {
"enabled": True,
"platforms": ["linkedin", "instagram"]
},
"spiderverify": {
"enabled": True,
"max_emails_per_business": 5
}
}
}
)

research = submit_response.json()
research_id = research['research_id']
print(f"Research ID: {research_id}")

# 2. Monitor Progress
print("\nMonitoring progress...")
while True:
status_response = requests.get(
f"{API_URL}/company-research/{research_id}/status",
headers=HEADERS
)
status = status_response.json()

progress = status['progress']
print(f"Status: {status['status']} | "
f"Stage: {status.get('current_stage', 'N/A')} | "
f"Companies: {progress['companies_processed']}/{progress['companies_found']} | "
f"Emails: {progress['emails_verified']}")

if status['status'] in ('completed', 'failed', 'cancelled'):
break

time.sleep(10)

# 3. Get Results
print("\nFetching results...")
results_response = requests.get(
f"{API_URL}/company-research/{research_id}/results",
headers=HEADERS
)
results = results_response.json()

# 4. Process Results
print(f"\n=== Research Complete ===")
print(f"Total companies: {results['summary']['total_companies']}")
print(f"With registry data: {results['summary']['with_registry_data']}")
print(f"Valid emails: {results['summary']['valid_emails']}")

for company in results['companies'][:5]: # First 5
print(f"\n{company['name']}")
print(f" Domain: {company['domain']}")
if company.get('registry_data'):
reg = company['registry_data']
print(f" Reg #: {reg.get('registration_number')}")
print(f" Status: {reg.get('status')}")
if company.get('emails_verified'):
for email in company['emails_verified'][:2]:
print(f" Email: {email['email']} ({email['status']})")

Best Practices

tip

Start with Domains: If you have a list of target companies, use domain-based input to skip discovery and get faster results.

tip

Enable SpiderCompanyData: Registry data adds significant value for B2B research - company status, officers, and financials.

tip

Country Detection: SpiderCompanyData automatically detects country from domain TLD and website content. Use countries_filter to restrict to specific registries.

warning

Rate Limiting: Research jobs respect system capacity. Large jobs may queue during peak times.

{
"config": {
"spidersite": {
"enabled": true,
"max_pages": 10,
"extract_company_info": true
},
"spidercompanydata": {
"enabled": true,
"include_officers": true,
"include_financials": false
},
"social_enrichment": {
"enabled": true,
"platforms": ["linkedin"]
},
"spiderverify": {
"enabled": true,
"max_emails_per_business": 3
}
}
}

Comparison: Campaigns vs Research

FeatureOrchestrated CampaignsCompany Research
InputLocations onlyLocations or Domains
ControlManual /next callsFully automated
Registry DataNoYes (US/UK/EU)
Social EnrichmentNoYes
Pause/ResumeYesYes
Best ForLead generation loopsDeep company intelligence

Next Steps