Skip to main content

Get Job Results

GET/api/v1/jobs/{job_id}/results

Overview

Retrieve the complete results for a scraping job. This endpoint returns different status codes based on job state.

Path Parameters

job_idstringrequired

The unique identifier of the job (UUID format)

Example: 550e8400-e29b-41d4-a716-446655440000

Query Parameters

formatstring

Response format for AI agent integration (v2.60.0)

Options:

  • yaml - Token-efficient YAML format (40% savings)
  • md - Human-readable Markdown format (50% savings)

Default: JSON (no format parameter)

Response Status Codes

200 OKstatus

Job completed successfully - results available

202 Acceptedstatus

Job still processing - poll again later

410 Gonestatus

Job failed or was cancelled

404 Not Foundstatus

Job ID does not exist

Response Structure

info

Flat Structure (v2.7.1): Responses now use a simplified 2-3 level nesting structure (previously 5 levels). All fields are always present - fields not applicable to your request will be null.

Top-Level Response Fields

successboolean

true if job completed successfully, false if failed

job_idstring

Unique job identifier (UUID format)

typestring

Job type: spiderSite or spiderMaps

statusstring

Job status: completed, failed, processing, queued, or cancelled

processing_time_secondsnumber

Time taken to process the job (null if not completed)

worker_idstring

Worker identifier that processed the job

completed_atstring

Completion timestamp in ISO 8601 format

messagestring

Additional context about job state (e.g., "Job is being processed")

dataobject

Job results data (structure varies by job type, see below)

error_messagestring

Error message if job failed (null otherwise)


SpiderSite Data Fields

info

Flat Structure: Social media fields are at the top level of data (e.g., data.linkedin), not nested under data.contact_info.social_media.linkedin.

Basic Information

data.urlstring

Website URL that was crawled

data.pages_crawledinteger

Number of pages successfully crawled

data.crawl_statusstring

Crawl result: success, partial, or failed

Contact Information (Flat - Top Level)

data.emailsarray

Email addresses found (filtered - tracking emails removed)

data.phonesarray

Phone numbers found

data.addressesarray

Physical addresses found

Social Media Profiles (All Flat - Top Level)

data.linkedinstring

LinkedIn company/profile URL (null if not found)

data.twitterstring

Twitter/X profile URL (null if not found)

data.facebookstring

Facebook page URL (null if not found)

data.instagramstring

Instagram profile URL (null if not found)

data.youtubestring

YouTube channel URL (null if not found)

data.githubstring

GitHub organization/user URL (null if not found)

data.tiktokstring

TikTok profile URL (null if not found)

data.pintereststring

Pinterest profile URL (null if not found)

data.mediumstring

Medium profile URL (null if not found)

data.discordstring

Discord server invite URL (null if not found)

data.whatsappstring

WhatsApp contact/business URL (null if not found)

data.telegramstring

Telegram contact/channel URL (null if not found)

data.snapchatstring

Snapchat profile URL (null if not found)

data.redditstring

Reddit profile/subreddit URL (null if not found)

Markdown Compendium (v2.14.0: SpiderMedia Storage)

data.markdown_compendiumstring

AI-generated markdown summary of the website (if enabled and included inline)

data.compendiumobject

Compendium metadata and storage information

AI Features (Always Present - Null If Not Enabled)

data.company_vitalsobject

Company information extracted with AI (null if extract_company_info: false)

data.pain_pointsarray

Business pain points identified by AI (null if extract_pain_points: false)

data.team_membersarray

Team members found with AI extraction (empty array if extract_team: false)

data.lead_scoringobject

CHAMP framework lead scoring (null if product/ICP not provided)

data.personalization_hooksobject

Personalization data for outreach (null if not available)

Technical Metadata

data.metadataobject

Crawl metadata and statistics including:

  • browser_rendering_available: Whether SPA rendering was used
  • spa_enabled: Whether SPA detection was enabled
  • sitemap_used: Whether sitemap-first crawling was used
  • crawl_strategy: Strategy used (sitemap, bestfirst, bfs, dfs)
  • total_emails_found: Total emails before filtering
  • total_phones_found: Total phone numbers found

SpiderMaps Data Fields

Basic Information

data.querystring

Search query used for the scrape

data.results_countinteger

Number of business listings returned

data.businessesarray

Array of business listings (see structure below)

data.metadataobject

Search metadata (max_results, extract_reviews, language, etc.)

Business Listing Structure

Each business in the businesses array contains:

namestring

Business name

place_idstring

Google Place ID

ratingnumber

Average rating (1.0-5.0)

reviews_countinteger

Number of reviews

addressstring

Full street address

phonestring

Phone number

websitestring

Business website URL

categoriesarray

Business categories/types

coordinatesobject

Latitude and longitude coordinates

linkstring

Google Maps link to the business

business_statusstring

Status: OPERATIONAL, CLOSED_TEMPORARILY, etc.

price_rangestring

Price range: $, $$, $$$, or $$$$

working_hoursobject

Working hours by day of week

Example Request

curl https://spideriq.ai/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000/results \
-H "Authorization: Bearer <your_token>"

Example Responses

Basic contact extraction without AI features:

{
"success": true,
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"type": "spiderSite",
"status": "completed",
"processing_time_seconds": 12.4,
"worker_id": "spider-site-main-1",
"completed_at": "2025-10-27T14:30:15Z",
"message": null,
"data": {
"url": "https://example.com",
"pages_crawled": 5,
"crawl_status": "success",
"emails": ["contact@example.com", "sales@example.com"],
"phones": ["+1-555-123-4567"],
"addresses": ["123 Main St, San Francisco, CA 94105"],
"linkedin": "https://linkedin.com/company/example",
"twitter": "https://twitter.com/example",
"facebook": "https://facebook.com/example",
"instagram": null,
"youtube": null,
"github": "https://github.com/example",
"tiktok": null,
"pinterest": null,
"medium": null,
"discord": null,
"whatsapp": null,
"telegram": null,
"snapchat": null,
"reddit": null,
"markdown_compendium": "# Example Company\n\nLeading provider of...",
"compendium": {
"available": true,
"chars": 8450,
"cleanup_level": "fit",
"storage_location": "spidermedia",
"download_url": "https://media.spideriq.ai/client-xxx/compendiums/550e8400-e29b-41d4-a716-446655440000.md",
"filename": "compendiums/550e8400-e29b-41d4-a716-446655440000.md",
"size_bytes": 8450,
"content_hash": "a1b2c3d4e5f6...",
"estimated_tokens": 2100
},
"company_vitals": null,
"pain_points": null,
"lead_scoring": null,
"team_members": [],
"personalization_hooks": null,
"metadata": {
"spa_enabled": true,
"sitemap_used": true,
"browser_rendering_available": true,
"crawl_strategy": "sitemap",
"total_emails_found": 2,
"total_phones_found": 1
}
},
"error_message": null
}

Handling Different Status Codes

import requests
import time

def get_job_results(job_id, auth_token, max_retries=60):
"""Get job results with automatic polling"""
url = f"https://spideriq.ai/api/v1/jobs/{job_id}/results"
headers = {"Authorization": f"Bearer {auth_token}"}

for attempt in range(max_retries):
response = requests.get(url, headers=headers)

if response.status_code == 200:
# Success - return results
return response.json()

elif response.status_code == 202:
# Still processing - wait and retry
print(f"Job processing... (attempt {attempt + 1}/{max_retries})")
time.sleep(3)
continue

elif response.status_code == 410:
# Job failed
error_data = response.json()
raise Exception(f"Job failed: {error_data.get('error')}")

elif response.status_code == 404:
raise Exception("Job not found")

else:
response.raise_for_status()

raise TimeoutError("Job did not complete within maximum retries")

# Usage
try:
results = get_job_results(
"550e8400-e29b-41d4-a716-446655440000",
"<your_token>"
)
print("Results:", results["data"])
except Exception as e:
print(f"Error: {e}")

Data Storage

info

Screenshot Storage: SpiderSite job screenshots are stored in Cloudflare R2 and accessible via CDN at cdn.spideriq.ai. URLs are permanent and do not expire.

Best Practices

warning

Don't poll too frequently: Respect the 100 requests/minute rate limit. Poll every 3-5 seconds for optimal balance between responsiveness and rate limit compliance.

tip

Save job IDs: Store job IDs in your database to retrieve results later. Results remain available indefinitely.