Get Job Results

GET/api/v1/jobs/{job_id}/results

Overview

Retrieve the complete results for a scraping job. This endpoint returns different status codes based on job state.

Path Parameters

job_idstringrequired

The unique identifier of the job (UUID format)

Example: 550e8400-e29b-41d4-a716-446655440000

Query Parameters

formatstring

Response format for AI agent integration (v2.60.0)

Options:

yaml - Token-efficient YAML format (40% savings)
md - Human-readable Markdown format (50% savings)

Default: JSON (no format parameter)

Response Status Codes

200 OKstatus

Job completed successfully - results available

202 Acceptedstatus

Job still processing - poll again later

410 Gonestatus

Job failed or was cancelled

404 Not Foundstatus

Job ID does not exist

Response Structure

info

Flat Structure (v2.7.1): Responses now use a simplified 2-3 level nesting structure (previously 5 levels). All fields are always present - fields not applicable to your request will be null.

Top-Level Response Fields

successboolean

true if job completed successfully, false if failed

job_idstring

Unique job identifier (UUID format)

typestring

Job type: spiderSite or spiderMaps

statusstring

Job status: completed, failed, processing, queued, or cancelled

processing_time_secondsnumber

Time taken to process the job (null if not completed)

worker_idstring

Worker identifier that processed the job

completed_atstring

Completion timestamp in ISO 8601 format

messagestring

Additional context about job state (e.g., "Job is being processed")

dataobject

Job results data (structure varies by job type, see below)

error_messagestring

Error message if job failed (null otherwise)

SpiderSite Data Fields

info

Flat Structure: Social media fields are at the top level of data (e.g., data.linkedin), not nested under data.contact_info.social_media.linkedin.

Basic Information

data.urlstring

Website URL that was crawled

data.pages_crawledinteger

Number of pages successfully crawled

data.crawl_statusstring

Crawl result: success, partial, or failed

Contact Information (Flat - Top Level)

data.emailsarray

Email addresses found (filtered - tracking emails removed)

data.phonesarray

Phone numbers found

data.addressesarray

Physical addresses found

data.linkedinstring

LinkedIn company/profile URL (null if not found)

data.twitterstring

Twitter/X profile URL (null if not found)

data.facebookstring

Facebook page URL (null if not found)

data.instagramstring

Instagram profile URL (null if not found)

data.youtubestring

YouTube channel URL (null if not found)

data.githubstring

GitHub organization/user URL (null if not found)

data.tiktokstring

TikTok profile URL (null if not found)

data.pintereststring

Pinterest profile URL (null if not found)

data.mediumstring

Medium profile URL (null if not found)

data.discordstring

Discord server invite URL (null if not found)

data.whatsappstring

WhatsApp contact/business URL (null if not found)

data.telegramstring

Telegram contact/channel URL (null if not found)

data.snapchatstring

Snapchat profile URL (null if not found)

data.redditstring

Reddit profile/subreddit URL (null if not found)

Markdown Compendium (v2.14.0: SpiderMedia Storage)

data.markdown_compendiumstring

AI-generated markdown summary of the website (if enabled and included inline)

data.compendiumobject

Compendium metadata and storage information

properties

availableboolean

Whether compendium was generated

charsinteger

Character count of the compendium

cleanup_levelstring

Cleanup level used: raw, fit, citations, minimal

storage_locationstring

Where stored: spidermedia (v2.14.0+), inline, or r2 (legacy)

download_urlstring

Permanent public URL to download compendium (SpiderMedia only)

filenamestring

Filename in storage (e.g., compendiums/job-uuid.md)

size_bytesinteger

Size in bytes

content_hashstring

MD5 hash of content

estimated_tokensinteger

Estimated token count for LLM usage

AI Features (Always Present - Null If Not Enabled)

data.company_vitalsobject

Company information extracted with AI (null if extract_company_info: false)

data.pain_pointsarray

Business pain points identified by AI (null if extract_pain_points: false)

data.team_membersarray

Team members found with AI extraction (empty array if extract_team: false)

data.lead_scoringobject

CHAMP framework lead scoring (null if product/ICP not provided)

data.personalization_hooksobject

Personalization data for outreach (null if not available)

Technical Metadata

data.metadataobject

Crawl metadata and statistics including:

browser_rendering_available: Whether SPA rendering was used
spa_enabled: Whether SPA detection was enabled
sitemap_used: Whether sitemap-first crawling was used
crawl_strategy: Strategy used (sitemap, bestfirst, bfs, dfs)
total_emails_found: Total emails before filtering
total_phones_found: Total phone numbers found

SpiderMaps Data Fields

Basic Information

data.querystring

Search query used for the scrape

data.results_countinteger

Number of business listings returned

data.businessesarray

Array of business listings (see structure below)

data.metadataobject

Search metadata (max_results, extract_reviews, language, etc.)

Business Listing Structure

Each business in the businesses array contains:

namestring

Business name

place_idstring

Google Place ID

ratingnumber

Average rating (1.0-5.0)

reviews_countinteger

Number of reviews

addressstring

Full street address

phonestring

Phone number

websitestring

Business website URL

categoriesarray

Business categories/types

coordinatesobject

Latitude and longitude coordinates

linkstring

Google Maps link to the business

business_statusstring

Status: OPERATIONAL, CLOSED_TEMPORARILY, etc.

price_rangestring

Price range: $, $$, $$$, or $$$$

working_hoursobject

Working hours by day of week

Example Request

cURL
Python
JavaScript

curl https://spideriq.ai/api/v1/jobs/550e8400-e29b-41d4-a716-446655440000/results \
  -H "Authorization: Bearer <your_token>"

import requests

job_id = "550e8400-e29b-41d4-a716-446655440000"
url = f"https://spideriq.ai/api/v1/jobs/{job_id}/results"
headers = {
    "Authorization": "Bearer <your_token>"
}

response = requests.get(url, headers=headers)

if response.status_code == 200:
    print("Job completed!")
    print(response.json())
elif response.status_code == 202:
    print("Job still processing, poll again later")
elif response.status_code == 410:
    print("Job failed or was cancelled")
    print(response.json())

const jobId = '550e8400-e29b-41d4-a716-446655440000';
const response = await fetch(
  `https://spideriq.ai/api/v1/jobs/${jobId}/results`,
  {
    headers: {
      'Authorization': 'Bearer <your_token>'
    }
  }
);

if (response.status === 200) {
  const data = await response.json();
  console.log('Job completed!', data);
} else if (response.status === 202) {
  console.log('Job still processing');
} else if (response.status === 410) {
  const error = await response.json();
  console.log('Job failed:', error);
}

Example Responses

Basic contact extraction without AI features:

{
  "success": true,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "spiderSite",
  "status": "completed",
  "processing_time_seconds": 12.4,
  "worker_id": "spider-site-main-1",
  "completed_at": "2025-10-27T14:30:15Z",
  "message": null,
  "data": {
    "url": "https://example.com",
    "pages_crawled": 5,
    "crawl_status": "success",
    "emails": ["contact@example.com", "sales@example.com"],
    "phones": ["+1-555-123-4567"],
    "addresses": ["123 Main St, San Francisco, CA 94105"],
    "linkedin": "https://linkedin.com/company/example",
    "twitter": "https://twitter.com/example",
    "facebook": "https://facebook.com/example",
    "instagram": null,
    "youtube": null,
    "github": "https://github.com/example",
    "tiktok": null,
    "pinterest": null,
    "medium": null,
    "discord": null,
    "whatsapp": null,
    "telegram": null,
    "snapchat": null,
    "reddit": null,
    "markdown_compendium": "# Example Company\n\nLeading provider of...",
    "compendium": {
      "available": true,
      "chars": 8450,
      "cleanup_level": "fit",
      "storage_location": "spidermedia",
      "download_url": "https://media.spideriq.ai/client-xxx/compendiums/550e8400-e29b-41d4-a716-446655440000.md",
      "filename": "compendiums/550e8400-e29b-41d4-a716-446655440000.md",
      "size_bytes": 8450,
      "content_hash": "a1b2c3d4e5f6...",
      "estimated_tokens": 2100
    },
    "company_vitals": null,
    "pain_points": null,
    "lead_scoring": null,
    "team_members": [],
    "personalization_hooks": null,
    "metadata": {
      "spa_enabled": true,
      "sitemap_used": true,
      "browser_rendering_available": true,
      "crawl_strategy": "sitemap",
      "total_emails_found": 2,
      "total_phones_found": 1
    }
  },
  "error_message": null
}

Request with AI company and team extraction enabled:

{
  "success": true,
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "type": "spiderSite",
  "status": "completed",
  "processing_time_seconds": 18.7,
  "worker_id": "spider-site-main-2",
  "completed_at": "2025-10-27T14:35:22Z",
  "message": null,
  "data": {
    "url": "https://techstart.com",
    "pages_crawled": 12,
    "crawl_status": "success",
    "emails": ["contact@techstart.com", "sales@techstart.com"],
    "phones": ["+1-555-987-6543"],
    "addresses": ["456 Tech Ave, Palo Alto, CA 94301"],
    "linkedin": "https://linkedin.com/company/techstart",
    "twitter": "https://twitter.com/techstart",
    "facebook": null,
    "instagram": "https://instagram.com/techstart",
    "youtube": "https://youtube.com/techstart",
    "github": "https://github.com/techstart",
    "tiktok": null,
    "pinterest": null,
    "medium": "https://medium.com/@techstart",
    "discord": null,
    "whatsapp": null,
    "telegram": null,
    "snapchat": null,
    "reddit": null,
    "markdown_compendium": "# TechStart Solutions\n\nAI-powered customer support...",
    "compendium": {
      "available": true,
      "chars": 45230,
      "cleanup_level": "fit",
      "storage_location": "spidermedia",
      "download_url": "https://media.spideriq.ai/client-xxx/compendiums/660e8400-e29b-41d4-a716-446655440001.md",
      "filename": "compendiums/660e8400-e29b-41d4-a716-446655440001.md",
      "size_bytes": 45230,
      "content_hash": "b2c3d4e5f6g7...",
      "estimated_tokens": 11300
    },
    "company_vitals": {
      "name": "TechStart Solutions",
      "summary": "AI-powered customer support automation for SaaS companies",
      "industry": "B2B SaaS",
      "services": ["AI Chatbots", "Support Ticket Automation", "Customer Analytics"],
      "target_audience": "Mid-market SaaS companies with 50-500 employees"
    },
    "pain_points": [
      "Struggling to scale customer support operations",
      "High ticket resolution times impacting customer satisfaction"
    ],
    "lead_scoring": null,
    "team_members": [
      {
        "name": "Sarah Johnson",
        "title": "CEO & Founder",
        "email": "sarah@techstart.com",
        "linkedin": "https://linkedin.com/in/sarahjohnson"
      },
      {
        "name": "Mike Chen",
        "title": "VP of Sales",
        "email": "mike@techstart.com",
        "linkedin": "https://linkedin.com/in/mikechen"
      }
    ],
    "personalization_hooks": null,
    "metadata": {
      "spa_enabled": true,
      "sitemap_used": true,
      "browser_rendering_available": true,
      "crawl_strategy": "sitemap",
      "total_emails_found": 2,
      "total_phones_found": 1
    }
  },
  "error_message": null
}

Complete lead scoring with CHAMP framework:

{
  "success": true,
  "job_id": "770e8400-e29b-41d4-a716-446655440002",
  "type": "spiderSite",
  "status": "completed",
  "processing_time_seconds": 24.1,
  "worker_id": "spider-site-main-3",
  "completed_at": "2025-10-27T14:40:30Z",
  "message": null,
  "data": {
    "url": "https://techstart.com",
    "pages_crawled": 15,
    "crawl_status": "success",
    "emails": ["contact@techstart.com", "sales@techstart.com"],
    "phones": ["+1-555-987-6543"],
    "addresses": ["456 Tech Ave, Palo Alto, CA 94301"],
    "linkedin": "https://linkedin.com/company/techstart",
    "twitter": "https://twitter.com/techstart",
    "facebook": null,
    "instagram": "https://instagram.com/techstart",
    "youtube": "https://youtube.com/techstart",
    "github": "https://github.com/techstart",
    "tiktok": null,
    "pinterest": null,
    "medium": "https://medium.com/@techstart",
    "discord": null,
    "whatsapp": null,
    "telegram": null,
    "snapchat": null,
    "reddit": null,
    "markdown_compendium": "# TechStart Solutions\n\nAI-powered customer support...",
    "compendium": {
      "available": true,
      "chars": 52340,
      "cleanup_level": "fit",
      "storage_location": "spidermedia",
      "download_url": "https://media.spideriq.ai/client-xxx/compendiums/770e8400-e29b-41d4-a716-446655440002.md",
      "filename": "compendiums/770e8400-e29b-41d4-a716-446655440002.md",
      "size_bytes": 52340,
      "content_hash": "c3d4e5f6g7h8...",
      "estimated_tokens": 13085
    },
    "company_vitals": {
      "name": "TechStart Solutions",
      "summary": "AI-powered customer support automation for SaaS companies",
      "industry": "B2B SaaS",
      "services": ["AI Chatbots", "Support Ticket Automation", "Customer Analytics"],
      "target_audience": "Mid-market SaaS companies with 50-500 employees"
    },
    "pain_points": [
      "Struggling to scale customer support operations",
      "High ticket resolution times impacting customer satisfaction",
      "Manual ticket triage consuming 40% of support team time"
    ],
    "lead_scoring": {
      "icp_fit_score": 0.85,
      "champ_analysis": {
        "challenges": [
          "Manual ticket triage consuming 40% of support team time",
          "Customer complaints about slow response times increasing CSAT risk"
        ],
        "authority": "VP of Customer Success (Mike Chen) leads vendor selection, CEO approval required for deals >$50k",
        "money": "Series B funded ($20M raised in 2024), actively allocating budget for Q1 2025 operational efficiency tools",
        "prioritization": "High - support automation listed as top Q1 2025 initiative in recent blog post"
      }
    },
    "team_members": [
      {
        "name": "Sarah Johnson",
        "title": "CEO & Founder",
        "email": "sarah@techstart.com",
        "linkedin": "https://linkedin.com/in/sarahjohnson"
      },
      {
        "name": "Mike Chen",
        "title": "VP of Sales",
        "email": "mike@techstart.com",
        "linkedin": "https://linkedin.com/in/mikechen"
      }
    ],
    "personalization_hooks": {
      "company_name": "TechStart Solutions",
      "decision_maker": "Mike Chen (VP of Customer Success)",
      "key_challenge": "Manual ticket triage consuming 40% of support team time",
      "urgency_signal": "Q1 2025 initiative mentioned in recent blog",
      "personalization_angle": "Show how automation can free up 40% of support team capacity"
    },
    "metadata": {
      "spa_enabled": true,
      "sitemap_used": true,
      "browser_rendering_available": true,
      "crawl_strategy": "sitemap",
      "total_emails_found": 2,
      "total_phones_found": 1
    }
  },
  "error_message": null
}

Google Maps business listings:

{
  "success": true,
  "job_id": "880e8400-e29b-41d4-a716-446655440003",
  "type": "spiderMaps",
  "status": "completed",
  "processing_time_seconds": 32.5,
  "worker_id": "spider-maps-main-1",
  "completed_at": "2025-10-27T14:45:10Z",
  "message": null,
  "data": {
    "query": "italian restaurants in Boston",
    "results_count": 20,
    "businesses": [
      {
        "name": "Mamma Maria",
        "place_id": "0x89e3709876543210:0xabcdef1234567890",
        "rating": 4.6,
        "reviews_count": 823,
        "address": "3 North Square, Boston, MA 02113",
        "phone": "+1-617-523-9077",
        "website": "https://www.mammamaria.com/",
        "categories": ["Italian restaurant", "Fine dining"],
        "coordinates": {
          "latitude": 42.3647,
          "longitude": -71.0542
        },
        "link": "https://www.google.com/maps/place/Mamma+Maria/...",
        "business_status": "OPERATIONAL",
        "price_range": "$$$",
        "working_hours": {
          "Monday": "5:00-10:00 PM",
          "Tuesday": "5:00-10:00 PM",
          "Wednesday": "5:00-10:00 PM",
          "Thursday": "5:00-10:00 PM",
          "Friday": "5:00-11:00 PM",
          "Saturday": "5:00-11:00 PM",
          "Sunday": "5:00-10:00 PM"
        }
      }
    ],
    "metadata": {
      "max_results": 20,
      "extract_reviews": true,
      "extract_photos": false,
      "language": "en"
    }
  },
  "error_message": null
}

Job still being processed:

{
  "success": false,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "spiderSite",
  "status": "processing",
  "processing_time_seconds": null,
  "worker_id": null,
  "completed_at": null,
  "message": "Job is still being processed. Please poll again in a few seconds.",
  "data": null,
  "error_message": null
}

Job failed with error:

{
  "success": false,
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "type": "spiderSite",
  "status": "failed",
  "processing_time_seconds": 5.2,
  "worker_id": "spider-site-main-1",
  "completed_at": "2025-10-27T14:50:00Z",
  "message": null,
  "data": null,
  "error_message": "Failed to connect to target URL: Connection timeout after 30s"
}

Job ID does not exist:

{
  "detail": "Job not found"
}

Handling Different Status Codes

Python - Complete Flow
JavaScript - Complete Flow

import requests
import time

def get_job_results(job_id, auth_token, max_retries=60):
    """Get job results with automatic polling"""
    url = f"https://spideriq.ai/api/v1/jobs/{job_id}/results"
    headers = {"Authorization": f"Bearer {auth_token}"}

    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code == 200:
            # Success - return results
            return response.json()

        elif response.status_code == 202:
            # Still processing - wait and retry
            print(f"Job processing... (attempt {attempt + 1}/{max_retries})")
            time.sleep(3)
            continue

        elif response.status_code == 410:
            # Job failed
            error_data = response.json()
            raise Exception(f"Job failed: {error_data.get('error')}")

        elif response.status_code == 404:
            raise Exception("Job not found")

        else:
            response.raise_for_status()

    raise TimeoutError("Job did not complete within maximum retries")

# Usage
try:
    results = get_job_results(
        "550e8400-e29b-41d4-a716-446655440000",
        "<your_token>"
    )
    print("Results:", results["data"])
except Exception as e:
    print(f"Error: {e}")

async function getJobResults(jobId, authToken, maxRetries = 60) {
  const url = `https://spideriq.ai/api/v1/jobs/${jobId}/results`;
  const headers = {
    'Authorization': `Bearer ${authToken}`
  };

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, { headers });

    if (response.status === 200) {
      // Success - return results
      return await response.json();
    }

    if (response.status === 202) {
      // Still processing - wait and retry
      console.log(`Job processing... (attempt ${attempt + 1}/${maxRetries})`);
      await new Promise(resolve => setTimeout(resolve, 3000));
      continue;
    }

    if (response.status === 410) {
      // Job failed
      const error = await response.json();
      throw new Error(`Job failed: ${error.error}`);
    }

    if (response.status === 404) {
      throw new Error('Job not found');
    }

    throw new Error(`Unexpected status: ${response.status}`);
  }

  throw new Error('Job did not complete within maximum retries');
}

// Usage
try {
  const results = await getJobResults(
    '550e8400-e29b-41d4-a716-446655440000',
    '<your_token>'
  );
  console.log('Results:', results.data);
} catch (error) {
  console.error('Error:', error.message);
}

Data Storage

info

Screenshot Storage: SpiderSite job screenshots are stored in Cloudflare R2 and accessible via CDN at cdn.spideriq.ai. URLs are permanent and do not expire.

Best Practices

warning

Don't poll too frequently: Respect the 100 requests/minute rate limit. Poll every 3-5 seconds for optimal balance between responsiveness and rate limit compliance.

tip

Save job IDs: Store job IDs in your database to retrieve results later. Results remain available indefinitely.

Create a new scraping job

Monitor job progress

View all your jobs

Cancel a running job

Get Job Results

Overview

Path Parameters

Query Parameters

Response Status Codes

Response Structure

Top-Level Response Fields

SpiderSite Data Fields

Basic Information

Contact Information (Flat - Top Level)

Markdown Compendium (v2.14.0: SpiderMedia Storage)

AI Features (Always Present - Null If Not Enabled)

Technical Metadata

SpiderMaps Data Fields

Basic Information

Business Listing Structure

Example Request

Example Responses

Handling Different Status Codes

Data Storage

Best Practices

Submit Job

Check Status

List Jobs

Cancel Job

Overview​

Path Parameters​

Query Parameters​

Response Status Codes​

Response Structure​

Top-Level Response Fields​

SpiderSite Data Fields​

Basic Information​

Contact Information (Flat - Top Level)​

Social Media Profiles (All Flat - Top Level)​

Markdown Compendium (v2.14.0: SpiderMedia Storage)​

AI Features (Always Present - Null If Not Enabled)​

Technical Metadata​

SpiderMaps Data Fields​

Basic Information​

Business Listing Structure​

Example Request​

Example Responses​

Handling Different Status Codes​

Data Storage​

Best Practices​

Related Endpoints​

Submit Job

Check Status

List Jobs

Cancel Job

Overview

Path Parameters

Query Parameters

Response Status Codes

Response Structure

Top-Level Response Fields

SpiderSite Data Fields

Basic Information

Contact Information (Flat - Top Level)

Social Media Profiles (All Flat - Top Level)

Markdown Compendium (v2.14.0: SpiderMedia Storage)

AI Features (Always Present - Null If Not Enabled)

Technical Metadata

SpiderMaps Data Fields

Basic Information

Business Listing Structure

Example Request

Example Responses

Handling Different Status Codes

Data Storage

Best Practices

Related Endpoints