Guides Overview

Welcome to SpiderIQ Guides

This section provides comprehensive guides and tutorials to help you get the most out of SpiderIQ's web scraping and Google Maps data extraction capabilities.

What is SpiderIQ?

SpiderIQ is a high-performance API service that provides four specialized capabilities:

Website Scraping

Extract content from any website using the Crawl4AI library with optional AI-powered data extraction.

Full-page markdown conversion
AI-powered content extraction
Screenshot capture
Metadata extraction

Google Maps Scraping

Extract business information from Google Maps using Playwright browser automation.

Business details (name, address, phone)
Reviews and ratings
Business hours
Categories and photos
Campaign System (v2.14.0): Multi-location orchestration

Email Verification

Verify email addresses at the SMTP level without sending actual emails.

Deliverability checking
Disposable email detection
Role account identification
Quality scoring (0-100)

Decision Maker Discovery (v2.17.0)

Find the right people behind companies using ICP-based search.

Natural language search by role + location
Profile lookup by LinkedIn URL
AI research reports
Experience & education data

Anti-Detect Browser Management (v2.29.0)

Manage persistent authenticated browser sessions at scale with Camoufox.

C++-level fingerprint spoofing
VNC web access for manual login/CAPTCHA
Cookie export (Netscape format for yt-dlp)
SpiderProxy mobile IP integration
Profile warmup automation

Company Data Enrichment (v2.36.0)

Enrich leads with official company data from government registries.

US SEC EDGAR (public companies)
UK Companies House (5M+ companies)
EU VIES VAT validation
Directors and officers data
24-hour caching

Quick Links

Submit your first job in 5 minutes

Learn about API authentication

Complete API documentation

Building & Deploying

AI Agent Site Builder

Build and deploy websites, blogs, and personalized landing pages with AI agents — via MCP, CLI, or API. Includes AGENTS.md, CLAUDE.md, and deploy guide.

Headless CMS API

Pages, posts, docs, navigation, templates, and Shadow DOM components. Multi-tenant with domain-based isolation.

Data Collection Guides

Complete guide to website scraping with SpiderSite

Extract business data from Google Maps

Verify emails via SMTP without sending

Research LinkedIn profiles with AI insights

Manage anti-detect browser profiles with VNC access, automated login, and cookie export

Enrich leads with official company data from US, UK, and EU registries

v2.18.0: SpiderFuzzer Deduplication

Automatic record deduplication across all job types with per-client data isolation.

Per-Record Unique Flag: Each record marked with fuzziq_unique: true/false
Standalone API: Check and manage records via /api/v1/fuzziq/* endpoints
Response Filtering: Use fuzziq_unique_only: true to return only new records
Isolated Schemas: Separate PostgreSQL schemas per client for complete data isolation

v2.15.0: Orchestrated Campaigns

Chain SpiderMaps + SpiderSite + SpiderVerify in a single workflow

Build lead gen systems with Xano no-code backend

Automate campaigns with n8n workflow automation

Common Use Cases

Content Aggregation

Extract articles, blog posts, and documentation from multiple sources for content analysis or aggregation platforms.

Example: News monitoring, competitor content analysis, research aggregation

E-commerce Data

Scrape product information, prices, and reviews from e-commerce sites for price monitoring or market research.

Example: Price comparison tools, inventory monitoring, product catalog building

Local Business Research

Extract business information from Google Maps for lead generation, market research, or directory creation.

Example: B2B prospecting, competitive analysis, local SEO research

Real Estate & Property Data

Gather property listings, prices, and details for real estate analysis and market trends.

Example: Property aggregators, market analysis tools, investment research

Job Board Aggregation

Collect job postings from multiple sources to create comprehensive job search platforms.

Example: Job aggregators, salary analysis, hiring trend research

How SpiderIQ Works

Processing Flow

Submit - Client submits a job via API
Queue - Job is queued for processing
Process - Available worker picks up and processes the job
Store - Results are saved (screenshots to Cloudflare R2, data to Database)
Retrieve - Client polls for results and receives data

Architecture

SpiderIQ is built on a scalable, distributed architecture:

API Gateway - FastAPI-based REST API
Message Queue - Job distribution system
Workers - Distributed scraping workers (Docker containers)
Database - Database for job metadata and results
Cache - Redis for performance optimization
CDN Storage - Cloudflare R2 for screenshots

Worker Types

SpiderSite Workers - 70 workers for website scraping
SpiderMaps Workers - 42 workers for Google Maps scraping
SpiderVerify Workers - 10 workers for email verification
SpiderPeople Workers - 1 worker for LinkedIn research
SpiderBrowser Workers - 1 worker for anti-detect browser management

Performance & Limits

Rate Limits

info

Standard Rate Limit: 100 requests per minute per client

Burst allowance of 20 requests for occasional spikes. Contact us for higher limits.

Processing Times

Job Type	Average Time	Range
SpiderSite (simple page)	5-15s	3-30s
SpiderSite (with AI)	10-25s	5-45s
SpiderMaps	3-8s	2-15s
SpiderVerify (single)	2-5s	1-10s
SpiderVerify (bulk 100)	30-60s	20-120s
SpiderPeople (profile)	5-10s	3-15s
SpiderPeople (search)	5-15s	3-20s
SpiderPeople (research)	15-30s	10-45s

Queue Capacity

Normal load: < 20 jobs queued
Moderate load: 20-50 jobs queued
High load: > 50 jobs queued

Use the Queue Stats endpoint to monitor current load.

Best Practices

tip

Poll efficiently: Use 2-5 second intervals when polling for results to balance responsiveness and rate limit compliance.

tip

Handle rate limits: Implement exponential backoff when you receive 429 (Too Many Requests) responses.

tip

Check queue load: Use /system/queue-stats before submitting bulk jobs to avoid overwhelming the queue.

tip

Store job IDs: Save job IDs in your database to retrieve results later if needed.

warning

Respect robots.txt: While SpiderIQ can scrape most sites, ensure you have permission and respect robots.txt directives.

Need Help?

Complete API documentation with all endpoints

Contact our support team

Check API health and queue stats

Request API credentials

Next Steps

Get Credentials

Contact admin@spideriq.ai to get your API credentials

Read the Quickstart

Follow our 5-minute quickstart guide to submit your first job

Explore Guides

Learn about website scraping and explore the API reference

Build Your Integration

Use the API reference to build your integration

Welcome to SpiderIQ Guides​

What is SpiderIQ?​

SpiderSite

SpiderMaps

SpiderVerify

SpiderPeople

SpiderBrowser

SpiderCompanyData

Quick Links​

Getting Started

Authentication

API Reference

Building & Deploying​

Site Builder

Content Platform

Data Collection Guides​

Scraping Websites

Google Maps

Email Verification

People Research

Browser Automation

Company Data

v2.18.0: SpiderFuzzer Deduplication​

SpiderFuzzer Deduplication

v2.15.0: Orchestrated Campaigns​

Orchestrated Campaigns

Xano Integration

n8n Integration

Common Use Cases​

Content Aggregation​

E-commerce Data​

Local Business Research​

Real Estate & Property Data​

Job Board Aggregation​

How SpiderIQ Works​

Processing Flow​

Architecture​

Worker Types​

Performance & Limits​

Rate Limits​

Processing Times​

Queue Capacity​

Best Practices​

Need Help?​