Skip to main content

Guides Overview

Welcome to SpiderIQ Guides​

This section provides comprehensive guides and tutorials to help you get the most out of SpiderIQ's web scraping and Google Maps data extraction capabilities.

What is SpiderIQ?​

SpiderIQ is a high-performance API service that provides four specialized capabilities:

🌍

SpiderSite

Website Scraping

Extract content from any website using the Crawl4AI library with optional AI-powered data extraction.

  • Full-page markdown conversion
  • AI-powered content extraction
  • Screenshot capture
  • Metadata extraction
πŸ—ΊοΈ

SpiderMaps

Google Maps Scraping

Extract business information from Google Maps using Playwright browser automation.

  • Business details (name, address, phone)
  • Reviews and ratings
  • Business hours
  • Categories and photos
  • Campaign System (v2.14.0): Multi-location orchestration
πŸ“§

SpiderVerify

Email Verification

Verify email addresses at the SMTP level without sending actual emails.

  • Deliverability checking
  • Disposable email detection
  • Role account identification
  • Quality scoring (0-100)
πŸ‘€

SpiderPeople

Decision Maker Discovery (v2.17.0)

Find the right people behind companies using ICP-based search.

  • Natural language search by role + location
  • Profile lookup by LinkedIn URL
  • AI research reports
  • Experience & education data
🌐

SpiderBrowser

Anti-Detect Browser Management (v2.29.0)

Manage persistent authenticated browser sessions at scale with Camoufox.

  • C++-level fingerprint spoofing
  • VNC web access for manual login/CAPTCHA
  • Cookie export (Netscape format for yt-dlp)
  • SpiderProxy mobile IP integration
  • Profile warmup automation
🏒

SpiderCompanyData

Company Data Enrichment (v2.36.0)

Enrich leads with official company data from government registries.

  • US SEC EDGAR (public companies)
  • UK Companies House (5M+ companies)
  • EU VIES VAT validation
  • Directors and officers data
  • 24-hour caching

Available Guides​

v2.18.0: SpiderFuzzer Deduplication​

🚫

SpiderFuzzer Deduplication

Automatic record deduplication across all job types with per-client data isolation.

  • Per-Record Unique Flag: Each record marked with fuzziq_unique: true/false
  • Standalone API: Check and manage records via /api/v1/fuzziq/* endpoints
  • Response Filtering: Use fuzziq_unique_only: true to return only new records
  • Isolated Schemas: Separate PostgreSQL schemas per client for complete data isolation

v2.15.0: Orchestrated Campaigns​

Common Use Cases​

Content Aggregation​

Extract articles, blog posts, and documentation from multiple sources for content analysis or aggregation platforms.

Example: News monitoring, competitor content analysis, research aggregation

E-commerce Data​

Scrape product information, prices, and reviews from e-commerce sites for price monitoring or market research.

Example: Price comparison tools, inventory monitoring, product catalog building

Local Business Research​

Extract business information from Google Maps for lead generation, market research, or directory creation.

Example: B2B prospecting, competitive analysis, local SEO research

Real Estate & Property Data​

Gather property listings, prices, and details for real estate analysis and market trends.

Example: Property aggregators, market analysis tools, investment research

Job Board Aggregation​

Collect job postings from multiple sources to create comprehensive job search platforms.

Example: Job aggregators, salary analysis, hiring trend research

How SpiderIQ Works​

Processing Flow​

  1. Submit - Client submits a job via API
  2. Queue - Job is queued for processing
  3. Process - Available worker picks up and processes the job
  4. Store - Results are saved (screenshots to Cloudflare R2, data to Database)
  5. Retrieve - Client polls for results and receives data

Architecture​

SpiderIQ is built on a scalable, distributed architecture:

  • API Gateway - FastAPI-based REST API
  • Message Queue - Job distribution system
  • Workers - Distributed scraping workers (Docker containers)
  • Database - Database for job metadata and results
  • Cache - Redis for performance optimization
  • CDN Storage - Cloudflare R2 for screenshots

Worker Types​

  • SpiderSite Workers - 70 workers for website scraping
  • SpiderMaps Workers - 42 workers for Google Maps scraping
  • SpiderVerify Workers - 10 workers for email verification
  • SpiderPeople Workers - 1 worker for LinkedIn research
  • SpiderBrowser Workers - 1 worker for anti-detect browser management

Performance & Limits​

Rate Limits​

info

Standard Rate Limit: 100 requests per minute per client

Burst allowance of 20 requests for occasional spikes. Contact us for higher limits.

Processing Times​

Job TypeAverage TimeRange
SpiderSite (simple page)5-15s3-30s
SpiderSite (with AI)10-25s5-45s
SpiderMaps3-8s2-15s
SpiderVerify (single)2-5s1-10s
SpiderVerify (bulk 100)30-60s20-120s
SpiderPeople (profile)5-10s3-15s
SpiderPeople (search)5-15s3-20s
SpiderPeople (research)15-30s10-45s

Queue Capacity​

  • Normal load: < 20 jobs queued
  • Moderate load: 20-50 jobs queued
  • High load: > 50 jobs queued

Use the Queue Stats endpoint to monitor current load.

Best Practices​

tip

Poll efficiently: Use 2-5 second intervals when polling for results to balance responsiveness and rate limit compliance.

tip

Handle rate limits: Implement exponential backoff when you receive 429 (Too Many Requests) responses.

tip

Check queue load: Use /system/queue-stats before submitting bulk jobs to avoid overwhelming the queue.

tip

Store job IDs: Save job IDs in your database to retrieve results later if needed.

warning

Respect robots.txt: While SpiderIQ can scrape most sites, ensure you have permission and respect robots.txt directives.

Need Help?​

Next Steps​

Get Credentials

Contact admin@spideriq.ai to get your API credentials

Read the Quickstart

Follow our 5-minute quickstart guide to submit your first job

Explore Guides

Learn about website scraping and explore the API reference

Build Your Integration

Use the API reference to build your integration