Overview
SpiderFuzzer is SpiderIQ’s built-in deduplication system that prevents duplicate records across your scraping campaigns. Available since v2.18.0, SpiderFuzzer provides:- Per-client isolation - Your data is stored in a separate PostgreSQL schema
- Automatic deduplication - Enable via payload flags on any job type
- Standalone API - Check and manage records directly via dedicated endpoints
- Multi-field matching - Match on email, google_place_id, linkedin_url, phone, or domain
Two Ways to Use SpiderFuzzer
1. Automatic Mode (In Jobs)
Addfuzziq_enabled: true to any job payload and records are automatically checked:
fuzziq_unique flag on each record:
2. Standalone API
Use SpiderFuzzer endpoints directly for custom workflows:| Endpoint | Use Case |
|---|---|
POST /fuzziq/check | Check single record |
POST /fuzziq/check-batch | Check up to 100 records |
POST /fuzziq/canonical/import | Bulk import up to 1000 records |
GET /fuzziq/canonical | List your canonical records |
GET /fuzziq/canonical/stats | Get database statistics |
Record Types
SpiderFuzzer supports four record types, each with different matching priorities:Business
Google Maps businessesKey fields: google_place_id, company_name, phoneBest for: SpiderMaps results
Contact
People/contactsKey fields: email, full_name, linkedin_urlBest for: SpiderSite extracted contacts
Email-only recordsKey fields: emailBest for: SpiderVerify results
Profile
LinkedIn profilesKey fields: linkedin_url, full_nameBest for: SpiderPeople results
Match Types
SpiderFuzzer checks fields in priority order (first match wins):| Priority | Field | Description |
|---|---|---|
| 1 | google_place_id | Exact match (best for businesses) |
| 2 | email | Exact match (normalized, lowercase) |
| 3 | linkedin_url | Exact match (normalized) |
| 4 | phone | Exact match (normalized, digits only) |
| 5 | company_domain | Exact match |
| 6 | exact_hash | SHA256 hash of all normalized fields |
confidence: 1.0 (exact matching only, no fuzzy matching yet).
Common Workflows
Deduplicate Multi-Location Campaigns
When scraping the same business type across multiple cities, the same chains appear in each location:Pre-seed from CRM
Before running campaigns, import your existing customers so they’re automatically excluded:Check Before Expensive Operations
Use SpiderFuzzer batch check to filter records before SpiderSite or SpiderVerify:Job Payload Options
Add these to any job payload:| Parameter | Type | Default | Description |
|---|---|---|---|
fuzziq_enabled | boolean | client default | Enable FuzzIQ for this job |
fuzziq_unique_only | boolean | false | Only return unique records |
Data Isolation
Each client has a completely isolated PostgreSQL schema:API Reference
Check Single
Check one record for duplicates
Check Batch
Check up to 100 records
List Records
Browse your canonical database
Add Record
Manually add a record
Bulk Import
Import up to 1000 records
Statistics
View database stats
Best Practices
Seed your database before campaigns
Seed your database before campaigns
Import existing customers, competitors, or blocked domains before running campaigns. This prevents wasting resources on records you already have.
Use batch check for efficiency
Use batch check for efficiency
Instead of checking records one by one, use
check-batch with up to 100 records per request. For imports, use canonical/import with up to 1000 records.Enable fuzziq_unique_only for clean results
Enable fuzziq_unique_only for clean results
When you only want new records, set
fuzziq_unique_only: true in your job payload. This filters duplicates server-side so you only process unique data.Use idempotency keys for retries
Use idempotency keys for retries
When using
check-batch, include an idempotency_key to safely retry failed requests without creating duplicate canonical entries.SpiderFuzzer is automatically enabled for all clients. Contact support if you need to adjust your settings or have questions about your canonical database.
