Instagram Profile Scraping - SpiderIQ by Di-Atomic

Overview

SpiderPublicInstagram allows you to extract public profile data from Instagram without requiring login credentials. This is useful for:

Lead Enrichment: Add Instagram presence and contact info to existing leads
Influencer Research: Build databases with verified follower counts and engagement metrics
Contact Discovery: Extract business emails and phone numbers from profiles
Brand Monitoring: Track competitor Instagram presence

No Login RequiredSpiderPublicInstagram uses Instagram’s public web API endpoint. It does not require Instagram login credentials, making it safe and compliant for public data extraction.

Quick Start

1. Submit a Profile Scraping Job

curl -X POST "https://spideriq.ai/api/v1/jobs/spiderPublicInstagram/submit" \
  -H "Authorization: Bearer $CLIENT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "payload": {
      "username": "natgeo"
    }
  }'

2. Check Job Status

curl "https://spideriq.ai/api/v1/jobs/{job_id}/status" \
  -H "Authorization: Bearer $CLIENT_TOKEN"

3. Get Results

curl "https://spideriq.ai/api/v1/jobs/{job_id}/results" \
  -H "Authorization: Bearer $CLIENT_TOKEN"

Input Formats

SpiderPublicInstagram accepts various input formats:

{
  "payload": {
    "username": "natgeo"
  }
}

What Data Can You Extract?

Profile Information

Field	Description	Always Available
`username`	Instagram handle	Yes
`full_name`	Display name	Yes
`bio`	Profile biography	Public profiles only
`external_url`	Website link	If configured
`profile_pic_url`	Profile image URL	Yes

Engagement Metrics

Field	Description
`follower_count`	Number of followers
`following_count`	Number following
`post_count`	Total posts

Account Type Flags

Field	Description
`is_business_account`	Business account
`is_professional_account`	Creator/professional
`is_verified`	Blue checkmark
`is_private`	Private profile

Business Information (Business Accounts Only)

Field	Description
`business_category`	Category (e.g., “Restaurant”)
`business_email`	Contact email
`business_phone`	Contact phone

Extracted Contacts

Field	Description
`bio_emails`	Emails found in bio text
`bio_phones`	Phone numbers found in bio text

Contact Extraction

SpiderPublicInstagram extracts contact information from two sources:

1. Business Profile Settings

Business accounts can configure contact information in their profile settings. This appears as:

business_email: Official contact email
business_phone: Official contact phone

2. Bio Text Parsing

Many users include contact information directly in their bio text. SpiderPublicInstagram uses regex patterns to extract:

Email addresses: Standard email format detection
Phone numbers: US format, international format, and raw digits

Example bio:

Contact us: hello@company.com | +1 (555) 123-4567

Extracted:

{
  "bio_emails": ["hello@company.com"],
  "bio_phones": ["+1 (555) 123-4567"]
}

Enable Contact ExtractionContact extraction is enabled by default. To disable it, set extract_contact_from_bio: false in your payload.

Profile Image Hosting

Instagram CDN URLs can expire. SpiderPublicInstagram can upload profile images to SpiderMedia for permanent hosting:

{
  "payload": {
    "username": "natgeo",
    "store_profile_image": true
  }
}

Response includes both URLs:

{
  "profile_pic_url": "https://scontent-xxx.cdninstagram.com/...",
  "profile_pic_url_hosted": "https://media.spideriq.ai/client-xxx/instagram_profile_natgeo.jpg"
}

URL Type	Pros	Cons
`profile_pic_url`	Original quality	May expire
`profile_pic_url_hosted`	Permanent, fast CDN	Stored in your quota

Batch Processing

For processing multiple profiles, submit jobs in a loop:

import requests
import time

profiles = ["natgeo", "nasa", "google", "microsoft"]
job_ids = []

for username in profiles:
    response = requests.post(
        "https://spideriq.ai/api/v1/jobs/spiderPublicInstagram/submit",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json"
        },
        json={"payload": {"username": username}}
    )
    job_ids.append(response.json()["job_id"])
    time.sleep(1)  # Small delay between submissions

print(f"Submitted {len(job_ids)} jobs")

Combining with Other Workers

Instagram → SpiderSite Pipeline

Extract Instagram data, then scrape the linked website:

# Step 1: Get Instagram profile
instagram_job = submit_instagram_job("company_handle")
instagram_data = wait_for_results(instagram_job["job_id"])

# Step 2: Scrape the linked website
if instagram_data.get("external_url"):
    website_job = submit_spidersite_job(instagram_data["external_url"])
    website_data = wait_for_results(website_job["job_id"])

Campaign Workflow Integration

SpiderPublicInstagram results can be enriched alongside SpiderMaps campaigns:

Run SpiderMaps campaign to discover businesses
Extract Instagram URLs from business data
Submit SpiderPublicInstagram jobs for each Instagram profile
Merge results for comprehensive lead data

Rate Limits and Best Practices

Instagram Rate Limits

Limit	Value
Requests per hour per IP	~200
Built-in delay	3-10 seconds

Best Practices

Use Mobile Proxies

Instagram blocks datacenter IPs quickly. SpiderProxy mobile proxies are automatically assigned for production jobs, providing carrier-grade IP addresses.

Respect Rate Limits

Don’t submit more than 100-200 jobs per hour. The worker includes built-in delays, but submitting too many jobs can still trigger blocks.

Handle Private Profiles

Private profiles return limited data. Check is_private: true in results and handle accordingly in your application.

Use Hosted Images

Always use profile_pic_url_hosted for display in your application. Instagram CDN URLs can expire or be blocked.

Error Handling

Common Errors

Error	Cause	Solution
Profile not found	Username doesn’t exist	Verify username is correct
Rate limited	Too many requests	Wait and retry later
IP blocked	Datacenter IP detected	Use mobile proxy (automatic in production)

Retry Strategy

For rate limit errors, implement exponential backoff:

import time

def get_instagram_profile(username, max_retries=3):
    for attempt in range(max_retries):
        job = submit_job(username)
        result = wait_for_results(job["job_id"])

        if result.get("success"):
            return result["data"]

        if "rate limit" in result.get("error", "").lower():
            wait_time = (2 ** attempt) * 60  # 1, 2, 4 minutes
            time.sleep(wait_time)
        else:
            raise Exception(result.get("error"))

    raise Exception("Max retries exceeded")

Example: Lead Enrichment

Complete example enriching leads with Instagram data:

import requests

API_BASE = "https://spideriq.ai/api/v1"
TOKEN = "your_token"

def enrich_lead_with_instagram(lead):
    """Add Instagram data to a lead record."""

    instagram_handle = lead.get("instagram")
    if not instagram_handle:
        return lead

    # Submit job
    response = requests.post(
        f"{API_BASE}/jobs/spiderPublicInstagram/submit",
        headers={"Authorization": f"Bearer {TOKEN}"},
        json={"payload": {"username": instagram_handle}}
    )
    job_id = response.json()["job_id"]

    # Wait for results (with polling)
    for _ in range(30):  # Max 30 attempts
        status = requests.get(
            f"{API_BASE}/jobs/{job_id}/status",
            headers={"Authorization": f"Bearer {TOKEN}"}
        ).json()

        if status["status"] == "completed":
            results = requests.get(
                f"{API_BASE}/jobs/{job_id}/results",
                headers={"Authorization": f"Bearer {TOKEN}"}
            ).json()

            # Enrich lead with Instagram data
            lead["instagram_followers"] = results["data"]["follower_count"]
            lead["instagram_verified"] = results["data"]["is_verified"]
            lead["instagram_bio"] = results["data"]["bio"]

            # Add any discovered contacts
            if results["data"].get("business_email"):
                lead.setdefault("emails", []).append(results["data"]["business_email"])
            if results["data"].get("bio_emails"):
                lead.setdefault("emails", []).extend(results["data"]["bio_emails"])

            break

        elif status["status"] == "failed":
            lead["instagram_error"] = status.get("error")
            break

        time.sleep(2)

    return lead

# Usage
lead = {
    "company": "National Geographic",
    "instagram": "natgeo"
}

enriched_lead = enrich_lead_with_instagram(lead)
print(enriched_lead)

Next Steps

API Reference

Complete API documentation

SpiderSite

Scrape linked websites

SpiderMaps Campaigns

Multi-location business discovery

Media Storage

Manage hosted images

Guides

​Overview

​Quick Start

​1. Submit a Profile Scraping Job

​2. Check Job Status

​3. Get Results

​Input Formats

​What Data Can You Extract?

​Profile Information

​Engagement Metrics

​Account Type Flags

​Business Information (Business Accounts Only)

​Extracted Contacts

​Contact Extraction

​1. Business Profile Settings

​2. Bio Text Parsing

​Profile Image Hosting

​Batch Processing

​Combining with Other Workers

​Instagram → SpiderSite Pipeline

​Campaign Workflow Integration

​Rate Limits and Best Practices

​Instagram Rate Limits

​Best Practices

​Error Handling

​Common Errors

​Retry Strategy

​Example: Lead Enrichment

​Next Steps

API Reference

SpiderSite

SpiderMaps Campaigns

Media Storage

Overview

Quick Start

1. Submit a Profile Scraping Job

2. Check Job Status

3. Get Results

Input Formats

What Data Can You Extract?

Profile Information

Engagement Metrics

Account Type Flags

Business Information (Business Accounts Only)

Extracted Contacts

Contact Extraction

1. Business Profile Settings

2. Bio Text Parsing

Profile Image Hosting

Batch Processing

Combining with Other Workers

Instagram → SpiderSite Pipeline

Campaign Workflow Integration

Rate Limits and Best Practices

Instagram Rate Limits

Best Practices

Error Handling

Common Errors

Retry Strategy

Example: Lead Enrichment

Next Steps