Overview

SpiderPublicInstagram allows you to extract public profile data from Instagram without requiring login credentials. This is useful for:
  • Lead Enrichment: Add Instagram presence and contact info to existing leads
  • Influencer Research: Build databases with verified follower counts and engagement metrics
  • Contact Discovery: Extract business emails and phone numbers from profiles
  • Brand Monitoring: Track competitor Instagram presence
No Login RequiredSpiderPublicInstagram uses Instagram’s public web API endpoint. It does not require Instagram login credentials, making it safe and compliant for public data extraction.

Quick Start

1. Submit a Profile Scraping Job

curl -X POST "https://spideriq.ai/api/v1/jobs/spiderPublicInstagram/submit" \
  -H "Authorization: Bearer $CLIENT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "payload": {
      "username": "natgeo"
    }
  }'

2. Check Job Status

curl "https://spideriq.ai/api/v1/jobs/{job_id}/status" \
  -H "Authorization: Bearer $CLIENT_TOKEN"

3. Get Results

curl "https://spideriq.ai/api/v1/jobs/{job_id}/results" \
  -H "Authorization: Bearer $CLIENT_TOKEN"

Input Formats

SpiderPublicInstagram accepts various input formats:
{
  "payload": {
    "username": "natgeo"
  }
}

What Data Can You Extract?

Profile Information

FieldDescriptionAlways Available
usernameInstagram handleYes
full_nameDisplay nameYes
bioProfile biographyPublic profiles only
external_urlWebsite linkIf configured
profile_pic_urlProfile image URLYes

Engagement Metrics

FieldDescription
follower_countNumber of followers
following_countNumber following
post_countTotal posts

Account Type Flags

FieldDescription
is_business_accountBusiness account
is_professional_accountCreator/professional
is_verifiedBlue checkmark
is_privatePrivate profile

Business Information (Business Accounts Only)

FieldDescription
business_categoryCategory (e.g., “Restaurant”)
business_emailContact email
business_phoneContact phone

Extracted Contacts

FieldDescription
bio_emailsEmails found in bio text
bio_phonesPhone numbers found in bio text

Contact Extraction

SpiderPublicInstagram extracts contact information from two sources:

1. Business Profile Settings

Business accounts can configure contact information in their profile settings. This appears as:
  • business_email: Official contact email
  • business_phone: Official contact phone

2. Bio Text Parsing

Many users include contact information directly in their bio text. SpiderPublicInstagram uses regex patterns to extract:
  • Email addresses: Standard email format detection
  • Phone numbers: US format, international format, and raw digits
Example bio:
Contact us: hello@company.com | +1 (555) 123-4567
Extracted:
{
  "bio_emails": ["hello@company.com"],
  "bio_phones": ["+1 (555) 123-4567"]
}
Enable Contact ExtractionContact extraction is enabled by default. To disable it, set extract_contact_from_bio: false in your payload.

Profile Image Hosting

Instagram CDN URLs can expire. SpiderPublicInstagram can upload profile images to SpiderMedia for permanent hosting:
{
  "payload": {
    "username": "natgeo",
    "store_profile_image": true
  }
}
Response includes both URLs:
{
  "profile_pic_url": "https://scontent-xxx.cdninstagram.com/...",
  "profile_pic_url_hosted": "https://media.spideriq.ai/client-xxx/instagram_profile_natgeo.jpg"
}
URL TypeProsCons
profile_pic_urlOriginal qualityMay expire
profile_pic_url_hostedPermanent, fast CDNStored in your quota

Batch Processing

For processing multiple profiles, submit jobs in a loop:
import requests
import time

profiles = ["natgeo", "nasa", "google", "microsoft"]
job_ids = []

for username in profiles:
    response = requests.post(
        "https://spideriq.ai/api/v1/jobs/spiderPublicInstagram/submit",
        headers={
            "Authorization": f"Bearer {token}",
            "Content-Type": "application/json"
        },
        json={"payload": {"username": username}}
    )
    job_ids.append(response.json()["job_id"])
    time.sleep(1)  # Small delay between submissions

print(f"Submitted {len(job_ids)} jobs")

Combining with Other Workers

Instagram → SpiderSite Pipeline

Extract Instagram data, then scrape the linked website:
# Step 1: Get Instagram profile
instagram_job = submit_instagram_job("company_handle")
instagram_data = wait_for_results(instagram_job["job_id"])

# Step 2: Scrape the linked website
if instagram_data.get("external_url"):
    website_job = submit_spidersite_job(instagram_data["external_url"])
    website_data = wait_for_results(website_job["job_id"])

Campaign Workflow Integration

SpiderPublicInstagram results can be enriched alongside SpiderMaps campaigns:
  1. Run SpiderMaps campaign to discover businesses
  2. Extract Instagram URLs from business data
  3. Submit SpiderPublicInstagram jobs for each Instagram profile
  4. Merge results for comprehensive lead data

Rate Limits and Best Practices

Instagram Rate Limits

LimitValue
Requests per hour per IP~200
Built-in delay3-10 seconds

Best Practices

Instagram blocks datacenter IPs quickly. SpiderProxy mobile proxies are automatically assigned for production jobs, providing carrier-grade IP addresses.
Don’t submit more than 100-200 jobs per hour. The worker includes built-in delays, but submitting too many jobs can still trigger blocks.
Private profiles return limited data. Check is_private: true in results and handle accordingly in your application.
Always use profile_pic_url_hosted for display in your application. Instagram CDN URLs can expire or be blocked.

Error Handling

Common Errors

ErrorCauseSolution
Profile not foundUsername doesn’t existVerify username is correct
Rate limitedToo many requestsWait and retry later
IP blockedDatacenter IP detectedUse mobile proxy (automatic in production)

Retry Strategy

For rate limit errors, implement exponential backoff:
import time

def get_instagram_profile(username, max_retries=3):
    for attempt in range(max_retries):
        job = submit_job(username)
        result = wait_for_results(job["job_id"])

        if result.get("success"):
            return result["data"]

        if "rate limit" in result.get("error", "").lower():
            wait_time = (2 ** attempt) * 60  # 1, 2, 4 minutes
            time.sleep(wait_time)
        else:
            raise Exception(result.get("error"))

    raise Exception("Max retries exceeded")

Example: Lead Enrichment

Complete example enriching leads with Instagram data:
import requests

API_BASE = "https://spideriq.ai/api/v1"
TOKEN = "your_token"

def enrich_lead_with_instagram(lead):
    """Add Instagram data to a lead record."""

    instagram_handle = lead.get("instagram")
    if not instagram_handle:
        return lead

    # Submit job
    response = requests.post(
        f"{API_BASE}/jobs/spiderPublicInstagram/submit",
        headers={"Authorization": f"Bearer {TOKEN}"},
        json={"payload": {"username": instagram_handle}}
    )
    job_id = response.json()["job_id"]

    # Wait for results (with polling)
    for _ in range(30):  # Max 30 attempts
        status = requests.get(
            f"{API_BASE}/jobs/{job_id}/status",
            headers={"Authorization": f"Bearer {TOKEN}"}
        ).json()

        if status["status"] == "completed":
            results = requests.get(
                f"{API_BASE}/jobs/{job_id}/results",
                headers={"Authorization": f"Bearer {TOKEN}"}
            ).json()

            # Enrich lead with Instagram data
            lead["instagram_followers"] = results["data"]["follower_count"]
            lead["instagram_verified"] = results["data"]["is_verified"]
            lead["instagram_bio"] = results["data"]["bio"]

            # Add any discovered contacts
            if results["data"].get("business_email"):
                lead.setdefault("emails", []).append(results["data"]["business_email"])
            if results["data"].get("bio_emails"):
                lead.setdefault("emails", []).extend(results["data"]["bio_emails"])

            break

        elif status["status"] == "failed":
            lead["instagram_error"] = status.get("error")
            break

        time.sleep(2)

    return lead

# Usage
lead = {
    "company": "National Geographic",
    "instagram": "natgeo"
}

enriched_lead = enrich_lead_with_instagram(lead)
print(enriched_lead)

Next Steps