LinkedIn Company & People Data
Overview
SpiderPublicLinkedin allows you to extract LinkedIn company and people profile data using the Voyager API. This is useful for:
- B2B Lead Generation: Build databases of companies and decision-makers
- Company Research: Get company size, industry, specialties, and headquarters
- People Research: Extract experience, education, skills, and contact info
- Competitive Analysis: Monitor competitor companies and their teams
- Sales Intelligence: Find key contacts at target accounts
LinkedIn Accounts Required
Unlike SpiderPublicInstagram, SpiderPublicLinkedin requires authenticated LinkedIn accounts. You must add accounts via the admin API before submitting jobs.
Quick Start
1. Add a LinkedIn Account (Admin)
First, add at least one LinkedIn account to the pool:
curl -X POST "https://spideriq.ai/api/v1/admin/linkedin/accounts" \
-H "X-Admin-Key: $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"account_email": "your-linkedin@example.com",
"password": "your-password",
"is_new_account": true
}'
2. Submit a Job
# Get a company profile
curl -X POST "https://spideriq.ai/api/v1/jobs/spiderPublicLinkedin/submit" \
-H "Authorization: Bearer $CLIENT_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"payload": {
"mode": "get_company",
"public_id": "microsoft"
}
}'
3. Check Job Status
curl "https://spideriq.ai/api/v1/jobs/{job_id}/status" \
-H "Authorization: Bearer $CLIENT_TOKEN"
4. Get Results
curl "https://spideriq.ai/api/v1/jobs/{job_id}/results" \
-H "Authorization: Bearer $CLIENT_TOKEN"
Operation Modes
SpiderPublicLinkedin supports four operation modes:
get_company
Fetch detailed company profile including size, industry, specialties, and headquarters.
get_profile
Fetch person profile with experience, education, skills, and current position.
search_companies
Search for companies by keywords and industry filters.
search_people
Search for people by keywords, location, company, or title.
Company Profiles
Get Company by Public ID
{
"payload": {
"mode": "get_company",
"public_id": "microsoft"
}
}
Get Company by URL
{
"payload": {
"mode": "get_company",
"linkedin_url": "https://linkedin.com/company/microsoft"
}
}
What Data Can You Extract?
| Field | Description | Example |
|---|---|---|
name | Company name | Microsoft |
description | About text | "Every company has a mission..." |
website | Company website | https://microsoft.com |
industry | Primary industry | Software Development |
company_type | Legal type | Public Company |
founded_year | Year founded | 1975 |
specialties | Company specialties | ["Cloud", "AI", "Software"] |
headquarters | HQ location | {"city": "Redmond", "country": "US"} |
follower_count | LinkedIn followers | 22,000,000 |
employee_count | Approximate employees | 221,000 |
employee_range | Size bracket | "10001+" |
Company Result Example
{
"type": "company",
"public_id": "microsoft",
"name": "Microsoft",
"description": "Every company has a mission...",
"website": "https://microsoft.com",
"industry": "Software Development",
"company_type": "Public Company",
"founded_year": 1975,
"specialties": ["Business Software", "Developer Tools", "Cloud Computing"],
"headquarters": {
"city": "Redmond",
"country": "United States"
},
"follower_count": 22000000,
"employee_count": 221000,
"employee_range": "10001+",
"logo_url": "https://media.licdn.com/...",
"logo_url_hosted": "https://media.spideriq.ai/client-xxx/linkedin_company_microsoft.jpg",
"linkedin_url": "https://www.linkedin.com/company/microsoft"
}
People Profiles
Get Profile by Public ID
{
"payload": {
"mode": "get_profile",
"public_id": "satyanadella"
}
}
Get Profile by URL
{
"payload": {
"mode": "get_profile",
"linkedin_url": "https://linkedin.com/in/satyanadella"
}
}
What Data Can You Extract?
| Field | Description | Example |
|---|---|---|
first_name | First name | Satya |
last_name | Last name | Nadella |
headline | Professional headline | Chairman and CEO at Microsoft |
summary | About text | "As Chairman and CEO..." |
location | Geographic location | Greater Seattle Area |
current_company | Current employer | Microsoft |
current_position | Current job title | Chairman and CEO |
connections | Connection count | 500+ |
experience | Work history | Array of positions |
education | Education history | Array of schools |
skills | Listed skills | ["Leadership", "Strategy"] |
Profile Result Example
{
"type": "profile",
"public_id": "satyanadella",
"first_name": "Satya",
"last_name": "Nadella",
"full_name": "Satya Nadella",
"headline": "Chairman and CEO at Microsoft",
"location": "Greater Seattle Area",
"current_company": "Microsoft",
"current_position": "Chairman and CEO",
"connections": 500,
"experience": [
{
"title": "Chairman and CEO",
"company": "Microsoft",
"start_date": {"year": 2014, "month": 2},
"end_date": null
}
],
"education": [
{
"school": "University of Chicago Booth School of Business",
"degree": "MBA"
}
],
"skills": ["Leadership", "Strategy", "Technology"],
"profile_pic_url_hosted": "https://media.spideriq.ai/client-xxx/linkedin_profile_satyanadella.jpg"
}
Searching Companies
Find companies matching your criteria:
{
"payload": {
"mode": "search_companies",
"keywords": "AI startup San Francisco",
"max_results": 20
}
}
Search with Industry Filter
{
"payload": {
"mode": "search_companies",
"keywords": "fintech",
"industry": "Financial Services",
"max_results": 10
}
}
Search Results
{
"type": "company_search",
"keywords": "AI startup San Francisco",
"total_results": 20,
"results": [
{
"name": "OpenAI",
"universal_name": "openai",
"description": "OpenAI is an AI research and deployment company...",
"industry": "Artificial Intelligence",
"staff_count": 1500,
"headquarters": {"city": "San Francisco"},
"logo_url": "https://media.licdn.com/..."
}
]
}
Searching People
Find people matching your criteria:
{
"payload": {
"mode": "search_people",
"keywords": "CTO startup",
"location": "San Francisco",
"max_results": 20
}
}
Search with Filters
{
"payload": {
"mode": "search_people",
"keywords": "software engineer",
"company": "Google",
"title": "Senior",
"location": "New York",
"max_results": 10
}
}
Available Filters
| Filter | Description | Example |
|---|---|---|
keywords | Search terms | "CTO AI startup" |
location | Geographic filter | "San Francisco" |
company | Company filter | "Microsoft" |
title | Title filter | "CEO" |
max_results | Result limit (1-100) | 20 |
Logo & Photo Hosting
LinkedIn CDN URLs can expire. SpiderPublicLinkedin can upload images to SpiderMedia for permanent hosting:
{
"payload": {
"mode": "get_company",
"public_id": "microsoft",
"store_logo": true
}
}
Response includes both URLs:
{
"logo_url": "https://media.licdn.com/...",
"logo_url_hosted": "https://media.spideriq.ai/client-xxx/linkedin_company_microsoft.jpg"
}
| URL Type | Pros | Cons |
|---|---|---|
logo_url | Original quality | May expire |
logo_url_hosted | Permanent, fast CDN | Stored in your quota |
Account Management
Adding Accounts
LinkedIn accounts are required for the worker to function. Add them via the admin API:
curl -X POST "https://spideriq.ai/api/v1/admin/linkedin/accounts" \
-H "X-Admin-Key: $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"account_email": "linkedin@example.com",
"password": "secure-password",
"daily_request_limit": 50,
"is_new_account": true
}'
Account Warmup
New accounts should be warmed up gradually to avoid detection:
| Day | Daily Limit |
|---|---|
| 1 | 5 requests |
| 2 | 10 requests |
| 3 | 15 requests |
| 4 | 20 requests |
| 5 | 30 requests |
| 6 | 40 requests |
| 7+ | 50 requests |
Set is_new_account: true to enable automatic warmup.
Monitoring Accounts
# List all accounts
curl "https://spideriq.ai/api/v1/admin/linkedin/accounts" \
-H "X-Admin-Key: $ADMIN_KEY"
# Get account statistics
curl "https://spideriq.ai/api/v1/admin/linkedin/accounts/stats" \
-H "X-Admin-Key: $ADMIN_KEY"
Account Statuses
| Status | Description | Action Needed |
|---|---|---|
active | Working normally | None |
rate_limited | Hit daily limit | Wait for cooldown (30-120 min) |
needs_verification | CAPTCHA required | Login manually to clear |
banned | Account blocked | Remove or replace |
inactive | Disabled by admin | Re-enable if needed |
Scaling Capacity
Each LinkedIn account provides ~50 requests per day:
| Accounts | Daily Capacity |
|---|---|
| 1 | 30-50 profiles |
| 5 | 150-250 profiles |
| 10 | 300-500 profiles |
| 20 | 600-1,000 profiles |
Account Recommendations
- Use accounts at least 1 year old
- Complete profiles with photos and connections
- Spread accounts across different email providers
- Add 2FA for security (session cookies work with 2FA)
Rate Limits and Best Practices
LinkedIn Rate Limits
| Limit | Value |
|---|---|
| Safe requests per account per day | 30-50 |
| Built-in delay between requests | 5-10 seconds |
| Session break every | 20-30 requests |
Best Practices
Use Mobile Proxies
LinkedIn blocks datacenter IPs quickly. SpiderProxy mobile proxies are automatically assigned for production jobs, providing carrier-grade IP addresses.
Respect Rate Limits
Don't submit more than 50 jobs per account per day. The worker includes built-in delays, but excessive volume triggers detection.
Warmup New Accounts
New accounts should start with is_new_account: true to enable the 7-day warmup schedule. Jumping straight to 50 requests/day can trigger blocks.
Use Multiple Accounts
Spread load across multiple accounts. The worker automatically rotates using LRU (least recently used) selection.
Monitor Account Health
Check account stats regularly. Address needs_verification status promptly by logging in manually.
Error Handling
Common Errors
| Error | Cause | Solution |
|---|---|---|
| No available accounts | All accounts rate limited | Wait or add more accounts |
| Profile not found | Invalid public_id | Verify the LinkedIn URL is correct |
| Rate limited | Account hit daily limit | Automatic cooldown, job may retry |
| Needs verification | CAPTCHA triggered | Login manually to clear |
| IP blocked | Datacenter IP detected | Use mobile proxy (automatic) |
Retry Strategy
For failed jobs, implement exponential backoff:
import time
def get_linkedin_profile(public_id, max_retries=3):
for attempt in range(max_retries):
job = submit_job(public_id)
result = wait_for_results(job["job_id"])
if result.get("success"):
return result["data"]
if "rate limit" in result.get("error", "").lower():
wait_time = (2 ** attempt) * 60 # 1, 2, 4 minutes
time.sleep(wait_time)
elif "no available accounts" in result.get("error", "").lower():
time.sleep(300) # Wait 5 minutes for cooldown
else:
raise Exception(result.get("error"))
raise Exception("Max retries exceeded")
Combining with Other Workers
LinkedIn → SpiderSite Pipeline
Extract LinkedIn data, then scrape the company website:
# Step 1: Get LinkedIn company profile
linkedin_job = submit_linkedin_job("get_company", "microsoft")
linkedin_data = wait_for_results(linkedin_job["job_id"])
# Step 2: Scrape the company website
if linkedin_data["data"].get("website"):
website_job = submit_spidersite_job(linkedin_data["data"]["website"])
website_data = wait_for_results(website_job["job_id"])
# Step 3: Combine data
enriched = {
"company": linkedin_data["data"]["name"],
"linkedin_followers": linkedin_data["data"]["follower_count"],
"employee_count": linkedin_data["data"]["employee_count"],
"website_emails": website_data["data"].get("emails", []),
"website_phones": website_data["data"].get("phones", [])
}
SpiderMaps → LinkedIn Pipeline
Discover businesses on Google Maps, then enrich with LinkedIn:
# Step 1: Get businesses from Google Maps
maps_job = submit_spidermaps_job("AI companies San Francisco")
businesses = wait_for_results(maps_job["job_id"])
# Step 2: Enrich each business with LinkedIn
for business in businesses["data"]["results"]:
company_name = business["name"]
# Search for the company on LinkedIn
linkedin_job = submit_linkedin_job(
"search_companies",
keywords=company_name,
max_results=1
)
linkedin_data = wait_for_results(linkedin_job["job_id"])
if linkedin_data["data"]["results"]:
business["linkedin"] = linkedin_data["data"]["results"][0]
Example: Building a Lead Database
Complete example building a B2B lead database:
import requests
import time
API_BASE = "https://spideriq.ai/api/v1"
TOKEN = "your_token"
def submit_job(payload):
response = requests.post(
f"{API_BASE}/jobs/spiderPublicLinkedin/submit",
headers={"Authorization": f"Bearer {TOKEN}"},
json={"payload": payload}
)
return response.json()
def wait_for_results(job_id, max_wait=60):
for _ in range(max_wait // 2):
status = requests.get(
f"{API_BASE}/jobs/{job_id}/status",
headers={"Authorization": f"Bearer {TOKEN}"}
).json()
if status["status"] == "completed":
return requests.get(
f"{API_BASE}/jobs/{job_id}/results",
headers={"Authorization": f"Bearer {TOKEN}"}
).json()
elif status["status"] == "failed":
return {"success": False, "error": status.get("error")}
time.sleep(2)
return {"success": False, "error": "Timeout"}
def build_lead_database(keywords, max_companies=10, people_per_company=5):
leads = []
# Step 1: Search for companies
company_job = submit_job({
"mode": "search_companies",
"keywords": keywords,
"max_results": max_companies
})
company_results = wait_for_results(company_job["job_id"])
if not company_results.get("success"):
print(f"Company search failed: {company_results.get('error')}")
return leads
companies = company_results["data"]["results"]
print(f"Found {len(companies)} companies")
# Step 2: For each company, find key people
for company in companies:
company_name = company.get("name")
print(f"\nProcessing: {company_name}")
# Search for decision makers at this company
people_job = submit_job({
"mode": "search_people",
"keywords": "CEO CTO VP Director",
"company": company_name,
"max_results": people_per_company
})
people_results = wait_for_results(people_job["job_id"])
if people_results.get("success"):
for person in people_results["data"]["results"]:
leads.append({
"company": company_name,
"company_industry": company.get("industry"),
"company_size": company.get("staff_count"),
"person_name": f"{person.get('first_name', '')} {person.get('last_name', '')}".strip(),
"person_title": person.get("headline"),
"person_linkedin": f"https://linkedin.com/in/{person.get('public_id')}",
"company_linkedin": f"https://linkedin.com/company/{company.get('universal_name')}"
})
# Respect rate limits
time.sleep(5)
return leads
# Build database
leads = build_lead_database("AI startup San Francisco", max_companies=5)
print(f"\nBuilt database with {len(leads)} leads")
for lead in leads[:5]:
print(f"- {lead['person_name']} at {lead['company']}")