SpiderFacebookPage extracts business information from Facebook pages. Submit a Facebook page URL and receive structured data including contact details, follower counts, ratings, and profile pictures hosted on SpiderMedia.
Single URL Per Job: Unlike SpiderMaps (which returns 100+ businesses per search), SpiderFacebookPage processes one Facebook page at a time. This is ideal for enriching existing leads with Facebook data.
Collect and host profile pictures for a directory:
Copy
facebook_pages = [ "https://www.facebook.com/Nike", "https://www.facebook.com/Adidas", "https://www.facebook.com/Puma",]profile_pictures = {}for url in facebook_pages: data = scrape_facebook_page(url) if data and data.get('profile_picture_stored'): profile_pictures[data['name']] = { 'hosted_url': data['profile_picture_stored'], # Permanent URL 'original_url': data.get('profile_picture_url'), # May expire 'facebook_id': data.get('facebook_id'), }# Use hosted URLs in your applicationfor name, pics in profile_pictures.items(): print(f"{name}: {pics['hosted_url']}")
Some pages cannot be scraped due to privacy settings:
Copy
result = requests.get(f"{API}/jobs/{job_id}/results", headers=headers)data = result.json()if data.get('status') == 'failed': error = data.get('error', '') if 'Library exited with code 1' in error: print("Page is private or restricted") else: print(f"Failed: {error}")
def scrape_with_retry(url, max_retries=3): """Scrape with retry on failure.""" for attempt in range(max_retries): result = scrape_facebook_page(url) if result: return result print(f"Attempt {attempt + 1} failed, retrying...") time.sleep(10) # Wait before retry return None
Recommended Rate: Submit no more than 10 Facebook page jobs per minute to avoid potential blocking.
Copy
import timeurls = [...] # Your list of Facebook URLsfor i, url in enumerate(urls): job = requests.post( f"{API}/jobs/spiderFacebookPage/submit", headers=headers, json={"payload": {"url": url}} ) # Rate limit: max 10 per minute if (i + 1) % 10 == 0: print("Rate limit pause...") time.sleep(60)
Use Business Pages: SpiderFacebookPage works best with Facebook Business Pages. Personal profiles often have privacy restrictions that prevent scraping.
Validate URLs First: Before submitting many jobs, verify your URLs are valid Facebook page URLs to avoid wasting API calls.
Cache Results: Store Facebook page data with the facebook_id as a unique key to avoid re-scraping the same pages.
Handle Missing Data: Not all pages have all fields. Always check if a field exists before using it.
# Find businesses on Google Maps, then enrich with Facebookmaps_job = submit_maps_job("coffee shops Berlin")businesses = get_results(maps_job)for biz in businesses['data']['businesses']: # Look for Facebook link in website website = biz.get('website', '') if 'facebook.com' in website.lower(): fb_data = scrape_facebook_page(website) if fb_data: biz['facebook_followers'] = fb_data.get('followers') biz['facebook_rating'] = fb_data.get('rating')