当遇到网页源码缺失导致数据采集失败时,可以尝试以下解决方案:
使用无头浏览器:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
driver = webdriver.Chrome(options=options)
driver.get(url)
html = driver.page_source
python
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...',
'Accept-Language': 'en-US,en;q=0.9'
}
import requests
from time import sleep
max_retries = 3
retry_delay = 5
for attempt in range(max_retries):
try:
response = requests.get(url, headers=headers)
if response.status_code == 200:
break
except Exception as e:
print(f"Attempt {attempt+1} failed: {str(e)}")
sleep(retry_delay)
如果以上方法都无法解决问题,可能需要联系网站管理员或考虑使用付费的数据服务提供商。