Getting passed anti-bot mechanisms
These are 4 things to get passed most anti-bot mechanisms:
These 3 will get you passed most bot detection mechanisms, https://www.browserless.io/docs/bypass-bot-detection.
1) Use the &headless=false flag:
Some sites will check your user-agent, which by default explicitly claims you're running headless chrome, this is a dead giveaway. It can be changed by setting a specific user-agent but we highly recommend you use the &headless=false flag instead, which changes your user-agent to a more credible one.
2) Use the &stealth flag:
It implements puppeteer's puppeteer-extra-plugin-stealth plugin which applies various techniques to make detection of headless puppeteer harder.
3) Use our in-built proxy or use a third-party proxy
Finally, the hardest sites to crack down check your IP address; there are two type of bocks that can occur, those based on type of IP, and those based on frequency of requests (rate-limits):
Sites checking the type of IP address will detect your data-center IP addresses when using browserless. To overcome this, using a proxy with residential IP addresses will be the best option.
Sites that work the first few times and then stop working, are probably rate-limiting and it's not the residential part of it that blocks us. For these cases, you don't necessarily need a residential proxy and data-center IP addresses that rotate should be enough.
Use our new /unblock API to access sites that have a higher level of anti-bot mechanisms. Keep in mind it'll consume more units from your plan but does a great job at unblocking the harder sites. Read more about it here
https://www.browserless.io/blog/2024/02/26/unblock-api
We still have a few tricks up our sleeves for paying customers, so feel free to reach out for more information at support@browserless.io
Basic bypassing methods
These 3 will get you passed most bot detection mechanisms, https://www.browserless.io/docs/bypass-bot-detection.
1) Use the &headless=false flag:
Some sites will check your user-agent, which by default explicitly claims you're running headless chrome, this is a dead giveaway. It can be changed by setting a specific user-agent but we highly recommend you use the &headless=false flag instead, which changes your user-agent to a more credible one.
2) Use the &stealth flag:
It implements puppeteer's puppeteer-extra-plugin-stealth plugin which applies various techniques to make detection of headless puppeteer harder.
3) Use our in-built proxy or use a third-party proxy
Finally, the hardest sites to crack down check your IP address; there are two type of bocks that can occur, those based on type of IP, and those based on frequency of requests (rate-limits):
Sites checking the type of IP address will detect your data-center IP addresses when using browserless. To overcome this, using a proxy with residential IP addresses will be the best option.
Sites that work the first few times and then stop working, are probably rate-limiting and it's not the residential part of it that blocks us. For these cases, you don't necessarily need a residential proxy and data-center IP addresses that rotate should be enough.
Advanced bypassing method
Use our new /unblock API to access sites that have a higher level of anti-bot mechanisms. Keep in mind it'll consume more units from your plan but does a great job at unblocking the harder sites. Read more about it here
https://www.browserless.io/blog/2024/02/26/unblock-api
We still have a few tricks up our sleeves for paying customers, so feel free to reach out for more information at support@browserless.io
Updated on: 19/03/2024
Thank you!