Perplexity Response to Cloudflare

50 points by Tokumei-no-hito 10 hours ago

> When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect

Strawmen. They aren't arguing that any automated tool should be suspect. They are arguing that an automated tool with sufficient computing power should be suspect. By Perplexity's reasoning, I should be able to set up a huge server farm and hit any website with 1,000,000 requests per second because 1 request is not seen as harmful. In this case, of course, the danger with AI is not a DOS attack but an attack against the way the internet is structured and the way website are supposed to work.

> This overblocking hurts everyone. Consider someone using AI to research medical conditions,

Of course you will put medical conditions in there: appeal to the hypothetical person with a medical problem, a rather contemptible and revolting argument.

> This undermines user choice

What happens to user choice when website designers stop making websites or writing for websites because the lack of direct interaction makes it no longer worthwile?

> An AI assistant works just like a human assistant.

That's like saying a Ferarri works like someone walking. Yes, they go from A to B, but the Ferarri can go 400km down a highway much faster than a human. So, no, it has fundamental speed and power differences that change the way the ecosystem works, and you can't ignore the ecosystem.

> This controversy reveals that Cloudflare's systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats.

As a website designer and writer, I consider all AI assistants to be actual threats, along with the entirety of Perplexity and all AI companies. And I'm not the only one: many content creators feel the same and hope your AI assistants are neutralized with as much extreme prejudice as possible.

taskforcegemini 16 minutes ago

As a sysadmin occasionally responsible for resolving load-spikes caused by bots/crawlers: well said!

avallach 5 hours ago

Cloudflare did explain a proper solution: "Separate bots for separate activities". E.g. here: one bot for scraping/indexing, and one for non-persistent user-driven retrieval.

Website owners have a right to block both if they wish. Isn't it obvious that bypassing a bot block is a violation of the owners right to decide whom to admit?

Perplexity's almost seems to believe that "robots.txt was only made for scraping bots, so if our bot is not scraping, it's fair for us to ignore it and bypass the enforcement". And their core business is a bot, so they really should have known better.

just-tom 8 hours ago

While agents act on behalf of the user, they won't see nor click any ads; they won't sign-up to any newsletter; they won't buy the website owner a coffee. They don't act as humans just because humans triggered them. They simply take what they need and walk away.

Tokumei-no-hito 8 hours ago

fair point.
where's the front page CF callout for google search agent? they wouldn't dare. i don't remember the shaming for ad and newsletter pop up blockers.
that being said, agree with you that sites are not being used the way they were intended. i think this is part of the evolution of the web. it all began with no monetization, then swung far too much into it to the point of abuse. and now legitimate content creators are stuck in the middle.
what i disagree on is that CF has the right to, again allegedly, shame perplexity on false information. especially when OAI is solving captchas and google is also "misusing" websites.
i wish i had an answer to how we can evolve the web sustainably. my main gripe is the shaming and virtue signaling.
- skybrian 7 hours ago
  
  As far as I know Google's bot respects robots.txt and doesn't try to evade detection?
  - Tokumei-no-hito 7 hours ago
    
    maybe. the allegations against perplexity are being challenged and i haven't seen any research on google agent. CF can demonstrate nonpartisanship, and gain credence for their claims against perplexity, by being transparent about other players in the space.
    (as an aside, not to shift the goalpost to the elephant in the room, but i didn't see any blog posts on the shameless consumption of every single thing on the internet by OAI, google and anthropic. talk about misuse..)
1gn15 6 hours ago

Do you have an ad-blocker? If so, should website owners be able to disable your ad-blocker via a setting they send to you? It's their content, after all.
- Rebelgecko 5 hours ago
  
  As a frugal person it's frustrating when websites block me for using an ad blocker, but I also don't think publishers should be required to send me data
- beardyw 5 hours ago
  
  They may, and do, refuse to send you the page. That is a more realistic parallel.
  Without advertising the web would be largely unsupportable financially without per site subscriptions.
skybrian 7 hours ago

Yep, also true of something like curl -s $url | llm --system 'summarize this article'
- vouaobrasil 7 hours ago
  
  It's also true that you could dismantle a building with a hammer, which accomplishes the same as dynamite. So why not just sell dynamite at the local hardware store along with hammers?

SilverElfin 10 hours ago

> When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they're arguing that any automated tool serving users should be suspect—a position that would criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like.

I wonder if Perplexity or others mix the traffic of the two types so they’re indistinguishable, specifically to make this argument.

bobbiechen 6 hours ago

Perplexity claims their traffic was confused with Browserbase's - I think this is inevitable at scale without better ways to identify traffic (or more specifically in this case, AI agents / fetchers), based on working in this space.

Zooming out for a second, we might be in an analogous era to open email relays. In a few years, will you need to run an agent through a big service provider because other big service providers only trust each other?

astrange 9 hours ago

Why did Perplexity use ChatGPT to write this? That's a competitor.

Or are they just so bad at writing that their own style looks like it?

ronsor 9 hours ago

Perplexity uses third-party models. They aren't a frontier AI lab like OpenAI, Anthropic, or DeepSeek.

minimaxir 10 hours ago

Official post on their company blog (no idea why they reposted it on Twitter verbatim, that's bad SEO, ironically): https://www.perplexity.ai/hub/blog/agents-or-bots-making-sen...

posperson 10 hours ago

Perplexity using Cloudflare on their own website with the WAF security settings turned up is quite ironic
Tokumei-no-hito 10 hours ago

if what they're saying is true then this was a huge fuckup on CFs part. i was already a bit suspicious when they started gloating about OAI agent since it's been shown to literally state that it's solving a captcha to complete the task.
i guess it will come down to browserbase corroborating the claims.
- anon7000 8 hours ago
  
  Hm, it’s still a tricky question. Do web admins have a right to block agentic AI? Of course they do. They have a right to block whoever they want. Distinguishing between scraper and agent is important, yes, but agents can still produce a lot more traffic than the average human, and it’s not like the agents are optimizing their access patterns.
  - Tokumei-no-hito 8 hours ago
    
    i agree. i think the point i agree with perplexity on is that CF is a central authority that is claiming to be the gatekeeper while (allegedly) being disingenuous and incapable.
    i also tend to agree with the concept that scraping != consuming on behalf of a user. they explicitly point out that they do not store the data for training, which would fall under scraping by proxy.

ChrisArchitect 7 hours ago

Responding to:

Perplexity is using stealth, undeclared crawlers to evade no-crawl directives

https://news.ycombinator.com/item?id=44785636

faragon 7 hours ago

That should be a property of a web site. E.g. the web site using a property in the Cloudflare configuration. That way there would be competition between websites allowing or not IA agents on user accounts.

doctor_radium 9 hours ago

[dead]