Files
phabricator/src/applications/system/controller/PhabricatorRobotsController.php
epriestley 439821c7b2 Don't require one-time tokens to view file resources
Summary:
Ref T10262. This removes one-time tokens and makes file data responses always-cacheable (for 30 days).

The URI will stop working once any attached object changes its view policy, or the file view policy itself changes.

Files with `canCDN` (totally public data like profile images, CSS, JS, etc) use "cache-control: public" so they can be CDN'd.

Files without `canCDN` use "cache-control: private" so they won't be cached by the CDN. They could still be cached by a misbehaving local cache, but if you don't want your users seeing one anothers' secret files you should configure your local network properly.

Our "Cache-Control" headers were also from 1999 or something, update them to be more modern/sane. I can't find any evidence that any browser has done the wrong thing with this simpler ruleset in the last ~10 years.

Test Plan:
  - Configured alternate file domain.
  - Viewed site: stuff worked.
  - Accessed a file on primary domain, got redirected to alternate domain.
  - Verified proper cache headers for `canCDN` (public) and non-`canCDN` (private) files.
  - Uploaded a file to a task, edited task policy, verified it scrambled the old URI.
  - Reloaded task, new URI generated transparently.

Reviewers: chad

Reviewed By: chad

Maniphest Tasks: T10262

Differential Revision: https://secure.phabricator.com/D15642
2016-04-06 14:14:36 -07:00

39 lines
1.4 KiB
PHP

<?php
final class PhabricatorRobotsController extends PhabricatorController {
public function shouldRequireLogin() {
return false;
}
public function processRequest() {
$out = array();
// Prevent indexing of '/diffusion/', since the content is not generally
// useful to index, web spiders get stuck scraping the history of every
// file, and much of the content is Ajaxed in anyway so spiders won't even
// see it. These pages are also relatively expensive to generate.
// Note that this still allows commits (at '/rPxxxxx') to be indexed.
// They're probably not hugely useful, but suffer fewer of the problems
// Diffusion suffers and are hard to omit with 'robots.txt'.
$out[] = 'User-Agent: *';
$out[] = 'Disallow: /diffusion/';
// Add a small crawl delay (number of seconds between requests) for spiders
// which respect it. The intent here is to prevent spiders from affecting
// performance for users. The possible cost is slower indexing, but that
// seems like a reasonable tradeoff, since most Phabricator installs are
// probably not hugely concerned about cutting-edge SEO.
$out[] = 'Crawl-delay: 1';
$content = implode("\n", $out)."\n";
return id(new AphrontPlainTextResponse())
->setContent($content)
->setCacheDurationInSeconds(phutil_units('2 hours in seconds'))
->setCanCDN(true);
}
}