Add cached summarization to PhamePost

Summary: Restore summarization. Use the remarkup cache, and try to do it somewhat-intelligently (pick the first paragraph that looks like it's text).

Test Plan:
{F21323}

{F21324}

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T1373

Differential Revision: https://secure.phabricator.com/D3715
This commit is contained in:
epriestley
2012-10-17 08:36:33 -07:00
parent 83bbad8ba0
commit 44c6109bf2
7 changed files with 99 additions and 105 deletions

View File

@@ -495,4 +495,54 @@ final class PhabricatorMarkupEngine {
return $mentions;
}
/**
* Produce a corpus summary, in a way that shortens the underlying text
* without truncating it somewhere awkward.
*
* TODO: We could do a better job of this.
*
* @param string Remarkup corpus to summarize.
* @return string Summarized corpus.
*/
public static function summarize($corpus) {
// Major goals here are:
// - Don't split in the middle of a character (utf-8).
// - Don't split in the middle of, e.g., **bold** text, since
// we end up with hanging '**' in the summary.
// - Try not to pick an image macro, header, embedded file, etc.
// - Hopefully don't return too much text. We don't explicitly limit
// this right now.
$blocks = preg_split("/\n *\n\s*/", trim($corpus));
$best = null;
foreach ($blocks as $block) {
// This is a test for normal spaces in the block, i.e. a heuristic to
// distinguish standard paragraphs from things like image macros. It may
// not work well for non-latin text. We prefer to summarize with a
// paragraph of normal words over an image macro, if possible.
$has_space = preg_match('/\w\s\w/', $block);
// This is a test to find embedded images and headers. We prefer to
// summarize with a normal paragraph over a header or an embedded object,
// if possible.
$has_embed = preg_match('/^[{=]/', $block);
if ($has_space && !$has_embed) {
// This seems like a good summary, so return it.
return $block;
}
if (!$best) {
// This is the first block we found; if everything is garbage just
// use the first block.
$best = $block;
}
}
return $best;
}
}