php-readability

A fork of https://bitbucket.org/fivefilters/php-readability

Github stars Tracking Chart

Readability

Build Status
Coverage Status
Total Downloads
License

This is an extract of the Readability class from this full-text-rss fork. It can be defined as a better version of the original php-readability.

Differences

The default php-readability lib is really old and needs to be improved. I found a great fork of full-text-rss from @Dither which improve the Readability class.

  • I've extracted the class from its fork to be able to use it out of the box
  • I've added some simple tests
  • and changed the CS, run php-cs-fixer and added a namespace

But the code is still really hard to understand / read ...

Requirements

By default, this lib will use the Tidy extension if it's available. Tidy is only used to cleanup the given HTML and avoid problems with bad HTML structure, etc .. It'll be suggested by Composer.

Also, if you got problem from parsing a content without Tidy installed, please install it and try again.

Usage

use Readability\Readability;

$url = 'http://www.medialens.org/index.php/alerts/alert-archive/alerts-2013/729-thatcher.html';

// you can use whatever you want to retrieve the html content (Guzzle, Buzz, cURL ...)
$html = file_get_contents($url);

$readability = new Readability($html, $url);
// or without Tidy
// $readability = new Readability($html, $url, 'libxml', false);
$result = $readability->init();

if ($result) {
    // display the title of the page
    echo $readability->getTitle()->textContent;
    // display the *readability* content
    echo $readability->getContent()->textContent;
} else {
    echo 'Looks like we couldn\'t find the content. :(';
}

If you want to debug it, or check what's going on, you can inject a logger (which must follow Psr\Log\LoggerInterface, Monolog for example):

use Readability\Readability;
use Monolog\Logger;
use Monolog\Handler\StreamHandler;

$url = 'http://www.medialens.org/index.php/alerts/alert-archive/alerts-2013/729-thatcher.html';
$html = file_get_contents($url);

$logger = new Logger('readability');
$logger->pushHandler(new StreamHandler('path/to/your.log', Logger::DEBUG));

$readability = new Readability($html, $url);
$readability->setLogger($logger);

Main metrics

Overview
Name With Ownerj0k3r/php-readability
Primary LanguagePHP
Program languagePHP (Language Count: 1)
Platform
License:Apache License 2.0
所有者活动
Created At2014-12-12 09:27:16
Pushed At2025-06-03 08:03:47
Last Commit At2025-06-03 09:22:21
Release Count44
Last Release Name2.0.7 (Posted on )
First Release Namev1.0 (Posted on )
用户参与
Stargazers Count173
Watchers Count8
Fork Count39
Commits Count189
Has Issues Enabled
Issues Count30
Issue Open Count6
Pull Requests Count67
Pull Requests Open Count4
Pull Requests Close Count6
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private