Goutte

Goutte, a simple PHP Web Scraper

  • Owner: FriendsOfPHP/Goutte
  • Platform:
  • License:: MIT License
  • Category::
  • Topic:
  • Like:
    0
      Compare:

Github stars Tracking Chart

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML
responses.

Requirements

Goutte depends on PHP 7.1+.

Installation

Add fabpot/goutte as a require dependency in your composer.json file:

.. code-block:: bash

composer require fabpot/goutte

Usage

Create a Goutte Client instance (which extends
Symfony\Component\BrowserKit\HttpBrowser):

.. code-block:: php

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

.. code-block:: php

// Go to the symfony.com website
$crawler = $client->request('GET', 'https://www.symfony.com/blog/');

The method returns a Crawler object
(Symfony\Component\DomCrawler\Crawler).

To use your own HTTP settings, you may create and pass an HttpClient
instance to Goutte. For example, to add a 60 second request timeout:

.. code-block:: php

use Goutte\Client;
use Symfony\Component\HttpClient\HttpClient;

$client = new Client(HttpClient::create(['timeout' => 60]));

Click on links:

.. code-block:: php

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

.. code-block:: php

// Get the latest post in this category and display the titles
$crawler->filter('h2 > a')->each(function ($node) {
    print $node->text()."\n";
});

Submit forms:

.. code-block:: php

$crawler = $client->request('GET', 'https://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
    print $node->text()."\n";
});

More Information

Read the documentation of the BrowserKit, DomCrawler, and HttpClient_
Symfony Components for more information about what you can do with Goutte.

Pronunciation

Goutte is pronounced goot i.e. it rhymes with boot and not out.

Technical Information

Goutte is a thin wrapper around the following Symfony Components:
BrowserKit, CssSelector, DomCrawler, and HttpClient.

License

Goutte is licensed under the MIT license.

.. _Composer: https://getcomposer.org
.. _BrowserKit: https://symfony.com/components/BrowserKit
.. _DomCrawler: https://symfony.com/doc/current/components/dom_crawler.html
.. _CssSelector: https://symfony.com/doc/current/components/css_selector.html
.. _HttpClient: https://symfony.com/doc/current/components/http_client.html

Main metrics

Overview
Name With OwnerFriendsOfPHP/Goutte
Primary LanguagePHP
Program languagePHP (Language Count: 1)
Platform
License:MIT License
所有者活动
Created At2010-04-21 19:21:54
Pushed At2023-04-01 09:06:44
Last Commit At2023-04-01 11:06:44
Release Count28
Last Release Namev4.0.3 (Posted on 2023-04-01 11:06:01)
First Release Namev0.1.0 (Posted on 2012-12-02 14:45:14)
用户参与
Stargazers Count9.2k
Watchers Count346
Fork Count1k
Commits Count321
Has Issues Enabled
Issues Count312
Issue Open Count138
Pull Requests Count98
Pull Requests Open Count0
Pull Requests Close Count55
项目设置
Has Wiki Enabled
Is Archived
Is Fork
Is Locked
Is Mirror
Is Private