Goutte

Goutte, a simple PHP Web Scraper

  • 所有者: FriendsOfPHP/Goutte
  • 平台:
  • 許可證: MIT License
  • 分類:
  • 主題:
  • 喜歡:
    0
      比較:

Github星跟蹤圖

Goutte, a simple PHP Web Scraper

Goutte is a screen scraping and web crawling library for PHP.

Goutte provides a nice API to crawl websites and extract data from the HTML/XML
responses.

Requirements

Goutte depends on PHP 7.1+.

Installation

Add fabpot/goutte as a require dependency in your composer.json file:

.. code-block:: bash

composer require fabpot/goutte

Usage

Create a Goutte Client instance (which extends
Symfony\Component\BrowserKit\HttpBrowser):

.. code-block:: php

use Goutte\Client;

$client = new Client();

Make requests with the request() method:

.. code-block:: php

// Go to the symfony.com website
$crawler = $client->request('GET', 'https://www.symfony.com/blog/');

The method returns a Crawler object
(Symfony\Component\DomCrawler\Crawler).

To use your own HTTP settings, you may create and pass an HttpClient
instance to Goutte. For example, to add a 60 second request timeout:

.. code-block:: php

use Goutte\Client;
use Symfony\Component\HttpClient\HttpClient;

$client = new Client(HttpClient::create(['timeout' => 60]));

Click on links:

.. code-block:: php

// Click on the "Security Advisories" link
$link = $crawler->selectLink('Security Advisories')->link();
$crawler = $client->click($link);

Extract data:

.. code-block:: php

// Get the latest post in this category and display the titles
$crawler->filter('h2 > a')->each(function ($node) {
    print $node->text()."\n";
});

Submit forms:

.. code-block:: php

$crawler = $client->request('GET', 'https://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, array('login' => 'fabpot', 'password' => 'xxxxxx'));
$crawler->filter('.flash-error')->each(function ($node) {
    print $node->text()."\n";
});

More Information

Read the documentation of the BrowserKit, DomCrawler, and HttpClient_
Symfony Components for more information about what you can do with Goutte.

Pronunciation

Goutte is pronounced goot i.e. it rhymes with boot and not out.

Technical Information

Goutte is a thin wrapper around the following Symfony Components:
BrowserKit, CssSelector, DomCrawler, and HttpClient.

License

Goutte is licensed under the MIT license.

.. _Composer: https://getcomposer.org
.. _BrowserKit: https://symfony.com/components/BrowserKit
.. _DomCrawler: https://symfony.com/doc/current/components/dom_crawler.html
.. _CssSelector: https://symfony.com/doc/current/components/css_selector.html
.. _HttpClient: https://symfony.com/doc/current/components/http_client.html

主要指標

概覽
名稱與所有者FriendsOfPHP/Goutte
主編程語言PHP
編程語言PHP (語言數: 1)
平台
許可證MIT License
所有者活动
創建於2010-04-21 19:21:54
推送於2023-04-01 09:06:44
最后一次提交2023-04-01 11:06:44
發布數28
最新版本名稱v4.0.3 (發布於 2023-04-01 11:06:01)
第一版名稱v0.1.0 (發布於 2012-12-02 14:45:14)
用户参与
星數9.2k
關注者數346
派生數1k
提交數321
已啟用問題?
問題數312
打開的問題數138
拉請求數98
打開的拉請求數0
關閉的拉請求數55
项目设置
已啟用Wiki?
已存檔?
是復刻?
已鎖定?
是鏡像?
是私有?