Zend_Feed_Pubsubhubbub
Zend_Feed_Pubsubhubbub ist eine Implementation der PubSubHubbub Core
0.2 Spezifikation (Working Draft). Sie bietet eine Implementation eines Pubsubhubbub
Publizisten und Abonnenten geeignet für den Zend Framework und andere PHP Anwendungen.
Was ist Pubsubhubbub?
Pubsubhubbub ist ein offenes, einfaches Web-skalierbares Pubsub Protokoll. Der normale
Anwendungsfall ist es Blogs (Publizist) zu erlauben Aktualisierungen von deren RSS oder
Atom Feeds (Themen) an Abonnenten zu "senden". Diese Abonenten müssen dem RSS oder Atom
Feed des Blogs über einen Hub abonniert haben. Das ist ein zentraler Server der
benachrichtigt wird wenn es Aktualisierungen des Publizisten gibt und diese anschließend
an alle Abonnenten verteilt. Jeder Feed kann bekanntgeben das er ein oder mehrere Hubs
unterstützen indem ein Atom Namespaced Linkelement mit dem Rel Attribut "hub" verwendet
wird.
Pubsubhubbub hat Aufmerksamkeit erlangt weil es ein Pubsub Protokoll ist das einfach zu
implementieren ist und über HTTP arbeitet. Seine Philosophie ist es das traditionelle
Modell zu ersetzen indem Blog Feeds mit einem regulären Interfall abgefragt werden um
Aktualisierungen zu erkennen und zu empfangen. Abhängig von der Frequenz der Abfrage
kann es viel Zeit in Anspruch nehmen Aktualisierungen an interessierte Menschen bei
Sammelstellen bis zu Desktop Lesern, bekannt zu machen. Mit der Verwendung eines
Pubsub Systems werden Aktualisierungen nicht einfach von Abonnenten abgefragt sondern
an die Abonnenten geschickt, was jegliche Verzögerung ausschaltet. Aus diesem Grund
fungiert Pubsubhubbub als Teil von dem was als Echt-Zeit Web bekannt ist.
The protocol does not exist in isolation. Pubsub systems have been around for a while,
such as the familiar Jabber Publish-Subscribe protocol, XEP-0060, or the less well known
rssCloud (described in 2001). However these have not achieved widespread adoption typically
due to either their complexity, poor timing or lack of suitability for web applications.
rssCloud, which was recently revived as a response to the appearance of Pubsubhubbub, has
also seen its usage increase significantly though it lacks a formal specification and
currently does not support Atom 1.0 feeds.
Perhaps surprisingly given its relative early age, Pubsubhubbub is already in use including
in Google Reader, Feedburner, and there are plugins available for Wordpress blogs.
Architecture
Zend_Feed_Pubsubhubbub implements two sides of the Pubsubhubbub
0.2 Specification: a Publisher and a Subscriber. It does not currently implement a Hub
Server though this is in progress for a future Zend Framework release.
A Publisher is responsible for notifying all supported Hubs (many can be supported to
add redundancy to the system) of any updates to its feeds, whether they be Atom or RSS
based. This is achieved by pinging the supported Hub Servers with the URL of the updated
feed. In Pubsubhubbub terminology, any updatable resource capable of being subscribed
to is referred to as a Topic. Once a ping is received, the Hub will request the updated
feed, process it for updated items, and forward all updates to all Subscribers
subscribed to that feed.
A Subscriber is any party or application which subscribes to one or more Hubs to receive
updates from a Topic hosted by a Publisher. The Subscriber never directly communicates
with the Publisher since the Hub acts as an intermediary, accepting subscriptions and
sending updates to subscribed Subscribers. The Subscriber therefore communicates only
with the Hub, either to subscribe/unsubscribe to Topics, or when it receives updates
from the Hub. This communication design ("Fat Pings") effectively removes the possibility of
a "Thundering Herd" issue. This occurs in a pubsub system where the Hub merely informs
Subscribers that an update is available, prompting all Subscribers to immediately retrieve
the feed from the Publisher giving rise to a traffic spike. In Pubsubhubbub, the Hub
distributes the actual update in a "Fat Ping" so the Publisher is not subjected to any
traffic spike.
Zend_Feed_Pubsubhubbub implements Pubsubhubbub Publishers and
Subscribers with the
classes Zend_Feed_Pubsubhubbub_Publisher and
Zend_Feed_Pubsubhubbub_Subscriber. In addition, the Subscriber
implementation may handle any feed updates forwarded from a Hub by using
Zend_Feed_Pubsubhubbub_Subscriber_Callback. These classes, their
use cases, and APIs are covered in subsequent sections.
Zend_Feed_Pubsubhubbub_Publisher
In Pubsubhubbub, the Publisher is the party who publishes a live feed and frequently updates
it with new content. This may be a blog, an aggregator, or even a web service with a public
feed based API. In order for these updates to be pushed to Subscribers, the Publisher
must notify all of its supported Hubs that an update has occured using a simple HTTP POST
request containing the URI or the updated Topic (i.e the updated RSS or Atom feed). The Hub
will confirm receipt of the notification, fetch the updated feed, and forward any updates to
any Subscribers who have subscribed to that Hub for updates from the relevant feed.
By design, this means the Publisher has very little to do except send these Hub pings
whenever its feeds change. As a result, the Publisher implementation is extremely
simple to use and requires very little work to setup and use when feeds are updated.
Zend_Feed_Pubsubhubbub_Publisher implements a full Pubsubhubbub
Publisher. Its setup for use is also simple, requiring mainly that it is configured with
the URI endpoint for all Hubs to be notified of updates, and the URIs of all Topics to
be included in the notifications.
The following example shows a Publisher notifying a collection of Hubs about updates to
a pair of local RSS and Atom feeds. The class retains a collection of errors which
include the Hub URLs, so the notification can be re-attempted later and/or logged if any
notifications happen to fail. Each resulting error array also includes a "response" key
containing the related HTTP response object. In the event of any errors, it is strongly
recommended to attempt the operation for failed Hub Endpoints at least once more at a
future time. This may require the use of either a scheduled task for this purpose or
a job queue though such extra steps are optional.
addHubUrls(array(
'http://pubsubhubbub.appspot.com/',
'http://hubbub.example.com',
));
$publisher->addUpdatedTopicUrls(array(
'http://www.example.net/rss',
'http://www.example.net/atom',
));
$publisher->notifyAll();
if (!$publisher->isSuccess()) {
// check for errors
$errors = $publisher->getErrors();
$failedHubs = array()
foreach ($errors as $error) {
$failedHubs[] = $error['hubUrl'];
}
}
// reschedule notifications for the failed Hubs in $failedHubs
]]>
If you prefer having more concrete control over the Publisher, the methods
addHubUrls() and addUpdatedTopicUrls()
pass each array value to the singular addHubUrl() and
addUpdatedTopicUrl() public methods. There are also matching
removeUpdatedTopicUrl() and
removeHubUrl() methods.
You can also skip setting Hub URIs, and notify each in turn using the
notifyHub() method which accepts the URI of a Hub endpoint as
its only argument.
There are no other tasks to cover. The Publisher implementation is very simple since
most of the feed processing and distribution is handled by the selected Hubs. It is
however important to detect errors and reschedule notifications as soon as possible
(with a reasonable maximum number of retries) to ensure notifications reach all
Subscribers. In many cases as a final alternative, Hubs may frequently poll your
feeds to offer some additional tolerance for failures both in terms of their own
temporary downtime or Publisher errors/downtime.
Zend_Feed_Pubsubhubbub_Subscriber
In Pubsubhubbub, the Subscriber is the party who wishes to receive updates to any Topic (RSS
or Atom feed). They achieve this by subscribing to one or more of the Hubs advertised by
that Topic, usually as a set of one or more Atom 1.0 links with a rel attribute of "hub". The
Hub from that point forward will send an Atom or RSS feed containing all updates to that
Subscriber's Callback URL when it receives an update notification from the Publisher. In
this way, the Subscriber need never actually visit the original feed (though it's still
recommended at some level to ensure updates are retrieved if ever a Hub goes offline). All
subscription requests must contain the URI of the Topic being subscribed and a Callback URL
which the Hub will use to confirm the subscription and to forward updates.
The Subsciber therefore has two roles. To create and manage subscriptions, including
subscribing for new Topics with a Hub, unsubscribing (if necessary), and periodically
renewing subscriptions since they may have a limited validity as set by the Hub. This is handled
by Zend_Feed_Pubsubhubbub_Subscriber.
The second role is to accept updates sent by a Hub to the Subscriber's Callback URL, i.e.
the URI the Subscriber has assigned to handle updates. The Callback URL also handles events
where the Hub contacts the Subscriber to confirm all subscriptions and unsubscriptions.
This is handled by using an instance of
Zend_Feed_Pubsubhubbub_Subscriber_Callback when the Callback URL is
accessed.
Zend_Feed_Pubsubhubbub_Subscriber implements the Pubsubhubbub 0.2
Specification. As this is a new specification version not all Hubs currently implement
it. The new specification allows the Callback URL to include a query string which is
used by this class, but not supported by all Hubs. In the interests of maximising
compatibility it is therefore recommended that the query string component of the
Subscriber Callback URI be presented as a path element, i.e. recognised as a
parameter in the route associated with the Callback URI and used by the application's
Router.
Subscribing and Unsubscribing
Zend_Feed_Pubsubhubbub_Subscriber implements a full Pubsubhubbub
Subscriber capable of subscribing to, or unsubscribing from, any Topic via any Hub
advertised by that Topic. It operates in conjunction with
Zend_Feed_Pubsubhubbub_Subscriber_Callback which accepts requests
from a Hub to confirm all subscription or unsubscription attempts (to prevent
third-party misuse).
Any subscription (or unsubscription) requires the relevant information before
proceeding, i.e. the URI of the Topic (Atom or RSS feed) to be subscribed to for
updates, and the URI of the endpoint for the Hub which will handle the subscription and
forwarding of the updates. The lifetime of a subscription may be determined by the
Hub but most Hubs should support automatic subscription refreshes by checking with
the Subscriber. This is supported by Zend_Feed_Pubsubhubbub_Subscriber_Callback
and requires no other work on your part. It is still strongly recommended that you use
the Hub sourced subscription time to live (ttl) to schedule the creation of new subscriptions
(the process is identical to that for any new subscription) to refresh it with the Hub.
While it should not be necessary per se, it covers cases where a Hub may not support
automatic subscription refreshing and rules out Hub errors for additional redundancy.
With the relevant information to hand, a subscription can be attempted as
demonstrated below:
setStorage($storage);
$subscriber->addHubUrl('http://hubbub.example.com');
$subscriber->setTopicUrl('http://www.example.net/rss.xml');
$subscriber->setCallbackUrl('http://www.mydomain.com/hubbub/callback');
$subscriber->subscribeAll();
]]>
In order to store subscriptions and offer access to this data for general use,
the component requires a database (a schema is provided later in this section).
By default, it is assumed the table name is "subscription" and it utilises
Zend_Db_Table_Abstract in the background meaning it
will use the default adapter you have set for your application. You may also
pass a specific custom Zend_Db_Table_Abstract instance
into the associated model Zend_Feed_Pubsubhubbub_Model_Subscription.
This custom adapter may be as simple in intent as changing the table name to use or as
complex as you deem necessary.
While this Model is offered as a default ready-to-roll solution, you may create your
own Model using any other backend or database layer (e.g. Doctrine) so long as the
resulting class implements the interface
Zend_Feed_Pubsubhubbub_Model_SubscriptionInterface.
Behind the scenes, the Subscriber above will send a request to the Hub endpoint containing the
following parameters (based on the previous example):
Subscription request parameters
Parameter
Value
Explanation
hub.callback
http://www.mydomain.com/hubbub/callback?xhub.subscription=5536df06b5dcb966edab3a4c4d56213c16a8184
The URI used by a Hub to contact the Subscriber and either request
confirmation of a (un)subscription request or send updates from
subscribed feeds. The appended query string contains a custom
parameter (hence the xhub designation). It is a query string
parameter preserved by the Hub and resent with all Subscriber
requests. Its purpose is to allow the Subscriber to identify and
look up the subscription associated with any Hub request in a
backend storage medium. This is a non-standard parameter used by
this component in preference to encoding a subscription key in the
URI path which is more difficult to implement in a Zend Framework
application.
Nevertheless, since not all Hubs support query string parameters,
we still strongly recommend adding the subscription key as a path component
in the form http://www.mydomain.com/hubbub/callback/5536df06b5dcb966edab3a4c4d56213c16a8184.
To accomplish this, it requires defining a route capable of parsing out the final
value of the key and then retrieving the value and passing it to the Subscriber
Callback object. The value would be passed into the method
Zend_Pubsubhubbub_Subscriber_Callback::setSubscriptionKey().
A detailed example is offered later.
hub.lease_seconds
2592000
The number of seconds for which the Subscriber would like a new
subscription to remain valid for (i.e. a TTL). Hubs may enforce their own maximum
subscription period. All subscriptions should be renewed by simply
re-subscribing before the subscription period ends to ensure
continuity of updates. Hubs should additionally attempt to automatically
refresh subscriptions before they expire by contacting Subscribers (handled
automatically by the Callback class).
hub.mode
subscribe
Simple value indicating this is a subscription request.
Unsubscription requests would use the "unsubscribe" value.
hub.topic
http://www.example.net/rss.xml
The URI of the topic (i.e. Atom or RSS feed) which the Subscriber
wishes to subscribe to for updates.
hub.verify
sync
Indicates to the Hub the preferred mode of verifying subscriptions
or unsubscriptions. It is repeated twice in order of preference. Technically
this component does not distinguish between the two modes and treats both
equally.
hub.verify
async
Indicates to the Hub the preferred mode of verifying subscriptions
or unsubscriptions. It is repeated twice in order of preference. Technically
this component does not distinguish between the two modes and treats both
equally.
hub.verify_token
3065919804abcaa7212ae89.879827871253878386
A verification token returned to the Subscriber by the Hub when it
is confirming a subscription or unsubscription. Offers a measure of
reliance that the confirmation request originates from the correct
Hub to prevent misuse.
You can modify several of these parameters to indicate a different preference. For
example, you can set a different lease seconds value using
Zend_Pubsubhubbub_Subscriber::setLeaseSeconds() or show a
preference for the async verify mode by using
setPreferredVerificationMode(Zend_Feed_Pubsubhubbub::VERIFICATION_MODE_ASYNC).
However the Hubs retain the capability to enforce their own preferences and for this
reason the component is deliberately designed to work across almost any set of options
with minimum end-user configuration required. Conventions are great when they work!
While Hubs may require the use of a specific verification mode (both are supported
by Zend_Pubsubhubbub), you may indicate a specific preference
using the setPreferredVerificationMode() method. In "sync"
(synchronous) mode, the Hub attempts to confirm a subscription as soon as it is
received, and before responding to the subscription request. In "async"
(asynchronous) mode, the Hub will return a response to the subscription request
immediately, and its verification request may occur at a later time. Since
Zend_Pubsubhubbub implements the Subscriber verification role
as a separate callback class and requires the use of a backend storage medium, it
actually supports both transparently though in terms of end-user performance,
asynchronous verification is very much preferred to eliminate the impact of a
poorly performing Hub tying up end-user server resources and connections for
too long.
Unsubscribing from a Topic follows the exact same pattern as the previous example, with
the exception that we should call unsubscribeAll() instead. The
parameters included are identical to a subscription request with the exception that
"hub.mode" is set to "unsubscribe".
By default, a new instance of Zend_Pubsubhubbub_Subscriber will
attempt to use a database backed storage medium which defaults to using the default
Zend_Db adapter with a table name of "subscription".
It is recommended to set a custom storage solution where these defaults are not apt either
by passing in a new Model supporting the required interface or by passing a new instance
of Zend_Db_Table_Abstract to the default Model's constructor to change
the used table name.
Handling Subscriber Callbacks
Whenever a subscription or unsubscription request is made, the Hub must verify the
request by forwarding a new verification request to the Callback URL set in the
subscription/unsubscription parameters. To handle these Hub requests, which will include
all future communications containing Topic (feed) updates, the Callback URL should trigger the
execution of an instance of Zend_Pubsubhubbub_Subscriber_Callback
to handle the request.
The Callback class should be configured to use the same storage medium as the Subscriber
class. Using it is quite simple since most of its work is performed internally.
setStorage($storage);
$callback->handle();
$callback->sendResponse();
/**
* Check if the callback resulting in the receipt of a feed update.
* Otherwise it was either a (un)sub verification request or invalid request.
* Typically we need do nothing other than add feed update handling - the rest
* is handled internally by the class.
*/
if ($callback->hasFeedUpdate()) {
$feedString = $callback->getFeedUpdate();
/**
* Process the feed update asynchronously to avoid a Hub timeout.
*/
}
]]>
It should be noted that
Zend_Feed_Pubsubhubbub_Subscriber_Callback may independently
parse any incoming query string and other parameters. This is necessary since PHP
alters the structure and keys of a query string when it is parsed into the
$_GET or $_POST superglobals. For example,
all duplicate keys are ignored and periods are converted to underscores.
Pubsubhubbub features both of these in the query strings it generates.
It is essential that developers recognise that Hubs are only concerned with sending
requests and receiving a response which verifies its receipt. If a feed update is
received, it should never be processed on the spot since this leaves the Hub waiting
for a response. Rather, any processing should be offloaded to another process or
deferred until after a response has been returned to the Hub. One symptom of a
failure to promptly complete Hub requests is that a Hub may continue to attempt
delivery of the update/verification request leading to duplicated update attempts
being processed by the Subscriber. This appears problematic - but in reality a
Hub may apply a timeout of just a few seconds, and if no response is received within
that time it may disconnect (assuming a delivery failure) and retry later. Note that
Hubs are expected to distribute vast volumes of updates so their resources are
stretched - please do process feeds asynchronously (e.g. in a separate process or
a job queue or even a cron scheduled task) as much as possible.
Setting Up And Using A Callback URL Route
As noted earlier, the Zend_Feed_Pubsubhubbub_Subscriber_Callback
class receives the combined key associated with any subscription from the Hub via one
of two methods. The technically preferred method is to add this key to the Callback
URL employed by the Hub in all future requests using a query string parameter with
the key "xhub.subscription". However, for historical reasons, primarily that this was
not supported in Pubsubhubbub 0.1 (it was recently added in 0.2 only), it is strongly
recommended to use the most compatible means of adding this key to the Callback URL
by appending it to the URL's path.
Thus the URL http://www.example.com/callback?xhub.subscription=key would become
http://www.example.com/callback/key.
Since the query string method is the default in anticipation of a greater level
of future support for the full 0.2 specification, this requires some additional work
to implement.
The first step to to make the Zend_Feed_Pubsubhubbub_Subscriber_Callback
class aware of the path contained subscription key. It's manually injected therefore
since it also requires manually defining a route for this purpose. This is achieved simply by
called the method Zend_Feed_Pubsubhubbub_Subscriber_Callback::setSubscriptionKey()
with the parameter being the key value available from the Router. The example below
demonstrates this using a Zend Framework controller.
setStorage($storage);
/**
* Inject subscription key parsing from URL path using
* a parameter from Router.
*/
$subscriptionKey = $this->_getParam('subkey');
$callback->setSubscriptionKey($subscriptionKey);
$callback->handle();
$callback->sendResponse();
/**
* Check if the callback resulting in the receipt of a feed update.
* Otherwise it was either a (un)sub verification request or invalid request.
* Typically we need do nothing other than add feed update handling - the rest
* is handled internally by the class.
*/
if ($callback->hasFeedUpdate()) {
$feedString = $callback->getFeedUpdate();
/**
* Process the feed update asynchronously to avoid a Hub timeout.
*/
}
}
}
]]>
Actually adding the route which would map the path-appended key
to a parameter for retrieval from a controller can be accomplished using
a Route configuration such as the INI formatted example below for use
with Zend_Application bootstrapping.