Embedded Tweets, rate limits, and the oEmbed API

This is going to be a little winding, but bear with me; I’m thinking out loud here. In version 3.4, WordPress added a feature whereby Twitter URLs that are entered as plain text into a post are automatically converted to an embedded tweet. Apparently this has been a feature on wordpress.com hosted blogs for a while, and they decided to port it over to the free version.

All well and good, and actually quite a nifty feature, since WordPress’ tendency to strip out script elements from posts means you can’t easily put manually generated tweet markup into them.  Although their decision to remove the option to disable auto-embedding in version 3.5 caused some consternation. As the author of that blog indicates, the idea that you might sometimes want to post an actual plain old Twitter URL, rather than an embedded tweet, or that some people just might not want auto-embedding, apparently never occurred to the WordPress devs.

That’s not why I’m writing this though. Personally, I though the feature was quite useful, and tried to use in on this very blog. However, for some reason, it didn’t seem to work. I copied and pasted in Twitter status URLs exactly as described in the documentation, ensuring they were on their own line, and weren’t converted to hyperlinks or otherwise marked-up, but rather than being converted into nicely formatted tweets, they stubbornly remained as URLs. Except, every so often, they didn’t. Once out of every twenty or so attempts, the auto-embed worked, and they remained in place until I deleted the page and tried again.

I was confused. There didn’t seem to be anything in the documentation, the application, or the published pages to suggest what was going on. In the end, I had to go into the WordPress code itself, and this is what I found: WordPress uses regular expressions to match a number of URL patterns within posts, and it then auto-embeds them using a type of API called oEmbed. This is a REST API which allows a web site or app to request, from a third-party site, an embeddable form of an identified resource. E.g. the example they give is embedding a photo from Flickr. A site can make an oEmbed API request to Flickr, and receive some XML/JSON back which contains the URL of the photo as well as some other info like the author name, the size, etc. They can also return a chunk of HTML to serve as the actual embedded content, as well, which is what Twitter does.

This oEmbed API is interesting. I’ve not dug into a great deal, but I would argue that its existence reflects a failure of the declarative, semantic web. You shouldn’t need a standalone REST service to get an embeddable form of a resource: the embeddable form of the resource should be the resource itself. All the information regarding, say, a Flickr photo or a tweet -the author, the content, the size, etc.- should be available in the HTML of the resource itself. In fact, it probably is, it’s just not easily accessible because it’s all bunged up in unsemantic markup.

Now, the semantic web movement has taken a bit of a beating in the last few years. It’s been in retreat ever since the XHTML 2.0 debacle, and the feeling that its principal adherents were extremists more interested in theoretical purity than practical concerns and meeting real requirements, but that doesn’t mean its ideas are wholly without merit.

Before I expand on that though, I’ll return the story of my broken auto-embedded tweets. After finding the WordPress code responsible for detecting Twitter URLs and making the necessary oEmbed calls (located in wp-includes/class-oembed.php, if you’re interested), I saw that any error returned by the GET request was not logged or otherwise indicated. Figuring this must be where things were going wrong, I added some logging to save the error details, and tried again.

The auto-embed failed again, but this time I got a log entry. The error stated that the request had been refused due to the requester exceeding the rate limit. I was momentarily confused, I surely hadn’t made enough requests to exceed any limit, unless it was ridiculously low. Then the truth dawned, I was using shared hosting, so likely sharing an IP address with any number of other WordPress installations, all making anonymous requests to the same API. Some Googling revealed others floating the same theory in similar situations, although there was scant official guidance on the issue either way.

I wondered how many other people must have run into the same problem, but written it off as general flakiness. Give the number of WordPress installations run on shared hosting, it seems likely that a great many other people had encountered the same. I can hardly blame the WordPress devs for Twitter’s API design, but it does seem they were a little remiss in adding this feature, making it mandatory in fact, without considering that it is likely to break for a substantial portion of their user base, and not even putting in some code to catch these errors and give some indication to the user what happened and the likely reason.

In the end then, I got to the root of the problem, although it didn’t really help me much in solving it. There’s nothing simple I can do to the WordPress code in question to make it work, and nor can I get a separate IP for my hosting just to get auto-embedding working. Given the shortage of IPv4 addresses, such a thing would be rather wasteful anyway. All I can probably hope to do is forgo embedded tweets, or generate them manually on twitter.com and try to find a way to work around WordPress’ markup stripping so I can insert them into posts.

Twitter have recently launched v1.1 of their APIs, which perhaps attempts to resolve this problem by requiring authorisation tokens on each request. So future versions of WordPress might require Twitter account details under which to make oEmbed requests. These requests would then be rate limited against the account, not the IP. The major downside of this is that, rather than a nice and simple GET request, apps utilising the new API would be required to implement oAuth just to do an embed. Apparently, this not a prospect that the WordPress devs particularly relish, and who can blame them?

I don’t know what the answer will be for WordPress but, returning to my earlier problems with oEmbed itself, I think the main problem is that server-side apps like WordPress shouldn’t be responsible for doing this embedded in the first place. Really, it should be the responsibility of the client, e.g. the web browser. An HTML document should be able to indicate that it wants to embed another resource, such as a tweet, and the browser should know how to request and integrate that resource into the main document such that it embeds in a seamless fashion.

It should be possible to request a resource and receive HTML in which the main content, as well things such as the author, title and other information is made available in an easily parsable way. This main content, perhaps literally marked up with the new <main> element, should be suited for extraction and insertion into the hosting document in a sandboxed context. The resource to be embedded should also be able to specify the styling that should apply in an embedded context, via an appropriately linked stylesheet.

This setup would shift the responsibility for requesting embedded content onto the client, where it belongs, removing the need for services like Twitter to implement oEmbed services, and applications like WordPress to consume them. If they were deemed necessary, rate limits could be applied against clients, rather than servers, and embedding would become trivially easy from an authoring point of view, even within static HTML documents with no CMS behind them.

I know that work is being done within HTML and CSS to enable these kind of scenarios, and there are people who have thought about it all in a much deeper fashion than I have. Upcoming features like seamless iframes and scoped stylesheets might help make this a reality eventually. But learning about the oEmbed API made me realise the need is more pressing, because if this is what people are coming up with to work around it, then we need a better solution as soon as possible.

Interesting Web Platform Bugs

This is mainly for my own purposes, so I don’t have to hunt through Bugzilla and Peter Beverloo’s blog to find the one I’m looking for.

Feature Webkit Rev. landed Firefox Rev. landed Notes
Line Grid 76197 105176 N/A N/A Kyoto Proposal
New Flexbox 62048 N/A 666041 N/A  
Grid Layout 60731 N/A 616605 N/A  
Regions 57312 N/A 674802 N/A  
Calc 16662 N/A Finished  
Scoped Stylesheets 49142 N/A 508725 N/A  
CSS Filter Effects #68469 N/A N/A N/A  
CSS Shaders #71392 N/A N/A N/A  
Web Components #52962 N/A N/A N/A  
CSS Variables #85580 N/A #442864? N/A