2010-06-06

My experience creating a Google Chrome extension

The Chrome extension for Oplop went up on to the extension gallery today. I figured I would just quickly jot down some thoughts of mine on creating an extension in Chrome, lessons learned, etc.

First off, writing an extension is easy. Over the years I have toyed with the thought of creating a Firefox extension, but the thought of having to use XUL always scared me away. Eventually I switched to Chrome as my primary browser (and I have been extremely happy, especially with the developer tools; only use Firefox now to test offline stuff because of its "Work Offline" menu option) but I still had not thought much about trying an extension again. But then Chrome released details of their extension design and how it revolved around extensions being nothing more than web pages. That seemed reasonable; why not treat an extension like some other web page? Less custom code just for extensions and people already have experience with HTML and JavaScript, so why not?

Eventually I started Oplop and my focus shifted to HTML5 web apps thanks to my Ph.D. (which I have started writing). But I still kept an eye on Chrome's approach to extensions (which Mozilla Jetpack takes a similar approach). Finally I found the time to give it a shot.

Chrome extensions fall into three types: browser actions, page actions, and content scripts. Browser actions are the icons you see to the right of omnibox. Those are extensions whose icons are always present no matter what the browser is showing you and will present a popup when you click the icon. A page action is an extension where the icon shows up within the omnibox at its far right edge. This is for extensions that conditionally want to present you with a popup (like Oplop only being useful on pages with a password field). Lastly there are content scripts which run based on a regex for the URL of the page being viewed. They have no visible icon. Now you can mix and match any of these three approaches, e.g. Oplop is a page action as it only shows itself when needed, but a it also uses a content script to find out what pages have a password field.

The other part to an extension is the background page. This is an HTML page that is constantly running headless in the background (why it is HTML and not just JavaScript I don't know since it has no UI). This kind of gives you an MVC for extensions: popups from a page or browser action is the view with the background page as the main controller and content scripts kind of acting like worker threads per tab.

The real trick in all of this is when you have to start communicating between the various parts. Content scripts run in their own space, as do background pages and popups. But content scripts run in a very restricted space since they can mutate the running tab's DOM. And things get a little bit more complicated as the background page is running for every tab while content scripts and popups are only running for their respective tab. So how the heck do you tie all of this stuff together?

Content scripts communicate through message passing. You can either do one-off messages or open a port that stays open. Now anyone can communicate with content scripts, including popups, so you kind of have to be careful when using one-time messages that you don't interleave communication through the extensions' mailbox. You can do the long-running message port if you want and store the port so that you send multiple pieces of information. The trick here seems to be having the background page truly act as the central controller. Basically you have any communication dealing with content scripts either start or terminate in the background page. If a popup needs to communicate with a content script it's best to have the background page implement a function for communicating and then use chrome.extension.getBackgroundPage() to send the message on the popup's behalf. You can use chrome.tabs.getSelected() in a popup to find out what the current tab happens to be in so you can send the message to the right content script by having the background page store the port or callback response function in an object keyed on tab ID.

The communication thing was the hardest thing I found. Everything else made sense. It's actually so straightforward that I have a catch-all extension which I have do stuff for me, e.g. skip full-screen ads. So if you are a Chrome user and have ever had an interest in writing an extension I recommend giving it a try.