2008-02-10

My first program using Google's Social Graph API

I have blogged before about the possibilities of Google's Social Graph API. I thought the possibilities of what it could be used for were pretty great and that I looked forward to using the API.

Well, I finally got around to writing something and it happened to be somewhat useful for myself. Basically what I did was write a Python script that took a starting point in the social graph that represented me, found all content that was mine, found all the people I linked to as friends or acquaintances, and then looked for all of their content. The reason I did this was I figured I might find out about content they have online that I was unaware of.

And it actually worked. For instance, I didn't know about Django People before, and I discovered that Steve, Doug, and Jacob are on there. I also found out that Ian has a Twitter account that has not updated in three weeks. Nothing Earth-shattering, but still cool to find out about.

But obviously this is not a completely accurate picture. As an example, Steve does not link to holdenweb in such a way as for an edge in the social graph to connect his blog to his business site in some way. As more people (hopefully) start to annotate links with XFN or set up FOAF files then the holes in the graph will be filled in.

Coding up the script was also interesting. It had been a while where I had to think about combining steps of an algorithm for performance reasons in order to minimize the number or queries I had to send to Google. While Google is damn fast and obviously has more processing power than I do, I doubt that the I/O delay over the Internet on top of Google's processing time is faster than me doing some extra work on my end. So I tried to minimize the queries I made and instead put more code into analyzing the results of the queries. I had to go against some strong desires to break down the algorithm into more queries. =)

One piece of warning I do have for anyone who plays with the API is to realize that a single query is limited to 15 URLs. I thought I had read this but I couldn't find any docs that explicitly stated this fact. But from experience I know this to be true. So don't be shocked if you have to make multiple requests to get all the data you want.

And if I ever decided to do a more robust Python library for the API I would definitely come up with an object representation of the nodes and cache the hell out of everything. While working directly off of the JSON data in native Python types as simplejson presents it is fine, I just would prefer to have some more structured attributes. Plus if I cache things I can use actual node instances instead of individual strings, making graph traversal easier. Then an abstracted query API could be put in place that would be at the right level to represent combined queries so as to minimize any possible network traffic needed to fulfill a lookup for something.

Overall I am happy with the thing. I have already gotten some use out of it. And it reminded me how working with the Internet makes you think in a different way.