Thanks for checking the page, bpfh. I'm not sure what's happening. I haven't tried viewing the Web site on Mac, but I can view it both on Windows and Linux (Ubuntu) with Chrome, FF, and IE8. Can you please try again, and let me know if you still have problem? The Web site (which is running on the same JVM of the Web Service) uses Twitter Bootstrap, and it works from my Android devices as well. Thanks!
D*ng, LiquidSummer. I've been using this service for almost a year with no problem, and you broke it. :)
It appears that, because it recursively calls it, the call eventually times out. (Google App Engine has this time limit of 10~30 seconds.) I'm not sure if I'll have a solution for this, but I can at least catch the exception. I'll need to look into it further.
OK, I figured it out. :) The problem was, we only support HTML pages at this point. The targetUrl you specified did not return a valid HTML page (it's JSON), and the application just returned 404 HTTP status code (since it couldn't find any HTML content), which was by design. (Note that this API is supposed to be used by a program not from a Web browser.) Anyways, it had been a while I actually looked at the code, and it was "fun" to look at the code again. :) I have yet to find a "bug". grin
I don't get it, is a service for developers to get the page title for a html page, so I must make a request to this service and learn it's api instead of making the request directly to that page and run a very simple regex?
Givan, yes and no. Clearly, you're right in that it's one more thing to learn (a particular variant of REST API). But, there can be many benefits of using a Web service like this (or, another app/service/abstraction layer, etc.) in some situations. For example, suppose that you want to get the meta description of a certain Web page using a Javascript (from your own HTML page). The Web page may happen to be large and there can substantial network latency, etc. In many cases, you do not want to do it on the fly every time your page is loaded. You may want to implement a storage or caching layer on the server side, etc. PageSynopsis provides such service "out of the box". It also supports "asynchronous" fetching, periodic refreshing, and so forth. This is a very simple service, but I use it from different apps of mine (and, I don't have to replicate this functionality across different apps). Thanks for checking it out.
Why make it a service? Why not open source it and let anyone run it from their machines?
I'm sorry but even for the client side, I'd feel happier with my users making a request to my own server where I can definite my own caching as per my applications needs.
Calling a third party service should be reserved for advertisements, tracking, and queries to proprietary data (e.g. Google Maps).
I'm not disparaging your efforts here, no doubt its a fine service but I'd want to run it myself not have it as a SaaS.
True_religion, Great point. As a matter of fact, PageSynopsis is just part of my larger effort.
There are many different ways we can use, benefit from, others' (other developers') work. Traditionally, using a library and linking it into your own code was the primary way to use other people's work. (There are pros and cons.) There have been many different efforts for the last decade or so to make "reusing" others' software "easier". You may recall things like component-based software development, etc., or more recently, certain architectural designs/paradigms such as SOA, and so forth. Open source software is another way in which you can "reuse" others' work.
I have no problem with one or another of these approaches. I use a lot of open source software and I open sourced a lot of my software before (even before open source was considered an important part of the development community). I was a big believer of component-based software development (which never realized).
This is just a different effort. Do you really want to run hundreds of "small" services yourself? Just hypothetically, if you can find (virtually) every functionality you need as a Web Service, do you still need to code it into your own program or run it as your own service? This is a rhetorical question, but I see this as a future. I believer that having a uniform interface (e.g., REST based WS) can make this dream a reality. REST-based Web services have been getting popular for the last several years with wide "install bases". I think we are on a good track (so far).
More to the point, the program part of the PageSynopsis is not that much. Any competent developer can probably implement this in a day (or, for even less) including all extra functionalities. But, why spend a day? And, spend more hours maintaining it, when/if the service were already available? (Also, that's hundred days for hundred developers.) I believe that the real benefit of PageSynopsis (and, other services I'm currently developing) is that, if you choose so, you do not have to worry about anything other than just the "interface". I developed this service almost a year ago, and amazingly it just works for me to this date. (Google does heavy lifting for me.) No more extra jar files I have to worry about. No maven, ant scripts to maintain.
Parsing HTML with a RegEx pattern is considered very bad practice, there are other more robust ways of scraping. For example, in Python, BeautifulSoup.
You do have a fair point though, this functionality does exist in most major programming languages.
On Tue May 22 20:03:29 EEST 2012, all I get is a page with navigation and a greyish pattern background. Could not figure out what to do with it. This is with Chrome and FF on a Mac.