All I could think while reading through the getting started was: that is an awful lot of added text. After a little more thought: that is an awful lot of added work. And while it won't be hard to have tools that make the process easier, the sort of work that goes into adding that data can never be completely automated (otherwise, we would have no need for it). Given that all the search engines will be using it, all major sites basically have to implement this or they risk falling in their rankings.
So, at the end of the day, Google, Microsoft, and Yahoo just made web development more expensive. They probably also just made the web a better place, too.
Sites that don't employ this don't risk falling in their rankings. This added data allows for richer snippets (which absolutely increase clickthrough ratio), but that won't directly make you rank higher (or lower it their absence).
If your website employs an SEO or webstandardista, you should already have your sites marked up with metadata. Reviews and rich breadcrumbs etc. have been around and supported for years now.
I suppose that since Google wants to solve these problems algorithmically first and foremost, that many of these structures already get recognized. Right now you don't _have_ to mark-up your breadcrumbs, for them to still appear as rich snippets on your search result listing. Google recognized the structure without added mark-up.
For now, I will build the new types in my CMS like a good web developer. That won't cost me any time in the future, and now I'll have a way to separate myself from those that won't add schema's or metadata to their mark-up. So, at the end of the day, I just got more expensive :)
"Google currently supports rich snippets for people, events, reviews, products, recipes, and breadcrumb navigation, and you can use the new schema.org markup for these types, just as with our regular markup formats. Because we’re always working to expand our functionality and improve the relevance and presentation of our search results, schema.org contains many new types that Google may use in future applications."
"Google doesn’t use markup for ranking purposes at this time—but rich snippets can make your web pages appear more prominently in search results, so you may see an increase in traffic."
Google likely uses clickthrough rate (CTR) in their algo. If your site has a high CTR, it should hypothetically rank higher, so it makes sense for them to include it in their algorithm - to the point that it can't be manipulated.
So, if the metadata doesn't directly increase rankings (which I'm pretty positive it won't), it can indirectly do so by grabbing the users eye and improving CTR, which I am most certain it will.
Some people were already doing this before, but with other formats. I did a little comparison between this and the one Best Buy currently uses, GoodRelations.
Well, the two gists don't quite encode the same information. In the schema.org example, one would still have to write code to parse the string dates. In the RDFa example, one could link several sites on the shared purl.org URL of, e.g., Friday, to answer questions like "which restaurants are open on Monday". That's the whole Linked Data idea.
On the other hand, the RDFa way is more painful when the opening hours differ between the days of the week.
The datetime attribute is designed to be easy to parse. They stick to a consistent format on schema.org. They reference ISO 8601 which is quite a bit more complicated, but hopefully they'll add something saying they only support a tiny subset of what ISO 8601 allows to make it easier to write tools.
Interesting idea; I'm a little rusty on my Django knowledge but I guess you would define a custom Model class for each data type, which in turn knows how to render itself with the correct markup.
I'm torn as well. It's a lot of new information to remember and creates a lot of extra work but if the content is more accessible as a result then it might be worth it. I'm also a little wary of the fact that it just seems tacked on to the HTML but I can't think of any other way to handle it.
I'm guessing this is like salt... a little dash'll do it.
Adding a few tags to your HTML templates is "expensive"? With Google, Bing, and Yahoo all supporting this, I don't think it will be long before formtastic-like plugins start popping up to make this really easy.
Exactly. We already have to support multiple browsers, multiple display sizes, ARIA roles, and more. I don't mind the extra work for the handicap (ARIA), but looking up and adding all this new meta information merely for search engines is going be a pain. In the end, this extra work makes it easier for the search engine by requiring the developer to do more work. I would prefer the reverse, otherwise maybe I need to get into the search business. They are creating a standard that will make it incredibly easy for future search engines.
It depends how you present the idea of adding machine-readable information to pages. After all, nobody is forced to do it, so you need to show some benefit before the "semantic web" will happen.
Schema.org is doing it with the SEO angle: mark up your pages like this, and they'll be presented better in search engines.
With VIE (https://github.com/bergie/VIE) we take a different angle: mark up your pages with RDFa, and they'll become editable.
This is going to sound curmudgeonry but it is seems like one more way search engines want to use your data without giving you the page view. It makes allot of technical sense and I can imagine some really great ways to use this data but in the end I guess I would just need to ask "what's in it for us- the content providers?"
I think that's shortsighted thinking. This is about improving the user's search experience. If that happens I think everyone wins.
Most mainstream users are not tech savvy. Imagine a traveling person arrives in a city and spontaneously decides to see a movie. They enter the search 'good movie playing in nowheresville'. As it stands now their query will likely be matched by the keywords 'movie', 'playing', and 'nowheresville'. The returned results might include a news article about local theatres, with no actual focus on reviews. The searcher might get frustrated and just decide to rent a movie instead. However, with schemas in wide use search engines will know exactly what web sites are talking about movies and whether it's in the context of reviews. The searcher can then be passed on to the relevant site.
In other words, do you think it's better to tell search engines this is sort of what I have or this is exactly what I have?
Information in specific types — including reviews — exposed using microformats, RDFa or microdata has already been used by Google for over 2 years, they call it "Rich Snippets", and it does improve the quality of experience for users, assuming that you equate an increase in clickthroughs to mean that the user percieves that page to be more useful compared to other SERPs results (and anecdotally I always go for results which include rich snippet information gleaned from pages with the required semantic enhancements).
This announcement is not the proposal of a new technique, but rather the extension of one which is already working and is a good thing for the web.
For most of my "quick searches", I already have the answer from within the search results listing. It looks like it will go one step further, we are not going to have to leave the results page to have complex answers.
I am not sure if I want to provide all my hard work in a format which will maybe help the search engines a bit, but mainly the spammers a lot as they will be able to automate the creation of content farms even more.
Mixed feelings... all the world data in a well structured format is a wonder but at the same time, what will be the incentives to create such an easy to digest content if the world at the end do not even know you are the one how produced it?
Kind of the old media against new media dilemma but applied to the new media. Interesting.
Bingo, this my first response as well. What happens when users stop clicking through to content because it's being served up by Google, Bing or Yahoo?
I guess it could actually hurt them as well. If users aren't providing information back to the algorithm in the form of a click through related to a search term, don't the search engines also risk losing a key signal of relevance?
Why unilaterally assert this will decrease click-throughs?
If I see immediately relevant data for restaurant hours, movie times, a person's bio, etc, I'm far more likely to click-through and start looking at a menu, making a reservation, or getting more background.
They may well expect increased click-throughs leading to more site traffic for those who adopt.
Assuming that authoritative sites, at least, don't abuse these schemas, this will help all search engines and data mining/nlp researchers build better models. The biggest gain isn't quick view of search results, it's that now search results will be better in general because Google et al will now understand if a page is really about a person and in many cases, who specifically it is about to point out a specific example.
Information extraction, just got that much easier. Hello, baby semantic web.
if they actually wanted people to use this they'd write better documentation.
if I picked 10 web devs off sitepoint and instructed them to add 10 assertions to an HTML page and didn't give them a validator, I'd be amazed if more than 3 got 80% of them right.
i like the taxonomy though, but honestly i think instances are much more interesting than types... rather than saying "George Washington" a :US_President, can't we say "George Washington" is :George_Washington where :George_Washington is his identifier in Freebase?
Very cool, but why didn't they use the existing RDFa format/keywords but invent their own?
[ see http://en.wikipedia.org/wiki/RDFa ]
[ itemprop vs property ??? ]
I guess when you are big G, you can do anything you want.
It seems that they chose Microdata over RDFa because the latter's syntax was deemed to be unwieldy.
It's not really true that RDFa is more extensible than microdata, there are a small number of missing features related to XML data, but nothing too significant for these use cases; see, for example, [1]
as far as my understand goes, this is basically equivalent to a subset of RDFa.
The differences as I understand them are three:
* schema.org has an implicit vocabulary, if you want to use more than one you can stil use RDFa and use the schema.org vocab explicitly
* some syntactic hacks are missing (curies, chaining) but these do not remove expressiveness. Again, implicit schema.
* typed literals are missing. And once more, not really needed when the schema is only one
I still would have preferred if they had used straight RDFa 1.1, but I think their main motivation is that the way the web is going (HTML5) does not seem to be the same it was when RDFa was initially invented (xhtml).
This solves concrete a finite set of problems now, while in the semweb world people still have to agree on how to express a person's name :/
Hi all:
Two comments:
1. The GoodRelations RDFa vocabulary remains the superior way of sending rich data to Google and Yahoo; even Bing just announced they will support it in the future.
2. As for tooling, here are two super-easy ways of adding rich GoodRelations data to your site:
- http://www.ebusiness-unibw.org/tools/grsnippetgen/
This creates a snippet of a few additional divs/spans based on your data; simply paste it before (!) the respective visible content. You are done ;) For products, see the effect in http://www.google.com/webmasters/tools/richsnippets
- if you are using a standard shop package, e.g. Magento, osCommerce, Joomla/Virtuemart, or WordPress/WPEC, there are free extension modules that add GoodRelations:
This sounds rather similar to the Facebooks OpenGraph protocol (http://developers.facebook.com/docs/opengraph). I wonder if this is related to Google's planned entry into social. It would help if they knew the context of search terms so they could match it up to ads in Gmail for example. or use your gmail conversations to help reorder search results...
The problem I've found with Microformats is that it's a misappropriate use of the class attribute that ends up causing problems on websites that have extensive templates and stylesheets. I'd personally rather have an attribute that doesn't already have another purpose as it's not likely to be abused, intentional or not. I had been looking into using RDFa however the syntax seemed burdensome and unwieldy. Microdata looks like a nice middle ground between the two previously supported rich snippet types.
The issue I found with microformats when I was evaluating it was is that there was no way to automatically transform a microformatted file into a native data structure without first knowing the schema. Writing a microformat-to-JSON parser is hard because you have no way of knowing which classes are significant and which are just there for styling.
I am still not really convinced that it is possible to integrate handcoded schemas for a wide range of use cases into search results in a meaningful way.
The solution Google proposes here will also restrain the content of websites in a lot of ways if it becomes widely adopted. Look at the recipes-example, it defines markup for including nutrition information for recipes:
"Can contain the following child elements: servingSize, calories, fat, saturatedFat, unsaturatedFat, carbohydrates, sugar, fiber, protein, cholesterol"
Every company that serves recipes on the web and decides not to offer this information because it deems other properties of recipes more important is now at a disadvantage. Google will show more information about the recipes of their competitors and presumably also rank them higher because they have included 'valuable' markup information in their recipe.
This approach favors shallow information ressources over complex ones as the former can be more easily parsed by metadata-crawlers.
They neglected a format for job listings, which is unfortunate. We (LinkUp) put one together at http://wp.me/pJYG0-1H, it will be interesting to see if it gets any attraction and if they're actually seeking external input.
For a site put together by search engines, the URL structure for the site search is atrocious. "#q=Product" and not "?q=Product"? Who thought that was a good idea?
Site also looks a bit like spam. Needs more Firefox-esk awesome graphics, imo.
The same principle applies to pagination. If you can do javascript pagination #page=2 vs. dynamic pagination ?page=2 you are nearly always better of with the hash pagination. If you do it right, you get the benefit of a single page, with the added bonus of being able to bookmark a certain page and having browser history working.
What's going to stop people from gaming this by doing things like adding fake 5-star reviews to their website? (especially brick and mortar stores that show up in google maps/places)
Putting aside the technology and schema decisions they've made IMO its great to see these three throwing their weight behind some common metadata even if it does step on some toes.
Now if they'd only add some schema targeted towards downloadable public data sets. I'm dying for a good global public dataset search beyond competing data markets and data.gov.* sites.
There's a lot on schema.org for social media websites/business lookup but should be more for open data. I was looking for a linked data schema to represent financial transactions (X paid Y $999 for Z) but schema.org only goes so far as Sales. XBRL explictly states it is not for "A transaction level activity".
I thought that snatching the type of content of my pages should be their job. I'm so naive.
No problem, I'll add a few kbytes to every single page of my sites, so I can replicate the information I've already stated in a number of sitemaps, video sitemaps, headers and XML files.
P.S: I'm not agaisnt standardization at all, I'm just saying, this comes a bit late.
Nothing an xml data island could not have solved. Even an external xml data island with internal references for better performance and less clutter. Even an external JSON data island would have been better for web consumption.
at this point, page speed is affected by things like http requests and javascript, something as insignificant as a couple kilobytes of compressable text would have an impact measured in the microseconds
Seems that if you use this, your documents won't be able to be considered 'valid' by validators (tested on the w3 validator). Unless, perhaps, you just mark your doctype as html and be done with it?
some tools will throw out misleading errors on 'invalid' markup, and you will have to spend time justifying to clients why these invalid markups are OK. accessibility tools might have problems (not sure now, but I remember some tools years ago having problem on some invalid markup).
The HTML data-* attributes are intended for private data only; i.e., to store data to be used as configuration for a Javascript plugin but which does not hold semantic value and cannot be represented as actual content, whereas microdata (which is what theyre using and is also part of the HTML5 specification) is meant only for describing how the content of the page maps to some schema.
So, at the end of the day, Google, Microsoft, and Yahoo just made web development more expensive. They probably also just made the web a better place, too.