« January 2009 | Main | March 2009 »

February 23, 2009

Ask Is Going “Canonical”!

More often than not, web sites produce pages with the same content, but with different URLs. Usually, this is simply unintentional. For example, the following URLs usually lead to the same page (try it out):

1) http://www.mywebsearch.com
2) http://mywebsearch.com/
3) http://search.mywebsearch.com/mywebsearch/default.jhtml

Of course, sometimes it is necessary for web site designers to make the URL look different even if the contents are the same. For example, in the e-commerce world, it is common to use a dynamic strings called "SESSION ID" to differentiate one user from the others for session tracking purpose.

Most importantly, from a search engine perspective, these are all different URLs and will all be indexed. This leads to a glut of duplicate content in the search Index - same web page, different URLs.

At Ask, we employ sophisticated and proprietary algorithms to detect duplicate content, so that we can present the search consumer with as many useful results as possible. However, the differences in proprietary algorithms used among different search engines may result in different forms of the URL to be presented to users, resulting in inconsistency.

To help meet this challenge, Ask is announcing today that we are joining forces with other major search engines in a timely partnership to support a special search index feature called "canonical URL tag". This feature gives webmasters an opportunity to set a "preferred" name for a page.

For example, a webmaster would include a tag such as this (remembering the last '/')…

<link rel="canonical" href="http://www.mywebsearch.com"/>

...into the header section of http://mywebsearch.com/, and http://search.mywebsearch.com/mywebsearch/default.jhtml.

Ask will use this information to combine characteristics of all three pages into one, and will attempt to take the canonical URL tag as the 'preferred' name for the page.

While web master's preference hint will be strongly honored, other signals may be used in determining the final form when multiple forms exist in the index. If the canonical URL causes 404 error during crawling, or if it specifies a URL on a different domain, the canonical tag will be ignored.

The "canonical" feature represents a timely, relevant, and positive partnership between major search engines. It is a step to ensuring more consistency with regard to treatment of duplicates among all of the engines. It will also put more control into the hands of site designers over how their sites are represented within the search indexes.

At Ask, we look forward to adoption of "canonical" by more and more websites, and to working in conjunction with our search partners. It's an idea whose time has come, and we're excited to be a part of the effort to improve the user experience on the web!

Thanks for reading - and please post a note here if you have any questions at all.

Yufan Hu - VP, Core Search Systems, Ask.com

Posted by Ask.com Blog | Permalink | Comments (0) | TrackBack

February 18, 2009

SMX West, and the Keynote Interview with John Battelle...

Last week, I was fortunate to be a panelist at SMX West in Santa Clara on Danny Sullivan’s panel called 'Ask the Search Engines' (thanks, Danny). During the conference, I took the opportunity to sit in on several panels, where I learned a lot about what is going on in other parts of our industry. I was especially blown away by Danny's keynote interview with John Battelle, where they discussed the intersection of technology and media with a specific focus on the future of the search industry.


John started the discussion by laying a foundation with the travails of the music industry. He explained that, in all industries, successful companies have access to the tools of production, and access to or control over distribution of their product. Distribution in the music industry used to be targeted to a network of retail stores, and music was produced in very expensive studios. The content was then stamped onto physical media by massive machines. When low cost studio replacement software became available on personal computers, and with the Internet providing virtually free distribution, the music industry was forced to change the entire business model, which it did reluctantly.


Battelle also discussed the similar situation that print newspaper industry is in today due to the proliferation of advertising on the Internet. He explained that a local newspaper employs between 100 and 400 journalists, and that the cost of labor, raw materials, and production makes it difficult for a newspaper to make a profit, specifically given that the advertising base is defecting to the Internet. John sees that "newspapers" are necessary for maintaining public trust in government and industry, but with the business model unsustainable, he sees them transitioning to ownership by public trusts.


With the backdrop of monumental changes in the music and newspaper media industries, does John expect similar changes in the search industry? John says that search will transition from 'static search' to 'real time search'. He stressed how products like Twitter are changing the way that people are finding product recommendations, as well as finding answers to their support questions. He used Comcast as an example of a company that has improved its image with technology influencers by putting access to Comcast Care technicians on Twitter. John also sees the format of the search user input changing, from keywords to voice, from discrete query to conversation.


Thanks for reading!


- Keith Hogan, VP, Technology


Posted by Ask.com Blog | Permalink | Comments (0) | TrackBack

February 10, 2009

Moving a Data Center - Big Project, and Big Results...

It has been almost 10 years since we first moved into Ask's original datacenter. Back then, a copy of the entire Internet fit in just a couple of hundred machines. We could be confident that even with the wildest Internet growth predictions, the vast expanses of emptiness of a datacenter would never be consumed.


It turned out that this prediction was indeed correct, but we realized a couple of years back that our ability to satisfy expansion requirements was not going to be limited by physical space, but instead by the availability of electricity. Limited power meant a cap on the number of servers, which restricted our ability to grow the search index. Our engineers were going to have to spend more time on improving efficiency, and less time on new feature development. So we've been in the planning stages for moving to new datacenter spaces since 2006, and we've executed the move of thousands of systems over the past several months.


If you are a web site owner who watches your access logs like a hawk, you most likely noticed that the Ask crawler has not been as active crawling your site. We curtailed our crawling after making the very difficult decision to reduce the search index size and freshness during the move in order to limit risks to our live traffic. We tried to do this in the smartest way possible, so that the impact on users was minimal. The internal metrics that we watch indicated that we met our goals. A few commentators noticed the reduced crawling level and speculated that this was an indication that Ask was in the process of exiting the search engine business. To the contrary, this quiet period was a necessary component of preparing Ask for future expansion and improving the overall search experience for our users. Now that the move is mostly complete, you should start to see Ask becoming more active in your logs.


We're excited about our new infrastructure position because it supports new and innovative core search technology initiatives, and allows us to continually improve the vital basics such as speed, freshness and relevancy. The direct benefits of the new data center spaces are:


 o The new locations are strategically positioned, resulting in an overall better response time for our users.
 o Reliability is higher, given that we're now operating well below peak availability limits.
 o Our new datacenter partners are invested in our success, and consequently are more responsive to change requests.
 o Electricity is cheaper and much more plentiful.
 o More of our power is provided by renewable resources.


I hope you'll keep an eye on what we're doing, and watch this space for more technology and infrastructure updates.


- Keith Hogan, VP Technology

Posted by Ask.com Blog | Permalink | Comments (0) | TrackBack

Opinions expressed here and in any corresponding comments are the personal opinions of the original authors, not of IAC Search & Media and may not have been reviewed in advance.

Blog Search from: Bloglines