Saturday, December 15, 2007

Google KNOL - Accurate Knowledge Source Coming Soon?

I just came across this announcement on the Official Google Blog: Encouraging people to contribute knowledge. Their latest project, called "knol" currently, is quite an ambitious effort Google has planned, and seems to be yet another avenue through which Google will continue to leverage content created by others in order to expand their ever-powerful Google AdWords / AdSense program (i.e., the primary money-making machine for Google corporate).

One excerpt from that Google Blog (by ) that I have to quote, and discuss further, is the following:
There are millions of people who possess useful knowledge that they would love to share, and there are billions of people who can benefit from it. We believe that many do not share that knowledge today simply because it is not easy enough to do that.
I agree with the Google VP's first sentence, as I am one of those millions willing to share all sorts of knowledge about things like software and programming, finance and investment, and even gluten-free baking. I have authored a commercial Gluten-Free Dessert Recipes book, and I share all sorts of Gluten-Free Diet and Baking advice on one of my blogs too -- all as part of that knowledge sharing Google envisions. In fact, I use their rather recently acquired Blogger/Blogspot service to host the latter content - and, I don't mind creating content that they can leverage, since they host my content for free.

I completely disagree with the second quoted sentence. I, and many others, do share our knowledge freely. It is very simple to share our knowledge. Ease of sharing knowledge is not the problem! Ease of getting true "knowledge" to be found (via search engines) is the problem.

What frustrates people like myself is how Google's own page-ranking system will repeatedly give higher precedence in their search results to "knowledge" that is anything but -- quite often presenting myself and others with search results that lead to what are best described as "Internet SPAM pages" (pages full of junk content, keyword-propagation search-result-manipulation crap, or plagiarized content / trademarks / copyrights / etc further used to attract "hits" from search engines for the sole purpose of,... you guessed it,... generating revenue for the site owner through Google Ads). You know it exists. We all know it exists. There really needs to be some better way to distinguish between junk and real, valuable content.

Google claims to have algorithms in place for preventing search index-SPAM from occurring, but many companies advertise services for raising the Google pagerank of websites for anyone that is willing to pay. Fact is, the index-spammers are able to succeed on many occasions. Most do it by linking from many different sites (owned by the same person or company) to a target site using a search-term they wish to target for increase.

Let me clue you in on how to stop this practice Google
(or seriously slow it down) - and I won't even sue you for patent-infringement or anything: use the Internic WhoIs Database to cross-reference the owner / contact info for web-sites, and determine whether linking patterns are nothing more than intra-owner site-links (by people / firms owning hundreds or thousands of domains for the sole purpose of doing this link-promotion stuff)! This should eliminate much of the Internet index-SPAM, by de-prioritizing this type of content.

Next, consider implementing a way to expose what I will call "coerced anchor-text links", which I regularly see on the web. What I mean by this is try to stop the technique where a site owner will offer purported "exposure" or "search presence benefit" to you or such, but only in exchange for you posting a reciprocal link to their site using the exact <a> tag / anchor-text they specify. It is rather widely known that this practice bumps items in your Google search-results. The problem here is how to determine which links are "coerced" and which ones are made because content is worth linking to.

I wouldn't mind seeing a linking-relevance voting system managed through Google. Knol may be a longterm answer to this, at least in part. But, what about content not in "knol"? Once again, the problem is preventing abuse. You don't want index-spammers just finding another new way to "vote" their way to the top of search results with their anything-but-knowledge content, and you don't want them to get their by voting (negatively) against real content (likewise, you don't want competitors voting your content down either). I need to think about this more, but I'm pretty sure there is a simple answer, though it may require access to "private" data -- meaning, to prevent fraudulent votes, steps should be taken to ensure one-person/one-vote-per-link and then run algorithms against those links to be sure they aren't just a new version of index-SPAM created by a network of people.

Will "knol" be impervious to this type of search index-SPAM abuse? Perhaps, but only if it doesn't compete with those that are currently using Google ads as a significant source of revenue with all their internet-junk-content sites (if it competes, junk-content-producers will find a new way to manipulate the system I fear). I don't know what will stop people from emulating the same type of link-propagation technique with Knol, simply by voting their own content up, up, up. When Google makes statements like this:
If an author chooses to include ads, Google will provide the author with substantial revenue share from the proceeds of those ads.
...that is bound to also attract all sorts of lame content just so someone can generate cash from the ads.

I just don't know if "knol" is the answer. Fact is, connecting web-users with knowledge is doable, and can be done without "knol" as a product, using some of the techniques I hypothesized previously to reduce irrelevant (or "junk") content from being returned in search results. But, this is unlikely. Why? Because I think Google has too much of a vested interest in all this current Internet search-engine SPAM that people are forced to navigate through (in hopes of finding real knowledge) only to click one of the Google Adwords / Adsense Ads once they get there. Google may argue that, indirectly, this gets you to the knowledge and content you want, but I'm sorry, that is one seriously weak argument I hope they would never try to defend.

We (the users of the Internet who search for real knowledge) want to get right to what is most relevant, and do so without clicking through a bunch of sponsored ads. I can't help wondering if the new "knol" system Google proposes will improve access to valuable knowledge, or diminish access to knowledge by marginalizing otherwise valuable content just because the authors of some content do not partake in knol, or are unable to partake in knol. Time will tell. All I can do is hope for improvement, because I (like many others) are sick of search-results that are meaningless much of the time.

By the way, when knol does hit production, count me in: I will gladly be an author for gluten-free advice, software development expertise (especially SQL-Server) and more. Why not? I already blog about. I would also go one step further with knol and really format my content as though it was part of an encylopedia or such (I tend to ramble a bit too much on my blog for what I would hope to consider "knol"-ready content).

