Nik's Technology Blog

Travels through programming, networks, and computers

Use Browser Toolbar instead of Address Bar to Avoid Phishing Sites

I've just read a post over at Search Engine Journal about statistics from Hitwise UK suggesting British users are increasingly using browser toolbars to search for domains they know already like tesco.com rather than typing them directly into their browser address bar.

I use this technique a lot because I frequently misspell a domain name or get the wrong domain extension for a website. When this happens more-often-than-not you get a holding page, cyber-squatter site, or worst still a site that attempts to mimic the intended destination in order to "phish" log-in details.
When you use a search toolbar to navigate to a domain the top search result is most likely going to to be the real domain.

Google Paid Link Policing and other more Democratic Ranking Methods

Google's Webmaster Help Center explains Google's policy on paid links and encourages people to report them to Google. Here's a snippet from Google's statement:

"Buying or selling links that pass PageRank is in violation of Google's webmaster guidelines and can negatively impact a site's ranking in search results.

Not all paid links violate our guidelines. Buying and selling links is a normal part of the economy of the web when done for advertising purposes, and not for manipulation of search results. Links purchased for advertising should be designated as such."

Google essentially want websites to designate paid links with rel="nofollow" anchor tags, so link juice or PageRank is not passed on to the website who bought the link. The use of rel="nofollow" anchor tag was originally conceived to stop comment SPAM on blogs and discussion boards, but its use has now spread to the policing of paid links.

I understand the difficulties Google and the other search engines must have in determining when to pass link juice between websites, but leaving the webmaster in control of this is like asking Google to start ranking sites by meta keywords again.

I'm beginning to believe the future of web search lies in the democratic nature of the StumbleUpon, Digg and other social bookmarking methods like (del.icio.us and my favourite ma.gnolia), whereby users vote, tag and bookmark sites. Surely a combination of popularity and search algorithm is the way forward?

Updated: Shortly after I posted this blog entry, Google has been spotted testing Digg-style voting buttons on their results pages!

Updated: Matt Cutts and Maile Ohye posted on The Official Google Webmaster Central blog on 1 Dec 2007 a post that intends to clarify Google's stance on paid links.

Adsense Allowed Sites Flags Up Google Cache Views As Unauthorised

When I read about the new Google Adsense feature "Allowed Sites" a couple of weeks ago, I thought I'd set it up on my account just to make sure no sites were displaying my Adsense code on their own sites, which could end up getting my account banned or flagged as suspicious due to factors outside my control.
Let's face it, if they're displaying my Adsense code, they've probably scraped or copied my site content without my consent, so who knows what else they may be up to!

Anyway I logged into Adsense recently and decided to check out the Allowed Sites page, and this is what I read...

There are unauthorized sites that have displayed ads using your AdSense publisher ID within the last week. Please click here to view them.

So I did click here, but all I got were some IP addresses:

 

Site URL
72.14.253.104
64.233.183.104
72.14.235.104
209.85.129.104
66.102.9.104
216.239.59.104
209.85.135.104
64.233.169.104
64.233.167.104

 

A little intrigued to what these IP addresses were, I decided to investigate further by issuing a trace route command to glean some more information.

C:\Documents and Settings\Nik>tracert 64.233.183.104

The trace route results resolved the IP addresses all to Google. I'm guessing that these are in my list because of people viewing my sites in Google's cached pages; So panic over!
Would be good if Google could filter out it's own IP addresses from the list though, so I don't have to check out each IP individually.

Google add feature to stem stolen Adsense publisher code

Google have added an "Allowed Sites" feature in the Adsense console to stem a problem that has been talked about for a while.
Lots publishers have had their site content stolen and re-purposed in an almost identical fashion on another domain, specifically to earn the criminal money from advertising without spending time and effort writing content themselves.
In some cases the HTML contained the victim's Adsense code, which when uploaded to a "junk" domain with other duplicate content, essentially associated the original publisher with a bad site in Google's eyes.
To protect Google's Adsense publishers from being associated with this crime and having their Adsense accounts potentially banned, Google has developed the "Allowed Sites" feature which allows the Adsense publisher to tell Google which domains it publishes to.

What this won't do is stop people stealing your content and code, nor will it stop people hacking into your web server and changing the Adsense account ID in the Adsense Javascript to the criminals Adsense ID, but this is definitely a step in the right direction.

The Rise and Fall of User Generated Content?

Another day another ludicrous allegation about cyberspace. Apparently..."The vast majority of blogs on top social websites contain potentially offensive material."

This was the conclusion of a ScanSafe commissioned report, which claims sites such as MySpace, YouTube and Blogger which are a "hit" among children can hold porn or adult language. According to the report 1 in 20 blogs contains a virus or some sort of malicious spyware.

The Problem?

User generated content is to blame of course; because of the nature of how the content is built and edited it makes it very difficult to control and regulate.


Even if you were to monitor every post on a website as part of your process, how would you clarify whether a particular portion of text, or Photoshopped image has violated anyone's copyright or intellectual property?


This is a problem the big search engines have as well. With so many SPAM sites scrapping content from other sites, then republishing the resulting mashed content as their own work in order to cash-in on affiliate income generated from SERPS. Is Google working on a solution to stem this SPAM?

EU Intellectual Property Ruling

Another potential blow to websites which rely on user generated content is the European Union ruling on intellectual property which is making its way through the ratification process. This could see ISP's and website owners being charged for copyright infringements even if the data was posted by users of the site.

The Rel Attribute in HTML

The rel attribute is available for use in a few HTML tags namely the <link> and <a> anchor tags, but until recently it has been fairly pointless to use because web browsers did not support the intended functionality of most of the values you could assign to the rel attribute.

The rel attribute has been around since HTML 3 specifications and defines the relationship between the current document and the document specified in the href attribute of the same tag. If the href attribute is missing from the tag, the rel attribute is ignored.

For example:
<link rel="stylesheet" href="styles.css">

In this example the rel attribute specifies that the href attribute contains the stylesheet for the current document.
This is probably the only recognised and supported use of the rel attribute by modern web browsers and by far the most common use for it to date.
There are other semantic uses for the rel tag, beyond those which a browser might find useful; such examples include social networking, and understanding relationships between people; see http://gmpg.org/xfn/intro, the other use which has been talked a lot about recently concerns search engine spiders.

Search Engines and the rel Attribute

Recently Google has played a big part in finding another use for the rel attribute. This time the HTML tag in question was the humble anchor tag.
Google and the other major search engines (MSN and Yahoo!) have a constant battle with SERP SPAM which clutter their results and make them less useful. These pages make their way into the top results pages by using black hat SEO methods such as automated comment SPAM, link farms etc.
Rather than adopt a complex algorithm to determine these SPAM links which increase target pages search engine vote sometimes called "Page Rank" or "Web Rank", the search engines (Google, MSN and Yahoo!) have collectively decided that if blogging software, big directories and general links pages etc use anchor tags with a rel="nofollow" attribute those links will simply be ignored by search engine spiders, yet still be fully functional for end users.
Of course using rel="nofollow" does not mean the links are deemed as bad in any way, every link on a blog comment will be treated in the same fashion. The webmaster is essentially saying

"this link was not put here by me, so ignore it and do not pass any "link juice" on to it".

More on nofollow by Search Engine Watch.

Putting Webmasters in Control

Putting this kind of control in the webmasters hands hasn't been without controversy. People will always try to experiment with ways of manipulating the intended outcome to favour their own goals, such as using nofollow internally in their site etc. Others have welcomed the move as a way of reducing the problem of spamming.

Is there still a place for site newsletters in the web 2.0 world?

More and more sites are adopting XML syndication technologies such as RSS and ATOM which users can subscribe to.

Pull Technology

Rather than being a push technology like email newsletters, RSS is a pull technology. The subscriber is in full control of the subscription, the publisher does not have a relationship with the subscriber, or need to know their email address. This makes unsubscribing very easy and because you don't need to supply an email you do not need to worry whether your details will be sold on by unscrupulous companies.

RSS Adoption

XML syndication has been around for over 5 years or so, but in the early days the RSS readers available weren't up to scratch, so it took a while for the technology to gather momentum. Nowadays there are plenty of good readers, such as Bloglines, Google Reader etc, which are very polished products that support all the major formats.

RSS Advertising

The last nail in the email newsletters coffin will be the adoption of RSS advertising into the mainstream. Currently Google and Yahoo! are performing tests with advertising on these syndication formats. As soon as these are released the already strong relationship Google has with publishers will allow it to rapidly make RSS very lucrative for website publishers.

Syndication Analytics

Until recently publishers syndicating their content via RSS had a hard time analysing their circulation, that's where companies such as Feedburner have found a niche and continue to provide publishers with additional services on top of basic subscription tracking.

Syndication SPAM

Of course syndicating your content is just another method of publishing. First you had paper, then HTML now XML. You can't irradiate SPAM with RSS, people can set up SPAM blogs etc, but it's the subscribers who are in control of their subscriptions. So as a publisher you know that your 500 subscribers reported by your RSS analytics product of choice are actively reading your content or else they'd simply click to unsubscribe from within their RSS reader application. Compare that to a database of registered subscribers dating back several years; are those users viewing your newsletter in their preview pane and pressing delete rather than unsubscribing via an unsubscribe link?

Content is King

The old adage that 'content is King' is truer than ever with RSS syndication. The problem with giving such power to the subscriber is that your content needs to be top-notch in order to keep your subscribers subscribing. Even though there are guidelines specifying opt-out and unsubscribe methods and practices, which newsletter senders must adhere to, the fact is unsubscribing from RSS is far easier and is not reliant on differing geographic data protection laws.

Contact forms, SPAM relay email and the CAPTCHA

Back in January this year I decided enough-was-enough with increasing amounts of automated SPAM coming into my inbox and originating from my site. I decided to do something about it. My contact form has been attracting lots of SPAM bots which were trying their best to relay their SPAM through my site.

My form has always had the To: and From: fields hard-coded however, so I doubt anything ever got relayed, but they all got sent to me anyhow.

As a result I now verify that the form was actually filled in my a HUMAN each time the form gets sent! I've built an ASP CAPTCHA function to achieve this (Completely Automated Public Turing test to tell Computers and Humans Apart), more about CAPTCHAs here.

Accessibility and CAPTCHAs

There are however down-sides to this SPAM free existence. CAPTCHA images in the form that I am using are inherently inaccessible, so I intend to use another system in conjunction with my image CAPTCHA as used my Matt Cutts on his blog.

Website Spam Avoidance - Javascript Code

Use the following JavaScript if you want to display an email address on your website, but you don't want to receive spam mail. It will avoid getting picked up by Spammer's email collecting scripts.

<script language="JavaScript" type="text/javascript">

<!--

var LinkText = "click here";

var e1 = "mail";

var e2 = "to:";

var EmailPart1 = "yourname";

var EmailPart2 = "yourdomain";

var EmailPart3 = ".com";

document.write("<a h" + "ref=" + e1 + e2 + EmailPart1 + "@" + EmailPart2 + EmailPart3 + ">" + LinkText + "</a>")

//-->

</script>