rss feed blog search engine
 
Search rss blog search engine
 
cleverhack dot com  
Released:  11/25/2007 11:20:49 AM
RSS Link:  http://cleverhack.com/feed/
Last View 12/2/2008 8:25:52 AM
Last Refresh 12/2/2008 6:19:28 AM
Page Views 712
Comments:  Read user comments (0)
Save It Add to Technorati Add to Del.icio.us Add to Furl Add to Yahoo My Web 2.0 Add to My MSN Add to Google Add to My Yahoo! cleverhack dot com



Description:



A Blog About Technology, Search Engine Optimization (SEO), Internet Marketing And More.


Contents:

Cuil referrer info (just because you like it)

Because of the hoopla around cuil today, I thought I’d take a peek at this newest search engine’s referrers.

Cuil crawler info. I know I’ve been seeing this bot for the past year or so. Cuil’s crawler is apparently called twiceler (is that a pun?) and the user agent string uses cuill.com which 302 redirects to the cuil.com domain. As of this writing, the cuil Webmaster info URL has been updated from what is in the bot’s user agent string.

Host: 208.36.144.10
*
/
Http Code: 200 Date: Jul 28 15:02:12 Http Version: HTTP/1.0 Size in Bytes: 68965
Referer: -
Agent: Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)

As for cuil visitor referrer info, here you go…

[Visitor’s IP Address]
*
/
Http Code: 200 Date: Jul 28 17:31:24 Http Version: HTTP/1.1 Size in Bytes: 17773
Referer: http://www.cuil.com/search?q=cleverhack
Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.0.1) Gecko/2008070206 Firefox/3.0.1

If you happen to see a “&sl=long” appended after the referrer i.e. (http://www.cuil.com/search?q=cleverhack&sl=long), it indicates that the visitor was using the two column layout. If cuil ever gets significant marketshare, you can bet there will be SEO’s stressing about how their sites show in the two column vs. three column layout.

Otherwise, a cuil visitor presents in your visitor logs pretty much as any other visitor from the big search engines. The IP address belongs to the user (not a proxy like ask.com) and so does the user agent.

As for my thoughts about cuil, I am not impressed with the image thumbnails with the search results, as nearly all I have seen so far have been wildly inappropriate for the results. As for information volume, I haven’t done a statistical survey, but google still presents a volume of results as opposed to cuil.

Technorati Tags: , , , , , , , , , ,




Rogue SEO spells out oh so not awesome

So earlier today I was doing some catching up on Google Alerts for some domains that I manage.

And I kept on finding pages that look like the one below - same formatting, even.

When I first noticed these pages the middle of last week, I took them for a stupidly overzealous SEO who was planting link farms on sites he owns.

Now, I don’t think so - after examining a number of these rogue SEO pages, it looks like someone is taking advantage of an exploit in Apache to post directories full of these rogue SEO pages, to boost their page rank (while adding outside links on these rogue pages to, I guess, appear genuine).

All of the pages I’ve found are on machines running Apache in shared hosting settings with poorly maintained / designed parent sites. That sure as heck points to exploit.

Take for example the page I posted above. The full URL looks like http://destinationconcerts.com/tmp416/cnf336/neurology_49.htm.

Since, like I noted before, the site is poorly maintained which means you can go ahead and browse the parent directories. The main Web site seems to be a homepage (created in Microsoft FrontPage) for a concert promoter in Allentown, PA. The hosting provider is E-Commerce, Inc. And this was just one, out of a number of pages that I found hosted by E-Commerce, Inc. I also found other pages on sites hosted by The Planet and, irony abounding, The Institute for Intelligence Studies at Mercyhurst College.

So, just who is planting these pages and why?

Technorati Tags: , , , ,




Don’t like Shyftr? Block the IP.

This past weekend there’s been a conversation about Shyftr a new RSS service that allows people to read and comment on full text stories on the Shyftr site, rather making the reader click through to the originating blog to comment. The thought is that folks who care about pageviews for advertising will lose out in such a scenario.

So, in the spirit of helping the wider, feathers in a ruffle, blogging community out, I’ve pasted the Shyftr RSS bot info below. The good news is that you can block the Shyftr IP address from accessing your blog (if you already have that capability through your blog hosting solution, etc.). As of present, the IP address is 66.234.234.34.

Unlike other annoying bots, I would not block the user agent in your .htaccess file as the RSS bot software the Shyftr folks are using is the generic MagpieRSS toolset, which is used by other RSS services. Hopefully, the people at Shyftr will rename the user agent to something more uniquely identifiable in the future so you can block via .htaccess.

(Note: Blocking a future unique Shyftr user agent via robots.txt probably won’t work as the crawler would need to fetch the robots.txt file first before fetching your feed and I didn’t see that behavior tonight.)

Host: 66.234.234.34
*
/feed
Http Code: 200 Date: Apr 12 19:48:28 Http Version: HTTP/1.0 Size in Bytes: 6244
Referer: -
Agent: MagpieRSS/0.72 (+http://magpierss.sf.net)
*
/favicon.ico
Http Code: 200 Date: Apr 12 19:48:28 Http Version: HTTP/1.0 Size in Bytes: 1406
Referer: -
Agent: -

Technorati Tags: , , , , , , , ,




Some real people feedback about bookmarklets…

On the MSNBC developer blog, the question was posed How do you share?. Not in the grade school way, but in the newfangled Web 2.0 way.

Overall, the comments from MSNBC readers were pretty… negative. Aside from the “I’ll just paste the link I want to share in an email” or the “I’ll just add the page to my browser bookmarks” or the “they’re tracking your habits for nefarious purposes” comments, other commenters cited just one or two social bookmarking sites (the most popular seeming to be either del.icio.us or digg.com). And a few other commenters wondered, “Hey, MSNBC, don’t you own Newsvine?”

It appears that the zen habits of social bookmarking hasn’t been widely accepted by the at large Internet populace.

Technorati Tags: , , , , , ,




Apple TV

For those of you with Apple TV, do you like it?

I’m thinking of springing for it, seeing as the idea of downloading movies and watching them on my (nearly outdated last of the mohicans CRT TV) does appeal to me. I don’t watch broadcast TV, I don’t have on-demand anything nor do I Netflix.

On the other hand, the iMac is in the family room too and I could, I suppose, hook that up to the TV negating the need for another product from Apple.

Thoughts?

Technorati Tags:




Brute force SEO: NY Times using keyword tagging in the page title tag

Building upon a discussion elsewhere on the Web, here’s some brute force SEO for you.

Apparently, the NY Times is inserting tagging in the page META title tag, in the instances where it seems that article headlines lack sufficient keywords. Normally, the Times just carries the article’s headline into the page META title tag.

For example, in the article headlined The Falling-Down Professions, the page title tag reads as “Economic Conditions-Economic trends-legal profession-lawyers-prestige-doctors - New York Times”.

You see, the page title tag is important for SEO as Google in particular lends much weight to the text contained within the title tag.

All in all, the NY Times approach is definitely an interesting methodology for organizations deploying content management systems and who wish to build traffic from search engines.

Technorati Tags: , , , , , , ,




Guy loses his domain due to a Gmail exploit

Had anyone else read this story of David Airey’s domains being stolen from him because of a Gmail exploit?

Both of David’s domains have been subsequently restored, thanks to the publicity he received this week.

Technorati Tags: , , , ,




Netscape Navigator End of Lifed, The Rest of Us Get A Little Nostalgic

Let’s all take a moment and remember the good old days of the Internet in the 1990s … the Netscape Web browser is being end of lifed as of Feb 2008.

If you didn’t catch Code Rush, a documentary on Netscape which was shown on PBS in 2000, I highly recommend you do so.

Technorati Tags: , , , , , , ,




SWSE - Semantic Web Search Engine

This particular crawler is being deployed from the Semantic Web Search Engine (SWSE) project, which is attempting to crawl the nascent Semantic Web, including RSS and FOAF data.

This is yet another reason why deploying RSS is a good idea for any Web presence.

Here’s a link to the SWSE search demo.

Host: 140.203.154.196
/wp-rdf.php
Http Code: 304 Date: Dec 18 14:56:27 Http Version: HTTP/1.1 Size in Bytes: -
Referer: -
Agent: multicrawler (+http://sw.deri.org/2006/04/multicrawler/robots.html)

Technorati Tags: , , , , , , ,




MSN Live Search - New activity

Has anyone else seen some different activity coming from MSN? What I mean is that I’m seeing the following entries in my search logs, but it doesn’t appear like traditional MSNBot crawler behavior.

Why this activity is different:
1) The originating IP address is from the MSN netblock.
2) There is an alleged referrer that looks like it is from an MSN search http://search.live.com/results.aspx?q=keyword&mrt=en-us&FORM=LIVSOP
3) The user agent is showing as a browser.
4) This activity is showing very close to when I see MSNBot entries in my logs.

And no, the behavior does not appear to be a real life user.

Host: 65.55.165.38
*
/2006/06/17/live-blogging-from-the-philly-blogger-meeting/
Http Code: 200 Date: Dec 17 02:59:16 Http Version: HTTP/1.0 Size in Bytes: 40839
Referer: http://search.live.com/results.aspx?q=podcasts&mrt=en-us&FORM=LIVSOP
Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)

Host: 65.55.165.42
*
/2006/06/20/how-e-commerce-will-be-affected-by-ie-7/
Http Code: 200 Date: Dec 17 03:13:02 Http Version: HTTP/1.0 Size in Bytes: 43238
Referer: http://search.live.com/results.aspx?q=podcasts&mrt=en-us&FORM=LIVSOP
Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)

Technorati Tags: , , , , , ,








Home  
 


Link to us




RSS Feed of new blogs                                                   Home        Feed Map        Submit Feed      Link to Us       Contact