Tag Archives: RSS

Spammers and Scammers

Even though I don’t get many comments (*sniff*) I know how many readers out there from the server and feed stats. Unfortunately I also know that I spend some time each day removing spammed comments from the filters.

These come in two types, there’s the "I don’t agree 100% with what you say, but you have made a valid point, I shall have to consider this some more – meanwhile, buy some viagra" type of comment. There’s also the "I’m going to take your content, and republish it and/or link to it, hoping you’ll automatically give a reciprocal link to my site" type.

Both types do not survive, the links rarely make it onto the live site, and the net result is that some of my time is wasted for no real gain by the spammer.

I know that the chances of a real spammer noting this and taking this site off of whatever list it’s on is marginal – but I felt like going ‘grr’. Please forgive.

There is one site in particular, which publishes loads of excerpts from here. There are links back (which is good, I suppose) – but the net effect is that I see loads of trackback links and have to clear out of the filters. I’m not going to link back to something which is essentially the same post that I wrote!

I can’t work out why they’re doing this – there’s no obvious ads on the site. Possibly it’s someone who just wanted a site without the hassle of doing their own content. If it was an occasional thing, e.g. a google reader ‘share’, fair enough – but it seems to be a systematic use of any post on a particular topic with little discernment. Syndication like that may not be too bad, but if automatically republishing, it is nice to ask. I might have said yes. Right now? The answer is No. Stop it.

Feeding Sed with XML and producing a GUID

I’m trying to bring more exercise into my life (have been for a while!) and I use the website ‘mapmyrun‘ to log my activities (it has sister sites, mapmyride and mapmytri which are effectively just ‘skins’ on the basic design). Using these sites provides an unambiguous record of how much (or how little) I do, which is somewhat motivating.

There is an RSS feed of my activities, but it contains a little too much information (I don’t want to broadcast to the world how much weight I need to lose!)

Therefore, I ‘clean it up’ using a script.

Here is the script I currently have:


cd ~
wget -q http://www.mapmyrun.com/rss/rss_workouts?u=xxxxxxx --output-document=exercise.rss
grep -v "<b>Weight:</b>" exercise.rss | grep -v "<link>" > exercisejournal.rss
rm exercise.rss

sed '
s/<\/b>//g
s/<channel>/<channel><link>http:\/\/www.mapmyrun.com\/<\/link>/g
s/<br\/>//g
s/<b>//g' <exercisejournal.rss >public_html/finalfile.rss

I then take the feed and give it to feedburner, which then tidies it up further. I decided to put the result into ‘friendfeed’ – but it wasn’t taking.

So I validated it:

This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.

line 9, column 43: item should contain a guid element (158 occurrences) [help]

Duration: 00:52:00 (HH:MM:SS)

Could it be simply that friendfeed doesn’t like the feed due to lack of guid?

No worries, I’ll write an extra sed line which takes the date,time and possibly other info to create a guid for each item…. but I cannot get the syntax right; I don’t even know if this is possible, given that the date and time are on different lines!

Why possibly include other info? This may be needed for when I do two bits of exercise on one day which have the same duration (I’d include kcal and the heart rates, I think.

There has to be a neat way to generate a unique guid for each item and insert it…. can anyone see how to do this? Is sed the correct command, even? I’d be grateful if you could let me know. The xml prior to feedburner doing its thing is here.

Reader and Bloglines

As I’ve previously discussed, I’ve recently moved from Bloglines to Google Reader. So far, I’m loving it – the ‘share’ feature alone is great (and will be better the more of my acquaintances use the site… hint…hint…). As well as appearing on murky.org‘s homepage (as I type), my shared links are available here, as well as within your own Reader account, if I’m one of your contacts.

Heather Hopkins has done an analysis of the two sites. Bloglines is ahead of Google Reader (but I seem to recall that it did have a head start). Interestingly, users of Google Reader tend to read much more news, and users of Bloglines tend to view much more photography.

What’s interesting to me is how (with a little wobble) the sites have tended to move up and down together, one isn’t pulling ahead of the other, despite how bloglines decided to (factually) spin it by pointing out they were ahead.

I’ve used Reader for about a week now. Long enough to get the feel, not long enough to be locked in. The ideal time to make a judgement.

Bloglines somehow feels, to me, a little more developed. One can keep items unread (though google has the ‘star’) (UPDATE: This is actually wrong, there is an unread feature in reader – thanks Aq) and the feeds are sorted by feed. In Google Reader the feeds seem to be arranged chronologically, which isn’t always desirable (I’d want to sort oldest first, or by feed with oldest first) (Update: I had a blind spot, you can change the sortorder).

The general interface for reader is just slick. You can select one feed at a time, but I just hit ‘all items’ and scroll with the mouse wheel. As items are displayed they’re marked as read (you can scroll up if something zoomed past too fast). Hitting the ‘star’ marks the item as something you want to look at later. Hitting ‘share‘ will do the obvious of sharing it with your contacts (who use reader) as well as adding it to a webpage.

When they first implemented ‘share’ they got it very wrong indeed, everything was shared. Bad, bad idea. What happens now is that one manually clicks ‘share’. Each feed can be ‘tagged’, I have tags like ‘cycling’, ‘comics’, ‘rugby’ and ‘science’. You can make it so that everything in a given tag is shared (my tags are all private). I don’t see any need to do this – if someone wants everything in a feed they can subscribe themselves! The sharing feature, right now, is pretty damned good – and it works ‘as expected’, which is about the best that any software feature can expect.

Instead of ‘sharing’ (for public consumption), one can ‘star’ (to make it easier to find something later). It is so easy to ‘star’ something – very easy. So easy that one can end up with loads of items to read ‘later’ (I’m starring something for more considered perusal)… but then, it’s very quick to scroll the starred items – and of course, there is a top notch search function. (There are keyboard shortcuts)

I’ve only found one problem with reader. That is occasionally it will say there is one new article, but refreshing everything doesn’t reveal it. This is a weird bug, but wait a bit and it generally clears as other articles come in.

The other issue is pretty minor. It’s the stats. There are a whole load of metrics which help you to pick out the feeds you read all the time, and those which you always miss. These seem to be based on if a feed is displayed (so I tend to read 100% of items as I view all items and scroll) – I’d much rather have a list based on how long each item was on screen for. For example, I scroll through slashdot pretty fast, stopping occasionally, but I will tend to linger over sites like xkcd and Bike Noob, I will click through to sites like Yehuda Moon. Stats based on clicks and time on screen would be much more useful to me than simply the fact that an article has been on screen for a few seconds.

Bloglines does share some of this, but on the whole, I think I’m a Google Reader convert. I did look at it once before – and moved back. This time, I gave it a chance and I’m staying. There’s something about the way it’s put together which is really nice – now, if only more of my friends were using it to ‘share’….

…. and it’s been built with the blind in mind too, that’s got to be a good thing.

Of course, this site’s feeds are easy to access:

Main Feed:Add Murky.org to GoogleSubscribe with Bloglines

Comments: Add Murky.org Comments to GoogleSubscribe with Bloglines

Del.icio.us Reader

For several years now, I have managed my RSS feeds by using bloglines. Bloglines is pretty good – but this won’t mean anything if you just thought ‘what’s an RSS feed?’

Many sites now have an link which is friendly for computers. This is called an RSS feed (or an ‘atom’ feed – the differences aren’t important). An RSS ‘aggregator’ is a program which can sit on your computer, or it can be a website which you log into, which monitors the RSS feeds you have subscribed to and then brings all that info together in one spot. It’s like an email inbox for the web. You don’t have to monitor all your favourite sites for changes, you just monitor the aggregator and all the changes appear there. This site has it’s own RSS feed, as does most sites, from the BBC to xkcd, Yehuda Moon and WWdN.

You can often tell if a site has a feed by spotting this icon (it might be in the address bar).

RSS Icon

This site also has feeds for comments left on the site.

Some websites don’t advertise the feeds, but if you put the address of the site into your feed reader, it can sometimes find the feed for you. (Remember though that not all sites have feeds).

Anyhow, as I mentioned, for some time I’ve been using bloglines. I’ve just moved over to Google Reader. (This was a simple job, involving ‘exporting’ my existing subscriptions from bloglines and then importing them to Reader).

Reader has some nice features. One of my favourites is ‘sharing’. If I see something I like, I can ‘share’ it. People who have me as a contact can see my ‘shared’ items if they use Reader. An RSS feed (surprise!) is also generated, and people can subscribe to this by copying the link and pasting it into their aggregator, and a webpage is generated too.

Another nice feature is that you can use a ‘bookmarklet’ to ‘share’ any item you find on the web. A bookmarklet is something that you add to your browser bookmarks, and it does stuff when you click on it.

This is pretty good, and has the capability of replacing some of the functionality of the excellent del.icio.us. Although, I think I’ll be using del.icio.us for quite some time still. “What is del.icio.us?” I hear you cry (well, possibly think).

It’s what’s called a ‘social bookmarking site’.

Essentially you have some ‘bookmarklets in your browser, and when you find a site you’d like to bookmark, you click the bookmarklet and it is added to del.icio.us.

Using ‘tags’ will allow you to find the site again. The nice thing is that you can search the site for tags added by other people (and yes, you can make sites you bookmark private to conceal some of your more esoteric tastes).

My bookmarks are here, and the eagle eyed might notice that there is a feed for the bookmarks (at least, for the public ones).

Del.icio.us also features something called an ‘inbox’. If I see a link that I think a friend might like, I can tag it as for:friend. E.g. someone who spots a link I might like can tag it for:murkee – and it will appear in my inbox – my inbox is private, but it too has its own RSS feed which I can subscribe to.

I don’t even have to remember the login names for my contacts to send them a link, as I can ‘add them to my network’ and it will remember them for me, and I can send a link to their inbox my clicking their name when I bookmark. It’s simple to add someone to your network (must be logged in).

I hardly ever visit del.icio.us itself, though I use it daily. My inbox is monitored via Reader, and I use handy buttons on my browser to search the bookmarks if I want to find a link again. It’s really nice.

The big advantage of all this is that wherever I go, I can find my bookmarks. I can keep up to date on sites I like to read. It’s good. (I also have a script which regularly backs up all my bookmarks by downloading, you guessed it, an RSS feed).