I've been fiddling with the
various tools offered by Google for optimising my web presence. It's a rather new domain for me coming from the primarily systems and applications layers so I'm discovering all sorts of stuff. One thing that popped up was the requirement for a
Sitemap file in order to optimise the indexing. This struck me as a good idea so I went looking for toolkit for generating one. I found a few, but it was still a fairly manual process unless I wanted to pay for a service and then I still had the issue of getting the resulting file onto my server.
I've been poking around with the file structures for the built-in Wiki and Blog recently in order to better understand how I can tweak the system and other stuff like backups. So after looking at the structure I noticed that I could generate a sitemap pretty easily by reading the plist file associated with each page on the site.
Which led to this script here. Feel free to copy and modify it for your site.

#/usr/bin/perl
#############################################################################
# SCRIPT: makesitemap.pl
#
# AUTHORS:
# alphageek@infrageeks.com
#
# DESCRIPTION:
# Quick and dirty script for scanning a given group directory for wiki and
# blog entries and spitting out a sitemap file suitable for indexing tools
# like Google.
#
# CHANGELOG
# 09-05-2008 : AG : Initial version
# 15-05-2008 : AG : Added a check for pages that exist physically, but have
# been deleted from the environment
use strict;
use Foundation;
my $header = '<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
my $footer = "</urlset>";
my $content = "";
my $files = `find /Library/Collaboration/Groups/infrageeks/ -name page.plist`;
foreach my $file (split(/\n/,$files)) {
my $plist = NSDictionary->dictionaryWithContentsOfFile_($file);
#### check for deleted entries first !
my $deleted = 0;
my $value = $plist->objectForKey_("deleted");
if ( substr ( ref ( $value ), 0,11 ) eq "NSCFBoolean" ) {
$deleted = $value->description()->UTF8String();
}
if(!$deleted) {
my $value = $plist->objectForKey_("uid");
my $uid = $value->description()->UTF8String();
my $url = "<url>\n\t<loc>http://www.infrageeks.com/$uid/</loc>\n";
$content .= $url;
$value = $plist->objectForKey_("modifiedDate");
my $lastmoddate = $value->description()->UTF8String();
my @mydate = split(" ",$lastmoddate);
my $lastmod = "\t<lastmod>$mydate[0]</lastmod>\n";
$content .= $lastmod;
$content .= "</url>\n";
}
}
open (OUTPUT, ">/Library/WebServer/Documents/Infrageeks/sitemap.xml") || die;
print OUTPUT "$header\n$content$footer";
close OUTPUT;
Comments