blog import

Cook’s journal spans nearly three years from August, 1768 to June, 1771. That’s over 1000 blog entries if I create one post per day. That sounds like a lot of manual effort, so I’ve created some ways of automating the process.

I was able to find several sources of the journal in html format on the web. Each daily journal entry is typically a single <p> element in these files. By examining the structure of an exported WordPress blog file (.wxr), I was able to devise an XSL transform to convert the <p> elements into the <item> elements that WordPress expects when importing a blog.

First, I needed to edit the html to give it the following structure, inserting higher level elements for journal, year, and month:

<journal>
      <year value=’1768′>
          <month value=’September’>
               <p>Wednesday, 14th. First part fine, Clear weather, etc.</p>
               <p>Thursday, 15th. Squals of Wind from the Land,  etc.</p>
               <p>Friday, 16th. The most part fine, Clear weather. etc.</p>
          </month>
      </year>
</journal>

Next, I could apply the following XSL transform to generate the <item> elements. The transform creates a list of <item> elements which I pasted into an RSS wrapper, based on a sample .wxr file. The same categories and tags are applied to every post, and each one has a publication date 243 years later than the original i.e. the journal entry for 26th August, 1768 is scheduled for publication on WordPress on 26th August, 2011.

(The farsighted among you will already have perceived that I will have a problem next year. 2012 is a leap year so the journal entry for  Wednesday, 1st February, 1769 will be posted on Wed, 29Feb2012. I’m trying not to worry about that yet.)

<?xml version=”1.0″ encoding=”UTF-8″?>
<xsl:stylesheet version=”2.0″ xmlns:xsl=”http://www.w3.org/1999/XSL/Transform&#8221; xmlns:fo=”http://www.w3.org/1999/XSL/Format&#8221; xmlns:xs=”http://www.w3.org/2001/XMLSchema&#8221; xmlns:fn=”http://www.w3.org/2005/xpath-functions&#8221; xmlns:excerpt=”http://wordpress.org/export/1.0/excerpt/&#8221; xmlns:content=”http://purl.org/rss/1.0/modules/content/&#8221; xmlns:wfw=”http://wellformedweb.org/CommentAPI/&#8221; xmlns:dc=”http://purl.org/dc/elements/1.1/&#8221; xmlns:wp=”http://wordpress.org/export/1.0/”&gt;
<xsl:output omit-xml-declaration=”no” method=”xml”/>
<xsl:template match=”journal”>
    <items>
        <xsl:apply-templates select=”year”/>
    </items>
</xsl:template>
<xsl:template match=”year”>
    <xsl:apply-templates select=”month”>
        <xsl:with-param name=”year” select=”@value”/>
    </xsl:apply-templates>
</xsl:template>
<xsl:template match=”month”>
    <xsl:param name=”year”/>
    <xsl:apply-templates select=”p”>
        <xsl:with-param name=”year” select=”$year”/>
        <xsl:with-param name=”month” select=”@value”/>
    </xsl:apply-templates>
</xsl:template>
<xsl:template match=”p”>
    <xsl:param name=”year”/>
    <xsl:param name=”month”/>
    <xsl:variable name=”day” select=”substring-before(.,’.’)”/>
    <xsl:variable name=”d” select=”substring(.,1,3)”/>
    <xsl:variable name=”dn” select=”format-number(number(translate(substring-after($day,’, ‘),’dhnrst’,”)),’00’)”/>
    <xsl:variable name=”m” select=”substring($month,1,3)”/>
    <xsl:variable name=”pubYear” select=”string(number($year) + 243)”/>
    <xsl:variable name=”postMonth”>
        <xsl:call-template name=”getMonthNumber”>
            <xsl:with-param name=”monthName” select=”$m”/>
        </xsl:call-template>
    </xsl:variable>
    <xsl:variable name=”postDate” select=”concat($pubYear,’-‘,$postMonth,’-‘,$dn,’ 12:00:00′)”/>
    <xsl:element name=”item”>
        <title>
            <xsl:value-of select=”concat($day,’ ‘,$month,’, ‘,$year)”/>
        </title>
        <pubDate>
            <xsl:value-of select=”concat($d,’, ‘,$dn,’ ‘,$m,’ ‘,$pubYear,’ 12:00:00 +0000′)”/>
        </pubDate>
        <xsl:call-template name=”boilerplate1″/>
        <content:encoded>CDATA1<xsl:value-of select=”.”/>CDATA2</content:encoded>
        <wp:post_date>
            <xsl:value-of select=”$postDate”/>
        </wp:post_date>
        <wp:post_date_gmt>
            <xsl:value-of select=”$postDate”/>
        </wp:post_date_gmt>
        <xsl:call-template name=”boilerplate2″/>
    </xsl:element>
</xsl:template>
<xsl:template name=”getMonthNumber”>
    <xsl:param name=”monthName”/>
    <xsl:choose>
        <xsl:when test=”$monthName=’Jan'”>01</xsl:when>
        <xsl:when test=”$monthName=’Feb'”>02</xsl:when>
        <xsl:when test=”$monthName=’Mar'”>03</xsl:when>
        <xsl:when test=”$monthName=’Apr'”>04</xsl:when>
        <xsl:when test=”$monthName=’May'”>05</xsl:when>
        <xsl:when test=”$monthName=’Jun'”>06</xsl:when>
        <xsl:when test=”$monthName=’Jul'”>07</xsl:when>
        <xsl:when test=”$monthName=’Aug'”>08</xsl:when>
        <xsl:when test=”$monthName=’Sep'”>09</xsl:when>
        <xsl:when test=”$monthName=’Oct'”>10</xsl:when>
        <xsl:when test=”$monthName=’Nov'”>11</xsl:when>
        <xsl:when test=”$monthName=’Dec'”>12</xsl:when>
    </xsl:choose>
</xsl:template>
<xsl:template name=”boilerplate1″>
<dc:creator><![CDATA[UUUUUUU]]></dc:creator>
<category><![CDATA[Australia]]></category>
<category domain=”category” nicename=”australia”><![CDATA[Australia]]></category>
<category domain=”tag”><![CDATA[Captain Cook]]></category>
<category domain=”tag” nicename=”captain-cook”><![CDATA[Captain Cook]]></category>
<category><![CDATA[Discovery]]></category>
<category domain=”category” nicename=”discovery”><![CDATA[Discovery]]></category>
<category domain=”tag”><![CDATA[Endeavour]]></category>
<category domain=”tag” nicename=”endeavour”><![CDATA[Endeavour]]></category>
<category><![CDATA[Exploration]]></category>
<category domain=”category” nicename=”exploration”><![CDATA[Exploration]]></category>
<category domain=”tag”><![CDATA[First voyage round the world]]></category>
<category domain=”tag” nicename=”first-voyage-round-the-world”><![CDATA[First voyage round the world]]></category>
<category><![CDATA[History]]></category>
<category domain=”category” nicename=”history”><![CDATA[History]]></category>
<category domain=”tag”><![CDATA[James Cook]]></category>
<category domain=”tag” nicename=”james-cook”><![CDATA[James Cook]]></category>
<category><![CDATA[Maritime]]></category>
<category domain=”category” nicename=”maritime”><![CDATA[Maritime]]></category>
<category><![CDATA[Naval]]></category>
<category domain=”category” nicename=”naval”><![CDATA[Naval]]></category>
<category><![CDATA[New Zealand]]></category>
<category domain=”category” nicename=”new-zealand”><![CDATA[New Zealand]]></category>
</xsl:template>
<xsl:template name=”boilerplate2″>
<excerpt:encoded><![CDATA[]]></excerpt:encoded>
<wp:comment_status>open</wp:comment_status>
<wp:ping_status>open</wp:ping_status>
<wp:post_name/>
<wp:status>draft</wp:status>
<wp:post_parent>0</wp:post_parent>
<wp:menu_order>0</wp:menu_order>
<wp:post_type>post</wp:post_type>
<wp:post_password/>
<wp:is_sticky>0</wp:is_sticky>
<wp:postmeta>
<wp:meta_key>_edit_lock</wp:meta_key>
<wp:meta_value><![CDATA[XXXXXXXXXX:YYYYYYY]]></wp:meta_value>
</wp:postmeta>
<wp:postmeta>
<wp:meta_key>_edit_last</wp:meta_key>
<wp:meta_value><![CDATA[YYYYYYY]]></wp:meta_value>
</wp:postmeta>
</xsl:template>
</xsl:stylesheet>

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: