[?] "HOW TO" Create a cummulative Archive for Blogger blogs

Table of Contents:
STEP 1: ASSIGNING CATEGORIES
STEP 2: CHEATING BLOGGER
STEP 3: MANIPULATING THE HTML FILE
STEP 3a: PREFLIGHT THE HTML FILE IN A BASIC TEXT EDITOR
STEP 3b: UNLEASHING THE POWER OF WORD
STEP 3c: UNLEASHING THE POWER OF EXCEL
STEP 4: NAMING #(NAMES)
STEP 5: FINALLY GETTING SATISFACTION FROM BLOGGER




I didn't really look until the other day, but my Cummulative Index by Category seems to be getting a bit of traffic all by itself, and my friend Hobster left a question over there about how one might go about doing such a thing.

Note to Blogger: it would be great if your software did this for us. I have noticed that typepad, for example, will allow the user to assign categories and categorizes each post. It's a nice feature for both the blogger and the bloggee. However, you're still free so I'm still on ly complaining in the theoretical sense.

Anyway, it is relatively easy to do what I did, and it takes about 30 minutes once a month to keep it current. However, it takes a LONG time to set up the first time -- for example, remember that week I spent updating all my posts? That's how long it can take. I'm posting this for all you freaky bloggers who are willing to spend your time doing things like this.
STEP 1: ASSIGNING CATEGORIES

This may actually be the hardest part. If you're a completely anal filing maven, you have to curb your enthusiasm in order to keep the filing system from having categories with only one or two posts in them -- which is a massive waste of filling, if you ask me. If you file by putting everything in one pile (cf. my desk at work), you have to come up with a couple of distinguishing features of your posts that will work to make filing simple.

I think that 5 or 6 categories is great. You might have a different opinion, and that's your problem. You use as many as you like.

For our example, I'm going to use 4 categories:
CATEGORY     #NAME     TEXT SIGNIFIER
-------- ----- --------------
Family FAM !
Friends FRN ?
Enemies NME #
Stuff STF %

STEP 2: CHEATING BLOGGER

Once you have your categories, you have to find a way to cheat Blogger's lack of subtlety. That is, you have to find a way to make Blogger do something it normally wouldn't do. For example, what we need for this index is a set of hyperlinks for each post that are able to be sorted into said categories so that we can update one page without hand-keying every bloody character every time. Blogger doesn’t really do that, does it?

As a matter of fact, it does from your dashboard. If you go to the posting tab and click "EDIT POST", you get (by default) a list of your last 50 posts. "But cent," complains the n00b on the phone line with 3 posts on his blog, "you can't sort that list, and someone with a very long history of blogging can't see all his posts."

Well, n00b, allow me to explain. The default 50 posts can be changed via the dropdown to up to 300 posts. As I type this, I have 163 posts, so 300 is fine. So change the dropdown to 300 and click "go".

Now you have a list of all your posts sorted by date, last post first. "But cent," complains the n00b again, "I can't sort that list, and anyway I can't link to that page for others to see." Yes: there is a reason you are a n00b. Of course you cannot link to this page, and of course it is unsortable. But it has a very interesting feature: it can be saved as an HTML file by even the worst of browsers. In that format, we have a very powerful tool for making our index page.

But having that list is nowhere near enough -- because there's no way to index them except by date (which Blogger has already done for you) and alpha by title (which is interesting, I am sure, but your ability to create definitive subject lines is no better than mine).

That's where the funny symbols come in. You've seen my list of funny symbols -- ahead of every post is a keyboard character between two square brackets (e.g. - [@]). The funny symbol alone would probably be enough, but I put it in brackets to make no mistake that it is not some kind of series number to new readers. You can do whatever you like -- use numbers, letters, KB characters you like. But you have to decide that some symbol always stands for some category. For example, [@] always signifies a post from me about orthodoxy; [!] always signifies a post about the Gospel; [?] always signifies a post about random thoughts; etc.

So in order to cheat Blogger, you have to put your signifiers at the front of the subject line of each post. Yes, it's monsterous, and if anyone is feeding from your site, they will go insane as you do this. But the net result is that when you're finished, the "EDIT POST" page becomes a treasure trove of HTML text.

When you have updated all your subject lines, load the Blogger "EDIT POST" page viewing 300 posts, and save this page to your desktop as an HTML FILE (NOT a Web Archive).
STEP 3: MANIPULATING THE HTML FILE

You're going to need EXCEL, WORD, and a basic text editor to do this next step, so bear with me as we go through this. If you're clever, you can build WORD MACROs to do some of this work so you won't be punished with having to read my blog every time you want to update your cummulative index.
STEP 3a: PREFLIGHT THE HTML FILE IN A BASIC TEXT EDITOR

See: I have WORD98 for MAC, so I don't have too many problems editing raw HTML. I open the file using the "Recover Text From Any File" open method, and I can open HTML without WORD trying to render it. When I use my PC at work to do this, I use WORDPad.

Anyway, the first step is to take out all the trash that this file has in it. You can delete everything from the "<!DOCTYPE" header to the "</tr> </thead>" which begins the actual table in the HTML for the posts and their links. Just highlight the whole jumbled mess and then SAVE UNDER A NEW FILE NAME. You need to keep the original file laying around in case you make a mistake (and you prolly will). Save it as a plain text file.

After you save the file, replace all "<" with "{" and all ">" with "}". Why? So WORD and EXCEL will not try to render the HTML tables the tags describe. We will change them back at the end of this exercise, but they need to go away for now. Save your work and close the file.
STEP 3b: UNLEASHING THE POWER OF WORD

OK. Now open the saved document in WORD. I use WORD to make the next set of edits because this file is usually pretty big and WORDPad usually locks up on me because of the number of substitutions it has to make. Follow these instructions VERY CAREFULLY.

Globally replace this text in your file:
{tr class=""} {td class="date"} {span class=""}

with "^p" (which signifies a line break). That will put each post listed in its own separate line of text, even though WORD will wrap the text. This will be important when we import this file to EXCEL.

Globally replace this text in your file:
{/span} {/td} {td class="edit"} {span}

with the character "|". We are going to use "|" in EXCEL as a cell delimiter, and this will put the DATE of your post in ist own cell.

Globally replace this text in your file:
{table lang="safari-hack"} {tr} {td}

with the character "|". This will ultimately create a column of useless stuff we are going to delete, but it will also start the column in which the subject line of your post will appear.

Globally replace the text:
{/td} {/tr} {/table} {/td} {td class="author"}{span}

with the character "|". This completes the subject line cell and begins a cell of useless stuff.

Globally replace the text:
{/span}{/td} {td class="link"}{span class=""}

with the character "|". This isolates the useless text (which, btw, is your Blogger user ID), and begins the most important field in this file: the HREF tag that links to the post.

And last of all, globally replace the text:
View {/a} {/span}{/td} {td class="weaklink"} {span}

with the character "|". This sets off the HREF tag, and leaves us with a final column of junk text which is quickly dealt with in EXCEL. Save and close the file. You're doing great. If you're smart, you recorded these steps as MACROs.
STEP 3c: UNLEASHING THE POWER OF EXCEL

You now have a very keen text file that you can use to make your index page. All you need is the ability to sort the file. EXCEL does this like nobody's business. Open your file in EXCEL.

First, delete the first row -- it will either be blank or have the text " {tbody}" in it, and either way it is completely irrelevant. Now select column A, and under your "DATA" menu, select "TEXT TO COLUMNS…"

The first dialog will prompt you to select either FIXED WIDTH or DELIMITED columns. Select the "DELIMITED" radio button, and click "NEXT".

The next dialog will prompt you for DELIMITERS. UNCHECK "TAB", and then in the field next to "OTHER" type "|", and click next. The really savvy ones of you reading already get where this is going, but I'm going to document the whole process for the n00b who is going to e-mail complaints to me for giving him free advice.

When you click "next", something magical happens: all the text takes on columns. This last step will give you a file that you can work with easily to finish up your index. Follow my instructions carefully, young padwan.

Select the FIRST COLUMN header, and in the COLUMN DATA FORMAT dialog, click "DATE".

Select the SECOND COLUMN header, and in the COLUMN DATA FORMAT dialog, click "DO NOT IMPORT".

Select the THIRD COLUMN header, and in the COLUMN DATA FORMAT dialog, click "TEXT".

Select the FOURTH COLUMN header, and in the COLUMN DATA FORMAT dialog, click "DO NOT IMPORT".

Select the FIFTH COLUMN header, and in the COLUMN DATA FORMAT dialog, click "TEXT".

Select the SIXTH COLUMN header, and in the COLUMN DATA FORMAT dialog, click "DO NOT IMPORT".

Now click "FINISH". Viola! You have the makings of a subject index for your blog.

Save your work before the excitement causes you to do something stupid.

Now insert a column between columns A & B, and then select the column with the HREF tags in it and drag it into the inserted column. In the now-empty column "D", type "{/a}" into cell D1, and then fill the rest of the cells below it with that text -- use your favorite method, whether cut-and-paste, drag, or CMD-D on selected cells.

Here's the keen part: select all 4 columns and sort by column "C". If you stopped right here, you'd have a directory of posts and their links sorted by category, as determined by the signifier characters. However, we're not going to stop there. In order to manage the last bit of text editing we are going to have to do, we're going to insert line breaks into this file. You have to do this part by eyeball -- just select the first row of a series with the same signifier, and insert a row. When you're done, you can sake your work.

At this point, the categories are sorted by type, but not by date. If you're quick about it, before we move on you might sort each category by date to show last post first or first post last.

The last step in EXCEL is to save the file as TEXT. The best way to do that is to use the SAVE AS command and save this file as "Text (tab-delimited)" format. It will require a small amount of cleaning up in the last step, but you can do it.
STEP 4: NAMING #(NAMES)

Open your saved file in WORD without fear because all the HTML tags are disabled by virtue of there being no "<" or ">" in the file. However, we are about to fix the. Go ahead and globally replace "{" with "<", and then "}" with ">". Now that looks more like HTML, doesn't it? What you should find is that your file you have a bunch of extra quotation marks (") and tab characters.

To fix that, first change all quotation marks(") to apostrophes('). Next change every occurrence of ('') {that's two apostrophes, not a quotation mark} to (") {which is actually a quotation mark}. Then, change all occurrences of (') to nothing. What you have just done is "undone" the text formatting EXCEL did to your file so that it will work like normal HTML without rendering a bunch of mileading and irrelevent quotation marks. Last, change all (^t) {that's the tab character for you WORD replace n00bs} to a single space.

Now we have to insert the Category headers, which will take some basic HTML savvy. First, copy the following like of HTML into your clipboard:

<a name="cat_abbv">Category Name</a>

I'm sure almost anyone reading this has used the <a href="URL"> tag before. The tag above is for placing "#name" tags in an HTML page so that rather than hyperlinking to just the page, we can hyperlink to a SPOT on the page. So if your page is:

http://www.targetpage.com/

You can link to:

http://www.targetpage.com/#name

and when the page loads, the browser will be pointed at #name on that page.

All that is said to say this: in the spaces in your archive listing, paste the text from above, then substitute the 3-character "#name" from your category list you made in STEP 1 with the appropriate "Category Name". So using one of our examples above, The category "Enemies" gets the header:

<a name="NME">Enemies</a>

BTW, that's the way the Step Finder at the top of this post works. This is amazing to n00bs only, I am sure, but if we are going to hold class, let's make sure we make all students competent. Save your work as we are almost done.
STEP 5: FINALLY GETTING SATISFACTION FROM BLOGGER

The last step is to place your indexed file into Blogger, and the method is simple. Tell Blogger to create a new post for your blog, and set the date to 1/1/YYYY where YYYY is the first day of the year you are in. Set the time to 12:01 AM. Why do this? It is to make sure that your index post is the first post for the current year, or what appears to be the last post when you pull up the list of posts the next time. It gives you a visible mile marker for on-going blog indeces.

Now copy and paste your file into the blog composition field. Save and Publish the post, and you should be good to go.

You should post your questions on this entry in the comments here so we can maintain a basic point of reference for the users of this obtuse and difficult process.

0 comments: