The Notepad ’s full of ballpoint hypertext

Pollen Footnotes: An Approach

Scribbled  · PDF · ◊ Pollen source

This article assumes you are familiar with Pollen and the concept of tagged X-expressions. One of the things you get for free with Markdown that you have to cook from scratch with Pollen (or HTML for that matter) is footnotes. But this is fine, since it gives you more control over the results. Here is what I cooked up for use on the upcoming redesign of The Local Yarn weblog/book project.

Update, 2018-01-25

The Pollen discussion group has a thread on this post that is well worth reading. Matthew Butterick showed you can get mostly the same results with clearer and more concise code using normal tag functions as opposed to doing everything top-down starting with root.

An aside: on the web, footnotes are something of an oddity. HTML doesn’t have any semantic notion of a footnote, so we typically make them up using superscripted links to an ordered list at the end of the article. I’m sympathetic to arguments that this makes for a poor reading experience, and am convinced that they are probably overused. Nonetheless, I’m converting a lot of old content that uses footnotes, and I know I’ll be resorting to them in the future. Some newer treatments of web footnotes use clever CSS to sprinkle them in the margins, which is nice, but comes with downsides: it isn’t accessible, it’s not intuitive to use and read on a phone, it renders the footnotes inline with the text in CSS-less environments (Lynx, e.g.) and the markup is screwyThese reasons are listed in decreasing order of importance for the particular application I have in mind.. So I’m sticking with the old ordered-list-at-the-end approach (for this project, and for now, at least).

So I get to design my own footnote markup. Here’s what’s on my wishlist:

  1. I want each footnote’s contents to be defined in a separate place from the footnote references. This will keep the prose from getting too cluttered.
  2. I want to be able to define the footnote contents anywhere in the document, in any order, and have them properly collected and ordered at the end.
  3. I want to be able to use any mix of strings, symbols or numbers to reference footnotes, and have these all be converted to ordinal reference numbers.
  4. I want to be able to refer to the same footnote more than once.It was this requirement in particular that steered me away from using the otherwise-excellent pollen-count package. (Rare, but useful in some cases).
  5. If I should happen to refer to a footnote that is never defined, I want a blank footnote to appear in the list in its place. (On the other hand if I define a footnote that isn’t referenced anywhere, I’m content to let it disappear from the output.)
  6. I want the footnote links not to interfere with each other when more than one footnote-using article is displayed on the same page. In other words, the URL for footnote #3 on article (A) should never be the same as the URL for footnote #3 on article (B).

In other words, I want to be able to do this:

Here is some text◊fn[1]. Later on the paragraph continues.

In another paragraph, I may◊fn[2] refer to another footnote.

◊fndef[1]{Here’s the contents of the first footnote.}
◊fndef[2]{And here are the contents of the second one.}

But I also want to be able to do this:

◊fndef["doodle"]{And here are the contents of the second one.}

Here is some text◊fn["wipers"]. Later on the paragraph continues.
◊fndef["wipers"]{Here’s the contents of the first footnote.}

In another paragraph, I may◊fn["doodle"] refer to another footnote.

And both of these should render identically to:

<p>Here is some text<sup><a href="#550b35-1" id="550b35-1_1">1</a></sup>. Later on the paragraph continues.</p>

<p>In another paragraph, I may <sup><a href="#550b35-2" id="550b35_1">2</a></sup> refer to another footnote.</p>

<section class="footnotes"><hr />
<ol>
  <li id="550b35-1">Here’s the contents of the first footnote. <a href="#550b35-1_1">↩</a></li>
  <li id="550b35-2">And here are the contents of the second one. <a href="#550b35-2_1">↩</a></li>
</ol>
</section>

You may be wondering, where did the 550b35 come from? Well, it’s an automatically generated identifier that’s (mostly, usually) unique to the current article. By using it as a prefix on our footnote links and backlinks, we prevent collisions with other footnote-using articles that may be visible on the same page. I’ll explain where it comes from at the end of this article.

This style of markup is a little more work to code in pollen.rkt, but it lets me be flexible and even a bit careless when writing the prose.

The output for footnotes (given my requirements) can’t very well be handled within individual tag functions; it demands a top-down approach. [Again, this turns out not to be true! see the Pollen group discussion.] So I will be leaving my ◊fn and ◊fndef tag functions undefined, and instead create a single function do-footnotes (and several helper functions nested inside it) that will transform everything at once. I’ll call it from my root tag like so:

📄 pollen.rkt
(require txexpr
         sugar/coerce
         openssl/md5
         pollen/decode
         pollen/template) ; That’s everything we need for this project

(define (root . elements)
  (define footnoted
    (do-footnotes `(root ,@elements)
                  (fingerprint (first elements)))))

The do-footnotes function takes a tagged X-expression (the body of the article) and a prefix to use in all the relative links and backlinks. You may have surmised that the fingerprint function call above is where the 550b35 prefix came from. Again, more on that later. Here are the general stages we’ll go through inside this function:

  1. Go through the footnote references. Transform them into reference links, giving each an incrementally higher reference number (or, if the footnote has been referenced before, using the existing number). For later use, keep a list of all references and in the order in which they’re found.
  2. Split out all the footnote definitions from the rest of the article. Get rid of the ones that aren’t referenced anywhere. Add empty ones to stand in for footnotes that are referenced but not defined.
  3. Sort the footnote definitions according to the order that they are first referenced in the article.
  4. Transform the footnote definitions into an ordered list with backlinks, and append them back on to the end of the article.

Here is the code for do-footnotes that implements the first stage:

📄 pollen.rkt
(define (do-footnotes tx prefix)
  (define fnrefs '())

  (define (fn-reference tx)
    (cond
      [(and (eq? 'fn (get-tag tx))
            (not (empty? (get-elements tx))))
       (define ref (->string (first (get-elements tx))))
       (set! fnrefs (append fnrefs (list ref)))
       (let* ([ref-uri (string-append "#" prefix "-" ref)]
              [ref-sequence (number->string (count (curry string=? ref) fnrefs))]
              [ref-backlink-id (string-append prefix "-" ref "_" ref-sequence)]
              [ref-ordinal (number->string (+ 1 (index-of fnrefs ref)))]
              [ref-str (string-append "(" ref-ordinal ")")])
         `(sup (a [[href ,ref-uri] [id ,ref-backlink-id]] ,ref-str)))]
      [else tx]))

  (define tx-with-fnrefs (decode tx #:txexpr-proc fn-reference))
  …)

Looking at the last line in this example will help you understand the flow of control here: we can call decode and, using the #:txexpr-proc keyword argument, pass it a function to apply to every X-expression tag in the article. In this case, it’s a helper function we’ve just defined, fn-reference. The upshot: the body of fn-reference is going to be executed once for each ◊fn tag in the article.

By defining fn-reference inside the do-foonotes function, it has access to identifiers outside its scope, such as the prefix string but most importantly the fnrefs list. This means that every call to fn-reference will be able to check up on the results of all the other times it’s been called so far. And other helper functions we’ll be creating inside do-footnotes later on will also have easy access to the results of those calls.

So let’s examine the steps taken by fn-definition in more detail.

  1. First, using cond it checks to see if the current X-expression tx is a fn tag and has at least one element (the reference ID). This is necessary because decode is going to call fn-reference for every X-expression in the article, and we only want to operate on the ◊fn tags.
  2. Every time fn-reference finds a footnote reference, it has the side-effect of appending its reference ID (in string form) to the fnrefs list (the set! function call). Again, that list is the crucial piece that allows all the function calls happening inside do-footnotes to coordinate with each other.
  3. The function uses let* to set up a bunch of values for use in outputting the footnote reference link:

    1. ref-uri, the relative link to the footnote at the end of the article.
    2. ref-sequence, will be "1" if this is the first reference to this footnote, "2" if the second reference, etc. We get this by simply counting how many times ref appears in the fnrefs list so far.
    3. ref-backlink-id uses ref-sequence to make an id that will be the target of a ↩ back-link in the footnote definition.
    4. ref-ordinal is the footnote number as it will appear to the reader. To find it, we remove all duplicates from the fnrefs list, find the index of the current ref in that list, and add one (since we want footnote numbers to start with 1, not 0).
    5. ref-str is the text of the footnoote reference that the reader sees. It’s only used because I wanted to put parentheses around the footnote number.
  4. Then, in the body of the let* expression, the function outputs the new footnote reference link as an X-expression that will transform neatly to HTML when the document is rendered.

So after the call to decode, we have an X-expression, tx-with-fnrefs, that has all the footnote references (◊fn tags) properly transformed, and a list fnrefs containing all the footnote reference IDs in the order in which they are found in the text.

Let’s take a closer look at that list. In our first simple example above, it would end up looking like this: '("1" "2"). In the second example, it would end up as '("wipers" "doodle"). In a very complicated and sloppy document, it could end up looking like '("foo" "1" "7" "foo" "cite1" "1"). So when processing ◊fndef["foo"], for example, we can see by looking at that list that this should be the first footnote in the list, and that there are two references to it in the article.

All that said, we’re ready to move on to phase two through four.

📄 pollen.rkt
(define (do-footnotes tx)
  ; … stage 1 above …

  (define (is-fndef? x) (and (txexpr? x) (equal? 'fndef (get-tag x))))

  ; Collect ◊fndef tags, filter out any that aren’t actually referenced
  (define-values (body fn-defs) (splitf-txexpr tx-with-fnrefs is-fndef?))
  (define fn-defs-filtered
    (filter (λ(f)
              (cond
                [(member (->string (first (get-elements f))) fnrefs) #t]
                [else #f]))
            fn-defs))

  ; Get a list of all the IDs of the footnote *definitions*
  (define fn-def-ids
    (for/list ([f (in-list fn-defs-filtered)]) (->string (first (get-elements f)))))

  ; Pad the footnote definitions to include empty ones for any that weren’t defined
  (define fn-defs-padded
    (cond [(set=? fnrefs fn-def-ids) fn-defs-filtered]
          [else (append fn-defs-filtered
                        (map (λ (x) `(fndef ,x (i "Missing footnote definition")))
                             (set-subtract fnrefs fn-def-ids)))]))
  ; … stage 3 and 4 …
)

We define a helper function is-fndef? and use it with splitf-txexpr to extract all the ◊fndef tags out of the article and put them in a separate list. Then we use filter, passing it an anonymous function that returns #f for any fndef whose ID doesn’t appear in fndefs.

Now we need to deal with the case where the ◊fn tags in a document reference a footnote that is never defined with an ◊fndef tag. To test for this, we just need a list of the reference IDs used by the footnote definitions. The definition of fn-def-ids provides this for us, using for/list to loop through all the footnote definitions and grab out a stringified copy of the first element of each. We can then check if (set=? fnrefs fn-def-ids)—that is, do these two lists contain all the same elements (regardless of duplicates)? If not, we use set-subtract to get a list of which IDs are missing from fn-def-ids and for each one, append another fndef to the filtered list of footnote definitions.

📄 pollen.rkt
(define (do-footnotes tx)
  ; … stages 1 and 2 above …

  (define (footnote<? a b)
    (< (index-of (remove-duplicates fnrefs) (->string (first (get-elements a))))
       (index-of (remove-duplicates fnrefs) (->string (first (get-elements b))))))

  (define fn-defs-sorted (sort fn-defs-padded footnote<?))

  ; … stage 4 …

The helper function footnote<? compares two footnote definitions to see which one should come first in the footnote list: it compares them to see which one has the ID that appears first in fndefs. We pass that function to sort, which uses it to sort the whole list of footnote definitions.

We are almost done. We just have to transform the now-ordered list of footnote definitions and append it back onto the end of the article:

📄 pollen.rkt
(define (do-footnotes tx)
  ; … stages 1 to 3 above …

  (define (fn-definition tx)
    (let* ([ref (->string (first (get-elements tx)))]
           [fn-id (string-append "#" prefix "-" ref)]
           [fn-elems (rest (get-elements tx))]
           [fn-backlinks
             (for/list ([r-seq (in-range (count (curry string=? ref) fnrefs))])
               `(a [[href ,(string-append "#" prefix "-" ref "_"
                                          (number->string (+ 1 r-seq)))]] "↩"))])
      `(li [[id ,fn-id]] ,@fn-elems ,@fn-backlinks)))

  (define footnotes-section
    `(section [[class "footnotes"]] (hr) (ol ,@(map fn-definition fn-defs-sorted))))

  (txexpr (get-tag body)
          (get-attrs body)
          (append (get-elements body)
                  (list footnotes-section)))
  ; Finis!
)

We need one more helper function, fn-definition, to transform an individual ◊fndef tag into a list item with the footnote’s contents and backlinks to its references. This helper uses let* in a way similar to fn-reference above, constructing each part of the list item and then pulling them all together at the end. Of these parts, fn-backlinks is worth examining. The expression (curry string=? ref) returns a function that compares any string to whater ref currently is.curry is basically a clever way of temporarily “pre-filling” some of a function’s arguments. That function gets passed to count to count how many times the current footnote is found in fnrefs. The list comprehension for/list can then use that range to make a backlink for each of them.

In defining the footnotes-section we map the helper function fn-definition onto each ◊fndef tag in our sorted list, and drop them inside an X-expression matching the HTML markup we want for the footnotes section. The last statement adds this section to the end of body (which was the other value given to us by splitf-txexpr way up in stage 2), and we’re done.

All that remains now is to show you where I got that 550b35 prefix from.

Ensuring unique footnote IDs

As mentioned before, I wanted to be able to give all the footnotes in an article some unique marker for use in their id attribute, to make sure the links for footnotes in different articles never collide with each other.

When the topic of “ensuring uniqueness” comes up it’s not long before we start talking about hashes.

I could generate a random hash once for each article, but then the footnote’s URI would change every time I rebuild the article, which would break any deep links people may have made to those footnotes. How often will people be deep-linking into my footnotes? Possibly never. But I would say, if you’re going to put a link to some text on the web, don’t make it fundamentally unstable.

So we need something unique (and stable) from each article that I can use to deterministically create a unique hash for that article. An obvious candidate would be the article’s title, but many of the articles on the site I’m making will not have titles.

Instead I decided to use an MD5 hash of the text of the article’s first element (in practice, this will usually mean its first paragraph):

📄 pollen.rkt
; Concatentate all the elements of a tagged x-expression into a single string
; (ignores attributes)
(define (txexpr->elements-string tx)
  (cond [(string? tx) tx]
        [(stringish? tx) (->string tx)]
        [(txexpr? tx)
         (apply string-append (map txexpr->elements-string (get-elements tx)))]))

(define (fingerprint tx)
  (let ([hash-str (md5 (open-input-string (txexpr->elements-string tx)))])
    (substring hash-str (- (string-length hash-str) 6))))

The helper function txexpr->elements-string will recursively drill through all the nested expressions in an X-expression, pulling out all the strings found in the elements of each and appending them into a single string. The fingerprint function then takes the MD5 hash of this string and returns just the last six characters, which are unique enough for our purposes.

If you paste the above into DrRacket (along with the requires at the beginnning of this post) and then run it as below, you’ll see

> (fingerprint (txexpr->elements-string '(p "Here is some text" (fn 1) ". Later on the paragraph continues.")))
"550b35"

This now explains where we were getting the prefix argument in do-footnotes:

📄 pollen.rkt
(define (root . elements)
  (define footnoted
    (do-footnotes `(root ,@elements)
                  (fingerprint (first elements)))))

Under this scheme, things could still break if I have two articles with exactly the same text in the first element. Also, if I ever edit the text in the first element in an article, the prefix will change (breaking any deep links that may have been made by other people). But I figure that’s the place where I’m least likely to make any edits. This approach brings the risk of footnote link collision and breakage down to a very low level, wasn’t difficult to implement and won’t be any work to maintain.

Summary and parting thoughts

When designing the markup you’ll be using, Pollen gives you unlimited flexibility. You can decide to adhere pretty closely to HTML structures in your markup (allowing your underlying code to remain simple), or you can write clever code to enable your markup do more work for you later on.

One area where I could have gotten more clever would have been error checking. For instance, I could throw an error if a footnote is defined but never referenced. I could also do more work to validate the contents of my ◊fn and ◊fndef tags. If I were especially error-prone and forgetful, this could save me a bit of time when adding new content to my site. For now, on this project, I’ve opted instead for marginally faster code…and more cryptic error messages.

I will probably use a similar approach to allow URLs in hyperlinks to be specified separately from the links themselves. Something like this:

📄 chapter.html.pm
#lang pollen

For more information, see ◊a[1]{About the Author}. You can also
see ◊a[2]{his current favorite TV show}.

◊hrefs{
[1]: http://joeldueck.com
[2]: http://www.imdb.com/title/tt5834198/
}

Home IT Overhaul, Phase Zero: A Screen

Scribbled  · PDF · ◊ Pollen source

My home network has had a pretty basic setup for the last six years. My wife and I each have a laptop, connected to the internet with an Asus wifi router and a cable modem. And we have a wifi B&W laser printer. That’s it.

Since we finished the basement and installed a 55″ TV, however, I’ve had my eye on some drastic improvements and additions to our home’s IT capabilities. I will outline the overall plan in another post, however, to get things rolling I thought I’d just write about the first small step in that plan, which I took today: I bought a monitor.

A Coby TFTV1525 15″ monitor/TV, purchased used, missing remote
A Coby TFTV1525 15″ monitor/TV, purchased used, missing remote

I have almost zero spare computer parts lying around my house, which is surely strange for an IT manager. I’ve made a point of getting rid of stuff I don’t need, which means I get to start from scratch.

This tiny used monitor will sit on my new IT “rack” when I need a direct-attached display for setting up or troubleshooting servers. It’s perfect for my setup for several reasons:

  1. It’s small
  2. It’s cheap ($20 on Craigslist)
  3. It has both HDMI and VGA inputs, so it will work with any computer, old or new, without any converters
  4. It also has a TV tuner, so I can use it to watch broadcast television on the rare occasions where that might come in handy (weather emergencies, etc)

Safari Speedbumps

Scribbled  · PDF · ◊ Pollen source

For a long time now, I’ve had a problem with Safari taking a long time to load a website when I first navigate to it: there will be a long pause (5–10 sec) with no visible progress or network traffic. Then there will be a burst of traffic, the site will load fully inside of a second, and every page I visit within that site afterwards will be lightning fast.

The same thing happens whether I’m at work, at home, or on public wifi (using a VPN of course). I’ve tried disabling all extensions and I’ve also tried using Chrome. So this was mystifying to me.

But I think I might have finally found the source of the problem. I was in Safari’s preferences window and noticed this little warning on the Security tab:

Safari preferences pane showing a problem with the 'Safe Browsing Service'
Safari preferences pane showing a problem with the ‘Safe Browsing Service’

I unchecked that box, and the problem seems to have disappeared.

Now, I haven’t yet been able to find any official information on exactly how Safe Browsing Service works, but it’s not hard to make an educated guess. If it’s turned on, the first time you browse to a website, the name of that website would first get sent, in a separate request, to Apple’s servers, which would return a thumbs up/thumbs down type of response. A problem on Apple’s end would cause these requests to time out, making any website’s first load terribly slow. And as the screenshot shows, clearly there is a problem on Apple’s end, because the Safe Browsing Service is said to be “unavailable”. (It says it’s only been unavailable for 1 day but I have reason to believe that number just resets every day.)

The fact that disabling the setting on Safari fixed the problem in Chrome too leads me to believe that this is in fact an OS-level setting, enforced on all outgoing HTTP requests, not just a Safari preference.

Anyway, if you are having this problem, see if disabling Safe Browsing Service solves it for you.


Site Incident Report

Scribbled  · PDF · ◊ Pollen source

Important notice: last Tuesday night all my sites went offline. In the interests of extreme transparency, I present this complete incident report and postmortem.

Impact

  1. All of my websites that still use Textpattern were broken from 11pm Jan 31 until about lunchtime the next day. (The Notepad does not use Textpattern and was mostly unaffected, except for #2 below.)
  2. All the traffic logged on all sites during that time was lost. So I have no record of how many people used my websites during the time that my websites were unusable.

Timeline

(All times are my time zone.)

  1. Tuesday Jan 31, late evening: I logged into my web server and and noticed a message about an updated version of my Ubuntu distribution being available. I was in a good mood and ran the do-release-upgrade command, even knowing it would probably cause problems. Because breaking your personal web server’s legs every once in a while is a good way to learn stuff. If I’d noticed that this “update” proposed to take my server from version 14.04 all the way to 16.04, I’d have said Hell no.
  2. In about half an hour the process was complete and sure enough, all my DB-driven sites were serving up ugly PHP errors.

Recovery

  1. Soon determined that my Apache config referred to non-existant PHP5 plugin. Installed PHP7 because why the hell not.
  2. More errors. The version of Textpattern I was using on all these sites doesn’t work with PHP7. Installed the latest version of Textpattern on one of the sites.
  3. Textpattern site still throwing errors because a few of my plugins didn’t like PHP7 either. Logged into the MySQL cli and manually disabled them in the database.
  4. Textpattern’s DB upgrade script kept failing because it doesn’t like something about my databases. I began the process of hand-editing each of the tables in one of the affected websites.
  5. Sometime around midnight my brother texted asking me to drive over and take him in to the emergency room. I judged it best to get over there in a hurry so I closed up my laptop and did that. His situation is a little dicey right now; it was possible that when I got there I’d find him bleeding or dying. That wasn’t it, thankfully. By four in the morning they had him stabilized and I was able to drive home.
  6. Morning of Feb 1st: I got out of bed at around eight on the morning of Feb 1st, made myself some coffee and emailed my boss to tell him I wouldn’t be in the office until nine-thirty.
  7. After driving in to work, I remembered almost all of my websites were still busted. I started to think about the ramifications. I wondered if anyone had noticed. I opened Twitter for the first time since before the election and closed it again, appalled.
  8. At lunchtime I drove to the coffee shop for some more caffeine and a sandwich. I remember it got up to 30º F that day so I almost didn’t need a coat. After I ate my sandwich I pulled out my laptop and resumed poking around the same database and trying to swap in all the mental state from before the hospital trip.
  9. Towards the end of my lunch hour I decided that this wasn’t fun anymore. Maybe I could poke this one database until Textpattern would stop whining about it, but there was still the matter of the broken plugins, and then I’d have to go through the same rigmarole for the other three sites.
  10. Sometime between noon and 1pm I logged into my DigitalOcean dashboard and clicked a button to restore the automatic backup from 18 hours ago. In two minutes it was done and all the sites were running normally.

Problems Encountered

  1. In-place OS upgrades across major releases will always break your stack
  2. Textpattern 4.5.7 doesn’t support PHP7
  3. Textpattern 4.6.0 needs a bunch of hacks to work with newer versions of MySQL
  4. Emergency rooms always have so much waiting time in between tests and stuff

Post-Recovery Followup Tasks

  1. Leave the goddamn server alone
  2. Revisit shelved projects that involve getting rid of Textpattern and MySQL.

Advent of Code 2016

Scribbled  · PDF · ◊ Pollen source

I’m giving this year’s Advent of Code event a shot.

Since I’m also using this as a way of learning a little about literate programming, the programs I write are also web pages describing themselves. I’m uploading those web pages to a subsection of this site, where you can read my solutions and watch my progress.


Flattening a Site: From Database to Static Files

Scribbled  · PDF · ◊ Pollen source

I just finished converting a site from running on a database-driven CMS (Textpattern in this case) to a bunch of static HTML files. No, I don’t mean I switched to a static site generator like Jekyll or Octopress, I mean it’s just plain HTML files and nothing else. I call this “flattening” a site.I wanted a way to refer to this process that would distinguish it from “archiving”, which to me also connotes taking the site offline. I passed on “embalming” and “mummifying” for similar reasons.

In this form, a web site can run for decades with almost no maintenance or cost. It will be very tedious if you ever want to change it, but that is fine because the whole point is long-term preservation. It’s a considerate, responsible thing to do with a website when you’re pretty much done updating it forever. Keeping the site online prevents link rot, and you never know what use someone will make of it.

How to Flatpack

Before getting rid of your site’s CMS and its database, make use of it to simplify the site as much as possible. It’s going to be incredibly tedious to fix or change anything later on so now’s the time to do it. In particular you want to edit any templates that affect the content of multiple pages:

Next, on your web server, make a temp directory (outside the site’s own directory) and download static copies of all the site’s pages into it with the wget command:

wget --recursive --domains howellcreekradio.com --html-extension howellcreekradio.com/

This will download every page on the site and every file linked to on those pages. In my case it included images and MP3 files which I didn’t need. I deleted those until I had only the .html files left.

Digression: Mass-editing links and filenames from the command line

This bit is pretty specific to my own situation but perhaps some will find it instructive. At this point I was almost done, but there was a bit of updating to do that couldn’t be done from within my CMS. My home page on this site had “Older” and “Newer” links at the bottom in order to browse through the episodes, and I wanted to keep it this way. These older/newer links were generated by the CMS with POST-style URLS: http://site.com/?pg=2 and so on. When wget downloads these links (and when the --html-extension option is invoked), it saves them as files of the form index.html?pg=2.html. These all needed to be renamed, and the pagination links that refer to them needed to be updated.

I happen to use ZSH, which comes with an alternative to the standard mv command called zmv that recognizes patterns:

zmv 'index.html\?pg=([0-9]).html' 'page$1.html'
zmv 'index.html\?pg=([0-9][0-9]).html' 'page$1.html'

So now these files were all named page01.html through page20.html but they still contained links in the old ?pg= format. I was able to update these in one fell swoop with a one-liner:

grep -rl \?pg= . | xargs sed -i -E 's/\?pg=([0-9]+)/page\1.html/g'

To dissect this a bit:

OK, digression over.

Back up the CMS and Database

Before actually switching, it’s a good idea to freeze-dry a copy of the old site, so to speak, in case you ever needed it again.

Export the database to a plain-text backup:

mysqldump -u username -pPASSWORD db_name > dbbackup.sql

Then save a gzip of that .sql file and the whole site directory before proceeding.

Shutting down the CMS and swapping in the static files

Final steps:

  1. Move the HTML files you downloaded and modified above into the site’s public folder.
  2. Add redirects or rewrite rules for every page on your site. For example, if your server uses Apache, you would edit the site’s .htaccess file so that URLs on your site like site.com/about/ would be internally rewritten as site.com/about.html. This is going to be different depending on what CMS was being used, but essentially you want to be sure that any URL that anyone might have used as a link to your site continues to work.
  3. Delete all CMS-related files from your site’s public folder (you saved that backup, right?) In my case I deleted index.php, css.php, and the whole textpattern/ directory.

Once you’re done

Watch your site’s logs for 404 errors for a couple of weeks to make sure you didn’t miss anything.

What to do now? You could leave your site running where it is. Or, long term, consider having it served from a place like NearlyFreeSpeech for pennies a month.


Splitting Pollen tags with Racket macros

Scribbled  · PDF · ◊ Pollen source

This may be one of the nerdiest things I have ever written, but I know there may be three or five people who will find it useful. This post is specifically for people who are using Pollen to generate content in multiple output formats, and who may also be using a separate build system like make.

Normally when targeting multiple output formats in Pollen, you’d write a tag function something like this:

📄 pollen.rkt
; …

(define (strong . xs)
  (case (current-poly-target)
    [ltx (string-append "\\textbf{" ,@xs "}")]
    [else `(strong ,@xs)]))

; …

Here, everything for the strong tag is contained in a single tidy function that produces different output depending on the current output format. This is fine for simple projects, but not ideal for more complex ones, for a couple of reasons.

First there’s the issue of tracking dependencies. Let’s say every Pollen file in your project gets rendered as an HTML file and as part of a PDF file. Then one day you make a small change in your pollen.rkt file. Does this edit affect just the HTML files? Or the PDF files? Or both? Which ones now need to be rebuilt? If you’re doing things as shown above, there’s no straightforward way for Pollen (or make) to determine this; you’ll have to rebuild all the output files every time.

Then there’s the issue of readability. Even with two possible output formats, pollen.rkt gets much more difficult to read. I didn’t even want to think about how hairy it would get at three or four.

I decided to address this by having each output format get its own separate .rkt file, containing its own definitions for each tag function, prefixed by the output format:

📄 html-tags.rkt
(define (html-strong attrs elements)
  `(strong ,attrs ,@elements))
📄 pdf-tags.rkt
(define (pdf-strong attrs elements)
  (string-append "\\textbf{" ,@xs "}"))

That part is simple enough. But you also need a way for pollen.rkt to branch to one tag or the other depending on the current poly target.

To handle this part, I wrote a macro, poly-branch-tag, which allows you to define a tag that will automatically call a different tag function depending on the current output format. The macro is rather long, but you can view it in the polytag.rkt file of this blog’s source code at Github.

Defining tag functions with poly-branch-tag

To use this macro, first copy the polytag.rkt file from this blog’s source code into your project.

You then include polytag.rkt and declare your Pollen tags using the macro. The first argument is the tag name, optionally followed by a single required attribute and/or as any number of optional attributes with default values:

📄 pollen.rkt
#lang racket
(require pollen/setup)
(require "polytag.rkt")
(require "html-tags.rkt" "pdf-tags.rkt")

; Define our poly targets as usual
(module setup racket/base
    (provide (all-defined-out))
    (define poly-targets '(html pdf)))

; Simple tag with no required or default attributes
(poly-branch-tag strong)

; Tag with a single required attribute
(poly-branch-tag link url)

; Tag with required attribute + some optional attrs w/defaults
(poly-branch-tag figure src (fullwidth #f) (link-url ""))

For every tag function declared this way, write the additional functions needed for each output type in your (setup:poly-targets). E.g., for strong above, we would define html-strong and pdf-strong inside their respective .rkt files.

These tag functions should always accept exactly two arguments: a list of attributes and a list of elements. The macro will ensure that any required attribute is present and any default values are applied. Here’s an example:

📄 html-tags.rkt
(define (html-figure attrs elems)     ; Important! Tag name must have html- prefix
  (define src (attr-val 'src attrs))  ; You can grab what you need from attrs
  (if (attr-val 'fullwidth attrs)     ; (I made (attr-val) to accept boolean values in attributes)
      (make-fullwidth)))              ; [dummy example]

The benefits, reiterated

If you use a dependency system like make in your Pollen project, you now have a clear separation between output files in a particular format and the code that produces output in that format. An edit to html-tags.rkt will only affect HTML files. An edit to pdf-tags.rkt will only affect PDF files. You can see this blog’s makefile for a detailed example.

It’s also easier to add output formats without losing your sanity. Each output format gets its own .rkt file where you can define your tag functions all the way up to root, and the logic for each output format is much easier to follow than if they were all jammed in together in one file.

Finally, I found that there’s a third benefit, delightful and unintended, that comes with this approach as well: pollen.rkt, stripped of all function definition code, becomes essentially a very readable, self-updating cheatsheet of your project’s tags. See what I mean in this blog’s pollen.rkt. This alone might almost tempt me to use poly-branch-tag even in projects where HTML is the only format being targeted.


Testing network switches

Scribbled  · PDF · ◊ Pollen source

A couple of weeks ago, one of the two Netgear GS748T network switches in our main office failed. The lights were still blinking, but nothing connected to it was able to talk to anything else. We were able to plug almost everyone in to the other switch, the rest we put on a temporary 8-port switch until we could get a replacement.

We ordered two more of these switches off eBay (a replacement plus a spare), and those arrived today. After testing both of them, I was able to swap out the bad one and figure out exactly what had happened to it.

How I test a switch

This is pretty basic and generic, but maybe someone will find it useful.

  1. Grab the reference manual and hard-reset the switch to factory defaults.
  2. Connect directly to the switch with an ethernet cable. Does the port light up?
  3. Manually set your computer’s IP address to correspond to the switch’s defaults. In this case, the Netgear’s default IP is 192.168.0.239 with a subnet mask of 255.255.255.0, so I set my computer to 192.168.0.20 and the same subnet.
  4. Try to ping the switch at its default address. Does it respond? If not, plug in another computer and set its IP address manually as well. Can you ping it? Try it across several ports.
  5. From your browser, try to log in to the switch’s web interface. In this case I browsed to http://192.168.0.239 and was greeted with the login screen.
  6. Try transferring data between two computers connected through the switch. In my case I was testing with two Windows machines, so I used NetCPS to benchmark these transfers. Again, use several different ports. If the ports are visibly divided between “banks” of 4 or 8 ports, test each bank. (Testing each individual port is overkill in most cases.)
  7. Managed switchesThe GS748T doesn’t have a separate console port or a CLI, so this point wasn’t applicable in this particular case. On another occasion though, when I had an HP ProCurve switch that was acting up, connecting via the console port revealed a barrage of error messages and an endless cycle of rebooting. Having a saved copy of this output was very helpful when I was on the phone with the manufacturer demanding a warranty replacement. usually have their own OS with a command-line interface that you can open by connecting through a separate “console” port (either RJ-45 or serial DB-9). Try to log in through this interface and poke around. Refer to the switch’s manual for details.

So what happened here?

In the case of our failing Netgear GS748T, after I pulled it out I found it was still “working”: I could connect to its web interface, and even send data between a couple of computers connected via the switch, but several things indicated something was wrong.

First of all, pinging the switch itself while plugged into it directly was yielding response times of 7–14ms. This may seem pretty fast, but an acceptable response time is more like 1ms, max.

Second, by looking at the error counters in the switch’s web interface, I noticed Rx errors piling up after only a few minutes:

Rx errors piling up after only a few minutes of traffic
Rx errors piling up after only a few minutes of traffic

An acceptable number of errors is zero, assuming there is no problem with the cables themselves.

All of this points towards some degradation that destroys performance when traffic increases past a certain point.

Finally, just for the heck of it, we opened up the switch’s casing and took a look at its innards.

The inside of the Netgear GS748T
The inside of the Netgear GS748T

The capacitors with flat tops (such as the group of four on the left) are in good shape, but the ones with bulging, rounded tops (there are three in this pic) have definitely gone bad. Hardware companies often try to save money by getting cheap, low-quality capacitors, and when they fail, they start to bulge like this.

The failed capacitors definitely seem to explain our problem. Personally, I would not have bothered unscrewing the casing on the failed switchNor would I have ordered the same make/model as a replacement. The “new” switches are a later revision than the originals, though (GS748Tv3H1 vs GS748Tv1H3) so hopefully that represents some improvement., but it was a good way to confirm that we were in fact dealing with a hardware failure. You might also want to do this if you ever order used network gear; if any of the capacitors are bulging like this you know to return the item immediately.


Shot in the Patoot

Scribbled  · PDF · ◊ Pollen source

happy birthday to former President James Garfield, mortally shot in the patoot
reminder that one time we shoved whiskey and beef bouillon up a president’s butt until he died nytimes.com/2006/07/25/hea…

(Solved) DNS_PROBE_FINISHED error, degraded internet performance

Scribbled  · PDF · ◊ Pollen source

Recently at the office we started having major network issues:

Troubleshooting

Google searches for the DNS_PROBE_FINISHED error invariably lead you to advice suggesting that you perform a netsh winsock reset and restart your computer. However this didn’t work in our case, unsurprisingly. The problem began affecting everyone at once, so unless there had been a bad Windows update or something (our IT support agency hadn’t heard of any) this would be unlikely to help.

We also ruled out the ISP as the cause. We have two WAN connections–one fiber and one cable–and switching to one or the other exclusively did not resolve the issue. Support tickets with ISPs confirmed there were no upstream connection or network problems.

Examining Switches

We had just that day moved a bunch of desks around one part of the office. Our IT support agency suggested we had some kind of switch-level spanning tree problem–a switch plugged into itself, perhaps, in some roundabout way. I tried rebooting the main switch used for non-VoIP traffic, and the problem immediately cleared up–for about ten minutes, and then it returned. We also tried disconnecting all the jacks for each person who had been affected by the move to rule out any subtle looping issues created (even though only one or two jacks had been affected); no dice.

I opened a support ticket with the switch company (Extreme Networks). They had me telnet into the switch and capture the output of a bunch of commands and send it to them, which allowed them to rule out any configuration or looping issues on the switch.

We upgraded the firmware, which dated from 2011, and restarted the switch. Again the problem cleared up and did not recur for the rest of the day. But by this point most people had gone home or to find somewhere else to work. I was curious if the problem would recur on Monday when everyone came back; sure enough, with 10 people in the office at 8:00 am Monday everything was fine, but by 8:30 we were having the same problem again.

At this point we were ready to try unplugging every person, port by port, waiting 5 seconds, and pinging google, to see if we could narrow the problem down to a particular network jack/user. Thankfully it didn’t come to that.

The culprit

This time on our firewall I noticed that the “connection count” was hovering close to or even above the stated maximum of 10,000. Occasionally the connection utilization would drop to 5–6% and then the problem would go away. I used the firewall’s “packet capture” interface to look at a few seconds’ worth of network traffic and noticed a high number of UDP packets coming from a particular LAN IP address, with sequential foreign destination IPs.

I was able to track down the computer with this IP address, it happened to be one of our sales people. The laptop was a Lenovo running Windows 8. In Task Manager I saw that it was sending 1.5 MBps over the wired Ethernet interface and 800–900 Kbps over the wireless interface, even with no apps running. (Task Manager did not show which process was casuing this.) Upon disconnecting the CAT5e cable the connection utilization on the firewall dropped to 40%. Disconnecting the wifi dropped it further to 7%.

By looking at the CPU usage it appears that the process discovery.exe was abnormally high. A Google search finally turned up this article: Excessive network traffic and wifi drops linked to LenovoEMC Storage connector, which stated:

Corporate networks or ISPs may detect an excessive amount of unusual network traffic coming from ThinkPad systems preloaded with Microsoft Windows 8.1. The network traffic may be interpreted as a network flood or denial-of-service attack. As a result, the system may become restricted on the network or the network may stop functioning normally.

“LenovoEMC Storage Connector” is preloaded on some ThinkPad models to help customers discover and connect to LenovoEMC storage devices on their network. The process causing the network flood is discovery.exe, which is a component of “LenovoEMC Storage Connector”.

Uninstalling the Lenovo EMC Storage Connector from the offending laptop finally fixed the issue.