David Meyer <papa@sdf.org>

Nov. 15, 2018

Updated: Nov. 17, 2018

As part of my self-education in Lisp programming and inspired by my interest in the Latin language, I set myself the task of creating a function to convert Arabic numerals to Roman numerals, a task I have accomplished in other programming languages. While Roman numerals may not be a particularly hot topic, I have found a useful task for achieving an intermediate level of competency in a programming language.

Considering my implementation in Lisp has raised several questions on the best way to use Lisp and programming in general. These questions are expressed in terms of five alternative approaches to implementing the same functionality. I look forward to reading other programmers' insights into the pros and cons, clarity, efficiency, scalability, Lispiness, naivete, ... of each of the approaches.

The following code examples are in Scheme, but I believe most of my thoughts apply to all versions of Lisp, and in many cases to programming in any language.

In the code snippets below I have left out validity checks and in-line comments to remove distractions from the essential code. For the conversion functions, the input argument should be an integer value between 1 and 3999 (inclusive), or a character string of the same kind of number.

My first version is based on my earlier Javascript implementation.

The function takes the Arabic number place-by-place starting with the highest and concatenates the Roman symbol for the place value. There will be one recursive call for each digit in the original number.

I thought it interesting that part of the logic of the algorithm is implicit in the structure of the list r in function roman-digit -- the correlation between the place and value of each digit and the Roman symbol. Note that this algorithm ignore patterns within the 30 symbols in the list, for example the fact that the symbol for 300 "CCC" consists of the symbol for 100 "C" repeated three time.

I was pleased that the function is relatively concise, but started to think about refactoring to improve clarity and possible reduce some of the arithmetic processing, especially in the expression for initializing e to the power of 10 of the most significant digit of the input number. I also considered integrating the functionality of roman-digit into integer->rnum1 since adding the code to the main function wouldn't hurt clarity too much.

Library rnrs base provides function exact.

<<approach1.scm>>= (import (rnrs base)) (define (integer->rnum1 i) (if (< i 1) "" (let* ((e (exact (floor (/ (log i) (log 10))))) (p (expt 10 e)) (v (floor (/ i p))) (r (remainder i p))) (string-append (if (> v 0) (roman-digit e v) "") (integer->rnum1 r))))) (define (roman-digit e v) (let ((r '(("I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX") ("X" "XX" "XXX" "XL" "L" "LX" "LXX" "LXXX" "XC") ("C" "CC" "CCC" "CD" "D" "DC" "DCC" "DCCC" "CM") ("M" "MM" "MMM")))) (list-ref (list-ref r e) (- v 1)))) @

While thinking about how to refactor the code in Approach 1, I hit upon an idea for eliminating the knarly logarithm arithmetic. This approach replaces the arithmetic for calculating the place and value of each digit with a series of integer comparisons and subtractions.

Although the cond structure takes more lines of code than the first algorithm, the simplicity of each clause is an improvement in clarity.

This function will make a recursive call for each character (or character pair for 4s and 9s) in the output Roman numeral rather than for each digit in the input Arabic numeral, making for deeper nesting of calls on average. But perhaps the simpler arithmetic for each step gives this approach an advantage in performance.

This algorithm takes advantage of patterns of repeated symbols in the Roman numerals, but the correlation between place, value, and symbol is less clear.

An interesting aspect of the cond structure is that some of the clauses will be triggered at most once for each input number -- for example, if a input number contains 90, the clause that starts with the test (> i 89) will be triggered exactly once to insert the symbol "XC" into the Roman numeral string. On the other hand, some clause may be triggered multiple times -- the clause with test (> i 9) will be triggered for each multiple of 10 contained in the input number.

When I started writing pseudocode for this function, my first instinct was to write the once-and-done clauses as *if* statements and the multiple triggers as loops. Structuring the code this way would actually eliminate the need for a recursive call -- the number would be converted completely with a single pass through the mishmash of *if*s and loops. This might have performance advantages, but would create a knotted mass of pasta in the place of clean code. On the other hand, I realized that making recursive calls to the large cond block actually served a double purpose -- it handled both the digit-by-digit processing of the input number *and* the looping for multiple occurrences of the same value symbol. Recursion allows for both levels of processing to be controlled with a single, simple, and neat cond block.

<<approach2.scm>>= (define (integer->rnum2 i) (cond ((> i 999) (string-append "M" (integer->rnum2 (- i 1000)))) ((> i 899) (string-append "CM" (integer->rnum2 (- i 900)))) ((> i 499) (string-append "D" (integer->rnum2 (- i 500)))) ((> i 399) (string-append "CD" (integer->rnum2 (- i 400)))) ((> i 99) (string-append "C" (integer->rnum2 (- i 100)))) ((> i 89) (string-append "XC" (integer->rnum2 (- i 90)))) ((> i 49) (string-append "L" (integer->rnum2 (- i 50)))) ((> i 39) (string-append "XL" (integer->rnum2 (- i 40)))) ((> i 9) (string-append "X" (integer->rnum2 (- i 10)))) ((> i 8) (string-append "IX" (integer->rnum2 (- i 9)))) ((> i 4) (string-append "V" (integer->rnum2 (- i 5)))) ((> i 3) (string-append "IV" (integer->rnum2 (- i 4)))) ((> i 0) (string-append "I" (integer->rnum2 (- i 1)))) (else ""))) @

In Approach 2, while the function is running, the output Roman nuumeral string is held symbol-by-symbol as arguments in nested calls to string-append. The pieces of the output string are not assembled until the function has recursed all the way down to the level of the last character. Then the pieces are concatenated symbol-by-symbol as control returns back up the call chain.

I vaguely recall hearing somewhere that Lisp (or maybe it was just Scheme) offers advantages if instead of having program state information spread out in memory like the under-construction Roman numeral string in Approach 2, you can capture program state in the arguments to the recursive function. Because of tail recursion?

This approach has the same cond structure as the previous approach, but adds and argument to the recursive function integer->rnum-r for passing the concatenated Roman numeral string for the portion of the input number already processed. When the function detects that the conversion is complete, it simply returns the input character string with no further changes as it gets passed back up the call stack.

With this approach, instead of calls to the recursive function inside nested pending calls to string-append, there is a call to string-append inside each nested call to the recursive function.

<<approach3.scm>>= (define (integer->rnum3 i) (integer->rnum-r "" i)) (define (integer->rnum-r r i) (cond ((> i 999) (integer->rnum-r (string-append r "M" ) (- i 1000))) ((> i 899) (integer->rnum-r (string-append r "CM") (- i 900))) ((> i 499) (integer->rnum-r (string-append r "D" ) (- i 500))) ((> i 399) (integer->rnum-r (string-append r "CD") (- i 400))) ((> i 99) (integer->rnum-r (string-append r "C" ) (- i 100))) ((> i 89) (integer->rnum-r (string-append r "XC") (- i 90))) ((> i 49) (integer->rnum-r (string-append r "L" ) (- i 50))) ((> i 39) (integer->rnum-r (string-append r "XL") (- i 40))) ((> i 9) (integer->rnum-r (string-append r "X" ) (- i 10))) ((> i 8) (integer->rnum-r (string-append r "IX") (- i 9))) ((> i 4) (integer->rnum-r (string-append r "V" ) (- i 5))) ((> i 3) (integer->rnum-r (string-append r "IV") (- i 4))) ((> i 0) (integer->rnum-r (string-append r "I" ) (- i 1))) (else r))) @

Thinking about Approach 1, I wondered why the simple algorithm:

*Take the first digit of the number and look up the Roman numeral for the place and value, then repeat the process for the remaining portion of the number.*

... that even an elementary student can master requires so much arithmetic to program. I realized that there is a kind of processing that human beings can do effortlessly and unconsciously that is tricky for computers -- we can look at a numeral and think of it as either a number or a character string with no processing overhead.

With that in mind I wrote the functions for this approach and Approach 5 expecting a string representation of the Arabic number to convert for the input argument. Working through the number character-by-character makes quite straightforward code with very little arithmetic, even allowing for the need to convert each character to a numeric value for the roman-digit look-up.

Like Approach 1, this function makes a recursive call for each digit in the original number.

Function roman-digit is reused from Approach 1.

<<approach4.scm>>= (define (string->rnum1 s) (let ((p (string-length s))) (if (zero? p) "" (let ((v (string->number (substring s 0 1)))) (string-append (if (zero? v) "" (roman-digit (- p 1) v)) (string->rnum1 (substring s 1))))))) @

The last approach combines the string-based technique of Approach 4 with the state-carrying recursive function argument of Approach 3. With concise code, recursion depth proportional to the input number length, avoidance of advanced arithmetic, and tail recursion goodness, this approach may take the prize.

Function roman-digit is reused from Approach 1.

<<approach5.scm>>= (define (string->rnum2 s) (string->rnum-r "" s)) (define (string->rnum-r r s) (let ((p (string-length s))) (if (zero? p) r (let ((v (string->number (substring s 0 1)))) (string->rnum-r (if (zero? v) r (string-append r (roman-digit (- p 1) v))) (substring s 1))))))