As a father-to-be, I'm both thrilled and terrified to be expecting a baby. Of all things I should do right now, the most practical thing would be to come up with a name. However, out of thousands if not tens of thousands first names out there, how to pick the one?
I am not very creative at naming people or pets, and I'm very picky at the same time. As we would likely to have multiple children down the road, we definitely need to come up with a system to help us name them. Say the maximum possible number of children we would have is six, we would want each child's name starts with a different letter. Together these letters would form a word with a beautiful meaning. I take inspiration from Brian W Kernighan's book UNIX: A History and a Memoir, in which he writes a story about using grep and regular expression to help a friend find words (from the dictionary on his Unix system) matching an upside-down calculator screen. I thought it was funny when I first read about it, but now I think it just might be what I need to try.
So here is the gist. I'm going to find all six-letter words from a dictionary and pick one. The word can have capitalized or small first letter, but it should not contain repeating letters.
First, I run the following command again a dictionary file commonly found on Linux and BSD machines (on macOS it's /usr/share/dict/web2). The purpose is to extract all six-letter words and store them to a temp filed called `upper_unprocessed'
~$ grep -E '^[[:upper:]][[:lower:]]{5}$' /usr/share/dict/web2 > upper_unprocessed
Next, we need to process the upper_unprocessed
file. I first convert capitalized letters to lower letters, piping the results to grep
to find the words that do not have any repeating letters (notice the -v
flag). Finally, I capitalize the first letter of each word, essentially restoring their original look. The cleaned up list is now stored in upper
file.
~$ tr [:upper:] [:lower:] < upper_unprocessed | grep -Ev '(.)(.*\1){1}' | sed -E 's/(.)/\u\1/' > upper
Now that we have upper case words taken care of, let's look at lower case words. We find six-letter words from web2
dictionary and get rid of the ones with repeating letters, store them in lower
file.
~$ grep -E '^[[:lower:]]{6}$' web2 | grep -Ev '(.)(.*\1){1}' > lower
Finally, we cat display both upper
and lower
files, sort all the words alphabetically (albeit ignoring the case with -f
flag) and output them to a file called combined
.
~$ cat upper lower | sort -f > combined
Voila! Here is a sneak peak of the resulted combined
file:
abdest
Abdiel
abduce
abduct
abeigh
abider
Abipon
abject
abjure
ablest
...
begoud
begowk
begray
begrim
Beguin
begulf
begunk
behalf
behind
behint
A quick wc -l
shows over 8000 lines (or words, in this case), giving us plenty of choices. A quirk I found during the process is that how unused I am to BSD version of these tools. I learned the command line by using Linux and are used to GNU version of things. Dealing with regex on a BSD machine is a little weird and frustrating. As a result, I grabbed the dictionary file from a Mac and did the processing inside a Debian system.
I would tell you that the final choice is the word family
, which totally checks all the boxes and means a lot to, well, a family. Credits go to Unix tools!
Wai, very Dai.