The Light of Human Reason

It’s strange…

… how, when working on one of your own projects, you soon reach the point where you feel a strong urge to re-write the whole thing from the start.You should usually resist this desire — there isn’t time! That said, it’s not necessarily a bad thing in itself. It means you’ve learned something new while working on the project. You may have been approaching a problem in one way, and then suddenly a new way presents itself. It may be an entirely new piece of functionality that you didn’t know the language or framework had, which could cut the length of your program by 25%. Or perhaps you just suddenly think ‘this is what I really meant when I wrote this’. Or perhaps you even suddenly find you understand the business problem more thoroughly.

While it’s good to learn as you write, it’s better still to have a clear idea before you start. Discuss the problem you’re trying to solve with someone more experienced than you. They can see the ‘unknown unknowns’ that you can’t. It’s amazing how many of these there are, and it’s much easier for another person to see this than it is for you.

Another good thing to do is spend 15 minutes every day teaching yourself about some new functionality. Open the manual at random and see what comes up. This is a surprisingly good way to learn once you know the basics (but before you’re an expert). It hands you new tools, thereby preventing you from falling into the trap of seeing every problem as a nail just because the only tool you have is a hammer.

Deleting from a flat table

My current project is to build a dummy command prompt. The program reads in a list of files and directories from a text file (generated using the find command), and creates a basic directory tree and prompt based on the contents. It lets you create, move, copy or delete files within this pretend directory structure, completely isolated from the real directory structure. The files don’t actually contain anything, and you can’t do anything with them. True, it’s not exciting from a user perspective, but that’s not the point. I’m learning a great deal about data structures, separation of the CLI user interface from the guts of the program (so I can build a web interface easily in the future), and other things. It’s also something I can easily build a Javascript front-end for. The point (at this stage!) isn’t to worry about precisely what my program does, but rather how well I write it.

Anyway, one problem I’m facing at the moment is deletion of files. To understand why this is a problem requires a bit of background explanation. My file/directory structure is based on a simple 1-dimensional array, and each file is referenced using its position in the array. Each element of the array (ie each file record) contains a hash with two values: the name of the file/directory, and its parent directory’s number in the array. (I’ve also added some other, custom fields, but I’m describing the bare bones.) This is a standard and very simple way of storing hierarchical data.

Let’s suppose we have an extremely simple file structure with the following files:

/ (root directory)
/home
/home/edward
/home/edward/somework.txt
/home/bob
/home/bob/email.eml

This will look like the following in the array:

Position in array Name Parent’s position in array
0 / (ie the root directory) nil
1 home 0
2 edward 1
3 somework.txt 2
4 bob 1
5 email.eml 4

So far, so simple. This is enough information to reproduce the entire directory structure, whether we have six files, as in the example above, or six billion.

I’ve also added a further hash to each array element called ‘Children’. This hash itself contains an array with that record’s children’s unique numbers. I know this isn’t strictly necessary, since it contains the same information as the Parent field in reverse. I did it on same basis that databases have indexes: to speed things up. It would be too slow to search the entire table for a given parent’s children if I wanted to work with, say, the file tree of an entire Linux installation.

So our table now looks like this:

Position in array Name Parent’s position in array Children’s positions in array
0 / (ie the root directory) nil { 1 }
1 home 0 {2 , 4}
2 edward 1 { 3 }
3 somework.txt 2 { }
4 bob 1 { 5 }
5 email.eml 4 { }

The problem with deletion

Continuing with the example above, suppose I want to delete the directory /home/edward, and all its contents. The first step is obviously to delete record 2. The next step is to look up all record 2’s children (in this case, just record 3), and delete them. Finally, element 2 should be removed from the list of parent 1’s children.

All this is fairly simple. But there’s a problem. Once elements 2 and 3 have been removed from the array, elements 4 and 5 become elements 2 and 3, because they have been moved ‘down’ the array. (At least in Ruby.) Any references to 4 and 5 now point to non-existent elements. So we need to modify our references. All references to 4 need to become references to 2, and all references to 5 need to become references to 3. If we didn’t do this, /home would have references to children that didn’t really exist, since 4 (and 4’s child, 5) are now nonexistent records. In a larger table than our example, they would reference the wrong elements.

But then we hit a further snag. How do we tell our program how much to modify our references by? If we have a larger array and are deleting (say) elements 100 and 200, then refs to elements between 101 and 199 need to be subtracted by one, but refs to elements 200 and above need to be subtracted by two. And if elements 100 and 200 have children (which perhaps have further children), then it gets more complicated still to keep track of how much to subtract references by.

So I went for the slow-but-simple solution: every time we delete an element, run through the entire array and subtract one from all references greater than that element’s number. There were several other solutions that would have been quicker, such as adding a ‘deleted = true’ tag and tidying up later, or making the entire element nil. But this option gave the least potential for something to go wrong later, and I was prepared to sacrifice some performance for that simplicity, at least at this stage.

I did this recursively on a deleted element’s children, so they would be removed in the same way before their parent. This worked pretty well, but caused one further problem which I’ll discuss in another post.

 

Theory and practice

From time to time, I dabble in philosophy, and much of that involves writing essays. As well as communicating an idea to others, writing helps you get your own thoughts in line. Writing allows — or forces — you to structure your thoughts and deal with any logical errors. It makes you put into clear writing what was previously only a mish-mash of vaguely connected ideas. Clear writing can flow only from clear thought, and therefore unclear thought must become clear before you write it down.

Writing flows from thought, but it also determines it. Your thoughts about a given topic will determine what you write, but they will (to a degree) be determined by it as well. That’s one reason that using language accurately is so important. Sloppy language not only results from sloppy thinking, but causes it, in yourself as well as others.

Programming can be similar. True, it differs from essay-writing in that you need a clear idea of the problem you’re trying to solve before you start. But very often, the actual details of how to get there must be worked out en-route. You’ll have a rough idea of how to solve a problem, but you won’t have the path entirely laid out until it’s on the screen in front of you. Once you’ve typed out a rough solution, you can make improvements to something already-existing, instead of trying to deal with everything in theoryland.

This is fine when you’re tinkering, but when you’re limited by time, it can be a problem. Endless re-works are not what any business wants! So how do you work around it? Firstly, recognize the problem at hand. As a junior, you’re not going to know the best way to do something first time (or even, necessarily, a good way of doing something first time). But try your best anyway. Don’t fall into the trap of perfectionism, but plan out a rough idea of how you’re going to solve a problem before you fire up your IDE. Write down, in logical steps, how you’re going to get from point A to B. Once you’re clear on that, you can translate it into code.

Secondly, make sure you know the problem in the first place! Can you describe in clear English what your stakeholder/client/manager wants you to do? If not, talk to them until you can.

Thirdly, write unit tests as you go along.

Finally, comment your code as you go, for your own benefit as much as for others’.

Of course, you can’t change some things overnight — like your experience level. A 20-year old undergrad won’t have the vocabulary or the complexity of thought of a 40-year old professor. And a junior won’t have the knowledge of a senior programmer — neither of the language in question, nor of programming concepts in general. You should work on this in the long term, but meanwhile, accept these limitations, and seek others’ input.

Finally, you need humility, a necessary factor for success in any undertaking. Try new things. Be prepared to fail. If what you’re doing doesn’t work, change it.

The danger of perfectionism

Perfectionism can be a serious danger for programmers. It can manifest itself in all kinds of ways.

Often, you can be extremely dissatisfied with your code when it’s so far from perfection. This has been happening to me a lot lately. I’m in the early stages of learning Ruby, and my code is something of an ugly mix between functional and object-oriented. My code works, but it often seems to break the style and ‘spirit’ of the language. I often seem to think and write procedurally, rather than object-ly. I’m not happy with what I write, and wish it could be better.

The wrong response is to spend days and days seeking some kind of Platonic-ideal perfection. The correct response, of course, is to  press on anyway, make incremental improvements, write something that seems reasonably good and does the job, and then return to it after a day or so. And then seek others’ comments on it.

Perfectionism shows up in other ways too. I did plenty of coding as a teenager, but recently returned to it as an adult. I spent a long time agonizing over which language to learn: which would teach me the most, which would be the most future-proof, which would have the best job prospects, etc. In the end, I took a promising online course which teaches Javascript and C#, while continuing Ruby in my spare time.

It’s easy to fall into the trap of seeking some kind of pure perfection. Logical people — that is, programmers — often fall into it, because the abstract logic with which they thrive does have a kind of purity and perfection to it. Physical reality doesn’t: it doesn’t move so cleanly or in such a structured way. Of course, physical reality intersects with logic in all kinds of ways — otherwise it would be entirely unpredictable. But every situation is unique and changing, and so arriving at abstract perfection is impossible.

You want to do what you have to do in the best way you can, and then move on.