Thoughts on debuggingMarch 27, 2022• [playbooks] #systems #debugging
I recently found myself working with some new graduates and helping them through a debugging process. They took their time with the code, and it was an interesting experience watching them go about it. I was in their place a decade ago and quite certainly as bad. I have learnt a few tricks since then, and this gave me an opportunity to articulate it, so here goes.
1. Know what you want to achieve
First you must know what the system requirements are. Any bugs or problems you encounter are bugs or problems only because they prevent you from achieving your system goals.
2. Describe the problem(s)
In writing, explain what the current problems of the system are. It's important to write this down rather than just talking about it. Writing things down is important because,
- You get more clarity on the problem. Sometimes it will make the solution obvious.
- You can use it for documentation.
When writing, remember to talk only about the problem. Don't confuse potential solutions with the problem statement. We will get to solutions later.
3. Pick one problem to solve
Tackle one problem at a time. This is actually the fastest way to get through your problems list. Don’t try to do more than one thing at a time. Slow is smooth, smooth is fast.
Debugging consists of two stages: figuring out what’s causing the bugs you see, and squashing those bugs.
4. Root cause analysis
a. Generate hypotheses
Generate a list of hypotheses about why you might be seeing the problem.
Your list will be a combination of what you know from experience and what stack overflow tells you.
You must have a model about what’s happening in your head before you start tacking the issue. You will constantly revisit this and update this list.
b. Validate hypotheses
Conduct experiments to verify which of the hypotheses holds true.
If you are going to comment parts of code while doing this, make sure you do it in an organized manner. Keep a record of the experiments run so as to not waste effort.
Ideally, your code should already be modular and customizable, so you can turn off components with a flag when experimenting.
To conduct experiments,
- Get your code in order like I have indicated above. The whole purpose is to isolate the problem you have picked. Turn off other components unrelated to this issue. We want to reproduce this one issue consistently, with the simplest set of inputs.
- Sometimes, you will have to write newer, simpler code that exercises the components with the issue, especially if the code base is big.
- Increase complexity if you are unable to reproduce the issue with the simplest use case. Include more and more components.
c. Repeat 4a and 4b
Repeat the steps
b till you know what's causing the problem with high certainty. At the end of this process you should know what the root cause of the issue is.
In each iteration, refine the hypotheses, the experiments, or both. Do not keep doing the same thing again and again and expect different results. You don’t want to waste time. Refine. Add new hypotheses, discard old ones. Update experiments.
5. Squash the bug
You now know what the root cause is, figure out a solution. Use google. Ask your fellow engineers. Some solutions are readily apparent once you discover the root cause. Others need time and sleeping. Some can be solved with the tools we have at the moment. Others will require a different set of tools that may not be easy to acquire.
Getting to root cause is 70% of debugging1. Solving the problem requires a different set of heuristics, but many can be tackled with just stack-overflow and asking your fellow engineers.
Document the problem and solution both for yourself and others. This is as important as the previous steps. In your programming and problem solving career, you will keep hitting the same issues again and again and you want to be able to refer to your previous attempts at solving them.
7. Repeat 3 - 6
Repeat steps 3 to 6 with the other problems in your list.
I made this up. It needn't be 70%, but it's a not insignificant number and likely higher.
Back to top