Our solutions engineering team at Noibu has crafted a data-driven approach to investigating and resolving errors. At its core, this approach consists of four steps:
- Step 1: Gather Data
- Step 2: Form Hypotheses
- Step 3: Investigate Hypotheses
- Step 4: Implement the Fix
We'll explore each of these steps in detail in other articles, but let's go through them at a high-level to give you a birds-eye view of the process from start to finish.
Step 1: Gather Data
The first step is obvious: gather as much data as possible about the error. It's natural to want to dive into the code and try to reproduce the error, but it can be a waste of time–or even dangerous–to plow forward based on early assumptions. Even if you make the correct assumption from the jump, it's valuable in the long term to see the complete picture. Errors rarely occur in isolation, and collecting a wide breadth of data may aid you in resolving more, related errors down the line.
This step relies on four key information gathering questions:
- WHAT is the problem?
- WHERE is the error occurring?
- WHEN did/does the error occur?
- WHO has the information we need?
Once you have ample information to answer these questions, you're ready to move on to Step 2.
Step 2: Form Hypotheses
The second step builds upon the first: use the data you've gathered to brainstorm some options of what the problem COULD be. This is an opportunity to cast a wide net and consider obscure options along with likely contenders. Open-mindedness is key at this stage. Avoid ruling anything out or making assumptions without a solid justification from the data.
This step considers the remaining information gathering questions:
- WHY is the error occurring?
- HOW might the error be resolved?
Once you have a formidable list of hypotheses, you're ready to move on to Step 3.
Step 3: Investigate Hypotheses
The third step is all about proving or disproving your hypotheses from Step 2. This is accomplished in a few ways, including reproduction, but we'll discuss that in another article. The goal here is to eliminate hypotheses one by one until you're left with the likely culprit that's supported by your data and your investigation.
Once you've zeroed-in on the error's cause and source, you can move on to Step 4.
Step 4: Implement the Fix
The fourth step is the juicy part: fixing the error. Once you've deployed the fix to your code, you can use Noibu to confirm that the error has been resolved. If the fix was successful, occurrences of the error will stop dead in their tracks. If the fix was unsuccessful, and occurrences continue apace, return to Step 3 and pursue another hypothesis.
Now that you understand the process at a high-level, let's circle back and do a deep-dive into each of these steps, starting with Step 1: Gathering Data.