Bug Tracker Hell and How To Get Out!

Whether you call it a defect or bug or change request or issue or enhancement you need an application to record and track the life-cycle of these problems.  For brevity, let’s call it the Bug Tracker.

Bug trackers are like a roach motel, once defects get in they don’t check out!  Because they are append only, shouldn’t we be careful and disciplined when we add “tickets” to the bug tracker?  We should, but in the chaos of a release (especially start-ups :-)) the bug tracker goes to hell.

Bug Tracker Hell happens when inconsistent usage of the tool leads to various problems such as duplicate bugs, inconsistent priorities and severities.  While 80% of defects  are straight forward to add to the Bug Tracker, it is the other 20% of the defects that cause real problems.

The most important attribute of a defect is its DefectLifecycleStatus; not surprisingly every Bug Tracker makes this the primary field for sorting.   This primary field is used to generate reports and to manage the defect removal process.  If we manage this field carefully we can generate reports that not only help the current version but also provide key feedback for post-mortem analysis.

Every Bug Tracker has at least the states Open, Fixed, and Closed, however, due to special cases we are tempted to create new statuses for problems that have nothing to do with the life cycle.  The creation of life cycle statuses that are not life cycle states is what caused inconsistent usage of the tool because then it becomes unclear how to enter a defect.

It is much easier to have consistent life cycle states than to have a 10 page manual on how to enter a defect.

(This color is used to indicate a defect attribute, and this color is used to indicate a constant.)

What Life Cycle States Do We Need?

Clearly we want to know how many Open defects need to be fixed for the  current release; after all, management is often breathing down our neck to get this information.

Ideally we would get the defects outstanding report by finding out how many defects are Open. Unfortunately, there are numerous open defects that will not be fixed in the current release (or ever 🙁 ) and so we seek ways to remove those defects from the defects outstanding.

Why complicate life?

In particular we are tempted to create states like Deferred,  WontFix, and FunctionsAsDesigned, to remove defects from the  defects outstanding.  These states have the apparent effect of simplifying the defects outstanding report but will end up complicating other matters.

For example, Deferred is simply an open defect that is not getting fixed in the current release; WontFix is an open defect that  the business has decided not to fix; and FunctionsAsDesigned indicates that either the requirements were faulty or QA saw a phantom problem, but once this defect gets into the Bug Tracker you can’t get it out.

All three states above variants of the Open life cycle state and creating these life cycle states will create more problems than they solve. The focus of this article is on how to fix the defect life cycle, however, other common issues are addressed.

 

Life cycle states for Deferred, WontFix, or FunctionsAsDesigned is like a “Go directly to Bug Tracker Hell” card!

Each Defect Must Be Unambiguous

The ideal state of a Bug Tracker is to be able to look at any defect in the system and have a clear answer to each of the following questions.

  • Where is the defect in the life-cycle?
  • Has the problem been verified?
  • How consistently can the problem be reproduced or is it intermittent?
  • Which team role will resolve the issue? (team role, not person)

The initial way to get out of hell is to be consistent with the life cycle state.

Defect Life Cycle

All defects go through the following life cycle (DefectLifecycleStatus) regardless of whether we track all of these states or not:

  • New
  • Verified
  • Open
  • Work in Process
  • Work complete
  • Fixed
  • Closed

Anyone should be able to enter a New defect, but just because someone thinks “I tawt I taw a defect!” in the system doesn’t mean that the defect is real.  In poorly specified software systems QA will often perceive a defect where there is none, the famous functions as designed (FAD)  issue.

Since there are duplicate and phantom issues that are entered into the Bug Tracker, we need to kick the tires on all New defects before assigning them to someone.  It is much faster and cheaper to verify defects than to simply throw them at the development team and assume that they can fix them.

Trust But Verify

New defects not entered by QA should be assigned to the QA role.  These defects should be verified by QA before the life cycle status is updated to Verified.  QA should also make sure that the steps to reproduce the defect are complete and accurate before moving the defect to the Verified life cycle status.  Ideally even defects entered by QA should be verified by someone else in QA to make sure that the defect is entered correctly.

By introducing a Verified  state you separate out potential work from actual work. If a bug is a phantom then QA can mark it as Closed  it before we assign it to someone and waste their time.  If a bug is a duplicate then it can be marked as such, linked to the other defect, and Closed.

The advantage of the Verified status is that the intermittent bugs get more attention to figure out how to reproduce them.  If QA discovers that a defect is intermittent then a separate field in the Bug Tracker, Reproducibility, should be populated with one of the following values:

  • Always (default)
  • Sometimes
  • Rare
  • Can’t reproduce

Note: This means that bugs that can not be reproduced stay in the New state until you can reproduce them.  If you can’t reproduce them then you can mark the issue as Closed without impacting the development team.

Assign the Defect to a Role

QA has a tendency to assume that all defects are coding defects — however, the analysis of 18,000+ projects does not confirm this.  In The Economics of Software Quality, Capers Jones and Olivier Bonsignour show that defects fall into different categories. Below we give the category, the frequency of the defect, and the business role that will address the defect.

Note, only the bolded rows below are assigned to developers.

Defect Role Category Frequency Role
Requirements defect 9.58% BA/Product Management
Architecture or design defect 14.58% Architect
Code defect 16.67% Developer
Testing defect 15.42% Quality Assurance
Documentation defect 6.25% Technical Writer
Database defect 22.92% DBA
Website defect 14.58% Operations/Webmaster

Defect Role Categories are important to accelerating  your overall development speed!

Even if all architecture, design, coding, and database defects are handled by the development group this only represents 54% of all defects.  So assigning any New defect to the development group without verification is likely to cause problems inside the team.

Note, 25% of all defects are caused by poor requirements and bad test cases, not bad code.  This means that the business analysts and QA folks are responsible for fixing them.

Given that 46% of all defects are not resolved by the development team there needs to be a triage before a bug is assigned to a role.  Lack of bug triages is the Root cause of ‘Fire-Fighting’ in Software Projects.

The Bug Tracker should be extended to record the DefectRole in addition to the assigned attribute.  Just this attribute will help to straighten out the Bug Tracker!

Non-development Defects

Most Bug Tracking systems have a category called enhancement.  Enhancements are simply defects in the requirements and should be recorded but not specified in the Bug Tracker; the defect should be Open with a DefectRole of ProductManagement.

Enhancements need to be assigned to product managers/BAs who should document and include a reference to that documentation in the defect.  The description for the defect is not the proper place to keep requirements documentation.  The life cycle of a product requirement is generally very different from a code defect because the requirement is likely to be deferred to a later release if you are late in your product cycle.

Business requirements may have to be confirmed with the end users and/or approved by the business.  As such, they generally take longer to become work items than code defects.

QA should not send enhancements to development without involvement of product management.

Note that  15.42% of the defects are a QA problem and are fixed in the test plans and test cases.

Bug Triage

The only way to correctly assign resources to fix a defect is to have a triage team meet regularly that can identify what the problem is.  A defect triage team needs to include a product manager, QA person, and developer.   The defect triage team should meet at least once a week during development and at least once a day during releases.  Defect triages save you time because only 54% of the defects can be fixed by the developers; correctly assigning defects avoids miscommunication.

Effective bug triage meetings are efficient when the only purpose of the meeting is to correctly assign defects.  Be aggressive and keep design discussions out of triages. 

Defects should be assigned to a role and not a specific person to allow maximum flexibility in getting the work done; they should only be assigned to a specific person when there is only one person who can resolve an issue.

Assigning unverified and intermittent defects to the wrong person will start your team playing the blame game.

As the defects are triaged, product management (not QA) should set the priority and severity as they represent the business.  With a multi-functional team these two values will be set consistently.  In addition the triage team should set the version that the defect will be fixed.  Some teams like to put the actual version number where a defect gets fixed(i.e. ExpectedFixVersion) I prefer to use the following:

  • Next bug fix
  • Next minor release
  • Next major  release
  • Won’t fix

Expected FixI like ExpectedFixVersion because it is conditional, it represents a desire.  Like it (or not) it is very hard to guess when every defect will be fixed.  The reality is that if the release date gets pulled in or the work turns out to be more involved than expected the fix version could be deferred (possibly indefinitely).  If you guess wrong then you will spend a considerable amount of time changing this field.

Getting the Defect Resolved

Once the defects are in the system each functional role can assign the work to its resources.  At that point the defect life cycle state is Work In Progress.

All Work complete means is that the individual working on the defect believes that it is resolved.  When the work is resolved the FixVersion should be set as the next version that will be released.  Note, if you use release numbers in the ExpectedFixVersion field then you should update that field if it is wrong 🙂

Of course the defect may or may not be resolved, however, the status of Work complete acts a signal that someone else has work to do.

If a requirements defect is fixed then the issue should be moved to Fixed and assigned to the development manager that will give the work to his team.  Once the team has verified their understanding of the requirement the defect can move from Fixed to Closed.

Work complete means that the fixer  believes that problem is resolved, Fixed means that the team has acknowledged the fix!

For code defects the Work complete status is a signal to QA to retest the issue.  If QA establishes that the defect is fixed they should move the issue to Fixed.  If the issue is not fixed at all then the defect should move back to Open; if the defect is partially fixed then the defect should move to Verified so that it goes back through the bug triage process (i..e severity and priority may have changed).

Once a release is complete, all Fixed items can be moved to Closed.

Tracking Defects Caused by Fixing Defects

Virtually all Bug Trackers allow you to link one or more issues together.  However, it is extremely important to know why bugs are linked, in most cases you link bugs because they are duplicates.

Bugs can be linked together because fixing one defect may cause another.  On average this happens for every 14 defects fixed but in the worst organizations can happen every 4 defects fixed.  Keeping a field called ResultedFromDefect where you link the number of the other defect allows you to determine how new defects are the result of fixing other defects.

Summary

Let’s recap how the above mechanisms will help you get out of hell.

  1. By introducing the Verified step you make sure that bugs are vetted before anyone get pulled into a wild goose chase.
    1. This also will catch intermittent defects and give them a home while you figure out how often they are occurring and work out if there is a reliable way to produce them.
    2.  If you can’t reproduce a defect then at least you can annotate it as Can’t Reproduce, i.e. status stays as New and it doesn’t clog the system
  2. By conducting triage meetings with product management, QA, and development you will end up with very consistent uses of priority and severity
  3. Bug triages will end up categorizing defects according to the role that will fix them which will reduce or eliminate:
    1. The blame game
    2. Defects being assigned to the wrong people
  4. By having the ExpectedFixVersion be conditional you won’t have to run around fixing version numbers for defects that did not get fixed in a particular release.  It also gives you a convenient way to tag a defect as Won’t Fix, the status should go back to Verified.
  5. By having the person who fixes a defect set the FixVersion then you will have an accurate picture of when defects are fixed
  6. When partially fixed defects go back to Verified the priority and severity can be updated properly during the release.

Benefits of the Process

By implementing the defect life cycle process above you will get the following benefits:

  • Phantom bugs and duplicates won’t sandbag the team
  • Intermittent bugs will receive more attention to determine their reproducibility
    • Reproducible bugs are much easier to fix
  • Proper triages will direct defects to the appropriate role
  • You will discover how many defects you create by fixing other defects

By having an extended set of life cycle states you will be able to start reporting on the following:

  • % of defects introduced while fixing defects (value in ResultedFromDefect)
  • % of New bugs that are phantoms or duplicates, relates to QA efficiency
  • % of defects that are NOT development problems, relates to extended team efficiency (i.e. DefectRole <> Development)
  • % of requirements defects which relates to the efficiency of your product management (i.e. DefectRole = ProductManagement)
  • % of defects addressed but not confirmed (Work Completed)
  • % of defects fixed and confirmed (Fixed)

It may sound like too much work to change your existing process, but if you are already in Bug Tracker hell, what is your alternative?

Need help getting out of Bug Tracker hell?  Write to me at dmahal@AcceleratedDevelopment.ca

Appendix: Importance of Capturing Requirements Defects

Shift HappensThe report on the % of requirements defects is particularly important because it represents the amount of scope shift (creep) in your project.   You can see this in the blog Shift Happens.  Also, if the rates of scope shift of 2% per month are strong indicators of impending swarms of bugs and project failure. Analysis shows that the probability of a project being canceled is highly correlated with the amount of scope shift.  Simply creating enhancements in the Bug Tracker hides this problem and does not help the team.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

Efficiency is for Losers

Focusing on efficiency and ignoring effectiveness is the root cause of most software project failures.

Effectiveness is producing the intended or expected result. Efficiency is the ability to accomplish a job with a minimum expenditure of time and effort.

Effective software projects deliver code that the end users need; efficient projects deliver that code with a minimum number of resources and time.

Sometimes, we become so obsessed with things we can measure, i.e. project end date, kLOC, that we somehow forget what we were building in the first place.  When you’re up to your hips in alligators, it’s hard to remember you were there to drain the swamp.

Efficiency only matters if you are being effective.

After 50 years, the top three end-user complaints about software are:

  1. It took too long
  2. It cost too much
  3. It doesn’t do what we need
Salaries are the biggest cost of most software projects, hence if it takes too long then it will cost too much, so we can reduce the complaints to:

  1. It took too long
  2. It doesn’t do what we need

The first issue is a complaint about our efficiency and the second is a complaint about our effectiveness. Let’s make sure that we have common  definitions of these two issues before continuing to look at the interplay between efficiency and effectiveness.

Are We There Yet?

Are you late if you miss the project end date? 

That depends on your point of view; consider a well specified project (i.e. good requirements) with a good work breakdown structure that is estimated
by competent architects to take a competent team of 10 developers at least 15 months to build. Let’s consider 5 scenarios where this is true except as stated below:

Under which circumstances is a project late?

A. Senior management gives the team 6 months to build the software.
B. Senior management assigns a team of 5 competent developers instead of 10.
C. Senior management assigns a team of 10 untrained developers
D. You have the correct team, but, each developer needs to spend 20-35% of their time maintaining code on another legacy system
E. The project is staffed as expected

Here are the above scenarios in a table:

#
Team
Resource
Commitment
Months Given
Result
A
10 competent developers
100%
6
Unrealistic estimate
B
5
competent developers
100%
15
Under staffed
C
10 untrained developers
100%
15
Untrained staff
D
10 competent developers
65-80%
15
Team under committed
E
10 competent developers
100%
15
Late


Only the last project (E) is late because the estimation of the end date was consistent with the project resources available.

Other well known variations which are not late when the end date is missed:

  • Project end date is a SWAG or management declared
  • Project has poor requirements
  • You tell the end-user 10 months when the estimate is 15 months.

If any of the conditions of project E are missing then you have a problem in estimation.  You may still be late, but not based on the project end date computed with bad assumptions

Of course, being late may be acceptable if you deliver a subset of the expected system.

It Doesn’t Work



“It doesn’t do what we need” is a failure to deliver what the end user needs. How so we figure out what the end user needs?

The requirements for a system come from a variety of sources:

  1. End-users
  2. Sales and marketing (includes competitors)
  3. Product management
  4. Engineering

These initial requirements will rarely be consistent with each other. In fact, each of these constituents will have a different impression of the requirements. You would expect the raw requirements to be contradictory in places. The beliefs are like the 4 circles to the left, and the intersection of their beliefs would be the black area.

The different sources of requirements do not agree because:

  • Everyone has a different point of view
  • Everyone has a different set of beliefs about what is being built
  • Everyone has a different capability of articulating their needs
  • Product managers have varying abilities to synthesize consistent requirements
It is the job of product management to synthesize the different viewpoints into a single set of consistent requirements. If engineering starts before
requirements are consistent then you will end up with many fire-fighting meetings and lose time.

Many projects start before the requirements are consistent enough. We hope the initial requirements are a subset of what is required.
In practice, we have missed requirements and included requirements that are not needed (see bottom of post, data from Capers Jones)

The yellow circle represents what we have captured, the black circle represents the real requirements.

We rarely have consistent requirements when we start a project, that is why there are different forms of the following cartoon lying around on the Internet.

If you don’t do all the following:

  • Interview all stakeholders for requirements
  • Get end-users to articulate their real needs by product management
  • Synthesize consistent requirements

Then you will fail to build the correct software.  So if you skip any of this work then you are guaranteed to get the response, “It doesn’t do what we need”.

Effectiveness vs. Efficiency

So, let’s repeat our user complaints:
  1. It took too long
  2. It doesn’t do what we need

It’s possible to deliver the correct software late.

It’s impossible to deliver on-time if the software doesn’t work

Focusing on effectiveness is more important than efficiency if a software project is to be delivered successfully.


Ineffectiveness Comes from Poor Requirements

Most organizations don’t test the validity or completeness of their requirements before starting a software project.
The requirements get translated into a project plan and then the project manager will attempt to execute the project plan. The project plan becomes the bible and everyone marches to it. As long as tasks are completed on time everyone assumes that you are effective, i.e. doing the right thing.

That is until virtually all the tasks are jammed at 95% complete and the project is nowhere near completion.

At some point someone will notice something and say, “I don’t think this feature should work this way”. This will provoke discussions between developers, QA, and product management on correct program behavior. This will spark a series of fire-fighting meetings to resolve the inconsistency, issue a defect, and fix the problem. All of the extra meetings will start causing tasks on the project plan to slip.

We discussed the root causes of fire-fighting in a  previous blog entry.

When fire-fighting starts productivity will grind to a halt. Developers will lose productivity because they will end up being pulled into the endless meetings. At this point the schedule starts slipping and we become focused on the project plan and deadline. Scope gets reduced to help make the project deadline; unfortunately, we tend to throw effectiveness out the window at this point.

With any luck the project and product manager can find a way to reduce scope enough to declare victory after missing the original deadline.

The interesting thing here is that the project failed before it started. The real cause of the failure would be the inconsistent requirements.But, in the chaos of fire-fighting and endless meetings, no one will remember that the requirements were the root cause of
the problem.

What is the cost of poor requirements? Fortunately, WWMCCS has an answer.  As a military organization they must tracks everything in a detailed fashion and perform root cause analysis for each defect (diagram).

This drawing shows what we know to be true.

The longer a requirement problem takes to discover, the harder and more expensive it is to fix!  A requirement that would take 1 hour to fix will take 900 hours to fix if it slips to system testing.

Conclusion

It is much more important to focus on effectiveness during a project than efficiency. When it becomes clear that you will not make the project end date, you need to stay focused on building the correct software.
Are you tired of the cycle of:
  • Collecting inconsistent requirements?
  • Building a project plan based on the inconsistent requirements?
  • Estimating projects and having senior management disbelieve it?
  • Focusing on the project end date and not on end user needs?
  • Fire-fighting over inconsistent requirements?
  • Losing developer productivity from endless meetings?
  • Not only miss the end date but also not deliver what the end-users need?

The fact that organizations go through this cycle over and over while expecting successful projects is insanity – real world Dilbert cartoons.

How many times are you going to rinse and repeat this process until you try something different? If you want to break this cycle, then you need to start collecting consistent requirements.

Think about the impact to your career of the following scenarios:

  1. You miss the deadline but build a subset of what the end-user needs
  2. You miss the deadline and don’t have what the end-user needs
You can at least declare some kind of victory in scenario 1 and your resume will not take a big hit. It’s pretty hard to make up for scenario 2 no matter how you slice it.
Alternatively, you can save yourself wasted time by making sure the requirements are consistent before you start development. Inconsistent requirements will lead to fire-fighting later in the project.
As a developer, when you are handed the requirements the team should make a point of looking for inconsistent requirements.
The entire team should go through the requirements and look for inconsistencies and force product management to fix them before you start developing.
It may sound like a waste of time but it will push the problem of poor requirements back into product management and save you from being in endless meetings. Cultivating patience on holding out for good requirements will lower your blood pressure and help you to sleep at night.
Of course, once you get good requirements then you should hold out for proper project estimates 🙂

Want to see other sacred cows get tipped?Check out:

Moo?

Courtesy of Capers Jones via LinkedIn on 6/22

Customers themselves are often not sure of their requirements.

For a large system of about 10,000 function points, here is what might be seen for the requirements.

This is from a paper on requirements problems – send an email to capers.jones3@gmail.com if you want a copy.

Requirements specification pages = 2,500
Requirements words = 1,125,000
Requirements diagrams = 300
Specific user requirements = 7,407
Missing requirements = 1,050
Incorrect requirements = 875
Superfluous requirements = 375
Toxic harmful requirements = 18

Initial requirements completeness = < 60%

Total requirements creep = 2,687 function points

Deferred requirements to meet schedule = 1,522

Complete and accurate requirements are possible < 1000 function points. Above that errors and missing requirements are endemic.

VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Uncertainty trumps Risk in Software Development

Successful software development involves understanding uncertainty, and uncertainty only comes from a few sources in a software project.  The uncertainties of a software project increase with the size of the project and the inexperience of the team with the domain and technologies.The focus on this article is on uncertainty and not on risk.  In part 1 we discussed uncertainty and in part 2 we discussed risk, so it should be clear that:

All risks are uncertain, however, not all uncertainties are risks.

For example, scope creep is not a risk (see Shift Happens) because it is certain to happen in any non-trivial project.  Since risk is uncertain, a risk related to scope creep might be that the scope shifts so much that the project is canceled.  However, this is a useless risk to track because by the time it has triggered it is much too late for anything proactive to be done.

It is important to understand the uncertainties behind any software development and then to extract the relevant risks to monitor.  The key uncertainties of a software project are around:

  • requirements
  • technology
  • resources
  • estimating the project deadline

Uncertainty in Requirements
There are several methodologies for capturing requirements:

  • Business requirements document (BRD) or software requirement specification (SRS)
  • Contract-style requirement lists
  • Use cases (tutorial)
  • User stories

Regardless of the methodology used, your initial requirements will split up into several categories:

The blue area above represents what the final requirements will be once your project is completed, i.e. the System To Build.  The initial requirements that you capture are in the yellow area called Initial Requirements.

With a perfect requirements process the Initial Requirements area would be the same as the System To Build area.  When they don’t overlap we get the following requirement categories:

  1.  Superfluous requirements
  2. Missing requirements

Superfluous initial requirements tends to happen in very large projects or projects where the initial requirements process is incomplete.  Due to scope shift the Missing requirements category always has something in it (see Shift Happens).  If either of these two categories contains a requirement that affects your core architecture negatively then you will increase your chance of failure by at least one order of magnitude.

For example, a superfluous requirement that causes the architecture to be too flexible will put the developers through a high learning curve and lead to slow development.

If scalability is a requirement of the architecture but it is missing during the initial architecture then you will discover that it is difficult and costly to add later.


The physical equivalent would be the apartment building here on the right.  The foundation was insufficient to the needs of the building and it is slowly collapsing on one side.  Imagine the cost of trying to fix the foundation now that the building is complete.

I’ve been in start-ups that did not plan for sufficient scalability in the initial architecture; subsequently, the necessary scalability was only added with serious development effort and a large cost in hardware. Believe me, throwing expensive hardware at a software problem late in the life cycle is not fun or a practical solution :-(.


The overlapping box, Inconsistent Requirements, is to categorize known and missing requirements that turn out to be in conflict with other requirements.  This can happen when requirements are gathered from multiple user roles all of whom have a different view of what the system will deliver.

It is much easier and faster to test your requirements and discover inconsistencies before you start developing the code rather than discover the inconsistencies in the middle of development.  When inconsistencies are discovered by developers and QA personnel then your project will descend into fire-fighting (see Root cause of ‘Fire-fighting’ in Software Projects).

The physical equivalent here is to have a balcony specified to one set of contractors but forget to notify another set that you need a sliding door (see right).  When the construction people stumble on the inconsistency you may have already started planning for one alternative only to discover that rework is necessary to get to the other requirement.

Note, if you consistently develop software releases and projects with less than 90% of your requirements documented before coding starts then you get what you deserve (N.B. I did not say before the project starts 🙂 ).   One of the biggest reasons that Agile software development stalls or fails is that the requirements in the backlog are not properly documented; if you Google “poor agile backlogs” you will get > 20M hits.

Requirements Risks
Some risks associated with requirements are:

  • Risk of a missing requirement that affects the core architecture
  • Risk that inconsistent requirements cause the critical path to slip

Uncertainty in Technology
Technical uncertainty comes  from correctly using technology but failing to accomplish the goals as outlined by your requirements; lack of knowledge and/or skills will be handled in the next section (Uncertainty Concerning Resources).  Team resources that don’t have experience with technology (poorly documented API, language, IDE, etc) does not constitute a technical risk it is a resource risk (i.e. lack of knowledge).

Technical uncertainty comes from only a few sources:

  • Defective APIs
  • Inability to develop algorithms


Unforeseen defects in APIs will impact one or more requirements and delay development.  If there is an alternative API with the same characteristics then there may be little or no delay in changing APIs, i.e. there are multiple choices for XML parsing in Java with the same API.

However, much of the time changing to another API will cause delays because the new API will be implemented differently than the defective one. There are also no guarantees that the new API will be bug free.

Mature organizations use production APIs, but even then this does not protect you against defects.  The best known example has to be the Pentium bug from Intel discovered in 1994.  Although the bug did not seem to cause any real damage, any time you have an intermittent problem the source might always be a subtle defect in one of the APIs that you are using.

Organizations that use non-production (alpha or beta) APIs for development run an extremely high risk of finding a defect in an API.  This generally only happens in poorly funded start-ups where the technical team might have excessive decisional control in the choice of technologies.

The other source of technical uncertainty is the teams inability to develop algorithms to accomplish the software goals.  These algorithms relate to the limitations of system resources such as CPU, memory, batteries, or bandwidth concern, i.e.:

  • Performance
  • Memory management
  • Power management
  • Volume of data concerns

Every technical uncertainty is associated with one or more requirements.  The inability to produce an algorithm to satisfy a requirement may have a work-around with acceptable behavior or might be infeasible.

If the infeasible requirements prevents a core goal from being accomplished then the project will get canceled.  If affected requirements have technical work-arounds then the project will be delayed while the work-around is being developed.

Technical Risks
Some risks associated with technology are:

  • Risk that a defective API will cause us to look for another API
  • Risk that we will be unable to find a feasible solution for a core project requirement

Uncertainty Concerning Resources
When using the same team to produce the next version of a software product there is little to no resource uncertainty.  Resource uncertainty exists if one of the following are present:

  • Any team member is unfamiliar with the technology you are using
  • Any member of the team is unfamiliar with the subject domain
  • You need to develop new algorithms to handle a technical issue (see previous section)
  • Any team member is not committed to the project because they maintain another system
  • Turnover robs you of a key individual

Resource uncertainty revolves around knowledge and skills, commonly this includes: 1) language, 2) APIs, 3) interactive development environments (IDEs), and 4) methodology (Agile, RUP, Spiral, etc).  If your team is less knowledgeable than required then you will underestimate some if not all tasks in the project plan.

When team members are unfamiliar with the subject domain then any misunderstandings that they have will cause delays in the project.  In particular, if the domain is new and the requirements are not well documented then you will probably end up with the wrong architecture, even if you have skilled architects.

The degree to which you end up with a bad architecture and a canceled project depends on how unfamiliar you are with the subject domain and technologies being used.  In addition, the size of your project will magnify all resource uncertainties above.

The majority of stand-alone applications are between 1,000 and 10,000 function points.  As you would expect, the amount of the system that any one person can understand drops significantly between 1,000 and 10,000 function points.  The number of canceled projects goes up as our understanding drops because all uncertainties increase and issues fall between the cracks.  N.B. The total % of the system understood by a single person drops precipitously between 1,000 and 10,000 function points.

When there are team members committed to maintaining legacy systems then their productivity will be uncertain.  Unless your legacy system behaves in a completely predictable fashion, those resources will be pulled away to solve problems on an unpredictable basis.  They will not be able to commit to the team and multi-tasking will lower their and the teams productivity (see Multi-tasking Leads to Lower Productivity).

Resource Risks
Some risks associated with resources are:

  • The team is unable to build an appropriate architecture foundation for the project
  • A key resource leaves the project before the core architecture is complete

Uncertainty in Estimation
When project end dates are estimated formally you will have 3 dates: 1) earliest finish, 2) expected finish, and 3) latest finish.  This makes sense because each task in the project plan can finish in a range of time, i.e. earliest finish to latest finish.  When a project only talks about a single date for the end date, it is almost always the earliest possible finish so there is a 99.9% chance that you will miss it.  Risk in estimation makes the most sense if:

  • Formal methods are used to estimate the project
  • Senior staff accepts the estimate

There are numerous cost estimating tools that can do a capable job.  Capers Jones lists those methods, but also comments about how many companies don’t use formal estimates and those that do don’t trust them:

Although cost estimating is difficult there are a number of commercial software cost estimating tools that do a capable job:  COCOMO II, KnowledgePlan, True Price, SEER, SLIM, SoftCost, CostXpert, and Software Risk
Master are examples.

However just because an accurate estimate can be produced using a commercial estimating tool that does not mean that clients or executives will accept it.  In fact from information presented during litigation, about half of the cases did not produce accurate estimates at all and did not use estimating tools.  Manual estimates tend towards optimism or predicting shorter schedules and lower costs than actually occur.   

Somewhat surprisingly, the other half of the case had accurate estimates, but they were rejected and replaced by forced estimates based on business needs rather than team abilities. 

Essentially, senior staff have a tendency to ignore formal estimates and declare the project end date.  When this happens the project is usually doomed to end in disaster (see Why Senior Management Declared Deadlines lead to Disaster).

So estimation is guaranteed to be uncertain.  Let’s combine the requirements categories from before with the categories of technical uncertainty to see where our uncertainty is coming from.  Knowing the different categories of requirements uncertainty gives us strategies to minimize or eliminate that uncertainty.
Starting with the Initial Requirements, we can see that there are two categories of uncertainty that can addressed before a project even starts:
  1. Superfluous initial requirements
  2. Inconsistent requirements

Both of these requirements will waste time if they get into the development process where they will cause a great deal of confusion inside the team.  At best these requirements will cause the team to waste time, at worst these requirements will deceive the team into building the architecture incorrectly.  A quality assurance process on your initial requirements can ensure that both of these categories are empty.

The next categories of uncertainty that can be addressed before the project starts is:

  1. Requirements with Technical Risk
  2. Requirements Technically Infeasible

Technical uncertainty is usually relatively straight forward to find when a project starts.  It will generally involve non-functional requirements such as scalability, availability, extendability, compatibility, portability, maintainability, reliability, etc, etc.  Other technical uncertainties will be concerned with:

  1. algorithms to deal with limited resources, i.e. memory, CPU, battery power
  2. volume of data concerns, i.e. large files or network bandwidth
  3. strong security models
  4. improving compression algorithms

Any use cases that are called frequently and any reports tie up your major tables are sources of technical uncertainty.  If there will be significant technical uncertainty in your project then you are better off to split these technical uncertainties into a smaller project that the architects will handle before starting the main project.  This way if there are too many technically infeasible issues then at least you can cancel the project.

However, the greatest source of uncertainty comes from the Missing Requirements section.  The larger the number of missing requirements the greater the risk that the project gets canceled.  If we look at the graph we presented above:

You can see that the chance of a project being canceled is highly correlated with the % of scope creep. Companies that routinely start projects with a fraction of the requirements identified are virtually guaranteed to have a canceled project.

In addition, even if you use formal methods for estimation, your project end date will not take into account the Missing Requirements.  If you have a significant number of missing requirements then your estimates will be way off.

Estimation Risks

The most talked about estimation risk is schedule risk.  Since most companies don’t use formal methods, and those that do are often ignored, it makes very little sense to talk of schedule risk.

When people say “schedule risk”, they are making a statement that the project will miss the deadline.  But given that improper estimation is used in most projects it is certain that the project will miss its deadline , the only useful question is “by how much?“.

Schedule risk can only exist when formal methods are used and there is an earliest finish/latest finish range for the project.  Schedule risk then applies to any task that takes longer than its latest finish and compromises the critical path of the project.  The project manager needs to try to crash current and future tasks to see if he can get the project back on track.  If you don’t use formal methods then this paragraph probably makes no sense 🙂

Conclusion

The main sources of uncertainty in software development comes from:

  • requirements
  • technology
  • resources
  • estimates

Successful software projects look for areas of uncertainty and minimize them before the project starts. Some uncertainties can be qualified as risks and should be managed aggressively by the project manager during the project.

Uncertainty in requirements, technology, and resources will cause delays in your project.  If you are using formal methods than you need to pay attention to delays caused by uncertainties not accounted for in your model.  If you don’t use formal methods then every time you hit a delay caused by an uncertainty, then that delay needs to be tacked on to the project end-date (of course, it won’t be 🙂 ).

If your project does not have strong architectural requirements and is not too big (i.e. < 1,000 function points) then you should be able to use Agile software development to set-up a process that grapples with uncertainty in an incremental fashion.  Smaller projects with strong architectural requirements should set up a traditional project to settle the technical uncertainties before launching into Agile development.

Projects that use more traditional methodologies need to add a quality assurance process to their requirements to ensure a level of completeness and consistency before starting development.  One way of doing this is to put requirements gathering into its own project.  Once you capture the requirements, if you establish that you have strong architectural concerns, then you can create a project to build out the technical architecture of the project.  Finally you would do the project itself.  By breaking projects into 2 or 3 stages this gives you the ability to cancel the project before too much effort is sunk into a project with too much uncertainty.

Regardless of your project methodology; being aware of the completeness of your requirements as well as the technical uncertainty of your non-functional requirements will help you reduce the chance of project cancellation by at least one order of magnitude.

It is much more important to understand uncertainty that it is to understand risk in software development.


Appendix: Traditional Software Risks
This list of software risks courtesy of Capers Jones.  Risks listed in descending order of importance.

  • Risk of missing toxic requirements that should be avoided
  • Risk of inadequate progress tracking
  • Risk of development tasks interfering with maintenance
  • Risk of maintenance tasks interfering with development
  • Risk that designs are not kept updated after release
  • Risk of unstable user requirements
  • Risk that requirements are not kept updated after release
  • Risk of clients forcing arbitrary schedules on team
  • Risk of omitting formal architecture for large systems
  • Risk of inadequate change control
  • Risk of executives forcing arbitrary schedules on team
  • Risk of not using a project office for large applications
  • Risk of missing requirements from legacy applications
  • Risk of slow application response times
  • Risk of inadequate maintenance tools and workbenches
  • Risk of application performance problems
  • Risk of poor support by open-source providers
  • Risk of reusing code without test cases or related materials
  • Risk of excessive feature “bloat”
  • Risk of inadequate development tools
  • Risk of poor help screens and poor user manuals
  • Risk of slow customer support
  • Risk of inadequate functionality
VN:F [1.9.22_1171]
Rating: 4.0/5 (1 vote cast)
VN:F [1.9.22_1171]
Rating: +1 (from 1 vote)

Root cause of ‘Fire-Fighting’ in Software Projects

Quite a few projects descend into ‘continual fire fighting‘ after the first usable version of the software is produced. Suddenly there are an endless set of meetings which involve the business analysts [1] , developers, QA, and managers. Even when these meetings are well run, you cut the productivity of your developers who can barely get a few contiguous hours to write code between the meetings.

Ever wonder what causes this scenario to occur in so many projects?Below we look at the root causes of fire fighting in a project. We also try to suggest meeting strategies to maximize productivity and minimize developer disruption.
The first thing to notice is the composition of the meetings when fire fighting starts. One common denominator is that it is rarely just the developers getting together to solve a problem involving some technical constraint; often these are cross-functional meetings that involve business analysts and QA. In larger organizations, this will involve end-users and customers. Fundamentally, fire-fighting is the result of poor coordinating mechanisms between team members and confused communication.

Common Scenarios that Waste Time

Typically an issue gets raised in a bug triage meeting about some feature that QA claims is improperly implemented. Development will then go on to explain how they implemented it and where they got the specific requirements.  At this point, the business analyst chimes in about what was actually required. There are several basic scenarios that could be happening here:

  1. The requirements are complete and QA is pointing out that development has implemented the feature incorrectly.
  2. The requirements are loose and development has coded the feature correctly but QA believes that the feature is incorrect
  3. QA has insufficient requirements to know if the feature is implemented correctly or not.
  4. The requirements are loose and development and QA have different interpretations of what that means.

Scenario 1 is what you would expect to happen in a bug triage session. There is no wasted effort for this case as you would expect to need the business analyst, development, and QA to resolve this issue.

Scenario 2 is what happens when the requirements are not well written. Odds are the developer has made several voice calls and emails to the business analyst to resolve the functionality of a particular feature. This information exists purely in the heads of the business analyst and the developer and is buried in their email exchanges and does not make it back into the requirements. This scenario wastes the developer’s time.

Scenario 3 also happens when the requirements are not well written. Most competent QA personnel know how to write test plans and test cases. When the requirements are available to QA with enough time, they can generally determine if they have sufficient information to write the test cases for a given feature. If given the requirements with enough time, QA can resolve the ambiguity with the business analyst and make sure that the requirements are updated. When there is insufficient time, the problem surfaces in the bug triage meeting. This scenario wastes both QA and development’s time.

Scenario 4 occurs when you have requirements that can legitimately be implemented in many different ways. It is likely that QA did not get the requirements before coding started, if so they could have warned the business analyst to fix it. If development has implemented the feature incorrectly, then: 1) the business analyst needs to fix the requirement, 2) development needs to re-code the feature, and 3) QA needs to update their test cases. In this scenario everyone’s time is wasted.

If the your scenarios are not 2), 3), and 4) then you are probably in fire fighting mode because you have requirements that can not be coded as specified due to unexpected technical constraints.  Explaining to the organization why something is technically infeasible can take up quite a few meetings.

As an example of unexpected technical constraints, at Way Systems (now Verifone) we were building a cell phone POS system.  Typically signal strength is shown as 5 bars, however, due to the 3rd party libraries we were using we could only display a number 0-32 for the wireless signal strength.  There was no way to overcome the technical constraint because there were too many framework layers that we did not control.  Needless to say there were quite a few (useless?) meetings while we informed everyone about the issue.

Strategies to Reduce Fire-Fighting

The best way to reduce fire-fighting is simply to have effective requirements when you start a project.  Once you are caught in fire-fighting the cure is the same – you need to fix the requirements and document them in a repository that everyone has access to.  By improving the synchronization mechanisms between the business analysts, development, and QA your fire fighting meetings will go away.  In particular, all those requirements discussions that the business analysts have had with QA and development need to be written down.

Centralize and Document Requirements

If you are using use cases then the changes need to be made in the use case documents.  If you don’t have a centralized repository then you need to create one.   You can use a formal collaboration tool such as SharePoint, an informal collaboration tool such as Google Sites, or simply use a Wiki to host and document all requirements.

In your document repository you will want to keep all requirements by scenario.  If you are using use cases or user stories then each of these is a scenario.  If you have more traditional requirements they you will need to determine the name of the scenarios from your requirements.  Scenarios will be of the form ‘verb noun phrase’, i.e. ‘create person’, ‘notify customer of delivery’, etc.

Once you have a central repository for putting your requirements then ALL incremental requirements should be put on this site, not in cumbersome email chains.  If you need to send an email to someone, then document the requirement to the central site and email a link to the party; do not allow requirements to become buried in your email server.

Run Effective Meetings

Managers are often tempted to call meetings with everyone present ‘just in case‘.  There is no doubt that this will solve the occasional problem, but you are likely just to have a bunch of developers with ‘kill me now‘ expressions on their faces from beginning to end in the meeting.

Structure the meeting by grouping issues by developer and make an agenda so that each developer knows the order that they will be at the meeting.  No developer should have to go to a meeting that does not have an agenda! Next, use some IM tool from the conference room to let developers know when they are required to attend the meeting (not the 1st developer, obviously 🙂 ).  Issues generally run over time so don’t call anyone into the meeting before they are really required.  Give yourself breathing room by having meetings finish 10 minutes before the next generally used slot (i.e. 10:20 am or 2:50 pm).


Issues that really need multiple developers present should be delayed for end of the meeting.  When all other items are handled, use IM to call all the developers for those issues at the end.  By having the group issues at the end you are unlikely to keep them around for a long time since you will probably have to give up your conference room to someone else.

Conclusion

Not all fire-fighting involves bad requirements, but many of them do.  By trying to produce better requirements at the start of a project and implementing centralized mechanisms for those requirements you will reduce the fire-fighting later in your project. If you find yourself in fire-fighting mode, you can use implement a centralized requirements mechanism to help fight your way out of the mess.


[1] Business analysts or product managers
VN:F [1.9.22_1171]
Rating: 0.0/5 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)