Start tackling data science inefficiencies by properly defining waste

djpatil.png
djpatil.png

DJ Patil, the U.S. chief data scientist

 Image: Greylock Partners

Because of my background and experience in Lean Six Sigma, I'm often called in to lean out a process, which means to remove as much waste as possible. However, leaders are usually puzzled when I lean out a data science process, because it doesn't align with what they were taught in business school. That's because we're not dealing with assembly line workers in a factory; we're dealing with data scientists who are working through a problem-solving and development cycle. When tackling inefficiencies in your data science processes, you must be careful about how you define waste.

The biggest waste misconceptions with data science are: refactoring, feedback, and imperfections. Refactoring is the biggest, so let's tackle it first.

Refactoring

Refactoring is cleaning up code without changing its functionality. It's the biggest area where traditional Lean techniques fall down, because it's a very conspicuous, non-value-added step in the process. Every time an organization brings in Lean experts, they froth when they discover refactoring. Refactoring is not waste -- in fact, if your team isn't spending enough time here, there's probably waste somewhere else.

If the team is not refactoring, then they're spending too much time getting functional code to end users for feedback. Remember coding best practices: make it work, make it look good, and then make it look better. To a typical Lean expert, only the first step is value-added; however, if the last two steps are removed, the cycle time for the first step will extend dramatically. In the best case, it extends to cover the cycle time that was removed from steps two and three. There's no overall improvement in cycle time, but that's not the worst part. The real sin is that you'll waste valuable feedback time because you're not putting the solution into the hands of end users until the code is clean -- this is a really bad idea. The value of feedback far outweighs the investment in cleaning up code.

Feedback

Refactoring (or lack thereof) is not the only area where feedback waste can be found. End user feedback is the single most valuable part of your development process. You'll have to accept that your codebase will change -- this is why agile development techniques are far superior to waterfall techniques.

Your codebase will change for three reasons: the developers (or business analysts if they exist) misunderstand the end users, the developers need to refactor, or the end users change their requirements. Only the first reason is waste, and we've already covered the second. The third reason is perfectly acceptable in a mature development process. End users are supposed to change their minds, and this is why you need their feedback as often as possible.

Lack of feedback usually comes from arrogance that data scientists feel when they don't believe they have anything more to learn from end users. If you're paying attention, you can literally see it in their body language during a requirements gathering meeting. At some point, they stop processing input, because they got it. The fix is easy, but the real risk is in detection.

It's important to make feedback waste (i.e., the absence of feedback) explicit. It's best to measure the amount of end user feedback that's present in the development process, and strive to increase this measure with positive consequences. This will reinforce the best behaviors.

Imperfections

Imperfections in this case are good. I'm not suggesting that you put out buggy code -- just not perfect solutions. This is an area that Lean experts miss, especially those that couple Lean with Six Sigma, which is very common. For them, imperfection is a defect -- for me perfection is the defect. And it's an easy trap for data scientists to fall into, because it's their natural tendency to make everything perfect. They want to make sure all the buttons are perfectly aligned, all the colors have the right luminosity, and a period completes the end of every sentence.

To lean out perfection, you must separate success from perfection in your development process. Sometimes it's hard to find the exit that separates the road to success and the path to perfection, though it's an important inflection point for waste reduction. There are some telltale signs to watch for. The time spent writing functional code will drop off precipitously. I'm not referring to time spent refactoring, but the time spent on building functionality for end users. Spending a month building and training a neural network to increase predictability where standard regression techniques are falling short is time well spent on the road to success. Spending a month deciding whether a heat map should be monochromatic or triadic is a waste of time on the path to perfection.

Summary

Leaning out your data science process is a worthwhile pursuit, but you must properly classify waste. To the untrained eye, refactoring looks like waste and perfection looks like quality, and yet just the opposite is true in data science.

Take some time today to survey your data science methods, and look for ways to balance refactoring, accelerate feedback, and eliminate perfection cycle time.

Also read

Disclaimer: TechRepublic and ZDNet are CBS Interactive properties.