Kaizen Express: Fundamentals for Your Lean Journey
Themes: Kaizen, Toyota Production Systems, Lean, Efficiency
Would I recommend this: No. This was hard to follow. I don’t think this book was meant to be read through cover to cover. It seemed like a companion textbook to a class. Half of each page was in English and the other half in Japanese, a translation of the content or maybe the other way around.
Note: Any examples below are generated by me.
Waste is any activity consuming resources without producing customer value. The resources can be equipment time, human work time, or raw materials. Customer value in this case is any product or service the customer is willing to pay for.
Not-waste: Engineers writing code for product or code for testing the product. The product is sold to the customer and increasing or maintaining the quality of the product ensures the customer will keep paying for it.
Waste: Engineers waiting for the computer to update during the time when they could be developing code. This could be done during a time period where developers aren’t using their computers.
There are different types of waste. The Japanese term for this is “muda” and each type of waste is written in a different form of Japanese script.
Type 1( むだ): Waste within a process that can be resolved with point Kaizen or within the process by an individual. Example: Blocking software updates that could be scheduled to run off hours.
Type 2( ムダ): System level waste that needs to be addressed via changes in different components. Example: High levels of inactive time between code review and production deployment.
Type 3 (無駄): Organizational or policy level waste that needs to be addressed by management. Example: Enforced time sheet logging for all employees when there is no variation in day to day time reports.
Accumulation, Batching, or Stockpiling
Accumulation between different processes, batching in large quantities, or stockpiling products or product components leads to waste. When you have large batches, it can lead to accumulation between processes or you may end up stockpiling a lot of one resource or product in anticipation of demand. Any type of accumulation hides defects.
You have a table factory and a production line creating screws for the tables. You’ve run the screw line to produce a large amount of screws so you can shut off the line when you have several hundreds of thousands and start up the table assembly line to use this accumulation or stockpile of screws. Unfortunately, you find out shortly after starting the table assembly line that one of the dies was misaligned and all the screws aren’t usable. You may have found out minutes later, days later, or months later. Not only did this waste the material in the screws, but also the time on the machines, potentially some material from the tables, and the time of the humans operating the machinery.
The idea of Jidoka is self-stopping machinery such that it does not need to be constantly monitored by a human. One of the common examples of this in everyday life is motion detectors. Cars, garage doors, and assembly lines use motion detection to detect anomalies in places they should be and automatically stop or create an alert.
Garage doors in the past were known to crush children or animals that were caught under them and needed to be watched carefully as they lowered. These days (in North American houses), new garage door openers have pressure detection when lowering as well as detection of objects near the ground that would interfere with closing. We don’t need to watch the garage door close anymore. Similarly, with Jidoka, it frees up human work time to allow us to do other things and check back only when needed.
As named, just-in-time describes only producing items or taking action as they are needed. Recall in the “Waste” section that accumulation can hide defects. Instead of batching and stockpiling, you only create things just-in-time.
Your team is expanding and you know you needs lots of engineers so you hire as many as you can in anticipation of future growth. Later on, after hiring 20 engineers, you find what you actually needed was 10 engineers, 5 data scientists, and 5 quality assurance engineers but you weren’t aware that the demand or requirements would change in the future. If you had hired “just-in-time”, you would have been able to save time and money.
What if some things can’t be done just-in-time? Take the example of packing for travel. If you’re at home, you are fine to buy toothpaste as you need it but when you travel, you want to pack your full supply for that duration because you may not be able to find the toothpaste you are looking for. This ties into more of the philosophy of Toyota Production Systems, lean, and kaizen. This restriction should push you to find a way to build a system that can support just-in-time. In the toothpaste example, this might mean that you learn how to make your own toothpaste out of baking soda, which is more readily available internationally than your specific brand.
There are different types of efficiency:
- Apparent Efficiency: You can increase production without increasing the number of operators or equipment.
- True Efficiency: Producing the number of parts that can be sold with the minimum number of operators and equipment.
Apparent Efficiency: You are in a cookie factory. Apparent efficiency is being able to double your batch of cookies without needing more workers or bigger bowls.
True Efficiency: If you can only guarantee selling 10 cookies per hour, you operate with the minimum required operators and equipment to make 10 cookies per hour. This also applies to resources. If you only consume the flour, eggs, and other ingredients needed to produce 10 cookies per hour, you reduce waste in a system that has enough apparent efficiency to produce 20 cookies per hour.
Local vs. Total Efficiency
Efficiency should always be looked at in terms of the whole system. If you have high local efficiency, this leads to an accumulation between the efficient process and the next process downstream. As we know by now, accumulation leads to waste.
My favorite example of this is when software developers punch out code review after code review but there aren’t enough reviewers to keep up. This results in a build up of unreviewed code. The volume of code to review makes reviewers less likely to focus on individual details and more likely to miss bugs. This also increases the chances of conflicts with other local development and creates wasted time for other developers that later need to merge the changes as they come through. Perhaps counter-intuitively, this local efficiency of producing code results in less global efficiency because the high volume of code to review leads to more bugs and more time to develop for other developers.
A machine’s operating rate is the time a machine is used in a given time frame. The operating ability of a machine is the time the machine works well when it is needed. If you optimize operating rate, you will get overproduction and accumulation. Accumulation leads to waste.
The overall machine effectiveness is the availability (rate) times the performance (rate) times the quality (rate). Translates to: is it working (performance) well (quality) when I need it (availability)?
Continuous flow shows the production process as the life-cycle of a single item through a series of steps. In contrast, there is batch processing, which sees a collection of sub-processes that produce batches of items that move to other sub-processes.
Why is this good? Consider the idea of just-in-time above: if we are able to optimize our end-to-end system to produce a single item quickly, we will be able to produce items just-in-time more effective. Further consider the idea of efficiency above: if we are able to understand the true efficiency of producing a unit and we have actual efficiency in our system, we can scale to only produce items according to demand and reduce waste.
From Flow to Pure Continuous Flow
So you’ve got a continuous flow system and you want to make it better or achieve pure continuous flow. There are a few ways to look at your continuous flow system to try and improve it further. A lot of these concepts are phrased for manufacturing but it can be translated to other applications.
- Process Village Layout: Relocate sequential processes to be together in product families. Example: put all your assembly stations for tables close together in a table factory rather than intermixing them with chair assembly. You can see this concept in how houses are laid out. We have kitchens for food related activities and bathrooms for hygiene related activities.
- Minimize Worker Movement: If you see workers or operators spending a lot of time walking and not producing something, you may want to consider your layout. An example of this is when bathroom layouts place sinks or wash basins far from the toilet stalls. In software, this can be physical distance, like the space between offices and meeting rooms. It can also be virtual distance. How many times do you need to SSH and enter your password through various redirects to find the document you need to review? That could be minimized with dashboards or single-sign on (SSO) solutions.
Pull vs. Push
When it comes to pull vs. push, it helps me to visualize a rope with markers every few inches or centimeters. If you are in a tug-of-war style situation where there are people pulling every few meters or feet on the rope, how do you make sure as many markers make it to the end of the line of people as possible? If you push the rope onto the next person, it will bunch up and only create movement where you are. If you pull the rope from the person down the line, it will not only pull your section but also the sections all the way down the line to the beginning. So, if you want to get a steady stream of markers making it to the end of the line, should you be pushing or pulling?
Pulling means you are starting a processes based on a signal from the “end of the line” or the customer. When you are pushing, you are pushing what you produce downstream and eventually onto the customer whether they want it or not. If they don’t want the product at the rate you are producing, you end up with overproduction and accumulation. I think at this point if you’ve been reading sequentially, you know what that means: waste!
The 5 Ss are a guide to organizing the work space for maximum efficiency and effectiveness.
- Seiri: Separate needed from unneeded.
- Seitan: Arrange in order of consumption.
- Seiso: Keep clean and inspect regularly.
- Seiketsu: Don’t clutter or litter.
- Sitsuke: Sustain and maintain discipline.
The Wikipedia page explains this much more clearly than the book did. I’ll provide some examples in the context of a software development process following sprints:
- Seiri or Sorting: Only consider work or bugs that can be accomplished within a sprint and only work that is ready to be picked up.
- Seitan or Order and Simplify: Prioritize and ensure all items are understandable and as small as they can be.
- Seiso or Shine: Regularly revisit your sprint tasks to ensure they are still relevant and have been updated to reflect current state.
- Seiketsu or Standardize: Ensure all individuals are participating in maintaining the first 3 Ss and have daily stand-ups to go through the exercise of maintaining your sprint work.
- Sitsuke or Sustain: Continue to follow and maintain regular processes on the team such as spring planning, stand-up, and retrospective.
Kanban is a tool for visualizing a pull system. It is not synonymous with Toyota Production System. The way kanban is used in production systems is a way to signal that work needs to be done and each signal or kanban card contains instructions and details needed to complete the work order. It doesn’t always need to be a card but it does need to indicate a “pull” is needed and should contain just the information needed, not more.
Rules of Kanban:
- Processes should be triggered by customer action and a pull from the customer side.
- Suppliers only produce what is requested on the card and not more.
- No stock or items should move in the system unless they are accompanied by a kanban card.
- Defects should not make it downstream.
- Limit the number of kanban in progress to lower stored inventory and find problems more quickly.
- Each item should not pass through the hands of the same person more than twice.
The idea of heijunka is to target producing product quantities in shippable units at each stage. In batching models, you might product all components of type A, then all of type B, and then pack them up. As you know, accumulation causes waste and hides defects. Heijunka will have the effect of shorter lead times, which helps with just-in-time production, and smaller inventories since you don’t need to stockpile as much before shipping a unit to a customer. This might seem like an obvious thing to do unless you consider that several production lines need to be converted between products. You can have a line that can do either screws or bolts but not both at the same time. This is similar to having one engineer doing both development and quality assurance.
You can see this idea in waterfall vs. agile software development methodologies. In waterfall, you will do all design, then all development, then all testing. In agile, you do design, development, and testing for one deliverable unit or product and then start on the others. In waterfall, you see waste when implementation needs to change the design and a cascade of alterations that may result in a redesign of all components. In agile, by completing one unit entirely, you are able to adjust the development of other units before design and implementation has started. You just eliminated waste!
You may think “oh, not the cardiac pacemaker” but it’s not so far off. The pacemaker in kaizen regulates continuous flow, as a heart regulates blood flow. All requests from the customer go through the pacemaker so it can regulate rates of demand and enforce heijunka. It is the first place in the system where heijunka can be enforced.
In the software world, the Project or Program Manager would be the pacemaker, taking all customer requests and scheduling them into a roadmap for the product or project the development team is working on. The thing that’s key in this example is that the manager will not dump all the work on the development team at once and instead will try to create a sequence that will then be given to the development team either monthly or quarterly to deliver. In an agile development setting, this would happen in sprints as well and the manager of the project/program would contribute work and assist in prioritization.
One of the driving factors for batching in some production systems is the time cost of switching equipment over to make a different part. To reduce the need to batch and accumulate (oh no! waste!), aim to reduce and shorten changeover times as much as possible. Changeover time is defined as the difference in time from the last good old item produced to the first good new item produced.
You are running a catering company that makes cookies and meatballs. As such a small and specific company, you have only one working surface and you need to make sure none of the cookies or meat products touch each other. As a result, each time you change between making meatballs and cookies, you have to clean your entire surface. This will drive you to instead do all of one and then all of the other. However, as we know by now, this will lead to accumulation and potential defects. If it turns out one of your meat sections or set of eggs for your cookies were rotten, the entire set made on the shared surface with shared equipment might need to be thrown away. If the batch were smaller, this would be less waste. To reduce setup time, you can consider speeding up the cleaning process by wearing latex gloves that can be switched between batches and using parchment paper to protect the surfaces being used so it can easily be tossed and replaced.
Great, so we’re not making cookies and meatballs, we’re making software. Let’s talk about that.
Some companies have dedicated architects, dedicated development teams, and dedicated quality assurance teams. Moving a product from one to the other takes time with hand-offs, back and forth communication, scheduling of meetings, and reviews, especially in a waterfall model. This is a way to interpret the changeover time in software. How do we make this faster? What if the same person were the designer, implementer, and tester. This way, we wouldn’t pay any cost of hand-offs for switch over and testing or redesign can be done as the product develops at shorter intervals, looking more like agile. Another way to reduce the time is making the hand-offs smaller in incremental components rather than all at once, again, more agile. If there is accumulation (waste!) between the design and development or development and test phases, this is likely to hide bugs.
Andons are tools meant to visually highlight anomalies and operational status of each station. Through use of these you should be able to visually survey the work space and see if anything is going wrong. On the operator side, andons are activated simply and quickly by a button or signal cord that sometimes will stop the local production line to prevent further propagation of error.
Day to day we can see andon-like signals all over the place:
- Grocery store check-out areas have a green light above them if they are open and are off if not. With a glance you can identify which ones are working. Associates can turn off the light if there is a problem at the register.
- Highways with dynamic lanes show which lanes are operational and which ones are not. If an accident occurs, a lane can be shut down by using the signal to communicate to drivers not to use it.
- Your car’s dashboard lights that tell you if your tire is flat, your oil needs changing, or if your gas tank is empty.
These are strictly andons because they are themselves a way to communicate to customers.
In software we typically see these mechanisms in operational areas with dashboards for our testing suites, environment health, or deployment health. The dashboards typically have green for “good”, red for “needs attention immediately”, and yellow for “needs attention but probably not immediately”. Further, these states are automatically detected or manually set when something wrong occurs. The faster you detect anomalies and stop the production line or rogue zombie process, the fewer defects and waste go down the line.
Maximize Human Work Time
You can get more efficiency by maximizing human work time. By watching for gaps in activity, you can find ways to creatively fill them. An example in the book shows that production lines that auto-eject boxes and wait for them to be taken allow one worker to operate two lines. By contrast, lines that need a worker to watch for the right timing to remove a product from a conveyor belt can only have one line per worker.
I don’t see this a lot in software because micromanaging at this level is typically stifling. However, I often see this when there is a “planning phase” of little development, they fill the development team time with operational improvement tasks to ensure they are keeping busy. In organizations that are more agile, there is typically an interleaving of operational and product work. That’s a different discussion though.
The idea of zone control is to make all defects localized and not allow them to propagate to other zones. Between zones there should be a defined inspection or quality assurance. This can be likened to running integration or health tests before deploying code to a new service host. This reduces the impact of a bug to a single host or stage instead of all versions of the system.
All repeated work should be standardized and part of a cycle or defined procedure. Any work that occurs out of the cycle destroys continuous flow. Standardizing a cycle or procedure can be through written example or through enforced checks, like a machine not starting if a prerequisite is not met.
An example of how non-standardized work destroys flow is when the standard development process is skipped for a bug fix. Often when I’ve seen this done it means all tracking data for why that change was made is lost and often validation has been skipped. This inevitably leads to regressions later on and very complex bugs.
All components should be in plain view with their status clearly visible. The standardized process should be defined visually within this system. This visual management system should make it easy to assess status, error areas, defect rate, and performance of each area.
Kanban is a great example of a visual management system. In children’s schools, visual management systems are typically well-used to show where each student’s belonging are and what activities each student is doing. Old hotels use key hanging racks to show whether a room is occupied and if the occupant is in.
Genryo management refers to how you can expand or contract your production with efficiency. It allows you to make profits and lower costs as needed. It contains a few sub-concepts:
- Shojinka or Labor Linearity: Adding labor will increase production and decreasing labor will decrease production.
- Capital Linearity: Create a production line such that adding more machinery will increase production and removing machinery will decrease it. This is meant to avoid the initial cost of a lot of machinery leading to over investment and potentially to overproduction (and accumulation… and waste!)
Why would this be good? Consider working on a development team of 5 people that produces 1 feature per month. If I add 5 more people, does that mean I will get 2 features per month? In practice, no, we don’t see this happening unless the two features are completely independent of each other. In an ideal world, this is how it would work. In this example, the effect of adding a new developer is more complex that just adding more work capacity. This illustrates how all the other aspects of kaizen like standardization through documentation, continuous flow, and others need to be in play for genryo management to be effective.
- Shojinka: At the motor vehicle licensing office, adding an attendant to accept applications will result in more processed applications. Removing an attendant will decrease processed applications.
- Capital linearity: This is often seen in scaling services with commodity hardware. Adding more hosts means more customer requests can be processed. Decreasing the number of hosts results in less.
A process study will help you understand where you can improve. This involves listing all the steps in your process and tracking all the items that pass through it. By timing all the steps and understanding the touch points of items through the system, you can get an idea of where and how much you can improve. Process studies should be repeated until consistency in timings and performance are reached.
Employees should be motivated to connect to the flow of the production system. They will be able to contribute improvements with their expertise and familiarity that managers and designers will not see. Understand that you can develop abilities in employees that will help with improvement and growth of kaizen. Remember the discipline or sustain aspect of the 5Ss: every employee should be contributing to sustain. Hold kaizen workshops to allow any and all employees to participate regularly. Kaizen isn’t a fix once and forget solution; continually maintain kaizen and grow an improvement culture.
Kaizen: All of this together makes up kaizen.
Takt time: How often to produce a component based on customer demand. Takt time = (available time per shift) / (demand rate per shift). Ex. 8 hour work day / 4 code reviews per day = 2 hour takt time for completing code reviews or a developer averages completing a code review every 2 hours.
Toyota Production System (TPS): a philosophy of lean production founded by Taiichi Ohno
5 Whys: Keep asking ‘why’ until you get past the obvious problems and reach the real root cause of an issue.
Poka-Yoke: Any process or procedure should be designed to be hard to get wrong. This is a design style to reduce defects through human error. Make it hard for mistakes to happen.
Other things about this book:
- There are tons of forms and worksheets showing how to implement some of these guidelines in the Appendices