Andon-Triggered Rescheduling: How Real-Time Machine Alerts Should Change Your Production Schedule

Q: What is an andon system in manufacturing and how does it connect to scheduling?

Andon is a lean visual alert system — originally a cord or button that stops the line when a problem is detected, triggering a yellow or red light visible to the whole production area. In scheduling context, andon events are any unplanned production interruptions: machine breakdowns, quality holds, material shortages, operator injuries, or safety stops. The alert itself is just the signal. The scheduling response — how the plan changes in response to lost capacity — is where the production outcome is determined.

Q: How long should a breakdown last before triggering a schedule change?

The threshold depends on buffer capacity. If the affected machine has a downstream buffer of 2 hours and the breakdown is resolved in 45 minutes, no schedule change is needed — the buffer absorbs the disruption. If the breakdown will outlast the buffer (estimate exceeds buffer time plus a 20% safety margin), trigger a reschedule. In practice, any breakdown with an unknown resolution time should trigger a conditional reschedule immediately — you can cancel the change if the machine recovers faster than expected, but you cannot recover from a late notification to downstream operations and customers.

Q: Which jobs should be prioritized during schedule recovery after a breakdown?

Priority during recovery follows this hierarchy: (1) jobs with customer commitments due within the remaining shift; (2) jobs that are feeding downstream operations with no buffer; (3) jobs with contractual penalties for lateness; (4) jobs for strategic customers regardless of due date. Jobs with flexible due dates or internal customers should yield capacity during recovery. The scheduler must also consider setup compatibility — prioritizing a job that can run with minimal setup on the recovered or alternate machine preserves more capacity for the recovery window.

Q: How do andon events connect to downstream communication and customer notifications?

Effective andon-triggered communication follows a defined escalation protocol. Within the first 15 minutes: supervisor is notified, estimated resolution time is assessed. Within 30 minutes: if resolution is uncertain, scheduler runs impact analysis and identifies affected jobs. Within 60 minutes: operations manager and customer service are notified for any job whose due date is now at risk. Customer notification should happen before the due date is missed — not after. Proactive communication with a recovery date is far less damaging to customer relationships than a surprise late delivery.

The andon cord is one of lean manufacturing's most recognized symbols. Pull the cord when you see a problem — defect, machine issue, safety concern, missing material — and the line stops. A light goes on. A supervisor responds within 60 seconds. The problem is addressed before it propagates.

Toyota's andon system is a masterpiece of real-time problem detection and rapid response at the point of production. But it has a gap that most manufacturers who adopt andon overlook: the alert tells you there's a problem at a work center right now. It does not tell you what happens to the jobs that were supposed to run there today, tomorrow, and next week — and it doesn't update the production schedule.

The andon cord stops the line. What stops your delivery commitments from falling apart?

In 35 years of working with manufacturers across aerospace, defense, automotive, and contract manufacturing, User Solutions has observed a consistent pattern: manufacturers invest in lean visual management and get good at detecting and resolving problems at the point of occurrence. Far fewer invest in connecting that detection to schedule recovery — the systematic process of assessing how a disruption affects the plan and regenerating a viable schedule before downstream operations and customers are blindsided.

This post covers the complete andon-to-reschedule workflow: what andon events look like in a scheduling context, how to assess their impact, how to prioritize recovery, and how real-time dashboards like EDGEBI connect visual production management to scheduling response.

What "Andon" Means in a Scheduling Context

In Toyota's original implementation, andon is specifically a line-stop signal for assembly defects or process problems in a high-volume, highly synchronized production environment. In the broader lean manufacturing context that most manufacturers work in, andon refers to any visual alert that signals an unplanned production interruption.

For scheduling purposes, the relevant andon events are:

Machine breakdown: A work center is down — mechanical failure, electrical issue, tooling breakage, coolant leak. Capacity is reduced or eliminated for an unknown duration.

Quality hold: A process problem generates suspect parts that must be quarantined and assessed before production continues. Depending on the scope, this may affect only the current job or all parts produced since the last confirmed-good inspection.

Material shortage: Raw material or a purchased component needed for a scheduled job is not available as expected — supplier delay, receiving discrepancy, or consumption higher than planned.

Operator absence: A key operator — particularly one with specialized skills or machine qualifications not covered by other operators — is unexpectedly unavailable.

Safety stop: An unsafe condition requires work to halt at a work center until the condition is resolved and inspected.

Each of these event types has different characteristics relevant to scheduling: expected duration, whether alternative capacity is available, which specific jobs are affected, and what the communication protocol should be.

The Four Steps After an Andon Event

Step 1: Stop and Contain

The first response is the lean stop: halt the affected work center, contain any in-process work (particularly for quality holds), and prevent additional defective parts or at-risk work from advancing downstream. This step happens at the operator and supervisor level — it does not yet involve the scheduler.

The critical scheduling input from this step is the initial time estimate: how long does the first responder expect before the work center is available again? This estimate is always uncertain at the start, but it is the input that determines whether a schedule response is needed immediately or can wait for more information.

Guideline: If the initial estimate is under 30 minutes, monitor but don't reschedule yet — the buffer at most work centers absorbs short interruptions. If the estimate is over 30 minutes or is genuinely unknown, move to Step 2 immediately.

Step 2: Assess Schedule Impact

The scheduler's job begins with a structured impact assessment:

Which jobs are affected? The jobs currently in the queue at the down work center, plus any jobs upstream that were planned to arrive at this work center within the estimated downtime window.

What is the buffer? How much capacity exists between the current state and the moment downstream operations or customer commitments are affected? This buffer has two components: physical WIP buffer downstream of the affected work center, and schedule float (difference between projected completion and due date).

Is alternative capacity available? Are there other work centers that can perform the same operation, and are they available? Alternative routing is the fastest recovery option when it exists.

What is the revised completion date for affected jobs? Running the affected jobs through the available capacity after the work center recovers, accounting for any queue that has built up, gives the honest answer to "when will these be done?"

This assessment should take no more than 15–30 minutes for an experienced scheduler with a current capacity model. Without a scheduling system that models capacity constraints, it requires judgment and experience and takes longer — with more uncertainty in the output.

Step 3: Generate Recovery Schedule

With the impact assessment complete, the scheduler generates a recovery schedule — a revised plan that accounts for the lost capacity and sequences the affected jobs in a way that minimizes damage to delivery commitments.

Recovery scheduling involves several simultaneous decisions:

Priority sequencing: Which jobs run first when the work center recovers? The answer is not always "earliest due date first" — setup compatibility, downstream urgency, and customer strategic importance all influence the sequence. A job that requires a 45-minute setup followed by a 2-hour run may yield to a job with a compatible setup that runs for 6 hours, recovering more total capacity in the window.

Alternate routing: Are any affected jobs good candidates for alternate work centers? Routing a job through a slower but available machine may produce a better on-time result than waiting for the primary machine to recover.

Sequence compression: Can any operations be overlapped, shifted to off-hours, or run on an expedited basis to recover lost time? Overtime authorization, weekend running, or parallel operations at other work centers may be available.

Sacrifice decisions: When capacity is genuinely insufficient to meet all commitments, which jobs get the recovered capacity and which are rescheduled? This decision should be made explicitly and communicated proactively — not discovered by a customer who calls asking about a late shipment.

Step 4: Communicate and Monitor

The recovery schedule is only useful if it drives action. Communication must flow in three directions:

To the shop floor: Operators at downstream work centers need to know what's arriving and when, so they can sequence their own work appropriately. If Cell B was expecting a transfer from Cell A at 2:00 PM and the new ETA is 4:00 PM, Cell B's scheduler needs to fill that 2-hour gap with something else.

To customer service and sales: Any job whose delivery date has changed needs a proactive notification. The message is: here is the new delivery date, here is what happened, here is what we're doing to recover as much schedule as possible. This conversation is much easier to have before the due date passes than after.

To operations management: The andon event and recovery actions should be logged with the actual downtime, root cause (when known), and schedule impact. This data feeds preventive maintenance decisions, capacity buffer sizing, and continuous improvement prioritization.

How Long Before a Breakdown Triggers a Reschedule?

The threshold for triggering a formal reschedule — rather than absorbing the disruption within existing buffers — depends on buffer inventory at the downstream work center and schedule float on affected jobs.

A simple decision rule:

Buffer time > estimated downtime × 1.5: No reschedule needed. Monitor. Update if estimate changes.
Buffer time < estimated downtime × 1.5 OR downtime unknown: Trigger conditional reschedule. Run the recovery scenario now; activate it if the situation doesn't resolve within the buffer window.
Any downtime with unknown resolution time: Trigger reschedule immediately. Running without a plan while duration is unknown is the highest-risk option.

The "× 1.5" factor provides margin for resolution time estimates being optimistic — as they usually are. Experienced maintenance teams know that "should be fixed in an hour" frequently becomes two hours.

The cost of an unnecessary reschedule (that you cancel when the machine recovers faster than expected) is low — a scheduler's time and some communication to operations. The cost of a late reschedule (triggered after the downstream work center has starved, or after a customer has received a surprise) is high. Default toward earlier trigger.

Connecting Andon to Real-Time Scheduling Software

The traditional andon system is a physical alert — cord, button, light. The information it generates is local and immediate: there's a problem here, right now. Without a bridge to the scheduling system, that information stays local.

Modern manufacturing facilities can extend the andon concept to their scheduling software through real-time production monitoring. When actual production data flows into a scheduling dashboard, deviations from plan become visible at the system level — not just at the work center level.

EDGEBI provides this bridge. By monitoring production actuals against the planned schedule, EDGEBI surfaces work centers where output has stopped or fallen below plan — functioning as a software andon that reaches the scheduler's dashboard, not just the plant floor light stack. The scheduler sees the deviation within minutes of the first sign, rather than discovering it when a downstream work center runs out of parts.

Once the deviation is detected, RMDB provides the rescheduling capability: model the affected jobs against revised capacity, run what-if scenarios for alternate routings and recovery sequences, and generate an updated schedule. The integration of real-time detection and scenario-based rescheduling creates the complete andon-to-reschedule workflow — detection, assessment, recovery planning, and communication — in a structured, repeatable process rather than an ad hoc scramble.

Building Andon Response Into Your Standard Operating Procedure

The most effective andon-triggered rescheduling systems work because the response is a defined process, not a reaction that depends on who happens to be in the building and how experienced they are.

A complete Andon Response SOP includes:

Trigger conditions: What events trigger each response level (monitor vs. conditional reschedule vs. immediate reschedule)?

Notification chain: Who is notified, by what mechanism, within what time window after each event type?

Impact assessment checklist: What does the scheduler assess — affected jobs, buffer, alternates, revised dates — and in what order?

Communication templates: Pre-written templates for shop floor, customer service, and management notifications that can be customized and sent quickly, rather than composed from scratch while the floor is waiting for direction.

Log requirements: What gets recorded for each andon event — timestamp, work center, event type, estimated and actual downtime, root cause, schedule impact, recovery actions taken?

The SOP converts andon response from a skill (possessed by experienced individuals) into a system (executable by anyone who follows the process). This is the lean principle of standard work applied to schedule recovery — and it is the difference between a plant that recovers gracefully from disruptions and one that spends days digging out from a cascade of late deliveries after every significant machine problem.

The Lean Connection: Andon Is a Problem-Detection System, Not a Problem-Resolution System

The most important conceptual shift for manufacturers extending andon to scheduling is understanding what the system is and isn't designed to do.

Andon detects and signals problems. It does not resolve them. The resolution comes from the rapid response team (for the root cause) and the scheduler (for the plan). These are two separate workflows that must both function for the plant to recover effectively.

Plants that have strong andon culture but weak scheduling response fix machine problems quickly — and then scramble to figure out what to do with the backlogged jobs. Plants that have good scheduling capability but no real-time detection run well until a disruption hits, then react too slowly. The full lean visual management system requires both: detect problems fast, and respond to their schedule implications with equal speed.

After 35 years of working with manufacturers through machine breakdowns, quality holds, material shortages, and every other form of production disruption, User Solutions has seen this truth clearly: the companies that meet their delivery commitments through adversity are not the ones with the best machines or the most favorable contracts. They're the ones with the best recovery systems — the processes, tools, and protocols that convert a disruption signal into a recovery plan before the customer feels it.

Andon is a lean visual alert system — originally a cord or button that stops the line when a problem is detected, triggering a yellow or red light visible to the whole production area. In scheduling context, andon events are any unplanned production interruptions: machine breakdowns, quality holds, material shortages, operator injuries, or safety stops. The alert itself is just the signal. The scheduling response — how the plan changes in response to lost capacity — is where the production outcome is determined.

The threshold depends on buffer capacity. If the affected machine has a downstream buffer of 2 hours and the breakdown is resolved in 45 minutes, no schedule change is needed — the buffer absorbs the disruption. If the breakdown will outlast the buffer (estimate exceeds buffer time plus a 20% safety margin), trigger a reschedule. In practice, any breakdown with an unknown resolution time should trigger a conditional reschedule immediately — you can cancel the change if the machine recovers faster than expected, but you cannot recover from a late notification to downstream operations and customers.

Priority during recovery follows this hierarchy: (1) jobs with customer commitments due within the remaining shift; (2) jobs that are feeding downstream operations with no buffer; (3) jobs with contractual penalties for lateness; (4) jobs for strategic customers regardless of due date. Jobs with flexible due dates or internal customers should yield capacity during recovery. The scheduler must also consider setup compatibility — prioritizing a job that can run with minimal setup on the recovered or alternate machine preserves more capacity for the recovery window.

Effective andon-triggered communication follows a defined escalation protocol. Within the first 15 minutes: supervisor is notified, estimated resolution time is assessed. Within 30 minutes: if resolution is uncertain, scheduler runs impact analysis and identifies affected jobs. Within 60 minutes: operations manager and customer service are notified for any job whose due date is now at risk. Customer notification should happen before the due date is missed — not after. Proactive communication with a recovery date is far less damaging to customer relationships than a surprise late delivery.

Want to connect real-time production monitoring to your schedule recovery process? Contact User Solutions to see how EDGEBI and RMDB work together to detect production deviations and support rapid rescheduling. Trusted by GE, Cummins, BAE Systems, and leading manufacturers for 35+ years.

For more on lean visual management and scheduling, see our guides on the lean manufacturing glossary, kanban systems in manufacturing, and lean manufacturing in job shops.

Andon-Triggered Rescheduling: How Real-Time Machine Alerts Should Change Your Production Schedule

What "Andon" Means in a Scheduling Context

The Four Steps After an Andon Event

Step 1: Stop and Contain

Step 2: Assess Schedule Impact

Step 3: Generate Recovery Schedule

Step 4: Communicate and Monitor

How Long Before a Breakdown Triggers a Reschedule?

Connecting Andon to Real-Time Scheduling Software

Building Andon Response Into Your Standard Operating Procedure

The Lean Connection: Andon Is a Problem-Detection System, Not a Problem-Resolution System

Expert Q&A: Deep Dive

Frequently Asked Questions

Ready to Transform Your Production Scheduling?

User Solutions Team

Related Articles

Cellular Manufacturing Scheduling: How Dedicated Cells Change Your Scheduling Approach

Can You Do Lean Manufacturing in a Job Shop? Yes — Here's How to Adapt It

Manufacturing SOPs: How to Write, Implement, and Maintain Standard Operating Procedures