BlackBits.io

How Laravel's Scheduler Locking Actually Works (and the Default You Want for Multi-Server Setups)

How Laravel's Scheduler Locking Actually Works (and the Default You Want for Multi-Server Setups)

The Motivation

You run a Laravel app with a handful of scheduled commands. The scheduler ran on a single instance, where the cron gave you exactly one schedule:run per minute by definition. Now you're moving it to a fleet of containers, multiple instances, or anything else where more than one process can fire schedule:run at the same time, e.g., for redundancy.

How to ensure my scheduled commands don't run twice?

Laravel ships two methods to deal with this:

  • ->onOneServer(), which the docs describe as ensuring the task is executed on only one server.
  • ->withoutOverlapping(), which the docs describe as preventing tasks from overlapping.

Do you need both, or one, and which one? We did a deep dive reading the framework source and ran a multi-server test harness against shared Redis to see exactly what each one does. This post is the writeup.

TL;DR

For minute-frequency commands running on more than one server with shared Redis, this is the default we ship:

$schedule->command(MyCommand::class)
    ->everyMinute()
    ->onOneServer()
    ->withoutOverlapping(10);

Four points worth knowing up front:

  1. You should use both, not either. onOneServer() is "exactly one execution for the same minute". withoutOverlapping() is "don't execute more than once while still running". They solve related but distinct problems, and for a long-running command that spans more than one tick you should use both protections to behave correctly across multiple servers. When in doubt, withoutOverlapping() is more important than onOneServer().
  2. Always pass an explicit TTL to withoutOverlapping($minutes). The default is 1440 minutes (24 hours). A hard kill of the running process leaves the lock stranded for an entire day.
  3. For a single-server topology, just ->withoutOverlapping($min) is enough. There's nothing for onOneServer() to do when only one server runs schedule:run.
  4. Legitimate cases to skip both or either exist. If your command, e.g., processes a specific minute of data, or kick of jobs that do, you might want to only use onOneServer(). If your code already checks what data has to be processed, and there are other locks in place, you might not need either.

The rest of the post explains why.

How the scheduler locks at all

Laravel's scheduler doesn't have its own lock implementation. It uses whatever your cache.default driver gives it through the Cache\LockProvider contract. With Redis (what we used), the lock is one atomic SET key value EX ttl NX per acquisition, and the value is a random 16-character owner string the framework hands out per process.

For a multi-server setup, the requirements collapse to:

  • Every server points at the same Redis host, the same database, and the same cache.prefix.
  • The cache driver supports atomic locks. Redis, DynamoDB, Memcached, and the database driver do. The file driver does not, and a multi-server setup on file cache will silently produce duplicates.

One subtlety: scheduler locks live on the default Redis connection

In a fresh Laravel 13 app, config/cache.php configures the Redis cache store with two distinct named connections:

'redis' => [
    'driver' => 'redis',
    'connection' => env('REDIS_CACHE_CONNECTION', 'cache'),
    'lock_connection' => env('REDIS_CACHE_LOCK_CONNECTION', 'default'),
],

Cache values go to cache (typically REDIS_CACHE_DB=1). Locks, including the scheduler's, go to default (typically REDIS_DB=0).

The split is intentional: cache:clear issues FLUSHDB against the cache-values connection, and you don't want that to drop live locks held by the scheduler or anything else using LockProvider.

The practical implication: if you ever go looking for a stuck scheduler lock with redis-cli --scan, you need the REDIS_DB number, not REDIS_CACHE_DB. Easy to miss if you are not aware of that split.

withoutOverlapping(): don't re-enter while still running

The mutex behind withoutOverlapping() (CacheEventMutex.php) uses a time-independent key. The same key is used every tick for the same scheduled command, and looks roughly like framework/schedule-<sha1(expression + command)>. Two ticks one minute apart hash to the same string.

The acquire is a single Redis call:

return $this->redis->set($this->name, $this->owner, 'EX', $this->seconds, 'NX') == true;

The NX flag is the entire point. SET ... NX only writes the key if it doesn't already exist, so it's an atomic compare-and-set across processes. The first server to call it during a contended tick wins, every other server gets false and bails out of Event::run() without spawning the command process.

Two things matter for using it correctly.

The default TTL is 1440 minutes (24 hours). The signature is withoutOverlapping(int $expiresAt = 1440), so calling it with no argument gives you a key that, if not released cleanly, blocks the command for a full day. Always pass an explicit value tuned to roughly twice the expected worst case runtime. We use 10 minutes for most commands, 30 for the longer ones.

Release happens on clean exit and on SIGTERM. Laravel registers a pcntl handler for SIGTERM, SIGINT, and SIGQUIT that releases the lock on the way out, so a normal Kubernetes pod shutdown (within terminationGracePeriodSeconds) cleans up after itself. SIGKILL bypasses this, and the lock stays put until its TTL expires.

onOneServer(): exactly one execution for the same minute

The mutex behind onOneServer() (CacheSchedulingMutex.php) uses a time-dependent key:

$mutexName = $event->mutexName().$time->format('Hi');

The Hi is a 4-digit hour-minute stamp (1745, 1746, and so on). The key changes every minute, has a hardcoded 1-hour TTL, and is not explicitly released. It just expires.

So at 17:45, every server tries to acquire ...1745. The first one wins and runs. The losers print a "Skipping ... because the command already ran on another server" notice and move on. At 17:46 the key name is now ...1746, the previous results are irrelevant, and everyone races again.

That's the defining property: onOneServer() is per-minute and throw-away. Once the minute passes, the key still sits in Redis for an hour, and a new race happens at the next tick.

Why long-running commands across servers should use both

This is the part that can be confusing. In a multi-server setup, you usually want both. Four short scenarios:

Scenario A: only onOneServer(), command shorter than a tick. Server A wins minute M, runs in 15 seconds, finishes well before M+1. At M+1 the key changes and the race starts fresh. No issues.

Scenario B: only onOneServer(), command longer than a tick. Server A wins minute M and starts a 90-second command. At M+1 the key is now ...M+1, a fresh race that A or B can win, and nothing prevents A from launching a second copy of the same command on top of the still-running first one. Or B does. onOneServer() alone does not prevent re-entry across minutes. This can catch people out, since it now runs on two servers in parallel just with different Hi minutes. This method is still valid, since it covers the case where you intentionally want overlap, e.g. if you kick of jobs that process the data each minute for a specific timestamp, but you don't want to process the same timestamp twice.

Scenario C: only withoutOverlapping(). A wins at M, the key stays held while the 90-second command runs. At M+1, A's pre-check finds the key held and skips. B's pre-check finds the same. Only one instance runs across the window. Works correctly. The downside under contention: two servers can pass the pre-check simultaneously, but only one wins the atomic acquire. The loser does the framework-level startup work (event dispatch, signal handler registration) before silently bailing. Functional but a little noisy.

Scenario D: both. onOneServer() deduplicates the per-minute race cheaply, before the loser does any further work. withoutOverlapping() keeps the cross-tick guard so a long command can't be re-entered while still running. The skip filter on withoutOverlapping() runs first, so if the previous run hasn't finished, nobody even gets to the onOneServer() race.

That's the case for shipping both on every minute-frequency multi-server task. The upside of Scenario D is that you can deliberately run multiple instances of the scheduler to run multiple different commands at once. This can help prevent long-running commands from delaying shorter ones.

What we only saw by instrumenting it

A useful side effect of the test harness was noticing that onOneServer() skips emit no Laravel event. Filter rejections fire ScheduledTaskSkipped, atomic-acquire denials in withoutOverlapping() fire ScheduledTaskStarting and then ScheduledTaskFinished with ~0ms duration, but the onOneServer() denial path is silent. This behavior is correct, since a Hi minute run is not skipped, just not run twice. For testing purposes, we worked around it with a small decorator on the SchedulingMutex binding that stamps a row to a database table on every create() call, granted or denied. The full source is in the companion repo; it's about 30 lines.

Two findings worth flagging that fall out of running the harness across two servers:

Boot-time skew can split the per-minute key. ScheduleRunCommand captures Date::now() once in its constructor and uses that timestamp for every event in the tick, including the Hi suffix. If two servers start schedule:run a few seconds apart across a minute boundary, server A computes Hi=0111 and server B computes Hi=0112, both acquire (different keys), and both run. We reproduced it with a deliberate few-second stagger. It's a niche failure mode (you need bad clock sync or a launch straddling the boundary), and a reason to keep withoutOverlapping() as the cross-tick guard rather than relying on onOneServer() alone.

onOneServer() doubles as task distribution. When several due commands fire in the same tick, the server that loses the race for command 1 immediately tries command 2, and so on. With identical timing the result is effectively round-robin; with different machine speeds it leans toward the faster server. Useful in practice, but it means "onOneServer() was active" doesn't imply "the same server handled everything".

Crash recovery: SIGKILL and the 1440-minute default

The 24-hour default TTL on withoutOverlapping() is the part that bites in production. Concrete sequence we tested:

  1. Server A acquires the withoutOverlapping(5) lock and starts a long command. TTL is 300 seconds.
  2. The pod is hit with kill -9 mid-run. SIGKILL bypasses the framework's signal handlers. No release fires.
  3. The orphan key sits in Redis with the original owner string. Server B's next ticks hit command.skipped for every attempt.
  4. The lock disappears once Redis expires it. With withoutOverlapping(5), recovery takes at most 5 minutes from the kill.

If you'd called withoutOverlapping() with no argument, recovery would have taken 24 hours. That's the entire SLA on getting back to running, and it's set by a default that nothing in the docs nudges you to override.

SIGKILL is fired more often than you would think in larger server clusters, especially when using EKS. On a graceful EC2 reboot or shutdown, systemd sends SIGTERM and waits TimeoutStopSec (90 s by default) before escalating to SIGKILL. That is long enough for short commands, not long enough for everything. On Kubernetes the same pattern applies at the pod level: kubelet sends SIGTERM, waits terminationGracePeriodSeconds (30 s by default), then SIGKILL. Autoscaling nodes is a common example. The OOM killer (system-level on EC2, cgroup-level on EKS when a container exceeds its memory limit) sends SIGKILL directly, with no chance to handle. And in the worst case, e.g., a hardware failure, hypervisor force-stop, spot interruption past the deadline, the process gets no signal at all; from the lock's perspective that's identical to SIGKILL.

The two practical rules:

  • Always pass withoutOverlapping($minutes) with an explicit value, sized to roughly twice the longest expected runtime.
  • Set your container shutdown grace period (terminationGracePeriodSeconds on Kubernetes, the equivalent on whatever you use) longer than the longest expected command duration. SIGTERM within the grace period releases locks cleanly through Laravel's signal handler. SIGKILL doesn't.

The full timeline, Redis snapshots, and SQL traces for the crash test are in the companion repo.

The default we ship

For a multi-server topology with shared Redis:

$schedule->command(MyCommand::class)
    ->everyMinute()
    ->onOneServer()
    ->withoutOverlapping(10);

A few notes on tuning:

  • Pick the withoutOverlapping TTL based on the command's expected runtime, not on a round number that feels nice. 2x normal duration is a reasonable starting point. If you have commands that have vastly varying runtimes start longer and measure the worst case runtimes. This is where APM tools like Laravel Nightwatch or NewRelic help a lot.
  • Keep cluster clocks tightly synced. chrony or ntpd is fine for normal servers; in containerized environments it's the host clock that matters. Seconds-level skew is survivable when you have withoutOverlapping() as a backstop, but onOneServer() alone doesn't tolerate it.
  • All servers must point at the same Redis instance, the same database, and the same cache.prefix. If you use the cache for anything else, leave lock_connection on default so cache:clear doesn't take live locks down with it.

For a single-server cron host the picture is simpler: ->withoutOverlapping($min) does everything you need. onOneServer() is a no-op when there's only one server.

Wrapping up

onOneServer() and withoutOverlapping() look interchangeable from the outside and aren't. The first one is a per-minute race that uses a key incorporating H:i; the second is a long-lived re-entry guard that uses a time-independent key. For single-server topologies, only the second one matters. For anything with more than one server hitting schedule:run per minute, ship both, and pass an explicit TTL.

The full test harness (custom commands, the SchedulingMutex decorator, observation runs, MySQL schema, Redis traces) is on GitHub: black-bits/laravel-scheduler-locking-test. If you want a second pair of eyes on a multi-server scheduler setup, get in touch.

Expert Level Laravel Web Development & Consulting Agency

We love Laravel, and so should you. Let us show you why.

About Us

Founded in 2014, Black Bits helps customers to reach their goals and to expand their market positions. We work with a wide range of companies, from startups that have a vision and first funding to get to market faster, to big industry leaders that want to profit from modern technologies. If you want to start on your project without building an internal dev-team first, or if you need extra expertise or resources, Black Bits – the Laravel Web Development Agency is here to help.

Laravel

Laravel

Vue.js

Vue.js

React

React

Next.js

Next.js

TailwindCSS

TailwindCSS

AWS

AWS

Laravel Cloud

Laravel Cloud

Vercel

Vercel

DigitalOcean

DigitalOcean

Cloudflare

Cloudflare

Terraform

Terraform

Kubernetes

Kubernetes

Wiz

Wiz

OpenAI

OpenAI

Stripe

Stripe

Let's Talk

You have the vision, we have the development expertise. We would love to hear about your project.

Platform down right now?

Call us directly at (+1) 541 237-0201. We're here to help!

Or send us a message. We'll get back to you within 24 hours.

Grants Pass, Oregon, U.S.A.

(+1) 541 237-0201

hello@blackbits.io