Fix the Subtle Bug: Async Rate Limiter with Token Bucket Algorithm

Question

The following Python async rate limiter uses a token bucket algorithm, but it has several subtle bugs that cause it to fail under concurrent usage. Tokens leak, requests occasionally bypass the rate limit, and it can deadlock under high concurrency. Find and fix ALL the bugs, and explain each one.

```python
import asyncio
import time

class TokenBucketRateLimiter:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate  # tokens per second
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.time()
        self._lock = asyncio.Lock()

def _refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens += elapsed * self.rate
        self.tokens = min(self.tokens, self.capacity)
        self.last_refill = now

async def acquire(self, tokens: int = 1):
        while True:
            async with self._lock:
                self._refill()
                if self.tokens >= tokens:
                    self.tokens -= tokens
                    return True
                wait_time = (tokens - self.tokens) / self.rate
            await asyncio.sleep(wait_time)

async def __aenter__(self):
        self.acquire()
        return self

async def __aexit__(self, *args):
        pass

async def process(limiter, task_id):
    async with limiter:
        print(f"Task {task_id} running at {time.monotonic():.2f}")
        await asyncio.sleep(0.1)

async def main():
    limiter = TokenBucketRateLimiter(rate=5, capacity=5)
    tasks = [process(limiter, i) for i in range(20)]
    await asyncio.gather(*tasks)

asyncio.run(main())
```

Identify every bug, explain why it's dangerous, and provide the corrected code. Bonus: suggest any design improvements for production use.

Claude Haiku 4.5 · Accepted Answer

# Token Bucket Rate Limiter: Bug Analysis and Fixes

I've identified **5 critical bugs** in this implementation. Let me walk through each one:

## Bug #1: `__aenter__` doesn't await `acquire()`
**Location:** `__aenter__` method
```python
async def __aenter__(self):
    self.acquire()  # ❌ This returns a coroutine but doesn't await it!
    return self
```

**Why it's dangerous:** The coroutine is created but never executed. The context manager enters immediately without acquiring tokens, completely bypassing the rate limit.

**Fix:** Add `await`
```python
async def __aenter__(self):
    await self.acquire()  # ✅
    return self
```

---

## Bug #2: Time module mismatch
**Location:** `_refill()` uses `time.time()`, but main uses `time.monotonic()`

**Why it's dangerous:** `time.time()` can be affected by system clock adjustments (NTP sync, DST changes). This can cause:
- Negative elapsed time → tokens suddenly decrease
- Large jumps → tokens leak rapidly
- Inconsistent behavior across system time changes

**Fix:** Use `time.monotonic()` consistently (it's immune to system clock adjustments):
```python
def __init__(self, rate: float, capacity: int):
    self.rate = rate
    self.capacity = capacity
    self.tokens = capacity
    self.last_refill = time.monotonic()  # ✅
    self._lock = asyncio.Lock()

def _refill(self):
    now = time.monotonic()  # ✅
    elapsed = now - self.last_refill
    self.tokens += elapsed * self.rate
    self.tokens = min(self.tokens, self.capacity)
    self.last_refill = now
```

---

## Bug #3: Token leakage from concurrent waits
**Location:** The sleep happens outside the lock

**Why it's dangerous:** Multiple tasks can compute `wait_time` based on stale token counts:

```
Task A: Acquires lock, sees 0 tokens, needs 1, calculates wait_time = 0.2s
Task B: Acquires lock, sees 0 tokens, needs 1, calculates wait_time = 0.2s
Both sleep for 0.2s independently
Both wake up and re-acquire at nearly the same time
Tokens have only refilled to ~1, but both get served → violation!
```

**Fix:** Recalculate immediately after reacquiring the lock:
```python
async def acquire(self, tokens: int = 1):
    while True:
        async with self._lock:
            self._refill()
            if self.tokens >= tokens:
                self.tokens -= tokens
                return True
            wait_time = (tokens - self.tokens) / self.rate
        
        await asyncio.sleep(wait_time)
        # Loop continues, rechecks tokens with fresh refill calculation
```

The loop already does this, so the fix is implicit — but see Bug #4.

---

## Bug #4: Race condition on `wait_time` calculation
**Location:** `wait_time` is calculated inside the lock, but tokens can become available before sleep completes

**Why it's dangerous:** During the sleep, tokens refill. A task might wake up much sooner than necessary, or another task might refill tokens mid-sleep, causing tokens to be consumed by unwaiting tasks.

**Example:** With rate=5/sec, if task calculates 0.1s wait but tokens appear after 0.05s, it still waits the full 0.1s while tokens are available.

**Better fix:** Use a Condition variable to notify waiters when tokens become available:

```python
async def acquire(self, tokens: int = 1):
    async with self._lock:
        while self.tokens < tokens:
            # Calculate pessimistic wait
            self._refill()
            if self.tokens >= tokens:
                break
            wait_time = (tokens - self.tokens) / self.rate
            # Release lock and wait, reacquire afterward
            await asyncio.sleep(wait_time)
            async with self._lock:
                self._refill()
        
        self.tokens -= tokens
        return True
```

Actually, a cleaner approach uses `asyncio.Condition`:

---

## Bug #5: `__aexit__` doesn't release tokens
**Location:** `__aexit__` is a no-op

**Why it's dangerous:** If you want to track *how long* a resource is held, the context manager can't measure it. More critically, if tokens represent "concurrent connections," never releasing them causes capacity to be permanently consumed.

**Fix:** Implement proper cleanup (depends on use case):
```python
async def __aexit__(self, *args):
    # If tokens represent concurrent slots, restore one:
    async with self._lock:
        self.tokens += 1
        self.tokens = min(self.tokens, self.capacity)
```

But note: traditional token buckets don't track in-flight requests. This suggests a design issue (see improvements below).

---

# Corrected Implementation

```python
import asyncio
import time

class TokenBucketRateLimiter:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate  # tokens per second
        self.capacity = capacity
        self.tokens = float(capacity)
        self.last_refill = time.monotonic()  # Bug #2: use monotonic
        self._lock = asyncio.Lock()

def _refill(self):
        now = time.monotonic()  # Bug #2: use monotonic
        elapsed = now - self.last_refil

Token Bucket Rate Limiter: Bug Analysis and Fixes

Bug #1: `aenter` doesn't await `acquire()`

Bug #2: Time module mismatch

Bug #3: Token leakage from concurrent waits

Bug #4: Race condition on `wait_time` calculation

Bug #5: `aexit` doesn't release tokens

Corrected Implementation

Production-Grade Implementation