Introduction:

When writing a beacon or any type of malware, how we evade detection is often more important than what the malware does or the type of data it can exfiltrate. One of the toughest challenges in evading detection is delaying execution for any reason. If we simply rely on syscalls like NtDelayExecution() and NtWaitForSingleObject(), EDRs and memory scanning tools can easily spot and terminate our sleeping beacons. In this blog, we’ll explore one of the simplest ways to delay execution without using any of these common syscalls, making it harder for EDRs to detect your beacon.

Why Does My Beacon’s Nap Keep Getting Interrupted by EDRs?

The first thing that raises suspicion for EDRs and memory scanning tools is the state of the thread. To better understand why, we need to look at what these syscalls actually do.

NtWaitForSingleObject():

This function waits for a specific object (like a mutex, event, or semaphore) to be signaled. Here’s what happens:

  1. Call to NtWaitForSingleObject().
  • The thread requests to wait for an object (e.g., a file, an event, or a resource lock).
  1. Thread Status Change:
  • The thread’s state changes to “waiting” while it waits for the object to be signaled (released).
  • The thread is not using CPU time while in this waiting state.
  1. Object Signaled:
  • Once the object becomes available (signaled), or if a timeout is specified and the time expires, the thread’s state changes back to “ready” or “running”.
  • The thread resumes execution.

NtDelayExecution():

This function delays the execution of a thread for a specified period of time:

  1. Call to NtDelayExecution().
  • The thread requests to be paused (delayed) for a given duration.
  1. Thread Status Change:
  • The thread’s state changes to “waiting” or “sleeping” (specifically for a timed wait).
  • The thread is paused for the duration specified in the function and does not use CPU time during this period.
  1. Delay Ends:
  • Once the delay expires, the thread’s state changes back to “ready” or “running”.
  • The thread continues execution.

Did you notice? Both of these system calls change the thread’s state to “waiting”. The first thing EDRs look for are unfamiliar threads that are in the “waiting” state before doing a memory scan to that thread.
Now that we know how EDRs catch us, we can figure out a way to avoid detection. We’ll do this by creating our own delay execution routine that doesn’t change the thread’s state or rely on system calls.

The Clock Is The Key

The easiest technique is using the system time, which we can access without making any API calls, thanks to the KUSER_SHARED_DATA structure. This structure contains various information about the system, but we’re only interested in the time. Since it’s located at a fixed memory address, all of its members are also at fixed addresses, allowing us to retrieve the time directly. The time is stored at the memory address 0x7FFE0014, which points to a 64-bit value representing the time in file time stamp, and is constantly updated.

For this example, I’ll use assembly to for faster execution, as it allows direct use of CPU registers instead of relying on the stack, which compilers often do. However, this technique can be implemented in other languages like C/C++ or Rust.

High-Level Overview of the Delay Routine:

  • The routine takes one input: the number of seconds to delay execution.
  • It converts the input into a file time stamp.
  • The routine then adds the current time to the input and repeatedly checks if the updated time has been reached or surpassed. Once the condition is met, it returns.
section .text
    global DelayExecution

DelayExecution:
    mov rax, 0x989680       ; moving 10,000,000 to rax, we need it to convert seconds to file time stamp.
    mul rcx                 ; multiply the value in rcx which is the argument that was passed to DelayExecution
    mov rcx, 0x7FFE0014     ; moving the memory address that points to the current time in file time stamp to rcx
    add rax, [rcx]          ; add the seconds we need to wait to the current time

loop:
    cmp rax, [rcx]          ; compare rax and the value rcx points to
    ja loop                 ; jump to loop if rax greater to rcx which means we still didnt reach the time
    ret                     ; if not it means we reached the time and we will return

Here’s the equivalent C code for the assembly code:

void DelayExecution(int seconds) {
    unsigned long long *CurrentTime = (unsigned long long*)0x7FFE0014;
    unsigned long long StopTime = (seconds * 0x989680) + *CurrentTime;

    while (1)
    {
        if (*CurrentTime > StopTime) {
            break;
        }
    }
    return;
}

Closing Thoughts

While this technique is easy to implement, there are ways to make it even better. Instead of just checking if we’ve reached the target time, we can add some garbage code or function calls that do nothing just to confuse analysts or EDRs and makes it harder for them to understand what’s going on.
Alternatively, we can avoid using time altogether and rely solely on garbage code. In this case, we would use algorithms that take a long time to run, like large matrix operations or complex floating-point calculations. The downside is that we can’t control the exact delay time.