Session Hijacking Detection: A Data Science Approach

Why Data Science is Key for Session Hijacking Detection

Despite expectations that phishing would be a thing of the past by 2025, the Verizon 2025 Breach Report revealed that phishing still accounts for roughly 16% of breach entry points. Its continued prevalence can be attributed to the relative ease with which adversaries can launch phishing attacks. The goal of phishing is often to compromise credentials and session tokens, enabling attackers to impersonate users and gain unauthorized access to systems or sensitive data. Once attackers gain access, they can impersonate legitimate users and move laterally through systems, putting sensitive data and operations at risk. Defenders must explore how to detect session hijacking early to prevent credential misuse and session token theft.

While incredible technologies exist to help detect and prevent potential user session hijacking attempts, this technology often falls short, failing to identify edge cases and leaving gaps in protection. We can potentially rely on signal-based detections to catch this at a later stage of the attack path, but this approach is neither timely nor guaranteed. As we will discuss later in this blog, these types of detections are quite difficult to catch via normal means.

Given these challenges, it becomes imperative to explore alternative detection opportunities. This blog will examine how data science can be leveraged to build effective session hijacking detection models, and how BeyondTrust researchers built upon these models to move beyond traditional signals to more effectively detect malicious activity.

How Session Hijacking Works: Credential vs. Token Compromise

Let’s talk about two major types of user identity attacks: Credential Compromise and Token Compromise.

Credential Compromise

Credential compromise represents the most common outcome of phishing-based attacks. In these scenarios, threat actors trick users into entering their credentials into a service controlled by the adversary (for example, malicious login pages that mimic legitimate services). These phishing “lures” are typically delivered via email, SMS, and even voice communications (vishing), each designed to create a sense of urgency or legitimacy. Once the victim enters their username and password, the threat actor captures them and can reuse the credentials to gain unauthorized access, compromising the target account. Many protections exist to prevent this type of attack, such as multi-factor authentication and additional authentication requirements. Although this particular attack can become quite sophisticated, it often requires low effort and minimal cost to execute.

Token Compromise

Compromising a token is a bit more complex than credential compromises. A session is a series of communications between a client and a server. After authentication occurs, the server typically returns a session token. This token allows the client to maintain authenticated communication with the server for the duration of its validity. Token compromise hijacking involves taking over this authenticated communication.

Another key aspect of token compromise is that, when an adversary steals a session token, it already confirms that the user has successfully completed multi-factor authentication. This effectively bypasses MFA, making token compromise considerably more dangerous than a regular credential compromise.

Performing token compromise can be technically complex because it requires an attacker to position themselves between the client and server to intercept communications. However, open-source tools like EvilGinx simplify many of the technical challenges, making it easier for attackers to intercept and potentially steal session tokens.

Why Session Hijacking Prevention Methods Matter

Once the adversary has compromised an identity—by one of the means above—they gain the ability to impersonate that identity on a target service. Attackers commonly target highly privileged users, allowing an adversary to obtain access to sensitive data and privileged actions. Even if an attacker compromises a low-privilege user, they can often perform detailed reconnaissance of the environment to find other opportunities for exploitation.

Why Signal-Based Detection Falls Short Against Session Hijacking

Now let’s go over some common detection rules for these types of attacks and how they often fall short. These detections can apply to both credential compromise and session hijacking. This is not an exhaustive list.

Detection Rule	What It Looks For	Challenges
Reused Session ID detection	Flags tokens used across multiple IPs	False positives from VPNs or user travel
User-Agent anomaly detection	Tracks changes in browser fingerprints	Easily spoofed by attackers
Geo-Location tracking	Detects impossible travel scenarios	Ineffective due to global remote work shift, BYOD, VPNs
New IP Detection	Looks for never-before-seen IPs in the environment	Requires a database of what has been seen

Why Signal-Based Detection Falls Short Against Session Hijacking

These examples highlight how difficult it is to detect these types of attacks without understanding what is normal in each environment. While these rules form the basis of many traditional session hijacking prevention methods, they often lack precision without contextual, behavior-based modeling. Let’s now explore how we can use data science to build more accurate behavioral profiles and detect truly anomalous activity.

Using Data Science for Session Hijacking Detection: A Behavioral Modeling Approach

To effectively detect session hijacking, we developed a multi-step approach that combines behavioral modeling with longitudinal analysis for anomaly detection. The goal is to identify rare and suspicious deviations in session activity that traditional detection methods may overlook.

Here’s how the process works:

1. Defining Suspicious Session Behavior

To detect potential session hijacking, we first determine the behaviors or properties that typically remain static within a user’s session, such as user agent, IP address, ASN, or location. For greater precision and reduced volume, construction of a binary composite variable indicating a change in multiple properties can be useful. This is likely to be a very rare event.

2. Quantifying Anomalies with Probability Scores

With the behavior of interest defined, we then quantify how unexpected this behavior is. One way is to estimate the probability for this particular behavior. For instance, in the case of anomalous geo-location changes, we can create a variable representing the percentage of events linked to a particular location, such as city, state, or country. This variable represents the empirical probability of an event being associated with that location. Depending on the type of anomalous behavior we are interested in, this can be calculated at the account level or organization level. Very small probabilities indicate potential anomalies worth investigating further. To make this more concrete, consider the following table:

Account	Country	Number of Events	Probability	Anomaly Score
jdoe123	United States	900	0.9	0.1
jdoe123	Canada	97	0.097	0.903
jdoe123	Germany	3	0.003	0.997

NA

Account jdoe123 has 1000 events of which the majority are from the United States or Canada. A very small number of events are associated with Germany. These can be considered anomalous. In our analysis, the behavior of interest is extremely rare, with a rate of occurrence of approximately 0.0004 or 4 in 10,000 events. The rate of events warranting further investigation is even less, roughly 1 in 30,000 events.

With such a low base rate, we require a way of filtering out less concerning events. We can increase precision by following the same approach for other variables, such as user agent, device, IP address, and ASN. This creates additional complexity: we now have multiple anomaly scores, each measuring a different type of anomalous behavior. We can approach this issue by aggregating these multiple anomaly scores into a single metric, as in the following equation: S꜀ = ƒ(S₁, …, Sₙ ).

In other words, scores S₁ through Sₙ are aggregated via some function (ƒ) into a single combined score (S꜀).

Aggregating Multiple Anomaly Scores

There are various approaches to combining multiple variables into a single severity score:

Bonferroni correction: Works for any correlation structure, but the resulting anomaly score is underpowered (i.e. the event is actually rarer than the score would indicate)
Holm-Bonferroni method: A less conservative, iterative variation of the Bonferroni approach
Fisher's method: A technique for combining p-values from independent tests. Details on this and related methods are described in Heard 2017.
Harmonic mean p-value: Method of combining correlated p-values under the assumption of positive dependence.

The most appropriate method depends on the data—particularly the assumed correlation structure, as well as the analyst’s objective. For instance, the analyst may want to flag all events where there is at least one small score or, conversely, discard any events with a single large score. In the statistics literature, this is basically the distinction between the Family-wise Error Rate and the False Discovery Rate. Whichever approach is used, the basic principle is that we aggregate multiple variables, each representing a different type of anomalous behavior, into a single, combined score representing the overall anomalousness of an event across each of these dimensions. This single score can be used to rank threshold events according to their severity.

3. Adding Temporal Context with Longitudinal Data

A separate but complimentary approach involves the analysis of accounts over time, helping to separate anomalous behavior that is frequently occurring from truly anomalous behavior. The temporal analysis of account activity is critical in identifying deviations from normal behavior. More detail on this approach can be found in our previous blog post on Longitudinal Data Analysis (LDA).

One effective technique in performing LDA is through the use of lagged variables, representing the number of times a behavior has occurred previously. This helps reduce the number of false positives by providing context about typical account behavior. For example, a within-session change in IP, user agent, and device might be an indicator of suspicious activity. However, by creating a variable that measures how often this behavior has occurred for a given account, we can recognize accounts where such changes are routine.

More formally, the conditional probability of such a change given previous instances of the same behavior can be estimated. Other relevant variables, such as department, organization, application ID, and time since the last event can also be included for more precise estimates. This model can be expressed in the following form: P (уₜ | уₜ ₋ ₙ : уₜ ₋ ₁, Xₜ )

Here, уₜ represents the behavior of interest at the current time. уₜ ₋ ₙ : уₜ ₋ ₁ represents a lagged representation at previous time periods, and Xₜ represents the current state of any additional covariates.

For identifying anomalous changes within a user’s session, we estimated this conditional probability using a binary classification model. This can be viewed as a probabilistic form of anomaly detection where we are looking for large deviations from the model’s predictions (i.e. large errors where the model has predicted a small probability of an event in cases where the event has occurred). Using the same process described earlier, this probability can be converted into an anomaly score and combined with other anomaly scores.

4. Detecting High-Risk Sessions with Combined Scoring

By this point, we’ve calculated individual anomaly scores for multiple session properties, such as IP address, user agent, and device, based on how rare or unexpected they are in the context of a user’s historical behavior. To identify truly high-risk events, we combine these scores into a single, aggregated anomaly score that reflects the overall severity of the session deviation.

This process is visualized in the following diagram. It shows how changes to specific session attributes at the current event time are evaluated, assigned individual scores, and then merged into a single risk score. At the current event time (t), we see 3 anomalous changes have occurred (changes in device name, IP address, and user agent). These individual scores are then combined into a single score. This particular event is flagged as highly concerning due to the presence of anomalous changes across multiple properties.

This general approach is useful in identifying only the most concerning events, such as those events with anomalous changes across many different properties. In the context of session hijacking detection, statistical methods, such as the use of LDA and probability-based approaches to anomaly scoring, help reduce event volume and prioritize incidents by severity, enabling security teams to focus on the most concerning threats.

Building a Session Hijacking Detection Model: Data First

Up to this point, we’ve examined the limitations of traditional methods for detecting anomalous account activity—methods that often require significant effort to maintain. We’ve also explored how data science can enhance detection capabilities. Now, let’s shift our focus to building something practical. Enter Microsoft Unified Audit Logs.

Although Microsoft’s logging landscape can be quite convoluted, it presents fantastic opportunities for discovering valuable data. That brings us to Microsoft 365’s Unified Audit Log (UAL). UAL is a centralized logging feature that enables organizations to collect user and administrator activities across various Microsoft 365 services, such as Exchange and Entra ID. One reason why we chose this particular log source was its inclusion of a session identifier. This field will be important for us to track and model user behavior across a user’s active session.

According to Microsoft, the SessionId field: “Represents a unique identifier for an entire session and is generated when a user does interactive authentication. This ID helps link all authentication artifacts issued from a single root authentication.”

Note: In recent news, Microsoft has included a session identifier in the Entra SignIn Logs! However, at the time of writing, this was not the case, and we relied on the Unified Audit Logs to build our model. We plan to extend this research into the Entra Sign-In Logs in the future!

Threat actors frequently attempt to phish users for their Microsoft 365 credentials and tokens, making these logs a valuable source for modeling user behavior. It is important to have research and data science collaborate to identify effective fields that could be incorporated as features in the model.

Key fields in the development of a session hijacking detection model

Let’s discuss some of the key fields that will be useful in the development of our model:

Field	Description	Why
AppId	The GUID that represents the application that the user is requesting access to	Needed to baseline user application activity to find deviations
ClientIp	The IP address of the device that was used when the activity was logged	Need to baseline user connectivity activity to find deviations
SessionID	Represents a unique identifier for an entire session	Useful to track a single root authentication.
ErrorCode	For failed logins, this property contains the Azure Active Directory STS (AADSTS) error code. A value of 0 indicates a successful login.	Errors could reveal unusual activity in an account
UserAgent	Information about the user's browser. This information is provided by the browser.	Needed to baseline user browser activity to find deviations
UserId	The user who performed the action	Necessary to understand the subject

NA

For more information on Microsoft’s Unified Audit Logs, check out these links:

With the model features determined, we collaborated to build a model capable of detecting anomalous behavior in the Microsoft 365 user sessions. After several rounds of refinement and tuning, we ended up with a robust ML model. In fact, it even uncovered activity tied to several phishing kits. We’ll dive deeper into that discovery in the next section.

Real-World Results: How Data Science Detected a Phishing Kit Attack

To validate the model’s performance, we began investigating some of the surfaced anomalies to determine if they were malicious. Early into our analysis, we noticed several events where the user-agent was "axios/1.7.9".

Upon further investigation, we found that this user-agent was associated with the Tycoon2FA phishing kit. However, a single indicator wasn’t enough to confirm malicious activity. Pulling on the threat intelligence thread, we cross-referenced additional data and discovered that the ISP linked to this phishing kit matched the activity flagged by our model.

Along with the user-agent, there were some other interesting behaviors tied to the event:

Field	Description
AppId	The AppId we observed was 4765445b-32c6-49b0-83e6-1d93765276ca (OfficeHome). Threat actors commonly target this app when first compromising an account
ClientIp	The IP address observed was tied to an ISP named Global Connectivity Solutions LLP. This ISP has been associated with activity from several phishing-as-a-service (PhaaS) platforms
ErrorCode	The error code logged was 53000. This error code means that the authentication was unsuccessful because Conditional Access failed. Upon further investigation, this was because the attacker’s device was not compliant.
UserAgent	The UserAgent was “axios/1.7.9”. Although this is known web client, this was highly unusual as it was the first time seen for the user agent and the first time seen for the organization.

NA

While static IOC-based detections might have caught this activity, it would have required prior knowledge of the specific indicators—and left the system open to evolving attacks. Our model, however, identified this event as anomalous—without relying on predefined rules—allowing us to uncover highly suspicious activity that might have otherwise been overlooked.

Why Data Science Is Essential for Modern Session Hijacking Detection

Credentials compromise and token compromise are among the most prevalent user identity attacks, yet they remain some of the hardest threats to defend against using static detections alone. While static detections can be effective when armed with a predefined list of indicators of compromise (IOCs), they fall short in the face of rapidly evolving threats. In today's dynamic cybersecurity landscape—where threats can rise and fall with little warning—static detections are slow to adapt, and adversaries exploit this gap.

The real advantage comes from integrating data science into detection engineering. By leveraging machine learning and behavioral analytics, we can surface anomalies that traditional detection methods would otherwise miss. Static detections rely on known patterns—data science enables us to uncover the unknown

As documented in the previous section, BeyondTrust uncovered activity linked to several phishing kits. This real-world example proves that data science isn’t just a luxury—it’s a must-have tool for any defender. Adversaries adapt. So must we.

Ready to see advanced session hijacking detection in action? BeyondTrust Identity Security Insights® delivers real-time anomaly detection powered by data science—helping you stop session hijacking before it leads to compromise. Request a demo to learn more, or click here for a free red-team assessment of your identity infrastructure.

FAQ: Session Hijacking and Data Science in Cybersecurity

Attackers hijack user sessions by stealing valid session tokens—often through phishing or man-in-the-middle attacks. Once a session token is compromised, the attacker can impersonate the user without needing credentials or triggering MFA, making detection difficult.

Session hijacking involves stealing a token after authentication, allowing attackers to bypass MFA. Session fixation tricks a user into authenticating with a token the attacker already knows. Both exploit session management flaws but require different prevention methods.

Signal-based tools rely on rules like detecting new IPs or geo-anomalies, which can lead to false positives due to VPNs, travel, or BYOD. These tools often miss subtle, malicious session changes that only behavioral models can catch.

Data science models analyze session behavior over time to identify anomalies across factors like location, IP, user agent, and device. These models calculate anomaly scores to flag rare or suspicious patterns that static detection rules may overlook.

BeyondTrust researchers used Microsoft Unified Audit Logs (UAL), focusing on fields like SessionID, AppId, ClientIP, UserAgent, and ErrorCode. These fields help track user behavior within sessions and build high-precision anomaly detection models.

Yes. BeyondTrust’s model flagged activity tied to the Tycoon2FA phishing kit, including anomalous user-agent strings and known phishing infrastructure. The detection worked without relying on static IOCs, proving the effectiveness of a data science-based approach.

About the Author

Kyle Barboza

Sr Security Researcher

Kyle is a Senior Security Researcher at BeyondTrust, where he investigates emerging threats and develops high-impact detections. With over eight years of experience in cybersecurity, he has led efforts to mature detection programs and optimize detection engineering pipelines across diverse environments. Certified across multiple domains, Kyle is recognized for his technical depth and dedication to operational excellence.

Darren Maynard

Sr Data Scientist

Darren Maynard is a Senior Data Scientist at BeyondTrust with over a decade of experience in data science and machine learning. His work has focused on solving complex problems involving large datasets and rare event detection in both industry and government. He holds a master’s degree in statistics and previously worked within the defense sector.

Phantom Labs™

BeyondTrust

BeyondTrust Phantom Labs™ believes the best way to fully understand cybersecurity threats is to work closely with our customers and partners, conducting real world research into the attacks that matter most to them. By dissecting emerging attack methods and exploitation techniques of threat actors, as well as conducting novel research, the team’s mission is to help organizations defend against identity threats.

Watch Product Demos

Buyer's Guide for Complete PAM

Gartner® Magic Quadrant™ for PAM

See what customer success looks like

Find a Partner

Leader in Privilege-Centric Identity Security

Watch Product Demos

How to Detect Session Hijacking Before It’s Too Late: A Data Science & Behavioral Modeling Approach

Why Data Science is Key for Session Hijacking Detection