# LOOKUP JOIN

The `LOOKUP JOIN` command combines data from a query results table with matching records from a specified lookup index. It adds fields from the lookup index as new columns to the results table based on matching values in the join field. This is particularly useful for enriching or correlating data across multiple indices, such as logs, IPs, user IDs, or hosts.

## Syntax

`LOOKUP JOIN <lookup_index> ON <join_condition>`

### Parameters

#### lookup_index

The name of the lookup index. This must be a specific index name—wildcards, aliases, and remote cluster references are not supported. Indices used for lookups must be configured with the `lookup` mode.

#### join_condition

Can be one of the following:

- A single field name
- A comma-separated list of field names, for example <field1>, <field2>, <field3>
- An expression with one or more predicates linked by AND, for example <left_field1> >= <lookup_field1> AND <left_field2> == <lookup_field2>. Each predicate compares a field from the left index with a field from the lookup index using binary operators (==, >=, <=, >, <, !=). Each field name in the join condition must exist in only one of the indexes. Use RENAME to resolve naming conflicts.
- An expression that includes full text functions and other Lucene-pushable functions, for example MATCH(<lookup_field>, "search term") AND <left_field> == <lookup_field>. These functions can be combined with binary operators and logical operators (AND, OR, NOT) to create complex join conditions. At least one condition that relates the lookup index fields to the left side of the join fields is still required.

If using join on a single field or a field list, the fields used must exist in both your current query results and in the lookup index. If the fields contains multi-valued entries, those entries will not match anything (the added fields will contain null for those rows).


### Syntax Examples

```
LOOKUP JOIN <lookup_index> ON <field_name>

LOOKUP JOIN <lookup_index> ON <field_name1>, <field_name2>, <field_name3>

LOOKUP JOIN <lookup_index> ON <left_field1> >= <lookup_field1> AND <left_field2> == <lookup_field2>

LOOKUP JOIN <lookup_index> ON MATCH(lookup_field, "search term") AND <left_field> == <lookup_field>

```

## Query Examples

### Example 1: Enriching Firewall Logs with Threat Data

This example demonstrates how to enrich firewall logs with threat data from a lookup index.

#### Sample Data Setup

##### Create the `threat_list` index

```esql
PUT threat_list
{
  "settings": {
    "index.mode": "lookup"
  },
  "mappings": {
    "properties": {
      "source.ip": { "type": "ip" },
      "dest.ip": { "type": "ip" },
      "threat_level": { "type": "keyword" },
      "threat_type": { "type": "keyword" },
      "last_updated": { "type": "date" }
    }
  }
}
```

##### Create the `firewall_logs` index

```esql
PUT firewall_logs
{
  "mappings": {
    "properties": {
      "timestamp": { "type": "date" },
      "source.ip": { "type": "ip" },
      "destination.ip": { "type": "ip" },
      "action": { "type": "keyword" },
      "bytes_transferred": { "type": "long" }
    }
  }
}
```

##### Add sample data to `threat_list`

```esql
POST threat_list/_bulk
{"index":{}}
{"source.ip":"203.0.113.5","threat_level":"high","threat_type":"C2_SERVER","last_updated":"2025-04-22", "dest.ip":"10.0.0.100"}
{"index":{}}
{"source.ip":"198.51.100.2","threat_level":"medium","threat_type":"SCANNER","last_updated":"2025-04-23", "dest.ip":"10.0.0.44"}
```

##### Add sample data to `firewall_logs`

```esql
POST firewall_logs/_bulk
{"index":{}}
{"timestamp":"2025-04-23T10:00:01Z","source.ip":"192.0.2.1","destination.ip":"10.0.0.100","action":"allow","bytes_transferred":1024}
{"index":{}}
{"timestamp":"2025-04-23T10:00:05Z","source.ip":"203.0.113.5","destination.ip":"10.0.0.55","action":"allow","bytes_transferred":2048}
{"index":{}}
{"timestamp":"2025-04-23T10:00:08Z","source.ip":"198.51.100.2","destination.ip":"10.0.0.200","action":"block","bytes_transferred":0}
{"index":{}}
{"timestamp":"2025-04-23T10:00:15Z","source.ip":"203.0.113.5","destination.ip":"10.0.0.44","action":"allow","bytes_transferred":4096}
{"index":{}}
{"timestamp":"2025-04-23T10:00:30Z","source.ip":"192.0.2.1","destination.ip":"10.0.0.100","action":"allow","bytes_transferred":512}
```

#### Query the Data based on common field between lookup and source index

```esql
FROM firewall_logs
| LOOKUP JOIN threat_list ON source.ip
| WHERE threat_level IS NOT NULL
| SORT timestamp
| KEEP source.ip, action, threat_level, threat_type
| LIMIT 10
```

This query:
- Matches the `source.ip` field in `firewall_logs` with the `source.ip` field in `threat_list`.
- Filters rows to include only those with non-null `threat_level`.
- Sorts the results by `timestamp`.
- Keeps only the `source.ip`, `action`, `threat_level`, and `threat_type` fields.
- Limits the output to 10 rows.

#### Response

| source.ip     | action | threat_type | threat_level |
|---------------|--------|-------------|--------------|
| 203.0.113.5   | allow  | C2_SERVER   | high         |
| 198.51.100.2  | block  | SCANNER     | medium       |
| 203.0.113.5   | allow  | C2_SERVER   | high         |

In this example, the `source.ip` field from `firewall_logs` is matched with the `source.ip` field in `threat_list`, and the corresponding `threat_level` and `threat_type` fields are added to the output.


#### Query the Data based on fields with different name in lookup and source index

```esql
FROM firewall_logs
| LOOKUP JOIN threat_list ON destination.ip == dest.ip
| WHERE threat_level IS null
| SORT timestamp
| KEEP destination.ip, action, bytes_transferred
```

This query:
- Matches the `destination.ip` field in `firewall_logs` with the `dest.ip ` field in `threat_list`.
- Filters rows to include only those with null `threat_level`. This means no threat was found for matched destination IPs.
- Sorts the results by `timestamp`.
- Keeps only the `destination.ip`, `action`, and `bytes_transferred` fields.

#### Response

|destination.ip | action | bytes_transferred |
|----------------|--------|-------------------|
|10.0.0.55      | allow  | 2048              |
|10.0.0.200     | block  | 0                 |


## Limitations

The following are the current limitations with LOOKUP JOIN:
- Indices in `lookup` mode are always single-sharded.
- Only equality-based matching is supported.
- `LOOKUP JOIN` can only use a single match field and a single index.
- Wildcards, aliases, datemath, and datastreams are not supported.
- The query may circuit break if there are too many matching documents in the lookup index or if the documents are too large. `LOOKUP JOIN` processes data in batches of approximately 10,000 rows, which can require significant heap space for large matching documents.
- Cross-cluster `LOOKUP JOIN` can not be used after aggregations (`STATS`), `SORT` and `LIMIT` commands, and coordinator-side `ENRICH` commands.
