Quickstart

file_re mirrors the shape of re, so existing regex patterns and call sites translate directly. The one new concept is max_span_lines, which controls how much of the file is held in memory while scanning.

Basic search

from pathlib import Path
from file_re import file_re

log = Path("server.log")

match = file_re.search(r"ERROR: (?P<msg>.+)", log)
if match:
    print(match.group("msg"))

file_re.search returns an Match or None, just like re.search().

Capture groups and named groups

from file_re import file_re

match = file_re.search(
    r"(?P<user>[\w.\-]+)@(?P<domain>[\w.\-]+)",
    "contacts.txt",
)
if match:
    print(match.group("user"), match.group("domain"))
    print(match.groupdict())
    print(match.groups())

Non-participating groups are reported as None (matching re).

findall and finditer

from file_re import file_re

phones = file_re.findall(r"(\d{3})-(\d{3})-(\d{4})", "contacts.txt")

for match in file_re.finditer(r"\bERROR\b", "server.log"):
    print(match.span(), match.group())

finditer yields matches lazily and is the recommended choice when the number of matches is large or unknown.

Compiled patterns

For patterns reused across many files, compile once and reuse:

import re
from file_re import file_re

pattern = file_re.compile(r"error: (\w+)", flags=re.IGNORECASE)

for log in ("server1.log", "server2.log", "server3.log"):
    for match in pattern.finditer(log):
        handle(match)

Flags

file_re accepts the same flag bitmask as re:

import re
from file_re import file_re

match = file_re.search(r"error", "server.log", flags=re.IGNORECASE | re.MULTILINE)

See Flags for details and divergences from re.

Compressed files

.gz and .xz files are decoded transparently — no special call is needed:

from file_re import file_re

matches = file_re.findall(r"(\d{3})-(\d{3})-(\d{4})", "logs/2026-04.log.gz")

Large files

For files that do not fit comfortably in memory, pass max_span_lines:

from file_re import file_re

# Stream line by line. Matches cannot cross a newline.
for match in file_re.finditer(r"ERROR", "huge.log", max_span_lines=1):
    handle(match)

# Slide a 5-line window. Patterns may span up to 5 lines.
for match in file_re.finditer(
    r"BEGIN TXN\n(?:.*\n){0,3}END TXN",
    "huge.log",
    max_span_lines=5,
):
    handle(match)

See Large files and max_span_lines for the full guide, including the multiprocessing pattern used for 50 GB log scans.