Quickstart
file_re mirrors the shape of re, so existing regex patterns and
call sites translate directly. The one new concept is
max_span_lines, which controls how much of the file is held in memory
while scanning.
Basic search
from pathlib import Path
from file_re import file_re
log = Path("server.log")
match = file_re.search(r"ERROR: (?P<msg>.+)", log)
if match:
print(match.group("msg"))
file_re.search returns an Match or None, just
like re.search().
Capture groups and named groups
from file_re import file_re
match = file_re.search(
r"(?P<user>[\w.\-]+)@(?P<domain>[\w.\-]+)",
"contacts.txt",
)
if match:
print(match.group("user"), match.group("domain"))
print(match.groupdict())
print(match.groups())
Non-participating groups are reported as None (matching re).
findall and finditer
from file_re import file_re
phones = file_re.findall(r"(\d{3})-(\d{3})-(\d{4})", "contacts.txt")
for match in file_re.finditer(r"\bERROR\b", "server.log"):
print(match.span(), match.group())
finditer yields matches lazily and is the recommended choice when the
number of matches is large or unknown.
Compiled patterns
For patterns reused across many files, compile once and reuse:
import re
from file_re import file_re
pattern = file_re.compile(r"error: (\w+)", flags=re.IGNORECASE)
for log in ("server1.log", "server2.log", "server3.log"):
for match in pattern.finditer(log):
handle(match)
Flags
file_re accepts the same flag bitmask as re:
import re
from file_re import file_re
match = file_re.search(r"error", "server.log", flags=re.IGNORECASE | re.MULTILINE)
Compressed files
.gz and .xz files are decoded transparently — no special call is
needed:
from file_re import file_re
matches = file_re.findall(r"(\d{3})-(\d{3})-(\d{4})", "logs/2026-04.log.gz")
Large files
For files that do not fit comfortably in memory, pass
max_span_lines:
from file_re import file_re
# Stream line by line. Matches cannot cross a newline.
for match in file_re.finditer(r"ERROR", "huge.log", max_span_lines=1):
handle(match)
# Slide a 5-line window. Patterns may span up to 5 lines.
for match in file_re.finditer(
r"BEGIN TXN\n(?:.*\n){0,3}END TXN",
"huge.log",
max_span_lines=5,
):
handle(match)
See Large files and max_span_lines for the full guide, including the multiprocessing pattern used for 50 GB log scans.