Sift Reference

Complete documentation

Sift operates in two modes: as a command-line tool for direct text processing, and as an MCP server exposing 78 tools for AI agents. This reference covers both. I’ve organized it by subsystem, since that’s how I think about the tool when I’m actually using it.

Contents

CLI Modes

–for “QUERY”

Execute SQL on standard input. Each line becomes a row in the lines table with line_number and content columns.

cat data.csv | sift --for "SELECT content FROM lines WHERE line_number > 1"

–dig –for “QUERY”

Search across multiple files. Reads file paths from stdin, indexes their contents with FTS5, and exposes files, lines, and search_fts tables.

find . -name "*.c" | sift --dig --for \
          "SELECT filepath, content FROM search_fts WHERE content MATCH 'malloc NEAR free'"

–refine FILE –for “QUERY”

Transform an entire file. The query must return a content column; the file is rewritten with the results.

–pick FILE –for “QUERY”

Surgical line editing. The query must return line_number and content columns; only those lines are modified.

–sweep –for “QUERY”

Batch editing across multiple files read from stdin.

–drop-after N FILE / –drop-before N FILE

Insert content at specific line positions. Reads from stdin.

–peek FILE

Display a file with line numbers.

–quarry [ACTION]

Manage the persistent workspace index. Actions: init, status, refresh, rebuild.

–mcp

Run as an MCP server, exposing all tools via JSON-RPC over stdio.

Output Options

--grain FORMAT sets output format: plain, tsv, csv, json, ndjson, grep.

--count outputs only the row count. --head N and --tail N limit output.

--shake previews changes without writing. --diff shows unified diff output.

Search Tools (2)

FTS5 full-text search, 30-195x faster than grep. Supports boolean queries (AND, OR, NOT, NEAR), prefix matching, and file glob filtering. Auto-initializes the workspace index on first use.

sift_search(pattern: "malloc AND free", files: "*.c", context: 3)

sift_workspace

Manage the workspace index. Actions: init (create), status (check info), refresh (update changed files), rebuild (full reindex).

File Tools (5)

sift_read

Read a file with line numbers. Supports start_line and end_line for partial reads. Large files stream automatically.

sift_write

Create or overwrite a file. Creates parent directories automatically.

sift_update

Find/replace with fuzzy whitespace matching. Fails if old_string isn’t found or isn’t unique (unless replace_all: true).

sift_edit

Powerful file editing with multiple modes:

sift_batch

Execute multiple edit operations atomically. All succeed or all fail. Supports delete_lines, replace_range, insert_after, insert_before, append, prepend, and replace actions.

SQL Tools (2)

sift_sql

Execute SQL on text input. Query the lines table with line_number and content columns. Supports CSV parsing, regex, and all SQL functions.

sift_sql(input: "a,b\n1,2", sql: "SELECT csv_field(content, 0) FROM lines")

sift_transform

SQL-based file transformation. Modifies the file in place based on SQL query results.

Memory Tools (38)

The memory subsystem stores persistent knowledge across sessions.

Core CRUD

sift_memory_add creates memories of types: task, note, plan, step, pattern, gotcha, preference.

sift_memory_get, sift_memory_update, sift_memory_archive, sift_memory_list handle retrieval and management.

sift_memory_search provides FTS5 search with automatic synonym expansion and relevance scoring based on access frequency and recency.

Synthesis

sift_memory_synthesize consolidates multiple memories into a single synthesis. sift_memory_expand retrieves the original sources from a synthesis.

Decisions

sift_memory_decide records a decision with question, answer, and rationale. sift_memory_decisions queries past decisions. sift_memory_supersede replaces a decision with a new one.

Reflections

sift_memory_reflect logs reasoning, observations, or corrections. sift_memory_reflections queries past reflections. sift_memory_reflect_trajectory captures insights about chains of memories over time.

Dependencies

sift_memory_link and sift_memory_unlink create relationships (blocks, related, parent). sift_memory_deps queries blockers. sift_memory_ready finds tasks with no open blockers.

Graph Analysis

sift_memory_network explores memory topology in four modes: hubs (most connected), neighbors (direct connections), cluster (related memories), bridges (connecting separate areas).

sift_memory_traverse walks the memory chain. sift_memory_origin finds the root of a chain. sift_memory_context generates rich session context.

Challenge

sift_memory_challenge searches for counterevidence to a claim by generating adversarial queries. sift_memory_challenge_evidence retrieves detailed results.

Fingerprints

sift_fingerprint_generate creates a fingerprint capturing engagement patterns. sift_fingerprint_load loads it at session start. sift_fingerprint_compare shows evolution between fingerprints. sift_fingerprint_drift detects deviation from baseline.

Maintenance

sift_memory_stats returns counts, active patterns, preferences, and corrections. sift_memory_stale finds old memories. sift_memory_cache_status shows eviction candidates. sift_memory_config and sift_memory_tune adjust ranking weights. sift_memory_backups and sift_memory_restore handle backup management.

Context Tools (10)

The context subsystem preserves conversation history.

sift_context_session manages sessions (start, end, get, current). sift_context_save stores messages and tool calls. sift_context_search provides FTS5 search across all conversations. sift_context_query allows raw SQL queries. sift_context_link connects messages to memories. sift_context_synthesize creates session summaries. sift_context_archive moves old sessions to cold storage. sift_context_stats returns database statistics. sift_context_stale finds sessions needing consolidation. sift_context_memory retrieves conversation context for a specific memory.

Web Tools (9)

Crawl and cache documentation for instant local search.

sift_web_crawl crawls a website respecting robots.txt and sitemaps. sift_web_fetch retrieves a single URL. sift_web_search performs FTS5 search on cached content. sift_web_query allows SQL queries on the pages table. sift_web_stats and sift_web_manifest show database information. sift_web_refresh updates stale pages. sift_web_search_multi searches across multiple databases. sift_web_merge combines databases.

sift_web_crawl(url: "https://docs.example.com", max_pages: 100)
        sift_web_search(db: "docs.db", query: "authentication AND oauth")

Repository Tools (5)

Clone and index git repositories for code search.

sift_repo_clone clones a repository and indexes it into a SQLite database. sift_repo_search performs FTS5 search. sift_repo_query allows raw SQL on the repo_files table. sift_repo_stats shows repository statistics. sift_repo_list lists indexed repositories.

sift_repo_clone(url: "https://github.com/org/repo")
        sift_repo_search(db: "repo.db", query: "error AND handling", language: "c")

Hardware Tools (7)

Monitor resources and adapt under pressure.

sift_hardware_status returns multi-dimensional resource state (memory, I/O, process metrics) with state levels (normal, elevated, critical, survival). sift_hardware_patterns shows learned tool access patterns. sift_hardware_events retrieves logged resource events.

sift_budget_request requests a resource budget before expensive operations. sift_budget_stats shows budget utilization.

sift_stream_read and sift_stream_close handle streaming for large results.

SQL Functions

Regex (PCRE2)

regex_match(pattern, text) returns 1 if pattern matches.

regex_replace(pattern, text, replacement) performs substitution.

regex_extract(pattern, text, group) extracts capture groups.

Encoding

base64_encode(text), base64_decode(text)

hex_encode(text), hex_decode(text)

url_encode(text), url_decode(text)

CSV (RFC 4180)

csv_field(line, index) extracts a field by position.

csv_count(line) returns field count.

csv_parse(line) returns JSON array of fields.

csv_escape(text) properly escapes for CSV output.

The source is on GitHub, currently proprietary while features mature, with plans to open source once the API stabilizes.

Home Home Intro Comparison GitHub