Sift Reference
Sift operates in two modes: as a command-line tool for direct text processing, and as an MCP server exposing 78 tools for AI agents. This reference covers both. I’ve organized it by subsystem, since that’s how I think about the tool when I’m actually using it.
Contents
- CLI Modes
- Search Tools
- File Tools
- SQL Tools
- Memory Tools
- Context Tools
- Web Tools
- Repository Tools
- Hardware Tools
- SQL Functions
CLI Modes
–for “QUERY”
Execute SQL on standard input. Each line becomes a row in the lines table with line_number and content columns.
cat data.csv | sift --for "SELECT content FROM lines WHERE line_number > 1"
–dig –for “QUERY”
Search across multiple files. Reads file paths from stdin, indexes their contents with FTS5, and exposes files, lines, and search_fts tables.
find . -name "*.c" | sift --dig --for \
"SELECT filepath, content FROM search_fts WHERE content MATCH 'malloc NEAR free'"
–refine FILE –for “QUERY”
Transform an entire file. The query must return a content column; the file is rewritten with the results.
–pick FILE –for “QUERY”
Surgical line editing. The query must return line_number and content columns; only those lines are modified.
–sweep –for “QUERY”
Batch editing across multiple files read from stdin.
–drop-after N FILE / –drop-before N FILE
Insert content at specific line positions. Reads from stdin.
–peek FILE
Display a file with line numbers.
–quarry [ACTION]
Manage the persistent workspace index. Actions: init, status, refresh, rebuild.
–mcp
Run as an MCP server, exposing all tools via JSON-RPC over stdio.
Output Options
--grain FORMAT sets output format: plain, tsv, csv, json, ndjson, grep.
--count outputs only the row count. --head N and --tail N limit output.
--shake previews changes without writing. --diff shows unified diff output.
Search Tools (2)
sift_search
FTS5 full-text search, 30-195x faster than grep. Supports boolean queries (AND, OR, NOT, NEAR), prefix matching, and file glob filtering. Auto-initializes the workspace index on first use.
sift_search(pattern: "malloc AND free", files: "*.c", context: 3)
sift_workspace
Manage the workspace index. Actions: init (create), status (check info), refresh (update changed files), rebuild (full reindex).
File Tools (5)
sift_read
Read a file with line numbers. Supports start_line and end_line for partial reads. Large files stream automatically.
sift_write
Create or overwrite a file. Creates parent directories automatically.
sift_update
Find/replace with fuzzy whitespace matching. Fails if old_string isn’t found or isn’t unique (unless replace_all: true).
sift_edit
Powerful file editing with multiple modes:
find+replace: Simple text replacementinsert_after/insert_before: Add content at line numbersdelete_lines: Remove lines by number or rangereplace_range: Replace a range of linessql: SQL-powered line transformationpatch: Apply unified diff patches
sift_batch
Execute multiple edit operations atomically. All succeed or all fail. Supports delete_lines, replace_range, insert_after, insert_before, append, prepend, and replace actions.
SQL Tools (2)
sift_sql
Execute SQL on text input. Query the lines table with line_number and content columns. Supports CSV parsing, regex, and all SQL functions.
sift_sql(input: "a,b\n1,2", sql: "SELECT csv_field(content, 0) FROM lines")
sift_transform
SQL-based file transformation. Modifies the file in place based on SQL query results.
Memory Tools (38)
The memory subsystem stores persistent knowledge across sessions.
Core CRUD
sift_memory_add creates memories of types: task, note, plan, step, pattern, gotcha, preference.
sift_memory_get, sift_memory_update, sift_memory_archive, sift_memory_list handle retrieval and management.
Search
sift_memory_search provides FTS5 search with automatic synonym expansion and relevance scoring based on access frequency and recency.
Synthesis
sift_memory_synthesize consolidates multiple memories into a single synthesis. sift_memory_expand retrieves the original sources from a synthesis.
Decisions
sift_memory_decide records a decision with question, answer, and rationale. sift_memory_decisions queries past decisions. sift_memory_supersede replaces a decision with a new one.
Reflections
sift_memory_reflect logs reasoning, observations, or corrections. sift_memory_reflections queries past reflections. sift_memory_reflect_trajectory captures insights about chains of memories over time.
Dependencies
sift_memory_link and sift_memory_unlink create relationships (blocks, related, parent). sift_memory_deps queries blockers. sift_memory_ready finds tasks with no open blockers.
Graph Analysis
sift_memory_network explores memory topology in four modes: hubs (most connected), neighbors (direct connections), cluster (related memories), bridges (connecting separate areas).
sift_memory_traverse walks the memory chain. sift_memory_origin finds the root of a chain. sift_memory_context generates rich session context.
Challenge
sift_memory_challenge searches for counterevidence to a claim by generating adversarial queries. sift_memory_challenge_evidence retrieves detailed results.
Fingerprints
sift_fingerprint_generate creates a fingerprint capturing engagement patterns. sift_fingerprint_load loads it at session start. sift_fingerprint_compare shows evolution between fingerprints. sift_fingerprint_drift detects deviation from baseline.
Maintenance
sift_memory_stats returns counts, active patterns, preferences, and corrections. sift_memory_stale finds old memories. sift_memory_cache_status shows eviction candidates. sift_memory_config and sift_memory_tune adjust ranking weights. sift_memory_backups and sift_memory_restore handle backup management.
Context Tools (10)
The context subsystem preserves conversation history.
sift_context_session manages sessions (start, end, get, current). sift_context_save stores messages and tool calls. sift_context_search provides FTS5 search across all conversations. sift_context_query allows raw SQL queries. sift_context_link connects messages to memories. sift_context_synthesize creates session summaries. sift_context_archive moves old sessions to cold storage. sift_context_stats returns database statistics. sift_context_stale finds sessions needing consolidation. sift_context_memory retrieves conversation context for a specific memory.
Web Tools (9)
Crawl and cache documentation for instant local search.
sift_web_crawl crawls a website respecting robots.txt and sitemaps. sift_web_fetch retrieves a single URL. sift_web_search performs FTS5 search on cached content. sift_web_query allows SQL queries on the pages table. sift_web_stats and sift_web_manifest show database information. sift_web_refresh updates stale pages. sift_web_search_multi searches across multiple databases. sift_web_merge combines databases.
sift_web_crawl(url: "https://docs.example.com", max_pages: 100)
sift_web_search(db: "docs.db", query: "authentication AND oauth")
Repository Tools (5)
Clone and index git repositories for code search.
sift_repo_clone clones a repository and indexes it into a SQLite database. sift_repo_search performs FTS5 search. sift_repo_query allows raw SQL on the repo_files table. sift_repo_stats shows repository statistics. sift_repo_list lists indexed repositories.
sift_repo_clone(url: "https://github.com/org/repo")
sift_repo_search(db: "repo.db", query: "error AND handling", language: "c")
Hardware Tools (7)
Monitor resources and adapt under pressure.
sift_hardware_status returns multi-dimensional resource state (memory, I/O, process metrics) with state levels (normal, elevated, critical, survival). sift_hardware_patterns shows learned tool access patterns. sift_hardware_events retrieves logged resource events.
sift_budget_request requests a resource budget before expensive operations. sift_budget_stats shows budget utilization.
sift_stream_read and sift_stream_close handle streaming for large results.
SQL Functions
Regex (PCRE2)
regex_match(pattern, text) returns 1 if pattern matches.
regex_replace(pattern, text, replacement) performs substitution.
regex_extract(pattern, text, group) extracts capture groups.
Encoding
base64_encode(text), base64_decode(text)
hex_encode(text), hex_decode(text)
url_encode(text), url_decode(text)
CSV (RFC 4180)
csv_field(line, index) extracts a field by position.
csv_count(line) returns field count.
csv_parse(line) returns JSON array of fields.
csv_escape(text) properly escapes for CSV output.
The source is on GitHub, currently proprietary while features mature, with plans to open source once the API stabilizes.