How It Works

Lien provides local-first semantic code search through a simple four-step process:

The Journey of Your Code

1. 🔍 Indexing

When you run lien index, Lien scans your codebase and breaks it down into manageable chunks. Each chunk contains a logical piece of code - a function, a class, or a related block of logic.

2. 🧠 Embedding

Each code chunk is converted into a vector embedding - a mathematical representation that captures its semantic meaning. This happens entirely on your machine using a local ML model (all-MiniLM-L6-v2). No external API calls, no cloud services.

3. 💾 Storage

These embeddings are stored in a vector database — LanceDB locally in ~/.lien/indices/, or optionally in Qdrant for cross-repo search. Think of it as a semantic index of your entire codebase that enables lightning-fast searches.

4. 🎯 Search

When you ask your AI assistant a question like "how does authentication work?", Lien:

Converts your query into a vector embedding
Finds the most semantically similar code chunks
Returns relevant results to your AI assistant
Your AI assistant uses this context to give you better answers

All in under 500ms! ⚡

Why Semantic Search?

Traditional text search looks for exact matches. Semantic search understands meaning:

Text search: "JWT authentication" → only finds code with those exact words
Semantic search: "user login security" → finds JWT auth, OAuth, session management, and more!

Privacy First

Everything runs locally:

✅ Your code never leaves your machine
✅ No external API calls
✅ No telemetry or tracking
✅ No internet required (after initial model download)

Architecture

Lien is built with modern, performant tools:

TypeScript for type-safe development
@huggingface/transformers for local embeddings (runs in worker thread)
LanceDB for local vector storage (default), Qdrant for cross-repo search (optional)
Model Context Protocol (MCP) for AI assistant integration (Cursor, Claude Code, etc.)

Want to Learn More?

For detailed technical architecture, flow diagrams, and implementation details, see the Architecture Documentation on GitHub.

Ecosystem-Aware & Monorepo Support

Lien automatically detects your project type via 12 ecosystem presets:

Node.js/TypeScript - via package.json
Python - via pyproject.toml, setup.py, requirements.txt
PHP - via composer.json
Laravel - via artisan
Django - via manage.py
Ruby - via Gemfile
Rails - via bin/rails
Rust - via Cargo.toml
JVM (Java/Kotlin/Scala) - via pom.xml, build.gradle
Swift - via Package.swift, *.xcodeproj
.NET - via *.csproj, *.sln
Astro - via astro.config.*
Monorepos - Multiple ecosystems in one repo (e.g., Node.js frontend + Laravel backend)

Each ecosystem preset applies appropriate file exclusions (e.g., ignoring node_modules or vendor). Additionally, 15+ languages (including Liquid, Go, C/C++, and more) are indexed out of the box via the default scan pattern.

Supported Languages

Lien indexes and understands code in:

Full AST Support (function detection, complexity analysis):

TypeScript, JavaScript (JSX/TSX)
Python
PHP
Rust
Go
Java
C#

Semantic Search (chunking and embeddings):

All of the above, plus C/C++, Vue, Ruby, Swift, Kotlin, Scala, and more!

Complexity Analysis

Lien tracks four complementary complexity metrics:

Metric	What it Measures	Best For
Cyclomatic	Decision paths (if, for, switch)	Testability - how many tests needed?
Cognitive	Mental effort (nesting depth, breaks)	Understandability - how hard to read?
Halstead Effort	Reading time based on operators/operands	Learning curve - how long to understand?
Halstead Bugs	Predicted bug count (Effort^(2/3) / 3000)	Reliability - how bug-prone is this?

All metrics are calculated during indexing using Tree-sitter AST parsing. Cognitive complexity is based on SonarSource's specification, Halstead metrics are based on Maurice Halstead's "Elements of Software Science" (1977).

Performance

Query time: < 500ms
Small projects (1k files): ~5 minutes to index
Medium projects (10k files): ~20 minutes to index
Large projects (50k files): ~30-60 minutes to index
Disk usage: ~500MB per 100k chunks
RAM usage: ~200-500MB during indexing, ~100-200MB during queries

Ready to get started? Check out our Quick Start Guide!

How It Works ​

The Journey of Your Code ​

1. 🔍 Indexing ​

2. 🧠 Embedding ​

3. 💾 Storage ​

4. 🎯 Search ​

Why Semantic Search? ​

Privacy First ​

Architecture ​

Want to Learn More? ​

Ecosystem-Aware & Monorepo Support ​

Supported Languages ​

Complexity Analysis ​

Performance ​