Technical Deep Dive

Technical Architecture

A complete technical reference for the Sansqrit quantum simulation engine. This document covers every layer of the system — from the DSL lexer through sparse matrix mathematics to distributed 10-qubit sharding. Designed for researchers extending the engine and AI models learning the codebase.

System Overview

Sansqrit is a quantum-classical computing platform implemented entirely in Rust. The system consists of four major components that work together to execute quantum programs written in the Sansqrit DSL (Domain-Specific Language). The first component is the language frontend — a lexer, parser, and tree-walking interpreter that processes .sq source files into an Abstract Syntax Tree (AST) and executes them. The second component is the quantum engine — a three-tier simulation system that selects the optimal execution strategy based on qubit count. The third component is the standard library — a collection of classical computing utilities including collections, file I/O, statistics, and regular expressions. The fourth component is the domain packages — eight specialized packages for chemistry, biology, medical, physics, genetics, machine learning, mathematics, and QASM export.

The engine supports 46 quantum gates, 19 fully-implemented quantum algorithms, and 17 circuit constructors. It can simulate up to 100+ qubits on a standard laptop by exploiting the sparsity structure of most practical quantum circuits. The key insight behind Sansqrit's performance is that most quantum states encountered in real-world algorithms (GHZ states, VQE ansatz outputs, QAOA intermediate states) have far fewer non-zero amplitudes than the theoretical maximum of 2n. By storing only non-zero entries in a HashMap, Sansqrit achieves memory savings of up to 1028× compared to dense simulation.

46
Quantum Gates
19
Algorithms
17
Circuits
100+
Qubit Capacity
7.5K
Lines of Rust
32
Source Files

Workspace Structure

The Sansqrit project uses a Rust workspace with 11 crates. The workspace root Cargo.toml sets rust-version = "1.80" as the minimum supported compiler version (required by rayon-core 1.13.0 for parallel computation). The crate structure is designed so that adding a new domain package (e.g., climate science) requires only creating a new crates/sansqrit-climate/ directory and adding one line to the workspace members list — zero changes to existing code.

sansqrit/
├── Cargo.toml                    # Workspace root: rust-version = "1.80"
├── crates/
│   ├── sansqrit-core/src/        # Quantum engine: 8 core modules
│   │   ├── complex.rs            # Complex64 arithmetic: c(), c_real(), c_exp_i()
│   │   ├── sparse.rs             # SparseStateVec: HashMap<u128, Complex64>
│   │   ├── gates.rs              # 46 gates: matrices + sparse application
│   │   ├── lookup.rs             # O(1) memory-mapped gate lookup tables
│   │   ├── engine.rs             # QuantumEngine: 3-tier auto-selection
│   │   ├── measurement.rs        # Shot-based histograms, expectation values
│   │   ├── distributed.rs        # Rayon parallel chunks, TCP protocol
│   │   ├── qasm_export.rs        # OpenQASM2/3, IBM, IonQ, Cirq, Braket
│   │   ├── algorithms.rs         # 19 quantum algorithms (Grover, Shor, VQE...)
│   │   └── circuits.rs           # 17 circuit constructors (W, Dicke, QEC...)
│   ├── sansqrit-lang/src/        # DSL compiler: lexer → parser → interpreter
│   │   ├── lexer.rs              # Tokenization: keywords, literals, operators
│   │   ├── ast.rs                # Abstract Syntax Tree: 15 statement types
│   │   ├── parser.rs             # Recursive-descent parser: expressions, control flow
│   │   ├── interpreter.rs        # Tree-walking interpreter: classical + quantum dispatch
│   │   └── main.rs               # CLI: sansqrit run/qasm/version commands
│   ├── sansqrit-stdlib/src/      # Standard library: 7 modules
│   └── sansqrit-{chemistry,biology,medical,physics,genetics,ml,math,qasm}/
├── tools/precompute/generate_blobs.py  # Gate lookup table generator
├── samples/                      # Example .sq programs
└── .github/workflows/ci.yml      # CI: format → build → test → clippy

Source code: github.com/sansqrit/sansqritPy

Execution Pipeline

When a user runs sansqrit run program.sq, the source code passes through four stages before producing results. Each stage is implemented as a separate Rust module in the sansqrit-lang crate. The pipeline is designed to be single-pass — no intermediate compilation step is needed. The interpreter directly executes the AST nodes, dispatching quantum operations to the engine as they are encountered.

Drag to pan, scroll to zoom, and hover any block to inspect the details without text overflow.

Stage 1: Lexer (lexer.rs)

The lexer (also called the tokenizer or scanner) reads the raw .sq source text character by character and produces a stream of tokens. Each token has a type (keyword, identifier, number, string, operator, etc.) and a span indicating its position in the source file for error reporting. The Sansqrit lexer supports the following token categories:

Keywords: let, const, fn, class, struct, if, else, for, while, loop, match, return, break, continue, import, simulate, circuit, molecule, true, false, None, and, or, not, in, try, catch, finally, raise, extends

Operators: +, -, *, /, //, %, **, ==, !=, <, >, <=, >=, =, +=, -=, *=, /=, |> (pipeline), &, |, ^, <<, >>

Literals: Integers (42, -7), floats (3.14, 1e-6), strings ("hello", 'world'), f-strings (f"Energy: {e:.6f}"), triple-quoted multiline strings

Comments: Single-line (# comment or -- comment), multi-line (/* ... */), documentation (/// docstring)

-- Input source code:
let q = quantum_register(4)
H(q[0])

-- Lexer output (token stream):
-- [KW_LET, IDENT("q"), ASSIGN, IDENT("quantum_register"), LPAREN, INT(4), RPAREN, NEWLINE,
--  IDENT("H"), LPAREN, IDENT("q"), LBRACKET, INT(0), RBRACKET, RPAREN, NEWLINE]

Stage 2: Parser & AST (parser.rs, ast.rs)

The parser consumes the token stream and builds an Abstract Syntax Tree (AST). Sansqrit uses a recursive-descent parser with precedence climbing for expressions. The AST defines 15 statement types (LetDecl, Assign, ExprStmt, FnDef, ClassDef, StructDef, IfChain, ForLoop, WhileLoop, Import, Return, Match, Simulate, TryCatch, Circuit) and expression types including IntLit, FloatLit, StringLit, BoolLit, Ident, BinOp, UnaryOp, FnCall, Index, Member, ListLit, DictLit, FString, Lambda, ListComp, and Pipeline.

The operator precedence (from lowest to highest) is: pipeline (|>), logical or, logical and, comparison, bitwise or, bitwise xor, bitwise and, shift, addition/subtraction, multiplication/division/modulo, power, unary, member access/index/call.

Stage 3: Interpreter (interpreter.rs)

The tree-walking interpreter executes AST nodes directly. It maintains an environment (scope stack) for variable bindings and dispatches quantum operations to the QuantumEngine. When the interpreter encounters a Simulate { engine, body } block, it creates a new QuantumEngine instance with the specified engine tier (or auto-selects based on qubit count), executes the body statements, and captures measurement results. All quantum gate function calls (H, CNOT, Rx, etc.) within a simulate block are dispatched to the engine's convenience methods.

The interpreter supports 80+ built-in functions organized into categories: quantum gates (46), quantum operations (quantum_register, measure, measure_all, probabilities, expectation_z, expectation_zz, engine_nnz, statevector), math functions (sqrt, sin, cos, log, exp, abs, ceil, floor, round, pow), collection operations (len, range, enumerate, zip, map, filter, reduce, sort, sum, mean, min, max), type functions (int, float, str, bool, type), I/O functions (print, read_csv, write_csv, read_json, write_json), and string methods (len, upper, lower, contains, replace, split, join).

Three-Tier Quantum Engine

The QuantumEngine struct is the primary interface for all quantum operations. It auto-selects the optimal simulation tier based on qubit count, but users can force a specific tier using simulate(engine="chunked") { ... }. All three tiers use the same SparseStateVec data structure under the hood — the difference is in how they manage memory and parallelism.

AUTOMATIC ENGINE SELECTION n_qubits? ≤ 20 DENSE Full state vector 2^n amplitudes in memory Max: 16 MB (20 qubits) Cache-friendly, fastest for small Single-threaded 21–28 SPARSE HashMap<u128, Complex64> Only non-zero amplitudes 100-qubit GHZ = 100 bytes 10^28× savings vs dense Single-threaded > 28 CHUNKED 10-qubit shards × Rayon Each chunk ≤ 1024 states Cross-chunk gate protocol Parallel via par_iter_mut Multi-threaded (Rayon)

Tier 1: Dense Engine (≤ 20 qubits)

For small circuits (up to 20 qubits), the state vector contains at most 220 = 1,048,576 amplitudes, requiring only 16 MB of RAM. The dense engine uses the SparseStateVec but expects it to be fully populated. Gate application iterates over all non-zero entries, which for dense states means all 2n entries. This is the fastest tier for small circuits because memory access patterns are sequential and cache-friendly. The 20-qubit threshold was chosen because 16 MB fits comfortably in L3 cache on modern processors.

Tier 2: Sparse Engine (21–28 qubits)

For medium circuits, the sparse engine exploits the fact that most quantum states have far fewer non-zero amplitudes than 2n. A 100-qubit GHZ state, for example, has exactly 2 non-zero entries regardless of qubit count. The sparse engine stores only non-zero entries in a HashMap<u128, Complex64>, achieving massive memory savings. Gate application iterates over only the non-zero entries, computing new amplitudes by applying the gate's unitary matrix. After each gate, amplitudes below a pruning tolerance (default 10-15) are removed to prevent numerical noise from accumulating.

Tier 3: Chunked Engine (> 28 qubits)

For large circuits, the chunked engine splits the quantum register into chunks of 10 qubits each. Each chunk maintains its own SparseStateVec with at most 1,024 basis states (210). Chunks execute in parallel using the Rayon library. Gates operating within a single chunk are applied locally. Cross-chunk gates (where control and target qubits are in different chunks) use a coordination protocol that temporarily merges the affected subspaces, applies the gate, and redistributes. The 10-qubit chunk size was chosen as the optimal balance between memory per chunk (16 KB max) and the number of cross-chunk operations needed for typical quantum circuits.

EngineQubitsMemory (30q)StrategyParallelism
Dense≤ 2016 GBFull state vectorSingle-thread
Sparse21–28~100 bytes (GHZ)HashMap of non-zero entriesSingle-thread
Chunked> 2810 × 16 KB chunks10-qubit shards, parallelRayon threads

Sparse Matrix Mathematics

The core mathematical innovation in Sansqrit is the sparse representation of quantum state vectors. Traditional quantum simulators allocate a dense vector of 2n complex numbers. For 50 qubits, this requires 16 petabytes of RAM — clearly impossible. Sansqrit's sparse engine stores only the non-zero amplitudes, which for most practical quantum circuits is a tiny fraction of the full state space.

SparseStateVec Data Structure

The SparseStateVec is defined in crates/sansqrit-core/src/sparse.rs. It uses a Rust HashMap<u128, Complex64> where the key is the basis state index (supporting up to 128 qubits) and the value is the complex amplitude. Key operations include:

pub struct SparseStateVec {
    pub n_qubits: usize,           // number of qubits
    entries: HashMap<u128, Complex64>, // ONLY non-zero amplitudes
    prune_tol: f64,                // prune below 1e-15
}

// O(1) operations:
get(index) -> Complex64       // lookup amplitude (0 if absent)
set(index, amp)               // insert or remove if near-zero
nnz() -> usize                // count of non-zero entries
drain() -> Vec<(u128, Complex64)>  // take all entries
total_probability() -> f64    // Σ|aᵢ|² (should always be 1.0)

// Bit manipulation:
bit_of(state, qubit) -> 0|1   // extract bit at position
flip_bit(state, qubit) -> u128 // toggle bit at position
set_bit(state, qubit, val) -> u128

Gate Application Algorithm

Applying a single-qubit gate to a sparse vector works by iterating over all existing non-zero entries. For each entry, the algorithm extracts the target qubit's bit value, computes the partner state (same state but with the target bit flipped), looks up the 2×2 gate matrix elements, and distributes the amplitude between the original and partner states. This is O(nnz) where nnz is the number of non-zero entries — not O(2n).

// For each non-zero entry (state, amplitude):
//   bit = extract target qubit's value from state
//   partner = state with target bit flipped
//   matrix = [[m00, m01], [m10, m11]] for the gate
//
//   if bit == 0:
//     new[state]   += m00 * amplitude
//     new[partner] += m10 * amplitude
//   else:
//     new[partner] += m01 * amplitude
//     new[state]   += m11 * amplitude

Amplitude Pruning

After each gate application, amplitudes with magnitude below prune_tol (default 10-15) are removed from the HashMap. This prevents numerical noise from accumulating over long circuits and keeps the nnz count as low as possible. The pruning tolerance was chosen to be well below the threshold where it could affect measurement probabilities (which are |amplitude|² ≈ 10-30) while still catching floating-point noise.

O(1) Gate Lookup Tables

Sansqrit's second performance innovation is pre-computed gate lookup tables. Instead of computing gate matrix multiplications at runtime, the system pre-computes every possible gate result for 10-qubit chunks and stores them in memory-mapped binary files. At runtime, applying a gate is a single memory read — O(1) instead of O(nnz).

Table Generation (generate_blobs.py)

The Python script tools/precompute/generate_blobs.py generates the lookup tables. It iterates over all 27 single-qubit gate types, 10 qubit positions within a chunk, and all 1,024 possible chunk states (210). For each combination, it pre-computes the output states and amplitudes, writing them to binary files. Two-qubit gates require iterating over 90 qubit pairs × 1,024 states. Generation takes approximately 30 seconds and produces ~52 MB of binary data.

python3 tools/precompute/generate_blobs.py --verify
# Output:
# single_qubit_all.bin   ~20 MB   (27 gates × 10 positions × 1024 states)
# two_qubit_all.bin      ~31 MB   (10 gates × 90 pairs × 1024 states)
# phase_table.bin        ~1 MB    (65536 pre-computed e^(iθ) values)
# manifest.json          <1 KB    (gate name → byte offset mapping)

Binary File Layout

Each entry in the binary lookup table consists of 36 bytes: two 16-bit output state indices (out0, out1) and four 64-bit floats (real and imaginary parts of both output amplitudes). The files are memory-mapped using the memmap2 crate, so the OS handles paging — only the needed portions are loaded into physical RAM.

Runtime Lookup

At runtime, applying a gate to a chunk state requires: (1) compute the byte offset: gate_id × 10 × 1024 × 36 + qubit × 1024 × 36 + state × 36, (2) read 36 bytes from the memory-mapped file, (3) deserialize into output states and amplitudes. This is a single memory read — no floating-point arithmetic required.

10-Qubit Sharding

The chunked engine divides the quantum register into chunks of 10 qubits each. For a 100-qubit register, this creates 10 chunks, each with its own SparseStateVec of at most 1,024 entries. The global qubit index i maps to chunk i / 10 and local qubit i % 10. Chunks execute in parallel using Rayon's par_iter_mut().

100 QUBITS → 10 CHUNKS (parallel via Rayon) Chunk 0q[0..9] Chunk 1q[10..19] Chunk 2q[20..29] ··· Chunk 9q[90..99] CNOT(9,10) MEMORY: 100q Dense = IMPOSSIBLE | 100q Sparse GHZ = 100 bytes Each chunk: max 1024 states × 16 bytes = 16 KB | Total: 10 × 16 KB = 160 KB DISTRIBUTED MODE (optional): Chunks across TCP nodes Node A: chunks 0-4 | Node B: chunks 5-9 | Cross-node gates use merge-apply-split over TCP DistributedConfig { nodes: [addr1, addr2], chunk_assignment: [...] }

Cross-chunk gates (where control and target qubits are in different chunks) are handled by temporarily merging the affected chunks, applying the gate on the combined state, and redistributing the results. For a CNOT(q[9], q[10]) spanning chunks 0 and 1, the engine: (1) collects the non-zero entries from both chunks, (2) constructs the combined 20-qubit subspace, (3) applies CNOT on the combined vector, (4) decomposes back into individual chunk states, (5) writes back to each chunk. This is transparent to the user — no special code is needed for cross-chunk gates.

The 10-qubit chunk size was selected as the optimal trade-off between memory per chunk (16 KB max, fitting in L1 cache), cross-chunk gate frequency (nearest-neighbor circuits rarely cross boundaries), and parallelism efficiency (10 chunks for 100 qubits gives excellent load balance). The distributed mode extends this across TCP-connected nodes, enabling simulation of thousands of qubits for sparse circuits.

Measurement Engine

Sansqrit supports both single-qubit measurement (which collapses the state) and shot-based measurement (which samples from the probability distribution without collapsing). The measure(qubit) function computes P(0) = Σ|ai|² over all states where bit qubit is 0, generates a random number, and collapses to 0 or 1 accordingly. The measure_all(shots) function builds a cumulative probability distribution and samples shots times, returning a histogram of bitstring outcomes.

Hardware Export (5 Backends)

Sansqrit can export circuits to five real quantum hardware backends: OpenQASM 2.0/3.0 (standard text format), IBM Quantum JSON (for IBM Cloud), IonQ JSON (for IonQ trapped-ion hardware), Google Cirq Python (for Google Sycamore), and Amazon Braket Python (for AWS quantum services). The qasm_export.rs module records all gate operations in the circuit_log during execution and serializes them to the target format on demand.

Crate Architecture (11 crates)

CratePurposeKey Dependencies
sansqrit-coreQuantum engine: 10 modules, 3,116 LOCnum-complex, rayon, dashmap, memmap2, bytemuck, rand
sansqrit-langDSL frontend: lexer, parser, interpreter, CLIsansqrit-core, sansqrit-stdlib, regex, env_logger
sansqrit-stdlibStandard library: 7 modulescsv, regex, serde_json, rand
sansqrit-chemistryVQE, PES, Trotter, molecular Hamiltonianssansqrit-core
sansqrit-biologyDNA/RNA, protein folding, alignmentsansqrit-core
sansqrit-medicalDrug screening, vaccine design, binding energysansqrit-core
sansqrit-physicsIsing model, Heisenberg chain, time evolutionsansqrit-core
sansqrit-geneticsCRISPR guide design, GWAS, variant callingsansqrit-core
sansqrit-mlQNN, QSVM, QPCA, variational classifierssansqrit-core
sansqrit-mathShor factoring, Grover search, HHL solversansqrit-core
sansqrit-qasmOpenQASM import/export utilitiessansqrit-core

Full source code: github.com/sansqrit/sansqritPy

Gate Reference (46) → Algorithm Reference (19) → Installation Guide →