MIT License — Open Source

Graphista: Dynamic Graph-Based LLM-Powered Memory

A proof-of-concept prototype that leverages LLM loops for graph-based memory management. Two specialized loops — SmartNodeProcessor for ingestion and SmartRetrievalTool for querying — enable dynamic knowledge management with chain-of-thought reasoning.

View on GitHub Join Discord

01Installation

Clone the repository and install dependencies:

bash

git clone https://github.com/pippinlovesyou/graphista.git
cd graphista
pip install -e .

Project structure:

tree
graphista/
├── memory.py           # Unified Memory class (entry point)
├── console.py          # Interactive CLI console
├── example.py          # Usage examples
├── graphrouter/        # Graph database abstraction layer
├── ingestion_engine/   # SmartNodeProcessor + extraction
├── llm_engine/         # LLM integration + tool definitions
├── docs/               # Documentation
└── tests/              # Test suite

02Quickstart

Initialize Memory, ingest text, and query — in three steps:

▶

Step 1 — Initialize Memorymemory.py

Import the Memory class and configure your ontology, extraction rules, and LLM settings. The backend defaults to local JSON for testing.

▶

Step 2 — Ingest Dataingest()

The ingest() method processes raw text through the SmartNodeProcessor loop: extracting entities, deduplicating, generating embeddings, and creating edges automatically.

▶

Step 3 — Query the Graphask()

The ask() method uses the SmartRetrievalTool to traverse the graph with chain-of-thought reasoning, returning both the answer and its reasoning steps.

03System Architecture

Click any component below to explore its implementation. Particles show the live data flow: blue = read path, yellow = write path, green = LLM loops.

—

04Core Components

▶

SmartNodeProcessoringestion_engine

Full read/write access to the graph database. Runs an LLM loop that processes incoming text: extracting entities, checking for duplicates via vector similarity, merging or creating nodes, and establishing edges. Exits when the LLM returns a finish action.

▶

SmartRetrievalToolllm_engine

Read-only graph access. Runs a chain-of-thought loop: the LLM iteratively decides which read tools to call (vector search, get edges, find similar nodes), building context until it has enough to synthesize a final answer.

▶

GraphRoutergraphrouter/

Database abstraction layer that provides a unified API across multiple backends. Handles node/edge CRUD, vector storage, batch operations, and path queries. Supports Local JSON, Neo4j, and FalkorDB.

05Memory Class API

The Memory class is the unified entry point. It ties together the graph database, ontology, LLM, and processing loops.

Method	Description	Returns
Memory(backend, ontology_config, ...)	Initialize with ontology, extraction rules, LLM config	Memory instance
ingest(text)	Process text → extract entities → create graph nodes/edges	{id, processing_result}
ask(question)	Natural language query via chain-of-thought retrieval	{final_answer, chain_of_thought}
retrieve(keyword)	Simple keyword search across node properties	list[Node]
query(Query)	Advanced query object with filters, vector search, paths	list[Node]

06ingest() — Ingestion Pipeline

When you call memory.ingest(text), the following pipeline executes:

Document node created with raw text stored as content
SmartNodeProcessor loop starts — LLM reads text and decides actions
Entity extraction — identifies Person, Company, Concept entities
Deduplication — vector similarity search against existing nodes (threshold 0.92)
Node creation or merge — new entities become nodes, duplicates get merged
Edge creation — relationships between entities become directed edges
Embedding generation — all new nodes get vector embeddings via OpenAI
Loop exits — when LLM returns finish action with processing summary

python
result = memory.ingest("John Doe is a software engineer at Innotech.")
print(result["id"])                # "doc_001"
print(result["processing_result"])  # {chain_of_thought: [...], new_nodes: 2, ...}

07ask() — Query Pipeline

When you call memory.ask(question), the SmartRetrievalTool loop executes:

Question parsed — LLM analyzes the query intent
Tool selection — LLM picks from: find_similar_nodes, get_edges, get_connected_nodes, vector_search, query
Iterative traversal — each tool result is added to context, LLM decides next step
Chain-of-thought recorded — every action logged for transparency
Synthesis — LLM combines gathered context into a natural language answer
Finish — returns {final_answer, chain_of_thought}

python
answer = memory.ask("Who works at Innotech?")
print(answer["final_answer"])     # "John Doe works at Innotech as a software engineer"
print(answer["chain_of_thought"])  # ["vector_search: Innotech", "get_edges: ...", ...]

08Tool Library

Read-Only Tools (used by SmartRetrievalTool + SmartNodeProcessor):

Function	Description	Returns
find_similar_nodes(embedding, k)	Vector cosine similarity search	list[Node]
get_node_by_property(key, val)	Exact property match lookup	Node \| null
get_connected_nodes(node_id)	All nodes connected by edges	list[Node]
get_edges(node_id)	All edges from/to a node	list[Edge]
vector_search(query, k)	Text → embed → similarity search	list[Node]
query(cypher_str)	Raw Cypher/GQL execution	ResultSet

Write Tools (SmartNodeProcessor only):

Function	Description	Returns
create_node(label, props)	Create node with auto-embedding	node_id
update_node(id, props)	Merge properties + refresh embedding	void
create_edge(src, tgt, rel)	Create directed relationship	edge_id
batch_create_nodes(list)	Bulk insert with parallel embeddings	list[id]
batch_create_edges(list)	Bulk edge creation	list[id]

09Advanced Usage

▶

Batch Operationsperformance

Efficiently create thousands of nodes in a single call with parallel embedding generation.

▶

Path Queriestraversal

Discover indirect relationships between nodes with configurable depth. Find how Person connects to Company through intermediate nodes.

▶

Hybrid Searchvector + property

Combine vector similarity with property filters for precise results. Filter by type/industry then rank by embedding similarity.

10Database Backends

Backend	Config	Status
Local JSON	backend="local", db_path="graph.json"	✓ Stable
Neo4j	backend="neo4j", uri, auth	⚠ Experimental
FalkorDB	backend="falkordb", host, port	⚠ Experimental