tenso package¶
Subpackages¶
Submodules¶
tenso.async_core module¶
Async I/O Support for Tenso.
Provides coroutines for reading and writing Tenso packets using asyncio stream readers and writers.
- async tenso.async_core.aread_stream(reader)[source]¶
Asynchronously read a Tenso packet from a StreamReader.
- Parameters:
reader (
StreamReader) – The stream reader source.- Returns:
The deserialized array.
- Return type:
Optional[ndarray]
- async tenso.async_core.awrite_stream(tensor, writer, strict=False, check_integrity=False)[source]¶
Asynchronously write a tensor to a StreamWriter.
- Parameters:
tensor (
ndarray) – The array to write.writer (
StreamWriter) – The stream writer destination.strict (
bool) – Strict contiguous check.check_integrity (
bool) – Include checksum.
- Return type:
None
tenso.cache module¶
TensoCache: In-process tensor cache backed by shared memory.
Provides an embeddable, tensor-aware cache with mutable entries, zero-copy reads, LRU eviction, TTL, in-place updates, and metadata inspection without deserialization.
- Memory Layout (single SHM segment):
┌──────────────────────────────────────────┐ │ POOL HEADER (4096 bytes) │ │ magic | version | pool_size | max_entries│ │ active_entries | free_bytes | watermark │ │ lru_head | lru_tail | free_list_head │ │ lock_word | generation | hits | misses │ ├──────────────────────────────────────────┤ │ ENTRY INDEX TABLE (max_entries × 256 b) │ ├──────────────────────────────────────────┤ │ DATA REGION (rest of pool) │ │ 64-byte aligned Tenso packets + free │ └──────────────────────────────────────────┘
- class tenso.cache.TensoCache(max_memory='256MB', name=None, create=True)[source]¶
Bases:
objectIn-process tensor cache backed by a single shared memory pool.
Supports mutable entries, zero-copy reads, LRU eviction, TTL, in-place updates, and metadata inspection without deserialization.
Example:
import numpy as np from tenso import TensoCache with TensoCache("64MB") as cache: cache.put("weights", np.random.randn(1000, 1000).astype(np.float32)) arr = cache.get("weights") # zero-copy view into SHM print(cache.info("weights")) # metadata without deserialization print(cache.stats) # hit/miss counts, memory usage
- delete(key)[source]¶
Delete an entry from the cache.
- Parameters:
key (
str) – Cache key.- Returns:
True if the key was found and deleted.
- Return type:
bool
- get(key, copy=False, device=None)[source]¶
Retrieve a tensor from the cache.
- Parameters:
key (
str) – Cache key.copy (
bool) – If True, return a writeable copy. Default returns zero-copy view.device (
str) – Target device for the result. Format: “framework” or “framework:device_spec” (e.g. “torch”, “torch:cuda:0”, “jax”, “cupy:0”). When set, the result is converted from numpy. Implies copy (SHM buffer cannot be shared with frameworks).
- Returns:
The tensor, or None if not found or expired.
- Return type:
Optional[np.ndarray or framework tensor]
- info(key)[source]¶
Get metadata about a cached entry without deserializing.
- Parameters:
key (
str) – Cache key.- Returns:
Dictionary with ‘shape’, ‘dtype’, ‘ndim’, ‘size_bytes’, ‘ttl’, ‘age’, or None if key not found.
- Return type:
Optional[dict]
- property name: str¶
- put(key, tensor, ttl=None, quantize=None)[source]¶
Store a tensor in the cache.
- Parameters:
key (
str) – Cache key (max 128 bytes UTF-8).tensor (np.ndarray, QuantizedTensor, or framework tensor) – Tensor to store. PyTorch, JAX, and CuPy tensors are automatically converted to numpy before caching.
ttl (
float) – Time-to-live in seconds. None means no expiry.quantize (
str) – Quantization dtype (‘qint8’, ‘quint8’, ‘qint4’, ‘quint4’).
- Returns:
Number of bytes written.
- Return type:
int- Raises:
ValueError – If key exceeds 128 bytes.
MemoryError – If pool is exhausted after eviction attempts.
- property stats: dict¶
Cache statistics.
- Returns:
Keys: entries, max_entries, pool_size, used_bytes, free_bytes, data_region_size, hits, misses, hit_rate
- Return type:
dict
tenso.client module¶
High-level HTTP client for Tenso-powered FastAPI endpoints.
Provides TensoFastAPIClient that natively streams and unpacks TensoResponse chunks, handling all protocol details under the hood.
Requires httpx (pip install httpx).
- class tenso.client.TensoFastAPIClient(base_url, timeout=30.0, check_integrity=False, headers=None)[source]¶
Bases:
objectClient for communicating with FastAPI endpoints that use TensoResponse and get_tenso_data.
Example:
client = TensoFastAPIClient("http://localhost:8000") result = client.predict("/infer", np.random.randn(1, 224, 224, 3).astype(np.float32)) print(result.shape) # Async usage result = await client.apredict("/infer", tensor)
- async apredict(endpoint, tensor, compress=False)[source]¶
Async version of predict().
- Return type:
Any
- predict(endpoint, tensor, compress=False)[source]¶
Send a tensor to a Tenso-powered endpoint and return the deserialized response.
- Parameters:
endpoint (
str) – The API path (e.g. “/infer”).tensor (
Union[ndarray,dict]) – The input tensor or bundle.compress (
bool) – Whether to LZ4-compress the request body.
- Returns:
The deserialized response (np.ndarray, dict, or sparse matrix).
- Return type:
Any
tenso.config module¶
Configuration and Protocol Constants for Tenso.
This module defines the binary protocol version, magic numbers, memory alignment requirements, and feature flags used across the library.
- tenso.config.DTYPE_BYTES = 21¶
Variable-length raw byte elements (int)
- tenso.config.DTYPE_STRING = 20¶
Variable-length UTF-8 string elements (int)
- tenso.config.FLAG_ALIGNED = 1¶
Packet uses 64-byte alignment (int)
- tenso.config.FLAG_BUNDLE = 16¶
Packet contains a collection (dict) of tensors (int)
- tenso.config.FLAG_COMPRESSION = 4¶
Packet body is compressed using LZ4 (int)
- tenso.config.FLAG_CUST_ALIGN = 128¶
Packet uses custom alignment (exponent byte follows shape) (int)
- tenso.config.FLAG_INTEGRITY = 2¶
Packet includes an 8-byte XXH3 checksum footer (int)
- tenso.config.FLAG_RAGGED = 512¶
Packet contains ragged/jagged array (int)
- tenso.config.FLAG_SPARSE = 8¶
Packet contains a Sparse COO tensor (int)
- tenso.config.FLAG_SPARSE_CSC = 64¶
Packet contains a Sparse CSC tensor (int)
- tenso.config.FLAG_SPARSE_CSR = 32¶
Packet contains a Sparse CSR tensor (int)
- tenso.config.FLAG_STRING = 256¶
Packet contains packed variable-length string tensor (int)
- tenso.config.MAX_ELEMENTS = 1000000000¶
Maximum elements per tensor (int)
- tenso.config.MAX_NDIM = 32¶
Maximum number of dimensions (int)
- tenso.config.QDTYPE_QINT4 = 18¶
4-bit signed quantized (packed, 2 per byte) (int)
- tenso.config.QDTYPE_QINT8 = 16¶
8-bit signed quantized (int)
- tenso.config.QDTYPE_QUINT4 = 19¶
4-bit unsigned quantized (packed, 2 per byte) (int)
- tenso.config.QDTYPE_QUINT8 = 17¶
8-bit unsigned quantized (int)
- tenso.config.QUANT_PER_CHANNEL = 1¶
One scale/zero_point per channel slice (int)
- tenso.config.QUANT_PER_GROUP = 2¶
One scale/zero_point per group of elements (int)
- tenso.config.QUANT_PER_TENSOR = 0¶
Single scale/zero_point for the entire tensor (int)
tenso.core module¶
Core Serialization Engine for Tenso.
This module provides high-performance functions for converting NumPy arrays, Sparse matrices, and Dictionaries to the Tenso binary format. It supports zero-copy memory mapping, LZ4 compression, and XXH3 integrity verification.
- tenso.core.dump(tensor, fp, strict=False, check_integrity=False)[source]¶
Serialize a tensor and write it to an open binary file.
Optimized for large arrays by writing the complete packet in a single system call instead of multiple small writes.
- Parameters:
tensor (
ndarray) – The array to serialize.fp (
BinaryIO) – Open binary file object.strict (
bool) – If True, raises error for non-contiguous arrays.check_integrity (
bool) – If True, includes XXH3 hash for verification.
- Return type:
None
- tenso.core.dumps(tensor, strict=False, check_integrity=False, compress=False, alignment=64)[source]¶
Serialize an object (Array, Sparse Matrix, or Dict) to a Tenso packet.
- Parameters:
tensor (
Any) – The object to serialize.strict (
bool) – If True, raises error for non-contiguous arrays.check_integrity (
bool) – If True, includes XXH3 hash for verification.compress (
bool) – If True, uses LZ4 compression on the data body.alignment (
int) – Memory alignment boundary (must be power of 2).
- Returns:
A view of the complete Tenso packet bytes.
- Return type:
memoryview
- tenso.core.iter_dumps(tensor, strict=False, check_integrity=False)[source]¶
Vectored serialization: Yields packet parts to avoid memory copies.
- Parameters:
tensor (
ndarray) – The array to serialize.strict (
bool) – If True, raises ValueError for non-contiguous arrays.check_integrity (
bool) – If True, includes an XXH3 checksum footer.
- Yields:
Union[bytes, memoryview] – Sequential chunks of the Tenso packet.
- tenso.core.load(fp, mmap_mode=False, copy=False)[source]¶
Deserialize an object from an open binary file.
- Parameters:
fp (
BinaryIO) – Open binary file object.mmap_mode (
bool) – Use memory mapping for large files.copy (
bool) – Return a writeable copy.
- Returns:
The reconstructed object.
- Return type:
Any
- tenso.core.loads(data, copy=False)[source]¶
Deserialize a Tenso packet into its original Python object.
- Parameters:
data (
Union[bytes,bytearray,memoryview,ndarray,mmap]) – The raw Tenso packet data.copy (
bool) – If True, returns a writeable copy. Otherwise returns a read-only view.
- Returns:
The reconstructed NumPy array, Dictionary, or Sparse Matrix.
- Return type:
Any
- tenso.core.read_stream(source)[source]¶
Read and deserialize an object from a stream source with DoS protection.
This function supports streaming deserialization for dense NumPy arrays, multi-tensor bundles (dictionaries), and sparse matrices (COO, CSR, CSC). It avoids loading the entire packet into memory before parsing, making it suitable for large-scale data ingestion.
- Parameters:
source (
Any) – Stream source to read from (must support .read() or .recv()).- Returns:
The deserialized NumPy array, Sparse matrix, or Dictionary. Returns None if the stream ended before any data was read.
- Return type:
Optional[Any]- Raises:
ValueError – If the packet is invalid or exceeds security limits.
EOFError – If the stream ends prematurely during reading.
ImportError – If scipy is missing during sparse matrix deserialization.
- tenso.core.write_stream(tensor, dest, strict=False, check_integrity=False)[source]¶
Write a tensor to a destination using memory-efficient streaming. Supports both file-like objects (.write) and sockets (.sendall).
- Parameters:
tensor (
ndarray) – The array to serialize.dest (
Any) – Destination supporting .write() or .sendall().strict (
bool) – Strict contiguous check.check_integrity (
bool) – Include integrity hash.
- Returns:
The total number of bytes written.
- Return type:
int
tenso.fastapi module¶
FastAPI Integration for Tenso.
Allows zero-copy streaming of tensors from API endpoints and high-performance ingestion of incoming Tenso packets.
- class tenso.fastapi.TensoResponse(tensor, filename=None, strict=False, check_integrity=False, **kwargs)[source]¶
Bases:
StreamingResponseFastAPI Response for zero-copy tensor streaming.
- Parameters:
tensor (
ndarray) – The tensor to stream.filename (
str) – Filename for Content-Disposition header.strict (
bool) – Strict contiguous check.check_integrity (
bool) – Include checksum.**kwargs – Passed to StreamingResponse.
- async tenso.fastapi.get_tenso_data(request)[source]¶
Dependency to extract a Tenso object from an incoming FastAPI Request.
- Parameters:
request (
Request) – The FastAPI request object.- Returns:
The deserialized array, bundle, or sparse matrix.
- Return type:
Any- Raises:
HTTPException – If the payload is invalid or headers are missing.
tenso.gpu module¶
GPU Acceleration for Tenso.
Implements fast transfers between device memory (CuPy/PyTorch/JAX) and Tenso streams using pinned host memory.
- class tenso.gpu.GPUDirectTransfer(device_id=0)[source]¶
Bases:
objectAbstraction layer for GPU-Direct Storage and RDMA transfers.
When available, uses NVIDIA GPUDirect Storage (GDS) to bypass CPU staging entirely — network/storage data lands directly in GPU memory. Falls back gracefully to pinned-memory staging when GDS is unavailable.
Example:
gdt = GPUDirectTransfer(device_id=0) # From a file descriptor (e.g., NVMe SSD or network socket) tensor = gdt.read_from_fd(fd, shape=(1, 3, 224, 224), dtype=np.float32) # From a Tenso packet buffer already in host memory tensor = gdt.read_packet(packet_bytes)
- property gds_available: bool¶
- read_from_fd(fd, shape, dtype, offset=0)[source]¶
Read raw tensor data from a file descriptor directly into GPU memory.
Uses GPUDirect Storage (kvikio/cuFile) when available, bypassing CPU RAM entirely. Falls back to pinned memory staging otherwise.
- Parameters:
fd (
int) – OS-level file descriptor.shape (
tuple) – Tensor shape.dtype (
dtype) – Element data type.offset (
int) – Byte offset into the file where data begins.
- Return type:
Any
- tenso.gpu.read_to_device(source, device_id=0)[source]¶
Read a Tenso packet from a stream directly into GPU memory.
- Parameters:
source (
Any) – Stream-like object (file, socket).device_id (
int) – The target GPU device ID.
- Returns:
The GPU tensor.
- Return type:
Any- Raises:
ValueError – If packet is invalid or integrity check fails.
EOFError – If stream ends prematurely.
- tenso.gpu.write_from_device(tensor, dest, check_integrity=False)[source]¶
Serialize a GPU tensor directly to an I/O stream using pinned memory staging.
- Parameters:
tensor (
Any) – A GPU-resident array (CuPy, PyTorch, or JAX).dest (
Any) – Destination with .write() method.check_integrity (
bool) – Include XXH3 checksum.
- Returns:
Number of bytes written.
- Return type:
int
tenso.quantize module¶
Quantized Tensor support for Tenso.
Provides QuantizedTensor for 4-bit and 8-bit quantized representations with per-tensor, per-channel, and per-group quantization schemes.
- class tenso.quantize.QuantizedTensor(data, scales, zero_points, shape, dtype_code, quant_scheme=0, group_size=0)[source]¶
Bases:
objectA quantized tensor with scale/zero_point metadata.
- dequantize()[source]¶
Reconstruct a float32 approximation of the original tensor.
- Return type:
ndarray
- property dtype_name: str¶
- property is_4bit: bool¶
- property is_signed: bool¶
- property nbytes: int¶
- classmethod quantize(tensor, dtype, scheme='per_tensor', group_size=0, axis=0)[source]¶
Quantize a float tensor.
- Parameters:
tensor (
ndarray) – Input tensor (will be converted to float32).dtype (
str) – Target quantized dtype name: “qint8”, “quint8”, “qint4”, “quint4”.scheme (
str) – Quantization scheme: “per_tensor”, “per_channel”, “per_group”.group_size (
int) – Group size for per_group scheme.axis (
int) – Channel axis for per_channel scheme.
- Return type:
tenso.ragged module¶
String Tensor and Ragged Array support for Tenso.
Provides efficient serialization of variable-length string batches and ragged/jagged arrays without padding, suitable for NLP pipelines and dynamic batching in LLM inference.
- String Tensor Format:
Header (8 bytes) + offsets array (n+1 uint64) + packed UTF-8 data
- Ragged Array Format:
Header (8 bytes) + shape_0 (4 bytes) + offsets (shape_0+1 uint64) + flat values as dense Tenso packet
- class tenso.ragged.RaggedArray(arrays)[source]¶
Bases:
objectA ragged (jagged) array: a sequence of variable-length 1-D arrays stored without padding.
Useful for dynamic batching in LLM inference where sequence lengths vary.
Example:
ra = RaggedArray([ np.array([1.0, 2.0, 3.0]), np.array([4.0, 5.0]), np.array([6.0]), ]) packet = ra.dumps() restored = RaggedArray.loads(packet) assert len(restored) == 3 assert list(restored[1]) == [4.0, 5.0]
- property dtype: numpy.dtype¶
- dumps(check_integrity=False)[source]¶
Serialize to a Tenso packet using bundle format internally.
- Return type:
memoryview
- property flat_values: numpy.ndarray¶
- classmethod loads(data)[source]¶
Deserialize from a Tenso packet or already-deserialized dict.
- Return type:
- property row_splits: numpy.ndarray¶
- class tenso.ragged.StringTensor(strings)[source]¶
Bases:
objectA batch of variable-length strings stored as packed UTF-8 with offsets.
This is a drop-in replacement for object-dtype numpy arrays of strings, but serializes compactly without padding.
Example:
st = StringTensor(["hello", "world", "foo"]) packet = st.dumps() restored = StringTensor.loads(packet) assert restored[0] == "hello" assert len(restored) == 3
- property nbytes: int¶
- property shape: tuple¶
tenso.ray module¶
Ray Integration for Tenso.
Registers Tenso as a custom serializer for Ray, replacing pickle-based serialization with zero-copy tensor transfer for numpy arrays and optionally PyTorch tensors.
Usage:
import ray
from tenso.ray import register
ray.init()
register() # Register Tenso as the serializer for numpy arrays
# All ray.put/get operations now use Tenso for numpy arrays
ref = ray.put(np.zeros((1000, 1000)))
arr = ray.get(ref) # Deserialized via Tenso (46x less CPU than pickle)
# Works transparently with remote functions and actors
@ray.remote
def process(tensor):
return tensor.mean()
ray.get(process.remote(np.random.randn(1000, 1000)))
- tenso.ray.register(include_torch=False, include_jax=False)[source]¶
Register Tenso as the custom serializer for tensor types in Ray.
After calling this, all
ray.put(),ray.get(), remote function arguments, and actor method arguments involving registered types will be serialized using Tenso instead of pickle.- Parameters:
include_torch (
bool) – Also register serializers fortorch.Tensor. Requires PyTorch.include_jax (
bool) – Also register serializers for JAX arrays. Requires JAX.
- Raises:
ImportError – If ray is not installed or if optional frameworks are not available.
- Return type:
None
Examples
>>> import ray >>> from tenso.ray import register >>> ray.init() >>> register() >>> ref = ray.put(np.zeros((100, 100))) >>> arr = ray.get(ref)
- tenso.ray.unregister()[source]¶
Remove Tenso serializers from Ray, reverting to default pickle behavior.
This deregisters all types that were registered by
register().- Return type:
None
tenso.serve module¶
tenso.serve: Quick-start utilities for spinning up Tenso-optimized servers.
Provides helpers that wrap FastAPI/gRPC with sensible defaults for high-throughput tensor serving: correct content types, worker pool sizing based on CPU/GPU resources, and built-in Tenso request/response handling.
Example:
from tenso.serve import create_app, run
app = create_app()
@app.tenso_endpoint("/infer")
def infer(tensor):
return tensor * 2.0
run(app, workers=4)
- class tenso.serve.TensoApp(title='Tenso Server', check_integrity=False, compress_response=False)[source]¶
Bases:
objectA lightweight wrapper around FastAPI pre-configured for Tenso serving.
Automatically handles Tenso binary request parsing and response serialization for registered endpoints.
- property app¶
The underlying FastAPI application instance.
- tenso_endpoint(path, method='POST', check_integrity=None)[source]¶
Decorator to register a function as a Tenso-powered endpoint.
The decorated function receives a deserialized tensor/bundle and should return a numpy array or dict. The return value is automatically serialized as a TensoResponse.
- Parameters:
path (
str) – The URL path for the endpoint.method (
str) – HTTP method (default: POST).check_integrity (
Optional[bool]) – Override per-endpoint integrity checking.
- Return type:
Callable
- tenso.serve.create_app(title='Tenso Server', check_integrity=False, health_check=True)[source]¶
Create a new TensoApp with sensible defaults.
- Parameters:
title (
str) – Server title.check_integrity (
bool) – Enable XXH3 integrity checking on all endpoints.health_check (
bool) – Add a /health endpoint.
- Returns:
The configured application.
- Return type:
- tenso.serve.run(app, host='0.0.0.0', port=8000, workers=None, log_level='info')[source]¶
Run a TensoApp using uvicorn with optimized settings.
- Parameters:
app (
TensoApp) – The TensoApp to serve.host (
str) – Bind address.port (
int) – Bind port.workers (
Optional[int]) – Number of worker processes. Auto-detected if None.log_level (
str) – Uvicorn log level.
- tenso.serve.run_grpc(servicer_class, port=50051, max_workers=None, max_message_length=268435456)[source]¶
Run a gRPC server with Tenso-optimized settings.
- Parameters:
servicer_class (
Any) – A TensorInferenceServicer subclass instance.port (
int) – gRPC listen port.max_workers (
Optional[int]) – Thread pool size. Auto-detected if None.max_message_length (
int) – Maximum message size (default 256MB for large tensors).
tenso.shm module¶
Shared Memory Transport for Tenso.
This module provides high-performance Inter-Process Communication (IPC) capabilities using POSIX Shared Memory. It allows zero-copy transfer of tensors between local processes.
- class tenso.shm.TensoShm(name, create=False, size=0)[source]¶
Bases:
objectA wrapper around SharedMemory for Tenso-protocol objects.
Example:
# Writer ary = np.random.rand(100, 100) with TensoShm.create_from("my_tensor", ary) as shm: print("Wrote to SHM") input("Press enter to cleanup...") # Reader with TensoShm("my_tensor") as shm: ary = shm.get() print(ary.shape)
- property buffer: memoryview¶
- classmethod create_from(name, obj, check_integrity=False, compress=False, alignment=64)[source]¶
Create a new SharedMemory segment sized to fit the object and write it.
Supports NumPy arrays, sparse matrices (COO/CSR/CSC), and dicts of arrays.
- Return type:
Self
- get()[source]¶
Deserialize the object currently in shared memory.
Returns a zero-copy view into the shared memory buffer. The view remains valid as long as the underlying SHM segment has not been unlinked.
- Returns:
The reconstructed object (zero-copy view).
- Return type:
Union[ndarray,dict,None]
- property name: str¶
- put(obj, check_integrity=False, compress=False, alignment=64)[source]¶
Serialize an object directly into the shared memory.
Supports NumPy arrays, sparse matrices, and dicts via the Rust fast-path. Falls back to Python serialization + copy if the Rust extension is unavailable.
- Returns:
Number of bytes written.
- Return type:
int
- property size: int¶
tenso.tenso_rs module¶
- tenso.tenso_rs.dump_to_buffer_rs(array, buffer, check_integrity=False, compress=False, alignment=64)¶
- tenso.tenso_rs.dump_to_fd_rs(array, fd, check_integrity=False, compress=False, alignment=64)¶
- Return type:
int
- tenso.tenso_rs.dumps_rs(array, check_integrity=False, compress=False, alignment=64)¶
- Return type:
bytes
- tenso.tenso_rs.get_packet_info_rs(data)¶
- Return type:
tuple
- tenso.tenso_rs.loads_rs(data)¶
- Return type:
ndarray[tuple[Any,...],dtype[TypeVar(_ScalarT, bound=generic)]]
- tenso.tenso_rs.shm_mutex_destroy(buffer, offset)¶
Destroy a POSIX process-shared mutex.
- tenso.tenso_rs.shm_mutex_init(buffer, offset)¶
Initialize a POSIX process-shared mutex at a given offset in a buffer. The buffer must be backed by shared memory and have at least 64 bytes available at the given offset.
- tenso.tenso_rs.shm_mutex_lock(buffer, offset, timeout_secs=5.0)¶
Lock a POSIX process-shared mutex. Returns True if the lock was recovered from a dead owner (Linux robust mutex), False otherwise.
- tenso.tenso_rs.shm_mutex_size()¶
Return the size in bytes needed for one POSIX process-shared mutex.
- tenso.tenso_rs.shm_mutex_unlock(buffer, offset)¶
Unlock a POSIX process-shared mutex.
tenso.utils module¶
- tenso.utils.get_packet_info(data)[source]¶
Extract metadata from a Tenso packet without deserializing the full tensor.
This function parses the header of a Tenso packet to provide information about the tensor’s properties, such as dtype, shape, and flags.
- Parameters:
data (
bytes) – The raw bytes of the Tenso packet.- Returns:
A dictionary containing packet information with keys: - ‘version’: Protocol version - ‘dtype’: NumPy dtype of the tensor - ‘shape’: Tuple representing tensor shape - ‘ndim’: Number of dimensions - ‘flags’: Raw flags byte - ‘aligned’: Boolean indicating if packet uses alignment - ‘integrity_protected’: Boolean indicating if integrity check is enabled - ‘total_elements’: Total number of elements in the tensor - ‘data_size_bytes’: Size of the tensor data in bytes
- Return type:
dict- Raises:
ValueError – If the packet is too short or invalid.
Uses Rust implementation for performance if available, otherwise falls back to Python. –
- tenso.utils.is_aligned(data, alignment=64)[source]¶
Check if the given bytes data is aligned to the specified boundary.
- Parameters:
data (
bytes) – The bytes object to check alignment for.alignment (
int) – The alignment boundary in bytes. Default is 64.
- Returns:
True if the data is aligned, False otherwise.
- Return type:
bool
Module contents¶
Tenso: High-performance tensor serialization and streaming.
This package provides efficient serialization, deserialization, and streaming of numpy arrays (tensors), with optional support for asynchronous and GPU-accelerated workflows.
- Main API:
dumps, loads, dump, load: Core serialization/deserialization functions.
read_stream, write_stream: Stream-based I/O.
aread_stream: Async stream reader (if available).
read_to_device: GPU direct transfer (if available).
get_packet_info, is_aligned: Utilities for packet inspection and alignment.
- class tenso.QuantizedTensor(data, scales, zero_points, shape, dtype_code, quant_scheme=0, group_size=0)[source]¶
Bases:
objectA quantized tensor with scale/zero_point metadata.
- dequantize()[source]¶
Reconstruct a float32 approximation of the original tensor.
- Return type:
ndarray
- property dtype_name: str¶
- property is_4bit: bool¶
- property is_signed: bool¶
- property nbytes: int¶
- classmethod quantize(tensor, dtype, scheme='per_tensor', group_size=0, axis=0)[source]¶
Quantize a float tensor.
- Parameters:
tensor (
ndarray) – Input tensor (will be converted to float32).dtype (
str) – Target quantized dtype name: “qint8”, “quint8”, “qint4”, “quint4”.scheme (
str) – Quantization scheme: “per_tensor”, “per_channel”, “per_group”.group_size (
int) – Group size for per_group scheme.axis (
int) – Channel axis for per_channel scheme.
- Return type:
- class tenso.RaggedArray(arrays)[source]¶
Bases:
objectA ragged (jagged) array: a sequence of variable-length 1-D arrays stored without padding.
Useful for dynamic batching in LLM inference where sequence lengths vary.
Example:
ra = RaggedArray([ np.array([1.0, 2.0, 3.0]), np.array([4.0, 5.0]), np.array([6.0]), ]) packet = ra.dumps() restored = RaggedArray.loads(packet) assert len(restored) == 3 assert list(restored[1]) == [4.0, 5.0]
- property dtype: numpy.dtype¶
- dumps(check_integrity=False)[source]¶
Serialize to a Tenso packet using bundle format internally.
- Return type:
memoryview
- property flat_values: numpy.ndarray¶
- classmethod loads(data)[source]¶
Deserialize from a Tenso packet or already-deserialized dict.
- Return type:
- property row_splits: numpy.ndarray¶
- class tenso.StringTensor(strings)[source]¶
Bases:
objectA batch of variable-length strings stored as packed UTF-8 with offsets.
This is a drop-in replacement for object-dtype numpy arrays of strings, but serializes compactly without padding.
Example:
st = StringTensor(["hello", "world", "foo"]) packet = st.dumps() restored = StringTensor.loads(packet) assert restored[0] == "hello" assert len(restored) == 3
- property nbytes: int¶
- property shape: tuple¶
- class tenso.TensoCache(max_memory='256MB', name=None, create=True)[source]¶
Bases:
objectIn-process tensor cache backed by a single shared memory pool.
Supports mutable entries, zero-copy reads, LRU eviction, TTL, in-place updates, and metadata inspection without deserialization.
Example:
import numpy as np from tenso import TensoCache with TensoCache("64MB") as cache: cache.put("weights", np.random.randn(1000, 1000).astype(np.float32)) arr = cache.get("weights") # zero-copy view into SHM print(cache.info("weights")) # metadata without deserialization print(cache.stats) # hit/miss counts, memory usage
- delete(key)[source]¶
Delete an entry from the cache.
- Parameters:
key (
str) – Cache key.- Returns:
True if the key was found and deleted.
- Return type:
bool
- get(key, copy=False, device=None)[source]¶
Retrieve a tensor from the cache.
- Parameters:
key (
str) – Cache key.copy (
bool) – If True, return a writeable copy. Default returns zero-copy view.device (
str) – Target device for the result. Format: “framework” or “framework:device_spec” (e.g. “torch”, “torch:cuda:0”, “jax”, “cupy:0”). When set, the result is converted from numpy. Implies copy (SHM buffer cannot be shared with frameworks).
- Returns:
The tensor, or None if not found or expired.
- Return type:
Optional[np.ndarray or framework tensor]
- info(key)[source]¶
Get metadata about a cached entry without deserializing.
- Parameters:
key (
str) – Cache key.- Returns:
Dictionary with ‘shape’, ‘dtype’, ‘ndim’, ‘size_bytes’, ‘ttl’, ‘age’, or None if key not found.
- Return type:
Optional[dict]
- property name: str¶
- put(key, tensor, ttl=None, quantize=None)[source]¶
Store a tensor in the cache.
- Parameters:
key (
str) – Cache key (max 128 bytes UTF-8).tensor (np.ndarray, QuantizedTensor, or framework tensor) – Tensor to store. PyTorch, JAX, and CuPy tensors are automatically converted to numpy before caching.
ttl (
float) – Time-to-live in seconds. None means no expiry.quantize (
str) – Quantization dtype (‘qint8’, ‘quint8’, ‘qint4’, ‘quint4’).
- Returns:
Number of bytes written.
- Return type:
int- Raises:
ValueError – If key exceeds 128 bytes.
MemoryError – If pool is exhausted after eviction attempts.
- property stats: dict¶
Cache statistics.
- Returns:
Keys: entries, max_entries, pool_size, used_bytes, free_bytes, data_region_size, hits, misses, hit_rate
- Return type:
dict
- class tenso.TensoShm(name, create=False, size=0)[source]¶
Bases:
objectA wrapper around SharedMemory for Tenso-protocol objects.
Example:
# Writer ary = np.random.rand(100, 100) with TensoShm.create_from("my_tensor", ary) as shm: print("Wrote to SHM") input("Press enter to cleanup...") # Reader with TensoShm("my_tensor") as shm: ary = shm.get() print(ary.shape)
- property buffer: memoryview¶
- classmethod create_from(name, obj, check_integrity=False, compress=False, alignment=64)[source]¶
Create a new SharedMemory segment sized to fit the object and write it.
Supports NumPy arrays, sparse matrices (COO/CSR/CSC), and dicts of arrays.
- Return type:
Self
- get()[source]¶
Deserialize the object currently in shared memory.
Returns a zero-copy view into the shared memory buffer. The view remains valid as long as the underlying SHM segment has not been unlinked.
- Returns:
The reconstructed object (zero-copy view).
- Return type:
Union[ndarray,dict,None]
- property name: str¶
- put(obj, check_integrity=False, compress=False, alignment=64)[source]¶
Serialize an object directly into the shared memory.
Supports NumPy arrays, sparse matrices, and dicts via the Rust fast-path. Falls back to Python serialization + copy if the Rust extension is unavailable.
- Returns:
Number of bytes written.
- Return type:
int
- property size: int¶
- async tenso.aread_stream(reader)[source]¶
Asynchronously read a Tenso packet from a StreamReader.
- Parameters:
reader (
StreamReader) – The stream reader source.- Returns:
The deserialized array.
- Return type:
Optional[ndarray]
- tenso.dump(tensor, fp, strict=False, check_integrity=False)[source]¶
Serialize a tensor and write it to an open binary file.
Optimized for large arrays by writing the complete packet in a single system call instead of multiple small writes.
- Parameters:
tensor (
ndarray) – The array to serialize.fp (
BinaryIO) – Open binary file object.strict (
bool) – If True, raises error for non-contiguous arrays.check_integrity (
bool) – If True, includes XXH3 hash for verification.
- Return type:
None
- tenso.dumps(tensor, strict=False, check_integrity=False, compress=False, alignment=64)[source]¶
Serialize an object (Array, Sparse Matrix, or Dict) to a Tenso packet.
- Parameters:
tensor (
Any) – The object to serialize.strict (
bool) – If True, raises error for non-contiguous arrays.check_integrity (
bool) – If True, includes XXH3 hash for verification.compress (
bool) – If True, uses LZ4 compression on the data body.alignment (
int) – Memory alignment boundary (must be power of 2).
- Returns:
A view of the complete Tenso packet bytes.
- Return type:
memoryview
- tenso.get_packet_info(data)[source]¶
Extract metadata from a Tenso packet without deserializing the full tensor.
This function parses the header of a Tenso packet to provide information about the tensor’s properties, such as dtype, shape, and flags.
- Parameters:
data (
bytes) – The raw bytes of the Tenso packet.- Returns:
A dictionary containing packet information with keys: - ‘version’: Protocol version - ‘dtype’: NumPy dtype of the tensor - ‘shape’: Tuple representing tensor shape - ‘ndim’: Number of dimensions - ‘flags’: Raw flags byte - ‘aligned’: Boolean indicating if packet uses alignment - ‘integrity_protected’: Boolean indicating if integrity check is enabled - ‘total_elements’: Total number of elements in the tensor - ‘data_size_bytes’: Size of the tensor data in bytes
- Return type:
dict- Raises:
ValueError – If the packet is too short or invalid.
Uses Rust implementation for performance if available, otherwise falls back to Python. –
- tenso.is_aligned(data, alignment=64)[source]¶
Check if the given bytes data is aligned to the specified boundary.
- Parameters:
data (
bytes) – The bytes object to check alignment for.alignment (
int) – The alignment boundary in bytes. Default is 64.
- Returns:
True if the data is aligned, False otherwise.
- Return type:
bool
- tenso.iter_dumps(tensor, strict=False, check_integrity=False)[source]¶
Vectored serialization: Yields packet parts to avoid memory copies.
- Parameters:
tensor (
ndarray) – The array to serialize.strict (
bool) – If True, raises ValueError for non-contiguous arrays.check_integrity (
bool) – If True, includes an XXH3 checksum footer.
- Yields:
Union[bytes, memoryview] – Sequential chunks of the Tenso packet.
- tenso.load(fp, mmap_mode=False, copy=False)[source]¶
Deserialize an object from an open binary file.
- Parameters:
fp (
BinaryIO) – Open binary file object.mmap_mode (
bool) – Use memory mapping for large files.copy (
bool) – Return a writeable copy.
- Returns:
The reconstructed object.
- Return type:
Any
- tenso.loads(data, copy=False)[source]¶
Deserialize a Tenso packet into its original Python object.
- Parameters:
data (
Union[bytes,bytearray,memoryview,ndarray,mmap]) – The raw Tenso packet data.copy (
bool) – If True, returns a writeable copy. Otherwise returns a read-only view.
- Returns:
The reconstructed NumPy array, Dictionary, or Sparse Matrix.
- Return type:
Any
- tenso.read_stream(source)[source]¶
Read and deserialize an object from a stream source with DoS protection.
This function supports streaming deserialization for dense NumPy arrays, multi-tensor bundles (dictionaries), and sparse matrices (COO, CSR, CSC). It avoids loading the entire packet into memory before parsing, making it suitable for large-scale data ingestion.
- Parameters:
source (
Any) – Stream source to read from (must support .read() or .recv()).- Returns:
The deserialized NumPy array, Sparse matrix, or Dictionary. Returns None if the stream ended before any data was read.
- Return type:
Optional[Any]- Raises:
ValueError – If the packet is invalid or exceeds security limits.
EOFError – If the stream ends prematurely during reading.
ImportError – If scipy is missing during sparse matrix deserialization.
- tenso.read_to_device(source, device_id=0)[source]¶
Read a Tenso packet from a stream directly into GPU memory.
- Parameters:
source (
Any) – Stream-like object (file, socket).device_id (
int) – The target GPU device ID.
- Returns:
The GPU tensor.
- Return type:
Any- Raises:
ValueError – If packet is invalid or integrity check fails.
EOFError – If stream ends prematurely.
- tenso.write_stream(tensor, dest, strict=False, check_integrity=False)[source]¶
Write a tensor to a destination using memory-efficient streaming. Supports both file-like objects (.write) and sockets (.sendall).
- Parameters:
tensor (
ndarray) – The array to serialize.dest (
Any) – Destination supporting .write() or .sendall().strict (
bool) – Strict contiguous check.check_integrity (
bool) – Include integrity hash.
- Returns:
The total number of bytes written.
- Return type:
int