add read me

2026-01-09 10:28:44 +11:00
commit edaf914b73
13417 changed files with 2952119 additions and 0 deletions
--- a/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/INSTALLER
+++ b/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/INSTALLER
@@ -0,0 +1 @@
+pip
--- a/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/METADATA
+++ b/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/METADATA
@@ -0,0 +1,167 @@
+Metadata-Version: 2.4
+Name: ctranslate2
+Version: 4.6.2
+Summary: Fast inference engine for Transformer models
+Home-page: https://opennmt.net
+Author: OpenNMT
+Project-URL: Documentation, https://opennmt.net/CTranslate2
+Project-URL: Forum, https://forum.opennmt.net
+Project-URL: Gitter, https://gitter.im/OpenNMT/CTranslate2
+Project-URL: Source, https://github.com/OpenNMT/CTranslate2
+Keywords: opennmt nmt neural machine translation cuda mkl inference quantization
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Environment :: GPU :: NVIDIA CUDA :: 12 :: 12.4
+Classifier: Intended Audience :: Developers
+Classifier: Intended Audience :: Science/Research
+Classifier: License :: OSI Approved :: MIT License
+Classifier: Programming Language :: Python :: 3
+Classifier: Programming Language :: Python :: 3 :: Only
+Classifier: Programming Language :: Python :: 3.9
+Classifier: Programming Language :: Python :: 3.10
+Classifier: Programming Language :: Python :: 3.11
+Classifier: Programming Language :: Python :: 3.12
+Classifier: Programming Language :: Python :: 3.13
+Classifier: Programming Language :: Python :: 3.14
+Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
+Requires-Python: >=3.9
+Description-Content-Type: text/markdown
+Requires-Dist: setuptools
+Requires-Dist: numpy
+Requires-Dist: pyyaml<7,>=5.3
+Dynamic: author
+Dynamic: classifier
+Dynamic: description
+Dynamic: description-content-type
+Dynamic: home-page
+Dynamic: keywords
+Dynamic: project-url
+Dynamic: requires-dist
+Dynamic: requires-python
+Dynamic: summary
+
+[![CI](https://github.com/OpenNMT/CTranslate2/workflows/CI/badge.svg)](https://github.com/OpenNMT/CTranslate2/actions?query=workflow%3ACI) [![PyPI version](https://badge.fury.io/py/ctranslate2.svg)](https://badge.fury.io/py/ctranslate2) [![Documentation](https://img.shields.io/badge/docs-latest-blue.svg)](https://opennmt.net/CTranslate2/) [![Gitter](https://badges.gitter.im/OpenNMT/CTranslate2.svg)](https://gitter.im/OpenNMT/CTranslate2?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge) [![Forum](https://img.shields.io/discourse/status?server=https%3A%2F%2Fforum.opennmt.net%2F)](https://forum.opennmt.net/)
+
+# CTranslate2
+
+CTranslate2 is a C++ and Python library for efficient inference with Transformer models.
+
+The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to [accelerate and reduce the memory usage](#benchmarks) of Transformer models on CPU and GPU.
+
+The following model types are currently supported:
+
+* Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
+* Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon, Qwen2
+* Encoder-only models: BERT, DistilBERT, XLM-RoBERTa
+
+Compatible models should be first converted into an optimized model format. The library includes converters for multiple frameworks:
+
+* [OpenNMT-py](https://opennmt.net/CTranslate2/guides/opennmt_py.html)
+* [OpenNMT-tf](https://opennmt.net/CTranslate2/guides/opennmt_tf.html)
+* [Fairseq](https://opennmt.net/CTranslate2/guides/fairseq.html)
+* [Marian](https://opennmt.net/CTranslate2/guides/marian.html)
+* [OPUS-MT](https://opennmt.net/CTranslate2/guides/opus_mt.html)
+* [Transformers](https://opennmt.net/CTranslate2/guides/transformers.html)
+
+The project is production-oriented and comes with [backward compatibility guarantees](https://opennmt.net/CTranslate2/versioning.html), but it also includes experimental features related to model compression and inference acceleration.
+
+## Key features
+
+* **Fast and efficient execution on CPU and GPU**<br/>The execution [is significantly faster and requires less resources](#benchmarks) than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc.
+* **Quantization and reduced precision**<br/>The model serialization and computation support weights with [reduced precision](https://opennmt.net/CTranslate2/quantization.html): 16-bit floating points (FP16), 16-bit brain floating points (BF16), 16-bit integers (INT16), 8-bit integers (INT8) and AWQ quantization (INT4).
+* **Multiple CPU architectures support**<br/>The project supports x86-64 and AArch64/ARM64 processors and integrates multiple backends that are optimized for these platforms: [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html), [oneDNN](https://github.com/oneapi-src/oneDNN), [OpenBLAS](https://www.openblas.net/), [Ruy](https://github.com/google/ruy), and [Apple Accelerate](https://developer.apple.com/documentation/accelerate).
+* **Automatic CPU detection and code dispatch**<br/>One binary can include multiple backends (e.g. Intel MKL and oneDNN) and instruction set architectures (e.g. AVX, AVX2) that are automatically selected at runtime based on the CPU information.
+* **Parallel and asynchronous execution**<br/>Multiple batches can be processed in parallel and asynchronously using multiple GPUs or CPU cores.
+* **Dynamic memory usage**<br/>The memory usage changes dynamically depending on the request size while still meeting performance requirements thanks to caching allocators on both CPU and GPU.
+* **Lightweight on disk**<br/>Quantization can make the models 4 times smaller on disk with minimal accuracy loss.
+* **Simple integration**<br/>The project has few dependencies and exposes simple APIs in [Python](https://opennmt.net/CTranslate2/python/overview.html) and C++ to cover most integration needs.
+* **Configurable and interactive decoding**<br/>[Advanced decoding features](https://opennmt.net/CTranslate2/decoding.html) allow autocompleting a partial sequence and returning alternatives at a specific location in the sequence.
+* **Support tensor parallelism for distributed inference**<br/>Very large model can be split into multiple GPUs. Following this [documentation](docs/parallel.md#model-and-tensor-parallelism) to set up the required environment.
+
+Some of these features are difficult to achieve with standard deep learning frameworks and are the motivation for this project.
+
+## Installation and usage
+
+CTranslate2 can be installed with pip:
+
+```bash
+pip install ctranslate2
+```
+
+The Python module is used to convert models and can translate or generate text with few lines of code:
+
+```python
+translator = ctranslate2.Translator(translation_model_path)
+translator.translate_batch(tokens)
+
+generator = ctranslate2.Generator(generation_model_path)
+generator.generate_batch(start_tokens)
+```
+
+See the [documentation](https://opennmt.net/CTranslate2) for more information and examples.
+
+## Benchmarks
+
+We translate the En->De test set *newstest2014* with multiple models:
+
+* [OpenNMT-tf WMT14](https://opennmt.net/Models-tf/#translation): a base Transformer trained with OpenNMT-tf on the WMT14 dataset (4.5M lines)
+* [OpenNMT-py WMT14](https://opennmt.net/Models-py/#translation): a base Transformer trained with OpenNMT-py on the WMT14 dataset (4.5M lines)
+* [OPUS-MT](https://github.com/Helsinki-NLP/OPUS-MT-train/tree/master/models/en-de#opus-2020-02-26zip): a base Transformer trained with Marian on all OPUS data available on 2020-02-26 (81.9M lines)
+
+The benchmark reports the number of target tokens generated per second (higher is better). The results are aggregated over multiple runs. See the [benchmark scripts](tools/benchmark) for more details and reproduce these numbers.
+
+**Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings.**
+
+#### CPU
+
+| | Tokens per second | Max. memory | BLEU |
+| --- | --- | --- | --- |
+| **OpenNMT-tf WMT14 model** | | | |
+| OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 209.2 | 2653MB | 26.93 |
+| **OpenNMT-py WMT14 model** | | | |
+| OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 275.8 | 2012MB | 26.77 |
+| - int8 | 323.3 | 1359MB | 26.72 |
+| CTranslate2 3.6.0 | 658.8 | 849MB | 26.77 |
+| - int16 | 733.0 | 672MB | 26.82 |
+| - int8 | 860.2 | 529MB | 26.78 |
+| - int8 + vmap | 1126.2 | 598MB | 26.64 |
+| **OPUS-MT model** | | | |
+| Transformers 4.26.1 (with PyTorch 1.13.1) | 147.3 | 2332MB | 27.90 |
+| Marian 1.11.0 | 344.5 | 7605MB | 27.93 |
+| - int16 | 330.2 | 5901MB | 27.65 |
+| - int8 | 355.8 | 4763MB | 27.27 |
+| CTranslate2 3.6.0 | 525.0 | 721MB | 27.92 |
+| - int16 | 596.1 | 660MB | 27.53 |
+| - int8 | 696.1 | 516MB | 27.65 |
+
+Executed with 4 threads on a [*c5.2xlarge*](https://aws.amazon.com/ec2/instance-types/c5/) Amazon EC2 instance equipped with an Intel(R) Xeon(R) Platinum 8275CL CPU.
+
+#### GPU
+
+| | Tokens per second | Max. GPU memory | Max. CPU memory | BLEU |
+| --- | --- | --- | --- | --- |
+| **OpenNMT-tf WMT14 model** | | | | |
+| OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) | 1483.5 | 3031MB | 3122MB | 26.94 |
+| **OpenNMT-py WMT14 model** | | | | |
+| OpenNMT-py 3.0.4 (with PyTorch 1.13.1) | 1795.2 | 2973MB | 3099MB | 26.77 |
+| FasterTransformer 5.3 | 6979.0 | 2402MB | 1131MB | 26.77 |
+| - float16 | 8592.5 | 1360MB | 1135MB | 26.80 |
+| CTranslate2 3.6.0 | 6634.7 | 1261MB | 953MB | 26.77 |
+| - int8 | 8567.2 | 1005MB | 807MB | 26.85 |
+| - float16 | 10990.7 | 941MB | 807MB | 26.77 |
+| - int8 + float16 | 8725.4 | 813MB | 800MB | 26.83 |
+| **OPUS-MT model** | | | | |
+| Transformers 4.26.1 (with PyTorch 1.13.1) | 1022.9 | 4097MB | 2109MB | 27.90 |
+| Marian 1.11.0 | 3241.0 | 3381MB | 2156MB | 27.92 |
+| - float16 | 3962.4 | 3239MB | 1976MB | 27.94 |
+| CTranslate2 3.6.0 | 5876.4 | 1197MB | 754MB | 27.92 |
+| - int8 | 7521.9 | 1005MB | 792MB | 27.79 |
+| - float16 | 9296.7 | 909MB | 814MB | 27.90 |
+| - int8 + float16 | 8362.7 | 813MB | 766MB | 27.90 |
+
+Executed with CUDA 11 on a [*g5.xlarge*](https://aws.amazon.com/ec2/instance-types/g5/) Amazon EC2 instance equipped with a NVIDIA A10G GPU (driver version: 510.47.03).
+
+## Additional resources
+
+* [Documentation](https://opennmt.net/CTranslate2)
+* [Forum](https://forum.opennmt.net)
+* [Gitter](https://gitter.im/OpenNMT/CTranslate2)
--- a/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/RECORD
+++ b/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/RECORD
@@ -0,0 +1,65 @@
+../../../bin/ct2-fairseq-converter,sha256=NXI5i8K6n5EFy_7489HAU9B2LwAL8QSsl6MSX9o0fNQ,252
+../../../bin/ct2-marian-converter,sha256=B2u0P4fItkVCzPulzSX5PYel7In6wYH_95F9Fo9DN_Q,251
+../../../bin/ct2-openai-gpt2-converter,sha256=wAEd4Spd_NjEhYOQmnOpY8KfHv2mwXJK6hRyQu09ySk,256
+../../../bin/ct2-opennmt-py-converter,sha256=YGKAF2fJPlKaUx0fDmdjnygUPnQ6aSQEcdfwGQxTk0g,255
+../../../bin/ct2-opennmt-tf-converter,sha256=yW7HweYs9VSD_wc82TmBtFE_tUEMXKm-IAbikYMX43k,255
+../../../bin/ct2-opus-mt-converter,sha256=1XSeVOwuisT8Ka0jOdXnDk-3M687cCWsb9MCv5W2gVY,252
+../../../bin/ct2-transformers-converter,sha256=0oq6JpuyR0vGpeVY0glkwg_cxDqhHT_rsFWJStdVmoM,257
+ctranslate2-4.6.2.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
+ctranslate2-4.6.2.dist-info/METADATA,sha256=lgGm2CuQLH_CZL2nM6oN3twjmLJbPc8hxndwlKru_Bk,10187
+ctranslate2-4.6.2.dist-info/RECORD,,
+ctranslate2-4.6.2.dist-info/WHEEL,sha256=aSgG0F4rGPZtV0iTEIfy6dtHq6g67Lze3uLfk0vWn88,151
+ctranslate2-4.6.2.dist-info/entry_points.txt,sha256=ZHkojut_TmVRHl0bJIGm2b9wqr98GAJqxN9rlJtQshs,466
+ctranslate2-4.6.2.dist-info/top_level.txt,sha256=1hUaWzcFIuSo2BAIUHFA3Osgsu6S1giq0y6Rosv8HOQ,12
+ctranslate2.libs/libctranslate2-e54a6950.so.4.6.2,sha256=mctSDY89UdadpZgmToBt-AhdB9WZ3ambn7DYFjrHxp0,69379977
+ctranslate2.libs/libcudnn-74a4c495.so.9.1.0,sha256=jYZAt-vsR5nwmKbNxJuY4PXLKKIptD3vQ-QpdXwfOXI,126449
+ctranslate2.libs/libgomp-a34b3233.so.1.0.0,sha256=On6uznIxkRvi-7Gz58tMtcLg-E4MK7c3OUcrWh_uyME,168193
+ctranslate2/__init__.py,sha256=oLRlZk-gl_mm2q_KyFObjc2ps4AYd9ztGPSxttfujis,1671
+ctranslate2/__pycache__/__init__.cpython-312.pyc,,
+ctranslate2/__pycache__/extensions.cpython-312.pyc,,
+ctranslate2/__pycache__/logging.cpython-312.pyc,,
+ctranslate2/__pycache__/version.cpython-312.pyc,,
+ctranslate2/_ext.cpython-312-x86_64-linux-gnu.so,sha256=_FMnkvjicdEnHJsDsXteAF9JdFx0Ib-9vWbHC4HwfM4,65036649
+ctranslate2/converters/__init__.py,sha256=T_yJMns_XXV6pJy3SWpTtgrvbT0owRnxcMn_7i5qop4,499
+ctranslate2/converters/__pycache__/__init__.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/converter.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/eole_ct2.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/fairseq.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/marian.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/openai_gpt2.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/opennmt_py.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/opennmt_tf.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/opus_mt.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/transformers.cpython-312.pyc,,
+ctranslate2/converters/__pycache__/utils.cpython-312.pyc,,
+ctranslate2/converters/converter.py,sha256=xE3CYPDvNzUneBV_NuPJ59SxeNKRUcyBnS_MsxOD3IM,3492
+ctranslate2/converters/eole_ct2.py,sha256=9Odl1bYU2wNYl3ND679BMCB8Q3cbHa3FanrKmZ1alpo,12212
+ctranslate2/converters/fairseq.py,sha256=tSIi2Yg_U6wwFcMq6g_hBqopYor2AnHpTScA1yex2Bs,12420
+ctranslate2/converters/marian.py,sha256=i5piS_4zSM9Fmp12FwIHuot0pM-VPhq2MNxut-4Ua80,10959
+ctranslate2/converters/openai_gpt2.py,sha256=Rmvxy6Uqa5f9YW9RQJfF4XCeNbXh-Aw61p5N_E139hM,3209
+ctranslate2/converters/opennmt_py.py,sha256=OkajME-2qKad1dpohU4oYYIfmjQ66aZYxDIuC8ogPMs,12824
+ctranslate2/converters/opennmt_tf.py,sha256=gie0t84R9vUXH8Q1y-vu6hJBU8ljiOPWdT8xB9LXgxM,15767
+ctranslate2/converters/opus_mt.py,sha256=XCAb3X4afRCtaZErb7bvIvJBmfVgC9NWa_IF5tOBSmQ,1210
+ctranslate2/converters/transformers.py,sha256=LmpypFuWjAcGrdKZbLja1jDBjSi9oZJ3f3D-uQRl6T0,122783
+ctranslate2/converters/utils.py,sha256=u51jg3U-zQRMMhMOp-KKHsHWm6tQQ4TArlXiRvLpEyQ,3690
+ctranslate2/extensions.py,sha256=d6Dzj17649mNAmJaxGK7iQL57tcJsatdjKCIkYkcfps,21197
+ctranslate2/logging.py,sha256=xmx2LlryOhAPbMmJwMAr-JuXZZogR9j62ONNZZ2fFOQ,1176
+ctranslate2/models/__init__.py,sha256=ssMbhmQ4v0C5AwkYopmYN_U9rtTFJvNsdlecAcBmHek,479
+ctranslate2/models/__pycache__/__init__.cpython-312.pyc,,
+ctranslate2/specs/__init__.py,sha256=XE_GwmsYNpaomZwaYJgbJxsXgdKuvGN73yaf3eBCmgs,635
+ctranslate2/specs/__pycache__/__init__.cpython-312.pyc,,
+ctranslate2/specs/__pycache__/attention_spec.cpython-312.pyc,,
+ctranslate2/specs/__pycache__/common_spec.cpython-312.pyc,,
+ctranslate2/specs/__pycache__/model_spec.cpython-312.pyc,,
+ctranslate2/specs/__pycache__/transformer_spec.cpython-312.pyc,,
+ctranslate2/specs/__pycache__/wav2vec2_spec.cpython-312.pyc,,
+ctranslate2/specs/__pycache__/wav2vec2bert_spec.cpython-312.pyc,,
+ctranslate2/specs/__pycache__/whisper_spec.cpython-312.pyc,,
+ctranslate2/specs/attention_spec.py,sha256=P0Xr5ngG9sgzMxZCZVEbZW3tFnxHhdzI73Uy6-S8LJQ,3346
+ctranslate2/specs/common_spec.py,sha256=7UuQ3YsO-fLAhwFuWl41VbNLQLzNlYTfjrZSgF_4gDk,1542
+ctranslate2/specs/model_spec.py,sha256=Vl3Edm8-Oej3i6mUVE6Rda46e553GSi2A_U5UomnU8s,24969
+ctranslate2/specs/transformer_spec.py,sha256=7iFAqZUS0BkSVKVnW8A4fwY2QkJyNUvcjInmsk9aEX0,29646
+ctranslate2/specs/wav2vec2_spec.py,sha256=D6D-N_fbsvOEVe802jS03pd6pt5bOi71yg_1Mfd_hjc,2034
+ctranslate2/specs/wav2vec2bert_spec.py,sha256=NKTC8nNdxMeTBW0Y_EKUNrDpKhPwL67P1l3dwmT-ayk,3432
+ctranslate2/specs/whisper_spec.py,sha256=vxxrAsHYGItYC2K9Yw2geFEUbXmeGtcqFYXDMqca0lA,2370
+ctranslate2/version.py,sha256=1bGLETTvfrp9sPLwX_oI7KiWLF21UU9i1631AFLMb9k,50
--- a/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/WHEEL
+++ b/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/WHEEL
@@ -0,0 +1,6 @@
+Wheel-Version: 1.0
+Generator: setuptools (80.9.0)
+Root-Is-Purelib: false
+Tag: cp312-cp312-manylinux_2_17_x86_64
+Tag: cp312-cp312-manylinux2014_x86_64
+
--- a/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/entry_points.txt
+++ b/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/entry_points.txt
@@ -0,0 +1,8 @@
+[console_scripts]
+ct2-fairseq-converter = ctranslate2.converters.fairseq:main
+ct2-marian-converter = ctranslate2.converters.marian:main
+ct2-openai-gpt2-converter = ctranslate2.converters.openai_gpt2:main
+ct2-opennmt-py-converter = ctranslate2.converters.opennmt_py:main
+ct2-opennmt-tf-converter = ctranslate2.converters.opennmt_tf:main
+ct2-opus-mt-converter = ctranslate2.converters.opus_mt:main
+ct2-transformers-converter = ctranslate2.converters.transformers:main
--- a/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/top_level.txt
+++ b/venv/lib/python3.12/site-packages/ctranslate2-4.6.2.dist-info/top_level.txt
@@ -0,0 +1 @@
+ctranslate2