PyGPT4All API Reference

pygpt4all.models.gpt4all

GPT4ALL with llama.cpp backend through pyllamacpp

GPT4All

GPT4All(
    model_path,
    prompt_context="",
    prompt_prefix="",
    prompt_suffix="",
    log_level=logging.ERROR,
    n_ctx=512,
    seed=0,
    n_parts=-1,
    f16_kv=False,
    logits_all=False,
    vocab_only=False,
    use_mlock=False,
    embedding=False,
)

Bases: pyllamacpp.model.Model

GPT4All model

Base: pyllamacpp.model.Model

Example usage

from pygpt4all.models.gpt4all import GPT4All

model = GPT4All('path/to/gpt4all/model')
for token in model.generate("Tell me a joke ?"):
    print(token, end='', flush=True)

Parameters:

Name	Type	Description	Default
`model_path`	`str`	the path to the gpt4all model	required
`prompt_context`	`str`	the global context of the interaction	`''`
`prompt_prefix`	`str`	the prompt prefix	`''`
`prompt_suffix`	`str`	the prompt suffix	`''`
`log_level`	`int`	logging level, set to ERROR by default	`logging.ERROR`
`n_ctx`	`int`	LLaMA context	`512`
`seed`	`int`	random seed	`0`
`n_parts`	`int`	LLaMA n_parts	`-1`
`f16_kv`	`bool`	use fp16 for KV cache	`False`
`logits_all`	`bool`	the llama_eval() call computes all logits, not just the last one	`False`
`vocab_only`	`bool`	only load the vocabulary, no weights	`False`
`use_mlock`	`bool`	force system to keep model in RAM	`False`
`embedding`	`bool`	embedding mode only	`False`

Source code in pygpt4all/models/gpt4all.py

def __init__(self,
             model_path: str,
             prompt_context: str = '',
             prompt_prefix: str = '',
             prompt_suffix: str = '',
             log_level: int = logging.ERROR,
             n_ctx: int = 512,
             seed: int = 0,
             n_parts: int = -1,
             f16_kv: bool = False,
             logits_all: bool = False,
             vocab_only: bool = False,
             use_mlock: bool = False,
             embedding: bool = False):
    """
    :param model_path: the path to the gpt4all model
    :param prompt_context: the global context of the interaction
    :param prompt_prefix: the prompt prefix
    :param prompt_suffix: the prompt suffix
    :param log_level: logging level, set to ERROR by default
    :param n_ctx: LLaMA context
    :param seed: random seed
    :param n_parts: LLaMA n_parts
    :param f16_kv: use fp16 for KV cache
    :param logits_all: the llama_eval() call computes all logits, not just the last one
    :param vocab_only: only load the vocabulary, no weights
    :param use_mlock: force system to keep model in RAM
    :param embedding: embedding mode only
    """
    # set logging level
    set_log_level(log_level)
    super(GPT4All, self).__init__(model_path=model_path,
                                  prompt_context=prompt_context,
                                  prompt_prefix=prompt_prefix,
                                  prompt_suffix=prompt_suffix,
                                  log_level=log_level,
                                  n_ctx=n_ctx,
                                  seed=seed,
                                  n_parts=n_parts,
                                  f16_kv=f16_kv,
                                  logits_all=logits_all,
                                  vocab_only=vocab_only,
                                  use_mlock=use_mlock,
                                  embedding=embedding)

pygpt4all.models.gpt4all_j

GPT4ALL with ggml backend

GPT4All_J

GPT4All_J(
    model_path,
    prompt_context="",
    prompt_prefix="",
    prompt_suffix="",
    log_level=logging.ERROR,
)

Bases: pygptj.model.Model

GPT4ALL-J model

Example usage

from pygpt4all.models.gpt4all_j import GPT4All_J

model = GPT4All_J('.path/to/gpr4all-j/model')
for token in model.generate("Tell me a joke ?"):
    print(token, end='', flush=True)

Parameters:

Name	Type	Description	Default
`model_path`	`str`	The path to a gpt4all-j model	required
`prompt_context`	`str`	the global context of the interaction	`''`
`prompt_prefix`	`str`	the prompt prefix	`''`
`prompt_suffix`	`str`	the prompt suffix	`''`
`log_level`	`int`	logging level, set to ERROR by default	`logging.ERROR`

Source code in pygpt4all/models/gpt4all_j.py

def __init__(self,
             model_path: str,
             prompt_context: str = '',
             prompt_prefix: str = '',
             prompt_suffix: str = '',
             log_level: int = logging.ERROR):
    """
    :param model_path: The path to a gpt4all-j model
    :param prompt_context: the global context of the interaction
    :param prompt_prefix: the prompt prefix
    :param prompt_suffix: the prompt suffix
    :param log_level: logging level, set to ERROR by default
    """
    # set logging level
    set_log_level(log_level)
    super(GPT4All_J, self).__init__(model_path=model_path,
                                    prompt_context=prompt_context,
                                    prompt_prefix=prompt_prefix,
                                    prompt_suffix=prompt_suffix,
                                    log_level=log_level)

Bases

pyllamacpp.model

This module contains a simple Python API around llama.cpp

Model

Model(
    model_path,
    prompt_context="",
    prompt_prefix="",
    prompt_suffix="",
    log_level=logging.ERROR,
    n_ctx=512,
    seed=0,
    n_parts=-1,
    f16_kv=False,
    logits_all=False,
    vocab_only=False,
    use_mlock=False,
    embedding=False,
)

A simple Python class on top of llama.cpp

Example usage

from pyllamacpp.model import Model

model = Model(ggml_model='path/to/ggml/model')
for token in model.generate("Tell me a joke ?"):
    print(token, end='', flush=True)

Parameters:

Name	Type	Description	Default
`model_path`	`str`	the path to the ggml model	required
`prompt_context`	`str`	the global context of the interaction	`''`
`prompt_prefix`	`str`	the prompt prefix	`''`
`prompt_suffix`	`str`	the prompt suffix	`''`
`log_level`	`int`	logging level, set to INFO by default	`logging.ERROR`
`n_ctx`	`int`	LLaMA context	`512`
`seed`	`int`	random seed	`0`
`n_parts`	`int`	LLaMA n_parts	`-1`
`f16_kv`	`bool`	use fp16 for KV cache	`False`
`logits_all`	`bool`	the llama_eval() call computes all logits, not just the last one	`False`
`vocab_only`	`bool`	only load the vocabulary, no weights	`False`
`use_mlock`	`bool`	force system to keep model in RAM	`False`
`embedding`	`bool`	embedding mode only	`False`

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pyllamacpp/model.py

def __init__(self,
             model_path: str,
             prompt_context: str = '',
             prompt_prefix: str = '',
             prompt_suffix: str = '',
             log_level: int = logging.ERROR,
             n_ctx: int = 512,
             seed: int = 0,
             n_parts: int = -1,
             f16_kv: bool = False,
             logits_all: bool = False,
             vocab_only: bool = False,
             use_mlock: bool = False,
             embedding: bool = False):
    """
    :param model_path: the path to the ggml model
    :param prompt_context: the global context of the interaction
    :param prompt_prefix: the prompt prefix
    :param prompt_suffix: the prompt suffix
    :param log_level: logging level, set to INFO by default
    :param n_ctx: LLaMA context
    :param seed: random seed
    :param n_parts: LLaMA n_parts
    :param f16_kv: use fp16 for KV cache
    :param logits_all: the llama_eval() call computes all logits, not just the last one
    :param vocab_only: only load the vocabulary, no weights
    :param use_mlock: force system to keep model in RAM
    :param embedding: embedding mode only
    """

    # set logging level
    set_log_level(log_level)
    self._ctx = None

    if not Path(model_path).is_file():
        raise Exception(f"File {model_path} not found!")

    self.llama_params = pp.llama_context_default_params()
    # update llama_params
    self.llama_params.n_ctx = n_ctx
    self.llama_params.seed = seed
    self.llama_params.n_parts = n_parts
    self.llama_params.f16_kv = f16_kv
    self.llama_params.logits_all = logits_all
    self.llama_params.vocab_only = vocab_only
    self.llama_params.use_mlock = use_mlock
    self.llama_params.embedding = embedding

    self._ctx = pp.llama_init_from_file(model_path, self.llama_params)

    # gpt params
    self.gpt_params = pp.gpt_params()

    self.res = ""

    self._n_ctx = pp.llama_n_ctx(self._ctx)
    self._last_n_tokens = [0] * self._n_ctx  # n_ctx elements
    self._n_past = 0
    self.prompt_cntext = prompt_context
    self.prompt_prefix = prompt_prefix
    self.prompt_suffix = prompt_suffix

    self._prompt_context_tokens = []
    self._prompt_prefix_tokens = []
    self._prompt_suffix_tokens = []

    self.reset()

reset

reset()

Resets the context

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pyllamacpp/model.py

def reset(self) -> None:
    """Resets the context"""
    self._prompt_context_tokens = pp.llama_tokenize(self._ctx, self.prompt_cntext, True)
    self._prompt_prefix_tokens = pp.llama_tokenize(self._ctx, self.prompt_prefix, True)
    self._prompt_suffix_tokens = pp.llama_tokenize(self._ctx, self.prompt_suffix, True)
    self._last_n_tokens = [0] * self._n_ctx  # n_ctx elements
    self._n_past = 0

tokenize

tokenize(text)

Returns a list of tokens for the text

Parameters:

Name	Type	Description	Default
`text`	`str`	text to be tokenized	required

Returns:

Type	Description
	List of tokens

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pyllamacpp/model.py

def tokenize(self, text:str):
    """
    Returns a list of tokens for the text
    :param text: text to be tokenized
    :return: List of tokens
    """
    return pp.llama_tokenize(self._ctx, text, True)

detokenize

detokenize(tokens)

Returns a list of tokens for the text

Parameters:

Name	Type	Description	Default
`text`		text to be tokenized	required

Returns:

Type	Description
	A string representing the text extracted from the tokens

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pyllamacpp/model.py

def detokenize(self, tokens:list):
    """
    Returns a list of tokens for the text
    :param text: text to be tokenized
    :return: A string representing the text extracted from the tokens
    """
    return pp.llama_tokens_to_str(self._ctx, tokens)

generate

generate(
    prompt,
    n_predict=None,
    antiprompt=None,
    infinite_generation=False,
    n_threads=4,
    repeat_last_n=64,
    top_k=40,
    top_p=0.95,
    temp=0.8,
    repeat_penalty=1.1,
)

Runs llama.cpp inference and yields new predicted tokens from the prompt provided as input

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The prompt :)	required
`n_predict`	`Union[None, int]`	if n_predict is not None, the inference will stop if it reaches `n_predict` tokens, otherwise it will continue until `EOS`	`None`
`antiprompt`	`str`	aka the stop word, the generation will stop if this word is predicted, keep it None to handle it in your own way	`None`
`infinite_generation`	`bool`	set it to `True` to make the generation go infinitely	`False`
`n_threads`	`int`	The number of CPU threads	`4`
`repeat_last_n`	`int`	last n tokens to penalize	`64`
`top_k`	`int`	top K sampling parameter	`40`
`top_p`	`float`	top P sampling parameter	`0.95`
`temp`	`float`	temperature	`0.8`
`repeat_penalty`	`float`	repeat penalty sampling parameter	`1.1`

Returns:

Type	Description
`Generator`	Tokens generator

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pyllamacpp/model.py

def generate(self,
             prompt: str,
             n_predict: Union[None, int] = None,
             antiprompt: str = None,
             infinite_generation: bool = False,
             n_threads: int = 4,
             repeat_last_n: int = 64,
             top_k: int = 40,
             top_p: float = 0.95,
             temp: float = 0.8,
             repeat_penalty: float = 1.10) -> Generator:
    """
    Runs llama.cpp inference and yields new predicted tokens from the prompt provided as input

    :param prompt: The prompt :)
    :param n_predict: if n_predict is not None, the inference will stop if it reaches `n_predict` tokens, otherwise
                      it will continue until `EOS`
    :param antiprompt: aka the stop word, the generation will stop if this word is predicted,
                       keep it None to handle it in your own way
    :param infinite_generation: set it to `True` to make the generation go infinitely
    :param n_threads: The number of CPU threads
    :param repeat_last_n: last n tokens to penalize
    :param top_k: top K sampling parameter
    :param top_p: top P sampling parameter
    :param temp: temperature
    :param repeat_penalty: repeat penalty sampling parameter
    :return: Tokens generator
    """
    input_tokens = self._prompt_prefix_tokens + pp.llama_tokenize(self._ctx, prompt,
                                                                  True) + self._prompt_suffix_tokens
    if len(input_tokens) > self._n_ctx - 4:
        raise Exception('Prompt too long!')
    predicted_tokens = []
    predicted_token = 0

    # add global context for the first time
    if self._n_past == 0:
        for tok in self._prompt_context_tokens:
            predicted_tokens.append(tok)
            self._last_n_tokens.pop(0)
            self._last_n_tokens.append(tok)

    # consume input tokens
    for tok in input_tokens:
        predicted_tokens.append(tok)
        self._last_n_tokens.pop(0)
        self._last_n_tokens.append(tok)

    n_remain = 0
    if antiprompt is not None:
        sequence_queue = []
        stop_word = antiprompt.strip()

    while infinite_generation or predicted_token != pp.llama_token_eos():
        if len(predicted_tokens) > 0:
            if (pp.llama_eval(self._ctx,
                              predicted_tokens,
                              len(predicted_tokens),
                              self._n_past,
                              n_threads)):
                raise Exception("failed to eval the model!")
            self._n_past += len(predicted_tokens)
            predicted_tokens.clear()

        predicted_token = pp.llama_sample_top_p_top_k(self._ctx,
                                                      self._last_n_tokens[self._n_ctx - repeat_last_n:],
                                                      repeat_last_n,
                                                      top_k,
                                                      top_p,
                                                      temp,
                                                      repeat_penalty)

        predicted_tokens.append(predicted_token)
        # tokens come as raw undecoded bytes,
        # and we decode them, replacing those that can't be decoded.
        # i decoded here for fear of breaking the stopword logic, 
        token_str = pp.llama_token_to_str(self._ctx, predicted_token).decode('utf-8', "replace")
        if antiprompt is not None:
            if token_str == '\n':
                sequence_queue.append(token_str)
                continue
            if len(sequence_queue) != 0:
                if stop_word.startswith(''.join(sequence_queue).strip()):
                    sequence_queue.append(token_str)
                    if ''.join(sequence_queue).strip() == stop_word:
                        break
                    else:
                        continue
                else:
                    # consume sequence queue tokens
                    while len(sequence_queue) != 0:
                        yield sequence_queue.pop(0)
                    sequence_queue = []
        self._last_n_tokens.pop(0)
        self._last_n_tokens.append(predicted_token)
        yield token_str
        if n_predict is not None:
            if n_remain == n_predict:
                break
            else:
                n_remain += 1

cpp_generate

cpp_generate(
    prompt,
    n_predict=128,
    new_text_callback=None,
    n_threads=4,
    repeat_last_n=64,
    top_k=40,
    top_p=0.95,
    temp=0.8,
    repeat_penalty=1.1,
    n_batch=8,
    n_keep=0,
    interactive=False,
    antiprompt=[],
    ignore_eos=False,
    instruct=False,
    verbose_prompt=False,
)

The generate function from llama.cpp

Parameters:

Name	Type	Description	Default
`prompt`	`str`	the prompt	required
`n_predict`	`int`	number of tokens to generate	`128`
`new_text_callback`	`Callable[[bytes], None]`	a callback function called when new text is generated, default `None`	`None`
`n_threads`	`int`	The number of CPU threads	`4`
`repeat_last_n`	`int`	last n tokens to penalize	`64`
`top_k`	`int`	top K sampling parameter	`40`
`top_p`	`float`	top P sampling parameter	`0.95`
`temp`	`float`	temperature	`0.8`
`repeat_penalty`	`float`	repeat penalty sampling parameter	`1.1`
`n_batch`	`int`	GPT params n_batch	`8`
`n_keep`	`int`	GPT params n_keep	`0`
`interactive`	`bool`	interactive communication	`False`
`antiprompt`	`List`	list of anti prompts	`[]`
`ignore_eos`	`bool`	Ignore LLaMA EOS	`False`
`instruct`	`bool`	Activate instruct mode	`False`
`verbose_prompt`	`bool`	verbose prompt	`False`

Returns:

Type	Description
`str`	the new generated text

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pyllamacpp/model.py

def cpp_generate(self, prompt: str,
                 n_predict: int = 128,
                 new_text_callback: Callable[[bytes], None] = None,
                 n_threads: int = 4,
                 repeat_last_n: int = 64,
                 top_k: int = 40,
                 top_p: float = 0.95,
                 temp: float = 0.8,
                 repeat_penalty: float = 1.10,
                 n_batch: int = 8,
                 n_keep: int = 0,
                 interactive: bool = False,
                 antiprompt: List = [],
                 ignore_eos: bool = False,
                 instruct: bool = False,
                 verbose_prompt: bool = False,
                 ) -> str:
    """
    The generate function from `llama.cpp`

    :param prompt: the prompt
    :param n_predict: number of tokens to generate
    :param new_text_callback: a callback function called when new text is generated, default `None`
    :param n_threads: The number of CPU threads
    :param repeat_last_n: last n tokens to penalize
    :param top_k: top K sampling parameter
    :param top_p: top P sampling parameter
    :param temp: temperature
    :param repeat_penalty: repeat penalty sampling parameter
    :param n_batch: GPT params n_batch
    :param n_keep: GPT params n_keep
    :param interactive: interactive communication
    :param antiprompt: list of anti prompts
    :param ignore_eos: Ignore LLaMA EOS
    :param instruct: Activate instruct mode
    :param verbose_prompt: verbose prompt
    :return: the new generated text
    """
    self.gpt_params.prompt = prompt
    self.gpt_params.n_predict = n_predict
    # update other params if any
    self.gpt_params.n_threads = n_threads
    self.gpt_params.repeat_last_n = repeat_last_n
    self.gpt_params.top_k = top_k
    self.gpt_params.top_p = top_p
    self.gpt_params.temp = temp
    self.gpt_params.repeat_penalty = repeat_penalty
    self.gpt_params.n_batch = n_batch
    self.gpt_params.n_keep = n_keep
    self.gpt_params.interactive = interactive
    self.gpt_params.antiprompt = antiprompt
    self.gpt_params.ignore_eos = ignore_eos
    self.gpt_params.instruct = instruct
    self.gpt_params.verbose_prompt = verbose_prompt

    # assign new_text_callback
    self.res = ""
    Model._new_text_callback = new_text_callback

    # run the prediction
    pp.llama_generate(self._ctx, self.gpt_params, self._call_new_text_callback)
    return self.res

get_params `staticmethod`

get_params(params)

Returns a dict representation of the params

Returns:

Type	Description
`dict`	params dict

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pyllamacpp/model.py

@staticmethod
def get_params(params) -> dict:
    """
    Returns a `dict` representation of the params
    :return: params dict
    """
    res = {}
    for param in dir(params):
        if param.startswith('__'):
            continue
        res[param] = getattr(params, param)
    return res

pygptj.model

This module contains a simple Python API around gpt-j

Model

Model(
    model_path,
    prompt_context="",
    prompt_prefix="",
    prompt_suffix="",
    log_level=logging.ERROR,
)

GPT-J model

Example usage

from pygptj.model import Model

model = Model(ggml_model='path/to/ggml/model')
for token in model.generate("Tell me a joke ?"):
    print(token, end='', flush=True)

Parameters:

Name	Type	Description	Default
`model_path`	`str`	The path to a gpt-j `ggml` model	required
`prompt_context`	`str`	the global context of the interaction	`''`
`prompt_prefix`	`str`	the prompt prefix	`''`
`prompt_suffix`	`str`	the prompt suffix	`''`
`log_level`	`int`	logging level	`logging.ERROR`

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pygptj/model.py

def __init__(self,
             model_path: str,
             prompt_context: str = '',
             prompt_prefix: str = '',
             prompt_suffix: str = '',
             log_level: int = logging.ERROR):
    """
    :param model_path: The path to a gpt-j `ggml` model
    :param prompt_context: the global context of the interaction
    :param prompt_prefix: the prompt prefix
    :param prompt_suffix: the prompt suffix
    :param log_level: logging level
    """
    # set logging level
    set_log_level(log_level)
    self._ctx = None

    if not Path(model_path).is_file():
        raise Exception(f"File {model_path} not found!")

    self.model_path = model_path

    self._model = pp.gptj_model()
    self._vocab = pp.gpt_vocab()

    # load model
    self._load_model()

    # gpt params
    self.gpt_params = pp.gptj_gpt_params()
    self.hparams = pp.gptj_hparams()

    self.res = ""

    self.logits = []

    self._n_past = 0
    self.prompt_cntext = prompt_context
    self.prompt_prefix = prompt_prefix
    self.prompt_suffix = prompt_suffix

    self._prompt_context_tokens = []
    self._prompt_prefix_tokens = []
    self._prompt_suffix_tokens = []

    self.reset()

generate

generate(
    prompt,
    n_predict=None,
    antiprompt=None,
    seed=None,
    n_threads=4,
    top_k=40,
    top_p=0.9,
    temp=0.9,
)

Runs GPT-J inference and yields new predicted tokens

Parameters:

Name	Type	Description	Default
`prompt`	`str`	The prompt :)	required
`n_predict`	`Union[None, int]`	if n_predict is not None, the inference will stop if it reaches `n_predict` tokens, otherwise it will continue until `end of text` token	`None`
`antiprompt`	`str`	aka the stop word, the generation will stop if this word is predicted, keep it None to handle it in your own way	`None`
`seed`	`int`	random seed	`None`
`n_threads`	`int`	The number of CPU threads	`4`
`top_k`	`int`	top K sampling parameter	`40`
`top_p`	`float`	top P sampling parameter	`0.9`
`temp`	`float`	temperature	`0.9`

Returns:

Type	Description
`Generator`	Tokens generator

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pygptj/model.py

def generate(self,
             prompt: str,
             n_predict: Union[None, int] = None,
             antiprompt: str = None,
             seed: int = None,
             n_threads: int = 4,
             top_k: int = 40,
             top_p: float = 0.9,
             temp: float = 0.9,
             ) -> Generator:
    """
     Runs GPT-J inference and yields new predicted tokens

    :param prompt: The prompt :)
    :param n_predict: if n_predict is not None, the inference will stop if it reaches `n_predict` tokens, otherwise
                      it will continue until `end of text` token
    :param antiprompt: aka the stop word, the generation will stop if this word is predicted,
                       keep it None to handle it in your own way
    :param seed: random seed
    :param n_threads: The number of CPU threads
    :param top_k: top K sampling parameter
    :param top_p: top P sampling parameter
    :param temp: temperature

    :return: Tokens generator
    """
    if seed is None or seed < 0:
        seed = int(time.time())

    logging.info(f'seed = {seed}')

    if self._n_past == 0 or antiprompt is None:
        # add the prefix to the context
        embd_inp = self._prompt_prefix_tokens + pp.gpt_tokenize(self._vocab, prompt) + self._prompt_suffix_tokens
    else:
        # do not add the prefix again as it is already in the previous generated context
        embd_inp = pp.gpt_tokenize(self._vocab, prompt) + self._prompt_suffix_tokens

    if n_predict is not None:
        n_predict = min(n_predict, self.hparams.n_ctx - len(embd_inp))
    logging.info(f'Number of tokens in prompt = {len(embd_inp)}')

    embd = []
    # add global context for the first time
    if self._n_past == 0:
        for tok in self._prompt_context_tokens:
            embd.append(tok)

    # consume input tokens
    for tok in embd_inp:
        embd.append(tok)

    # determine the required inference memory per token:
    mem_per_token = 0
    logits, mem_per_token = pp.gptj_eval(self._model, n_threads, 0, [0, 1, 2, 3], mem_per_token)

    i = len(embd) - 1
    id = 0
    if antiprompt is not None:
        sequence_queue = []
        stop_word = antiprompt.strip()

    while id != 50256:  # end of text token
        if n_predict is not None:  # break the generation if n_predict
            if i >= (len(embd_inp) + n_predict):
                break
        i += 1
        # predict
        if len(embd) > 0:
            try:
                logits, mem_per_token = pp.gptj_eval(self._model, n_threads, self._n_past, embd, mem_per_token)
                self.logits.append(logits)
            except Exception as e:
                print(f"Failed to predict\n {e}")
                return

        self._n_past += len(embd)
        embd.clear()

        if i >= len(embd_inp):
            # sample next token
            n_vocab = self.hparams.n_vocab
            t_start_sample_us = int(round(time.time() * 1000000))
            id = pp.gpt_sample_top_k_top_p(self._vocab, logits[-n_vocab:], top_k, top_p, temp, seed)
            if id == 50256:  # end of text token
                break
            # add the token to the context
            embd.append(id)
            token = self._vocab.id_to_token[id]
            # antiprompt
            if antiprompt is not None:
                if token == '\n':
                    sequence_queue.append(token)
                    continue
                if len(sequence_queue) != 0:
                    if stop_word.startswith(''.join(sequence_queue).strip()):
                        sequence_queue.append(token)
                        if ''.join(sequence_queue).strip() == stop_word:
                            break
                        else:
                            continue
                    else:
                        # consume sequence queue tokens
                        while len(sequence_queue) != 0:
                            yield sequence_queue.pop(0)
                        sequence_queue = []

            yield token

cpp_generate

cpp_generate(
    prompt,
    new_text_callback=None,
    logits_callback=None,
    n_predict=128,
    seed=-1,
    n_threads=4,
    top_k=40,
    top_p=0.9,
    temp=0.9,
    n_batch=8,
)

Runs the inference to cpp generate function

Parameters:

Name	Type	Description	Default
`prompt`	`str`	the prompt	required
`new_text_callback`	`Callable[[str], None]`	a callback function called when new text is generated, default `None`	`None`
`logits_callback`	`Callable[[np.ndarray], None]`	a callback function to access the logits on every inference	`None`
`n_predict`	`int`	number of tokens to generate	`128`
`seed`	`int`	The random seed	`-1`
`n_threads`	`int`	Number of threads	`4`
`top_k`	`int`	top_k sampling parameter	`40`
`top_p`	`float`	top_p sampling parameter	`0.9`
`temp`	`float`	temperature sampling parameter	`0.9`
`n_batch`	`int`	batch size for prompt processing	`8`

Returns:

Type	Description
`str`	the new generated text

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pygptj/model.py

def cpp_generate(self,
                 prompt: str,
                 new_text_callback: Callable[[str], None] = None,
                 logits_callback: Callable[[np.ndarray], None] = None,
                 n_predict: int = 128,
                 seed: int = -1,
                 n_threads: int = 4,
                 top_k: int = 40,
                 top_p: float = 0.9,
                 temp: float = 0.9,
                 n_batch: int = 8,
                 ) -> str:
    """
    Runs the inference to cpp generate function

    :param prompt: the prompt
    :param new_text_callback: a callback function called when new text is generated, default `None`
    :param logits_callback: a callback function to access the logits on every inference
    :param n_predict: number of tokens to generate
    :param seed: The random seed
    :param n_threads: Number of threads
    :param top_k: top_k sampling parameter
    :param top_p: top_p sampling parameter
    :param temp: temperature sampling parameter
    :param n_batch: batch size for prompt processing

    :return: the new generated text
    """
    self.gpt_params.prompt = prompt
    self.gpt_params.n_predict = n_predict
    self.gpt_params.seed = seed
    self.gpt_params.n_threads = n_threads
    self.gpt_params.top_k = top_k
    self.gpt_params.top_p = top_p
    self.gpt_params.temp = temp
    self.gpt_params.n_batch = n_batch

    # assign new_text_callback
    self.res = ""
    Model._new_text_callback = new_text_callback

    # assign _logits_callback used for saving logits, token by token
    Model._logits_callback = logits_callback

    # run the prediction
    pp.gptj_generate(self.gpt_params, self._model, self._vocab, self._call_new_text_callback,
                     self._call_logits_callback)
    return self.res

braindump

braindump(path)

Dumps the logits to .npy

Parameters:

Name	Type	Description	Default
`path`	`str`	Output path	required

Returns:

Type	Description
`None`	None

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pygptj/model.py

def braindump(self, path: str) -> None:
    """
    Dumps the logits to .npy
    :param path: Output path
    :return: None
    """
    np.save(path, np.asarray(self.logits))

reset

reset()

Resets the context

Returns:

Type	Description
`None`	None

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pygptj/model.py

def reset(self) -> None:
    """
    Resets the context
    :return: None
    """
    self._n_past = 0
    self._prompt_context_tokens = pp.gpt_tokenize(self._vocab, self.prompt_cntext)
    self._prompt_prefix_tokens = pp.gpt_tokenize(self._vocab, self.prompt_prefix)
    self._prompt_suffix_tokens = pp.gpt_tokenize(self._vocab, self.prompt_suffix)

get_params `staticmethod`

get_params(params)

Returns a dict representation of the params

Returns:

Type	Description
`dict`	params dict

Source code in /opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pygptj/model.py

@staticmethod
def get_params(params) -> dict:
    """
    Returns a `dict` representation of the params
    :return: params dict
    """
    res = {}
    for param in dir(params):
        if param.startswith('__'):
            continue
        res[param] = getattr(params, param)
    return res

PyGPT4All API Reference

pygpt4all.models.gpt4all

GPT4All

pygpt4all.models.gpt4all_j

GPT4All_J

Bases

pyllamacpp.model

Model

reset

tokenize

detokenize

generate

cpp_generate

get_params staticmethod

pygptj.model

Model

generate

cpp_generate

braindump

reset

get_params staticmethod

get_params `staticmethod`

get_params `staticmethod`