LLM Parameters
LLM Parameters are settings that controls and optimize Large Language Model's output and behavior.
Weights
Weights are numerical values that represent the importance that the LLM assigns to a specific input. Not all inputs are treated equally by the artificial intelligence (AI) model when generating responses. The higher an input’s weight, the more relevant it is to the model’s output.
Biases
Like weights, biases are also configured automatically during AI model training. Biases are constant values added to a signal’s value from the previous layers. Models use biases to allow neurons to activate when the weights alone might not be sufficient to pass through the activation function.
Hyperparameters
Hyperparameters are external settings that determine a model’s behavior, shape, size, resource use and other characteristics.
Types of Hyperparameters
- Architecture hyperparameters, such as the number of layers and the dimensionality of the hidden layers, determine a model’s size and shape.
- Training hyperparameters, such as the learning rate and batch size, guide the model’s training process. Training hyperparameters strongly affect model performance and whether a model meets the required LLM benchmarks.
- Inference hyperparameters, such as temperature and top-p sampling, decide how a generative AI model produces its outputs.
- Memory and compute hyperparameters, such as the context window, the maximum number of tokens in an output sequence, and the number of stop sequences, balance model performance and capabilities with resource requirements.
- Hyperparameters for output quality, such as presence and frequency penalties, help LLMs generate more varied and interesting outputs while controlling costs.
We will discuss some frequently used hyperparameters in the next article.
References / Resources
What are LLM parameters? - IBM