Post-Training Palettization
The palettize_weights
function discretizes the values of all weights in the ML program and constructs the LUT according to the algorithm you specify as mode
in the OpPalettizerConfig
. The float values are then converted to nbit values, and the LUT is saved along side each weight. The const
ops that were storing weight values are replaced by constexpr_lut_to_dense
ops.
The following example shows how to palettize the weights of a Core ML model:
from coremltools.optimize.coreml import (
OpPalettizerConfig,
OptimizationConfig,
palettize_weights,
)
op_config = OpPalettizerConfig(mode="kmeans", nbits=6, weight_threshold=512)
config = OptimizationConfig(global_config=op_config)
compressed_6_bit_model = palettize_weights(model, config=config)
Specify how the LUT is constructed by choosing one of the following as the mode
:
"kmeans"
(default) : The LUT is generated by k-means clustering, with number of clusters set to2^nbits
.nbits
can be one of1, 2, 4, 6, 8
."uniform"
: The LUT is generated by computing uniformly spaced intervals between the minimum and maximum values in the weight tensor."unique"
: In this mode,np.unique
is applied to the weight values, and if 256 or less number of unique values are found, they are converted into lookup table form. Nothing is done if there are more than 256 uniques values.
The weight_threshold
parameter specifies the minimum number of elements that the weight tensor must have for palettization to take place. In the previous code sample, since weight_threshold=512
was specified, all the weight tensors that have less than 512
elements will be left untouched, while the tensors of size greater than 512
will be palettized.
For options on how to set different pruning configs for different weights in the same network, see Customizing Ops to Compress.
For more details on the parameters available in the config, see the following in the API Reference:
Post-Training Palettization Works Well for
nbits = 6, 8
Results are model and task dependent, but in most cases, palettizing with
optimize.coreml.palettize_weights
preserves the accuracy to a good degree for 6-bit or 8-bit settings. With lower settings, you will likely see a sharp drop in accuracy, in which case consider using Training-Time Palettization with nbits = 2, 4.
Updated 4 months ago