optimize.torch API Overview
Use coremltools.optimize.torch
(training-time compression) to train your model in a compression-aware fashion, or start from a pre-trained float
precision model and fine-tune it with training data.
API Format and Interface
All model optimizers use a similar API format:
import coremltools.optimize.torch as cto
pruner = cto.pruning.MagnitudePruner(model, config)
quantizer = cto.quantization.LinearQuantizer(model, config)
palettizer = cto.palettization.DKMPalettizer(model, config)
model
is thetorch.nn.Module
instance you want to optimize.config
specifies how the model will be configured for optimization. These configuration objects share the same API among different optimization techniques.
All model optimizers also provide the same interface for optimizing the models:
prepare
: Insert model optimization layers in the model.step
: Step through the optimization schedule.report
: Create a report with information about current state of the optimization such as the current sparsity of a layer.finalize
: Create model weights from learned optimization parameters, and make the model ready for export usingcoremltools.convert
.
The following set of examples show how these optimizers can be created and integrated in PyTorch code.
Creating Configurations
To initialize a model optimizer such as MagnitudePruner
, create it using a YAML file or programmatically.
Create a YAML File
Create a YAML configuration file (config.yaml
) describing how different modules are to be configured. The file can contain up to three sections:
global_config
: Used to specify a configuration that is applied globally, on all supported module types (torch.nn.Conv2d
,torch.nn.Linear
,torch.nn.ConvTranspose2d
, and so on).module_type_configs
: Used to specify a common configuration for all modules of the same type (e.g.torch.nn.Conv2d
).module_name_configs
: In atorch.nn.Module
, all submodules have a unique name. Using this option you can set a particular submodule's configuration.
Example 1: Configure Globally
The following sample config.yaml
file configures parameters of the pruner globally. These parameters are used to configure all supported modules:
global_config:
scheduler:
update_steps: [100, 200, 300, 500]
target_sparsity: 0.8
Example 2: Configure More Granularly
The following sample config.yaml
file configures parameters of the pruner on a more granular level, using different sparsity types (block
and n:m
) for convolution and linear layers, respectively. The configuration also sets module2.linear
to null
in order to leave it alone and not prune it.
module_type_configs:
Linear:
scheduler:
update_steps: [100, 200, 300, 500]
n_m_ratio: [3, 4]
Conv2d:
scheduler:
update_steps: [100, 200, 300, 500]
target_sparsity: 0.5
block_size: 2
module_name_configs:
module2.conv1:
scheduler:
update_steps: [100, 200, 300, 500]
target_sparsity: 0.75
module2.linear: null
Using the Configuration
The following example shows how to use the config.yaml
configuration file with a MagnitudePruner
:
import torch
import coremltools as ct
from coremltools.optimize.torch.pruning import MagnitudePruner, MagnitudePrunerConfig
model, loss_fn, optimizer = create_model_and_optimizer()
data = create_data()
# Initialize pruner and configure it
config = MagnitudePrunerConfig.from_yaml("config.yaml")
pruner = MagnitudePruner(model, config)
# Insert pruning layers in the model
model = pruner.prepare()
for inputs, labels in data:
output = model(inputs)
loss = loss_fn(output, labels)
loss.backward()
optimizer.step()
pruner.step()
# Commit pruning masks to model parameters
pruner.finalize(inplace=True)
# Export
example_input = torch.rand(1, 3, 224, 224)
traced_model = torch.jit.trace(model, example_input)
coreml_model = ct.convert(
traced_model,
inputs=[ct.TensorType(shape=example_input.shape)],
pass_pipeline=ct.PassPipeline.DEFAULT_PRUNING,
minimum_deployment_target=ct.target.iOS16,
)
coreml_model.save("~/pruned_model.mlpackage")
Initialize Programmatically
You may prefer to configure a model optimizer programmatically, which is useful especially if you have complex conditions for configuring the optimizer that are hard to express in a YAML file.
Programmatic Example 1
import torch
import coremltools as ct
from coremltools.optimize.torch.palettization import (
DKMPalettizer,
DKMPalettizerConfig,
ModuleDKMPalettizerConfig,
)
# code that defines the pytorch model, and optimizer
model, loss_fn, optimizer = create_model_and_optimizer()
data = create_data()
# Initialize the palettizer
config = DKMPalettizerConfig(
global_config=ModuleDKMPalettizerConfig(n_bits=4, cluster_dim=4)
)
palettizer = DKMPalettizer(model, config)
# Prepare the model to insert FakePalettize layers for palettization
model = palettizer.prepare(inplace=True)
# Use palettizer in the PyTorch training loop
for inputs, labels in data:
output = model(inputs)
loss = loss_fn(output, labels)
loss.backward()
optimizer.step()
palettizer.step()
# Fold LUT and indices into weights
model = palettizer.finalize(inplace=True)
# Export
example_input = torch.rand(1, 3, 224, 224)
traced_model = torch.jit.trace(model, example_input)
coreml_model = ct.convert(
traced_model,
inputs=[ct.TensorType(shape=example_input.shape)],
pass_pipeline=ct.PassPipeline.DEFAULT_PALETTIZATION,
minimum_deployment_target=ct.target.iOS16,
)
coreml_model.save("~/palettized_model.mlpackage")
Programmatic Example 2
If you want to quantize only the convolution modules that have a kernel size of 1, you can do so:
import torch
import coremltools as ct
from coremltools.optimize.torch.quantization import (
LinearQuantizer,
LinearQuantizerConfig,
ModuleLinearQuantizerConfig,
ObserverType,
QuantizationScheme,
)
model, loss_fn, optimizer = create_model_and_optimizer()
data = create_data()
# Initialize the quantizer
global_config = ModuleLinearQuantizerConfig(
quantization_scheme=QuantizationScheme.symmetric
)
config = LinearQuantizerConfig().set_global(global_config)
# We only want to quantize convolution layers which have a kernel size of 1 or all linear layers.
for name, m in model.named_modules():
if isinstance(m, torch.nn.Conv2d):
if m.kernel_size == (1, 1):
config = config.set_module_name(
name,
ModuleLinearQuantizerConfig(
weight_observer=ObserverType.mix_max, weight_per_channel=True
),
)
else:
config = config.set_module_name(name, None)
quantizer = LinearQuantizer(model, config)
# Prepare the model to insert FakeQuantize layers for QAT
example_input = torch.rand(1, 3, 224, 224)
model = quantizer.prepare(example_inputs=example_input, inplace=True)
# Use quantizer in your PyTorch training loop
for inputs, labels in data:
output = model(inputs)
loss = loss_fn(output, labels)
loss.backward()
optimizer.step()
quantizer.step()
# Convert operations to their quanitzed counterparts using parameters learnt via QAT
model = quantizer.finalize(inplace=True)
traced_model = torch.jit.trace(model, example_input)
coreml_model = ct.convert(
traced_model,
inputs=[ct.TensorType(shape=example_input.shape)],
minimum_deployment_target=ct.target.iOS17,
)
coreml_model.save("~/quantized_model.mlpackage")
Tutorials
- Magnitude Pruning Tutorial: Learn how to train a simple convolutional neural network using
MagnitudePruner
. - Palettization Using Differentiable K-Means Tutorial: Learn how to palettize a neural network using
DKMPalettizer
, which clusters the weights using a differentiable version ofk-means
, allowing the lookup table (LUT) and indices of palettized weights to be learned using a gradient-based optimization algorithm. - Linear Quantization Tutorial: Learn how to train a simple convolutional neural network using
LinearQuantizer
. This algorithm simulates the effects of quantization during training, by quantizing and dequantizing the weights and/or activations during the model’s forward pass.
From each tutorial you can download a Jupyter Notebook version and the source code.
Updated 4 months ago