coremltools

Use Core ML to integrate machine learning models into your app. Core ML provides a unified representation for all models. Your app uses Core ML APIs and user data to make predictions, and to train or fine-tune models, all on the user’s device.

Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption. Running a model strictly on the user’s device removes any need for a network connection, which helps keep the user’s data private and your app responsive.

This example will demonstrate how to construct a segmentation model that takes an image, and outputs a class prediction for each pixel of the image.

Converting a PyTorch Segmentation Model to Core ML

  1. Add the import statements:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
from PIL import Image
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms

import coremltools as ct
  1. Load a sample model. For this example, the TorchVision deeplabv3 segmentation model is loaded.

Note: This model is publicly available via TorchVision).

model = torch.hub.load('pytorch/vision:v0.6.0', 'deeplabv3_resnet101', pretrained=True).eval()
  1. Load in the sample image:
input_image = Image.open("dog_and_cat.jpg")
display(input_image)
  1. Apply normalization to the image, using the PASCAL VOC mean and standard deviation that were applied to the model's training data.

This will convert the sample to a form that works with the segmentation model, when testing the model's output.

preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0)
  1. Get predictions from the model. Running the normalized image through the model will compute a score for each object class per pixel, and the class will be assigned with a maximum score for each pixel.
with torch.no_grad():
    output = model(input_batch)['out'][0]
torch_predictions = output.argmax(0)
  1. Plot the predictions, overlayed with the original image:
def display_segmentation(input_image, output_predictions):
    # Create a color pallette, selecting a color for each class
    palette = torch.tensor([2 ** 25 - 1, 2 ** 15 - 1, 2 ** 21 - 1])
    colors = torch.as_tensor([i for i in range(21)])[:, None] * palette
    colors = (colors % 255).numpy().astype("uint8")

    # Plot the semantic segmentation predictions of 21 classes in each color
    r = Image.fromarray(
        output_predictions.byte().cpu().numpy()
    ).resize(input_image.size)
    r.putpalette(colors)

    # Overlay the segmentation mask on the original image
    alpha_image = input_image.copy()
    alpha_image.putalpha(255)
    r = r.convert("RGBA")
    r.putalpha(128)
    seg_image = Image.alpha_composite(alpha_image, r)
    display(seg_image)

display_segmentation(input_image, torch_predictions)
  1. Now that the PyTorch model looks like it is segmenting the dog image correctly, convert the model into the Core ML format. Before conversion, trace the PyTorch model using a sample input.

This example, uses the cat and dog image. Note: A random input of the same shape would also work.

trace = torch.jit.trace(model, input_batch)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-7-eef9789cb590> in <module>
      1 # First attempt at tracing
----> 2 trace = torch.jit.trace(model, input_batch)

~/miniconda3/envs/coreml/lib/python3.7/site-packages/torch/jit/__init__.py in trace(func, example_inputs, optimize, check_trace, check_inputs, check_tolerance, _force_outplace, _module_class, _compilation_unit)
    880         return trace_module(func, {'forward': example_inputs}, None,
    881                             check_trace, wrap_check_inputs(check_inputs),
--> 882                             check_tolerance, _force_outplace, _module_class)
    883 
    884     if (hasattr(func, '__self__') and isinstance(func.__self__, torch.nn.Module) and

~/miniconda3/envs/coreml/lib/python3.7/site-packages/torch/jit/__init__.py in trace_module(mod, inputs, optimize, check_trace, check_inputs, check_tolerance, _force_outplace, _module_class, _compilation_unit)
   1032             func = mod if method_name == "forward" else getattr(mod, method_name)
   1033             example_inputs = make_tuple(example_inputs)
-> 1034             module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, _force_outplace)
   1035             check_trace_method = module._c._get_method(method_name)
   1036 

RuntimeError: Only tensors or tuples of tensors can be output from traced functions (getOutput at ../torch/csrc/jit/tracer.cpp:212)
frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 135 (0x10c1ec787 in libc10.dylib)
frame #1: torch::jit::tracer::TracingState::getOutput(c10::IValue const&, unsigned long) + 1587 (0x11d8502d3 in libtorch.dylib)
frame #2: torch::jit::tracer::trace(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >, std::__1::function<std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> > (std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)> const&, std::__1::function<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > (at::Tensor const&)>, bool, torch::jit::script::Module*) + 1865 (0x11d850ee9 in libtorch.dylib)
frame #3: torch::jit::tracer::createGraphByTracing(pybind11::function const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >, pybind11::function const&, bool, torch::jit::script::Module*) + 361 (0x10b7daa49 in libtorch_python.dylib)
frame #4: void pybind11::cpp_function::initialize<torch::jit::script::initJitScriptBindings(_object*)::$_13, void, torch::jit::script::Module&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, pybind11::function, pybind11::tuple, pybind11::function, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(torch::jit::script::initJitScriptBindings(_object*)::$_13&&, void (*)(torch::jit::script::Module&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, pybind11::function, pybind11::tuple, pybind11::function, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 319 (0x10b81bd9f in libtorch_python.dylib)
frame #5: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 3372 (0x10b14e9fc in libtorch_python.dylib)
<omitting python frames>

When running this code, the tracer outputs an error that Only tensors or tuples of tensors can be output from traced functions, but our model returns a dictionary.

Get around this limitation by wrapping the model in a module that extracts the output we want from the dictionary:

class WrappedDeeplabv3Resnet101(nn.Module):

    def __init__(self):
        super(WrappedDeeplabv3Resnet101, self).__init__()
        self.model = torch.hub.load(
            'pytorch/vision:v0.6.0',
            'deeplabv3_resnet101',
            pretrained=True
        ).eval()

    def forward(self, x):
        res = self.model(x)
        # Extract the tensor we want from the output dictionary
        x = res["out"]
        return x

Now the trace runs without errors:

traceable_model = WrappedDeeplabv3Resnet101().eval()
trace = torch.jit.trace(traceable_model, input_batch)
  1. Use the Core ML converter by passing in the traced model, and describe the inputs we want to provide to the model. See the Flexible inputs section to learn more.

📘

Note

This example also provides a name for the output, which will make it easier to extract from the Core ML model's prediction dictionary.

mlmodel = ct.convert(
    trace,
    inputs=[ct.TensorType(name="input", shape=input_batch.shape)],
)
  1. Set the model's metadata. This will be useful when the model is used in Xcode and integrated into an app.

For this example, set the type of the model to segmentation, and enumerate the classes in the model's order.

mlmodel.user_defined_metadata['com.apple.coreml.model.preview.type'] = 'imageSegmenter'
mlmodel.user_defined_metadata['com.apple.coreml.model.preview.params'] = 'background,aeroplane,bicyle,bird,boat,bottle,bus,car,cat,chair,cow,dining table,dog,horse,motorbike,person,potted plant,sheep,sofa,train,tv/monitor'
  1. Get predictions from the Core ML model using the cat and dog image.

A dictionary needs to be created to pair the input name with its numpy array value, for the Core ML model's inference.

coreml_inputs = {"input": input_batch.numpy()}
prediction_dictionary = mlmodel.predict(coreml_inputs)
  1. Extract the output from the Core ML prediction dictionary and display the results:
coreml_output = torch.from_numpy(prediction_dictionary["3464"]).squeeze(0).argmax(0)
display_segmentation(input_image, coreml_output)
# Check that the converted output is identical to the original output.
assert(torch.all(torch.eq(coreml_output, torch_predictions)))

Congratulations, the output of our Core ML model is the exact same as the output of our PyTorch model!

Updated 4 months ago


Examples


Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.