Transformers — Bert: Fill the Missing Word

3 min readApr 16, 2022

Many of you must have heard of Bert, or transformers.
And you may also know huggingface. In this tutorial, let’s play with its pytorch transformer model and serve it with Pinferencia.

Pinferencia makes it super easy to serve any model with just three extra lines.
HuggingFace makes it easy to use the pretrained model with just several lines.

How the model works?

With an input of an incomplete sentence, the model will infer the missing word:

Cool~let’s try it now~

Prerequisite

For mac users

If you’re working on a M1 Mac like me, you need install cmake and rust .

brew install cmakecurl — proto ‘=https’ — tlsv1.2 -sSf https://sh.rustup.rs | sh

Install dependencies

You can install dependencies using pip.

pip install tqdm boto3 requests regex sentencepiece sacremoses

or you can use a docker image instead:

docker run -it -p 8000:8000 -v $(pwd):/opt/workspace huggingface/transformers-pytorch-cpu:4.18.0 bash

Load the model

This will load the tokenizer and the model. It may take sometime to download.

import torch# load tokenizer
tokenizer = torch.hub.load(
    “huggingface/pytorch-transformers”,
    “tokenizer”,
    “bert-base-cased”,
)
# load masked model
masked_lm_model = torch.hub.load(
    “huggingface/pytorch-transformers”,
    “modelForMaskedLM”,
    “bert-base-cased”,
)

Define the predict function

The input text is: Paris is the [MASK] of France.

input_text = “Paris is the [MASK] of France.”

First we need to tokenize the

tokens = tokenizer(input_text)

Let’s have a look at the masked index:

mask_index = [
    i
    for i, token_id in enumerate(tokens[“input_ids”])
    if token_id == tokenizer.mask_token_id
]

Prepare the tensor:

segments_tensors = torch.tensor([tokens[“token_type_ids”]])
tokens_tensor = torch.tensor([tokens[“input_ids”]])

Predict:

with torch.no_grad():
     predictions = masked_lm_model(
         tokens_tensor, token_type_ids=segments_tensors
     )

Now, let’s have a look at the result:

pred_tokens = torch.argmax(predictions[0][0], dim=1)# replace the initail input text’s mask with predicted text
for i in mask_index:
    tokens[“input_ids”][i] = pred_tokens[i]tokenizer.decode(tokens[“input_ids”], skip_special_tokens=True)

Output:

Paris is the capital of France.

Let’s organize the codes in to a predict function:

Run:

predict(“Paris is the [MASK] of France.”)

Output:

Serve it through REST API

First, let’s install Pinferencia.

pip install “pinferencia[uvicorn]”

With Pinferencia, just add three more lines and your model goes online!

Never heard of Pinferencia? It’s not late. Go to its GitHub to take a look. Don’t forget to give it a star.

Let’s save our predict function into a file `app.py` and add some lines to register it.

Run the service, and wait for it to load the model and start the server:

uvicorn app:service — reload

Test the service

Using curl:

Response:

Cool~~ Not yet, even cooler:

You can use the swagger ui at http://127.0.0.1:8000 (the server’s address) to try the prediction: