Transformers — Bert: Fill the Missing Word
Many of you must have heard of Bert, or transformers.
And you may also know huggingface. In this tutorial, let’s play with its pytorch transformer model and serve it with Pinferencia.
Pinferencia makes it super easy to serve any model with just three extra lines.
HuggingFace makes it easy to use the pretrained model with just several lines.
How the model works?
With an input of an incomplete sentence, the model will infer the missing word:
Cool~let’s try it now~
For mac users
If you’re working on a M1 Mac like me, you need install cmake
and rust
brew install cmakecurl — proto ‘=https’ — tlsv1.2 -sSf | sh
Install dependencies
You can install dependencies using pip.
pip install tqdm boto3 requests regex sentencepiece sacremoses
or you can use a docker image instead:
docker run -it -p 8000:8000 -v $(pwd):/opt/workspace huggingface/transformers-pytorch-cpu:4.18.0 bash
Load the model
This will load the tokenizer and the model. It may take sometime to download.
import torch# load tokenizer
tokenizer = torch.hub.load(
# load masked model
masked_lm_model = torch.hub.load(
Define the predict function
The input text is: Paris is the [MASK] of France.
input_text = “Paris is the [MASK] of France.”
First we need to tokenize the
tokens = tokenizer(input_text)
Let’s have a look at the masked index:
mask_index = [
for i, token_id in enumerate(tokens[“input_ids”])
if token_id == tokenizer.mask_token_id
Prepare the tensor:
segments_tensors = torch.tensor([tokens[“token_type_ids”]])
tokens_tensor = torch.tensor([tokens[“input_ids”]])
with torch.no_grad():
predictions = masked_lm_model(
tokens_tensor, token_type_ids=segments_tensors
Now, let’s have a look at the result:
pred_tokens = torch.argmax(predictions[0][0], dim=1)# replace the initail input text’s mask with predicted text
for i in mask_index:
tokens[“input_ids”][i] = pred_tokens[i]tokenizer.decode(tokens[“input_ids”], skip_special_tokens=True)
Paris is the capital of France.
Let’s organize the codes in to a predict function:
predict(“Paris is the [MASK] of France.”)
Serve it through REST API
First, let’s install Pinferencia.
pip install “pinferencia[uvicorn]”
With Pinferencia, just add three more lines and your model goes online!
Never heard of Pinferencia? It’s not late. Go to its GitHub to take a look. Don’t forget to give it a star.
Let’s save our predict function into a file `` and add some lines to register it.
Run the service, and wait for it to load the model and start the server:
uvicorn app:service — reload
Test the service
Using curl:
Cool~~ Not yet, even cooler:
You can use the swagger ui at (the server’s address) to try the prediction: