Getting started with Gemini API with NestJS

Reading Time: 6 minutes

Loading

Introduction

I create different Generative AI examples in this blog post using NestJS and Gemini API. The examples generate text from 1) a text input 2) a prompt and an image, and 3) a prompt and two images to analyze them. Google team provided the examples in NodeJS and I ported several of them to NestJS, my favorite framework that builds on top of the popular Express framework.

let's go

Generate Gemini API Key

Go to https://aistudio.google.com/app/apikey to generate an API key for a new or an existing Google Cloud project

Create a new NestJS Project

nest new nestjs-gemini-api-demo

Install dependencies

npm i --save-exact @google/generative-ai @nestjs/swagger class-transformer class-validator dotenv compression

npm i --save-exact --save-dev @types/multer

Generate a Gemini Module

nest g mo gemini
nest g co gemini/presenters/http/gemini --flat
nest g s gemini/application/gemini --flat

Create a Gemini module, a controller, and a service for the API.

Define Gemini environment variables

In .env.example, it has environment variables for the Gemini API Key, Gemini Pro model, Gemini Pro Vision model, and port number

// .env.example

GEMINI_API_KEY=<google_gemini_api_key>
GEMINI_PRO_MODEL=gemini-pro
GEMINI_PRO_VISION_MODEL=gemini-pro-vision
PORT=3000

Copy .env.example to .env and replace the placeholder of GEMINI_API_KEY with the real API Key.

Add .env to the .gitignore file to ensure we don’t accidentally commit the Gemini API Key to the GitHub repo

Add configuration files

The project has 3 configuration files. validate.config.ts validates the payload is valid before any request can route to the controller to execute

// validate.config.ts

import { ValidationPipe } from '@nestjs/common';

export const validateConfig = new ValidationPipe({
  whitelist: true,
  stopAtFirstError: true,
});

env.config.ts extracts the environment variables from process.env and stores the values in the env object.

// env.config.ts

import dotenv from 'dotenv';

dotenv.config();

export const env = {
  PORT: parseInt(process.env.PORT || '3000'),
  GEMINI: {
    KEY: process.env.GEMINI_API_KEY || '',
    PRO_MODEL: process.env.GEMINI_PRO_MODEL || 'gemini-pro',
    PRO_VISION_MODEL: process.env.GEMINI_PRO_VISION_MODEL || 'gemini-pro-vision',
  },
};

gemini.config.ts defines the options for the Gemini API

// gemini.config.ts

import { GenerationConfig, HarmBlockThreshold, HarmCategory, SafetySetting } from '@google/generative-ai';

export const GENERATION_CONFIG: GenerationConfig = { maxOutputTokens: 1024, temperature: 1, topK: 32, topP: 1 };

export const SAFETY_SETTINGS: SafetySetting[] = [
  {
    category: HarmCategory.HARM_CATEGORY_HATE_SPEECH,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
  {
    category: HarmCategory.HARM_CATEGORY_HARASSMENT,
    threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
  },
];

Bootstrap the application

// main.ts

function setupSwagger(app: NestExpressApplication) {
  const config = new DocumentBuilder()
    .setTitle('Gemini example')
    .setDescription('The Gemini API description')
    .setVersion('1.0')
    .addTag('google gemini')
    .build();
  const document = SwaggerModule.createDocument(app, config);
  SwaggerModule.setup('api', app, document);
}

async function bootstrap() {
  const app = await NestFactory.create<NestExpressApplication>(AppModule);
  app.enableCors();
  app.useGlobalPipes(validateConfig);
  app.use(express.json({ limit: '1000kb' }));
  app.use(express.urlencoded({ extended: false }));
  app.use(compression());
  setupSwagger(app);
  await app.listen(env.PORT);
}
bootstrap();

The bootstrap function registers middleware to the application, sets up Swagger documentation, and uses a global pipe to validate payloads.

I have laid down the groundwork and the next step is to add routes to receive Generative AI inputs to generate text

Example 1: Generate text from a prompt

// generate-text.dto.ts

import { ApiProperty } from '@nestjs/swagger';
import { IsNotEmpty, IsString } from 'class-validator';

export class GenerateTextDto {
  @ApiProperty({
    name: 'prompt',
    description: 'prompt of the question',
    type: 'string',
    required: true,
  })
  @IsNotEmpty()
  @IsString()
  prompt: string;
}

The DTO accepts a text prompt to generate text.

// gemini.constant.ts

export const GEMINI_PRO_MODEL = 'GEMINI_PRO_MODEL';
export const GEMINI_PRO_VISION_MODEL = 'GEMINI_PRO_VISION_MODEL';
// gemini.provider.ts

import { GenerativeModel, GoogleGenerativeAI } from '@google/generative-ai';
import { Provider } from '@nestjs/common';
import { env } from '~configs/env.config';
import { GENERATION_CONFIG, SAFETY_SETTINGS } from '~configs/gemini.config';
import { GEMINI_PRO_MODEL, GEMINI_PRO_VISION_MODEL } from './gemini.constant';

export const GeminiProModelProvider: Provider<GenerativeModel> = {
  provide: GEMINI_PRO_MODEL,
  useFactory: () => {
    const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
    return genAI.getGenerativeModel({
      model: env.GEMINI.PRO_MODEL,
      generationConfig: GENERATION_CONFIG,
      safetySettings: SAFETY_SETTINGS,
    });
  },
};

export const GeminiProVisionModelProvider: Provider<GenerativeModel> = {
  provide: GEMINI_PRO_VISION_MODEL,
  useFactory: () => {
    const genAI = new GoogleGenerativeAI(env.GEMINI.KEY);
    return genAI.getGenerativeModel({
      model: env.GEMINI.PRO_VISION_MODEL,
      generationConfig: GENERATION_CONFIG,
      safetySettings: SAFETY_SETTINGS,
    });
  },
};

I define two providers to provide the Gemini Pro model and the Gemini Pro Vision model respectively. Then I can inject these providers into the Gemini service.

// content.helper.ts

import { Content, Part } from '@google/generative-ai';

export function createContent(text: string, ...images: Express.Multer.File[]): Content[] {
  const imageParts: Part[] = images.map((image) => {
    return {
      inlineData: {
        mimeType: image.mimetype,
        data: image.buffer.toString('base64'),
      },
    };
  });

  return [
    {
      role: 'user',
      parts: [
        ...imageParts,
        {
          text,
        },
      ],
    },
  ];
}

createContent is a helper function that creates the content for the model.

// gemini.service.ts

// ... omit the import statements to save space

@Injectable()
export class GeminiService {

  constructor(
    @Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
    @Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
  ) {}

  async generateText(prompt: string): Promise<GenAiResponse> {
    const contents = createContent(prompt);

    const { totalTokens } = await this.proModel.countTokens({ contents });
    const result = await this.proModel.generateContent({ contents });
    const response = await result.response;
    const text = response.text();

    return { totalTokens, text };
  }
  
  ...
}

generateText method accepts a prompt and calls the Gemini API to generate the text. The method returns the total number of tokens and the text to the controller

// gemini.controller.ts

// omit the import statements to save space

@ApiTags('Gemini')
@Controller('gemini')
export class GeminiController {
  constructor(private service: GeminiService) {}

  @ApiBody({
    description: 'Prompt',
    required: true,
    type: GenerateTextDto,
  })
  @Post('text')
  generateText(@Body() dto: GenerateTextDto): Promise<GenAiResponse> {
    return this.service.generateText(dto.prompt);
  }

  ... other routes....
}

Example 2: Generate text from a prompt and an image

This example needs both the prompt and the image file

// gemini.service.ts

// ... omit the import statements to save space

@Injectable()
export class GeminiService {

  constructor(
    @Inject(GEMINI_PRO_MODEL) private readonly proModel: GenerativeModel,
    @Inject(GEMINI_PRO_VISION_MODEL) private readonly proVisionModel: GenerativeModel,
  ) {}

 ... other methods ...


async generateTextFromMultiModal(prompt: string, file: Express.Multer.File): Promise<GenAiResponse> {
    try {
      const contents = createContent(prompt, file);

      const { totalTokens } = await this.proVisionModel.countTokens({ contents });
      const result = await this.proVisionModel.generateContent({ contents });
      const response = await result.response;
      const text = response.text();

      return { totalTokens, text };
    } catch (err) {
      if (err instanceof Error) {
        throw new InternalServerErrorException(err.message, err.stack);
      }
      throw err;
    }
  }
}
// file-validator.pipe.ts

import { FileTypeValidator, MaxFileSizeValidator, ParseFilePipe } from '@nestjs/common';

export const fileValidatorPipe = new ParseFilePipe({
  validators: [
    new MaxFileSizeValidator({ maxSize: 1 * 1024 * 1024 }),
    new FileTypeValidator({ fileType: new RegExp('image/[jpeg|png]') }),
  ],
});

Define fileValidatorPipe to validate that the uploaded file is either a JPEG or a PNG file, and that the file does not exceed 1MB.

// gemini.controller.ts

  @ApiConsumes('multipart/form-data')
  @ApiBody({
    schema: {
      type: 'object',
      properties: {
        prompt: {
          type: 'string',
          description: 'Prompt',
        },
        file: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
      },
    },
  })
  @Post('text-and-image')
  @UseInterceptors(FileInterceptor('file'))
  async generateTextFromMultiModal(
    @Body() dto: GenerateTextDto,
    @UploadedFile(fileValidatorPipe)
    file: Express.Multer.File,
  ): Promise<GenAiResponse> {
    return this.service.generateTextFromMultiModal(dto.prompt, file);
  }

Example 3: Analyze two images

This example is similar to example 2 except it needs a prompt and 2 images for comparison and contrast.

// gemini.service.ts

async analyzeImages({ prompt, firstImage, secondImage }: AnalyzeImage): Promise<GenAiResponse> {
    try {
      const contents = createContent(prompt, firstImage, secondImage);

      const { totalTokens } = await this.proVisionModel.countTokens({ contents });
      const result = await this.proVisionModel.generateContent({ contents });
      const response = await result.response;
      const text = response.text();

      return { totalTokens, text };
    } catch (err) {
      if (err instanceof Error) {
        throw new InternalServerErrorException(err.message, err.stack);
      }
      throw err;
    }
  }
// gemini.controller.ts

@ApiConsumes('multipart/form-data')
  @ApiBody({
    schema: {
      type: 'object',
      properties: {
        prompt: {
          type: 'string',
          description: 'Prompt',
        },
        first: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
        second: {
          type: 'string',
          format: 'binary',
          description: 'Binary file',
        },
      },
    },
  })
  @Post('analyse-the-images')
  @UseInterceptors(
    FileFieldsInterceptor([
      { name: 'first', maxCount: 1 },
      { name: 'second', maxCount: 1 },
    ]),
  )
  async analyseImages(
    @Body() dto: GenerateTextDto,
    @UploadedFiles()
    files: {
      first?: Express.Multer.File[];
      second?: Express.Multer.File[];
    },
  ): Promise<GenAiResponse> {
    if (!files.first?.length) {
      throw new BadRequestException('The first image is missing');
    }

    if (!files.second?.length) {
      throw new BadRequestException('The second image is missing');
    }
    return this.service.analyzeImages({ prompt: dto.prompt, firstImage: files.first[0], secondImage: files.second[0] });
  }

Test the endpoints

I can test the endpoints with Postman or Swagger documentation after starting the application

npm run start:dev

The URL of the Swagger documentation is http://localhost:3000/api

(Bonus) Deploy to Google Cloud Run

Install the gcloud CLI on the machine according to the official documentation. On my machine, the installation path is ~/google-cloud-sdk.

Then, I open a new terminal and change to the root of the project. On the command line, I update the environment variables before the deployment

$ ~/google-cloud-sdk/bin/gcloud run deploy \ --update-env-vars GEMINI_API_KEY=<replace with your own key>,GEMINI_PRO_MODEL=gemini-pro,GEMINI_PRO_VISION_MODEL=gemini-pro-vision

If the deployment is successful, the NestJS application will run on Google Cloud Run.

This is the end of the blog post that analyzes data retrieval patterns in Angular. I hope you like the content and continue to follow my learning experience in Angular, NestJS, and other technologies.

Resources:

  1. Github Repo: https://github.com/railsstudent/nestjs-gemini-api-demo
  2. NodeJS Gemini tutorials: https://ai.google.dev/tutorials/node_quickstart
  3. Cloud run deploy documentation: https://cloud.google.com/run/docs/deploying-source-code