@tokenizer/s3

The tokenizer-s3 module enables seamless integration with Amazon Web Services (AWS) S3, allowing you to read and tokenize data from S3 objects in a streaming fashion. This module extends the functionality of the strtok3 tokenizer by providing support for chunked S3 data access.

Features

Streaming Support: Efficiently read and tokenize data from Amazon S3 objects using streaming, which is ideal for handling large files without loading them entirely into memory. Integration with strtok3: Works seamlessly with the strtok3 tokenizer to process S3 data streams, making it easy to handle various tokenization tasks. Flexible Access: Provides options to configure S3 access, allowing for customized tokenization workflows based on your specific needs. Promise-Based API: Utilizes a promise-based API for easy integration into modern asynchronous workflows.

Installation

npm install @tokenizer/s3

Sponsor

If you appreciate my work and want to support the development of open-source projects like music-metadata, file-type, and listFix(), consider becoming a sponsor or making a small contribution. Your support helps sustain ongoing development and improvements. Become a sponsor to Borewit

or

API Documention

`makeChunkedTokenizerFromS3`

Initialize a tokenizer, with the option for random access, from an Amazon S3 client for use in extracting metadata from media files.

Function Signature

function makeChunkedTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<IRandomAccessTokenizer>

Reads from the S3 as a stream.

Parameters

s3 (S3Client):

The S3 client used to make requests to Amazon S3.

[!NOTE] To configure AWS client authentication see Configuration and credential file settings.
objRequest (GetObjectRequest):

The S3 object request containing details about the S3 object to fetch. This includes properties like the bucket name and object key.
options (IS3Options, optional):

Returns

Promise<IRandomAccessTokenizer>:

A Promise that resolves to an instance of IRandomAccessTokenizer. This tokenizer can be used to extract metadata from the specified media file in the S3 object. It supports random access reads.

`makeStreamingTokenizerFromS3`

Initialize a tokenizer from an Amazon S3 client for use in extracting metadata from media files.

Function Signature

function makeStreamingTokenizerFromS3(s3: S3Client, objRequest: GetObjectRequest): Promise<ITokenizer>

Reads from the S3 as a stream.

Parameters

s3 (S3Client):

The S3 client used to make requests to Amazon S3.

[!NOTE] To configure AWS client authentication see Configuration and credential file settings.
objRequest (GetObjectRequest):

The S3 object request containing details about the S3 object to fetch. This includes properties like the bucket name and object key.

Returns

Promise<ITokenizer>:

A Promise that resolves to an instance of ITokenizer. This tokenizer can be used to extract metadata from the specified media file in the S3 object.

Compatibility

Module: version 0.3.0 migrated from CommonJS to pure ECMAScript Module (ESM). The distributed JavaScript codebase is compliant with the ECMAScript 2020 (11th Edition) standard.

This module requires a Node.js ≥ 16 engine. It can also be used in a browser environment when bundled with a module bundler.

For TypeScript CommonJs backward compatibility, you can use load-esm.

Examples

Determine S3 file type

Determine file type (based on it's content) from a file stored Amazon S3 cloud:

import { fileTypeFromTokenizer } from 'file-type';
import { fromEnv } from '@aws-sdk/credential-providers';
import { S3Client } from '@aws-sdk/client-s3';
import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';

(async () => {

  // Initialize S3 client
  const s3 = new S3Client({
    region: 'eu-west-2',
    credentials: fromEnv(),
  });

  // Initialize S3 tokenizer
  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, {
    Bucket: 'affectlab',
    Key: '1min_35sec.mp4'
  });

  // Figure out what kind of file it is
  const fileType = await fileTypeFromTokenizer(s3Tokenizer);
  console.log(fileType);
})();

Reading audio metadata from Amazon S3

Retrieve music-metadata

import { makeChunkedTokenizerFromS3 } from '@tokenizer/s3';
import { S3Client } from '@aws-sdk/client-s3';
import { parseFromTokenizer } from 'music-metadata/lib/core';

/**
 * Retrieve metadata from Amazon S3 object
 * @param objRequest S3 object request
 * @param options `tokenizer-s3` options
 * @return Metadata
 */
async function parseS3Object(s3, objRequest, options) {
  const s3Tokenizer = await makeChunkedTokenizerFromS3(s3, objRequest);
  return parseFromTokenizer(s3Tokenizer, options);
}

(async () => {
  const s3 = new S3Client({});

  const metadata = await parseS3Object(s3, {
    Bucket: 'standing0media',
    Key: '01 Where The Highway Takes Me.mp3'
  });

  console.log(metadata);
})();

A module implementation of this example can be found in @music-metadata/s3.

Name		Name	Last commit message	Last commit date
Latest commit History 1,283 Commits
.github		.github
doc		doc
lib		lib
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
.mocharc.json		.mocharc.json
.yarnrc.yml		.yarnrc.yml
README.md		README.md
biome.jsonc		biome.jsonc
package.json		package.json
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@tokenizer/s3

Features

Installation

Sponsor

API Documention

`makeChunkedTokenizerFromS3`

Function Signature

Parameters

Returns

`makeStreamingTokenizerFromS3`

Function Signature

Parameters

Returns

Compatibility

Examples

Determine S3 file type

Reading audio metadata from Amazon S3

Dependency graph

About

Releases 16

Sponsor this project

Packages

Contributors 7

Languages

Borewit/tokenizer-s3

Folders and files

Latest commit

History

Repository files navigation

@tokenizer/s3

Features

Installation

Sponsor

API Documention

makeChunkedTokenizerFromS3

Function Signature

Parameters

Returns

makeStreamingTokenizerFromS3

Function Signature

Parameters

Returns

Compatibility

Examples

Determine S3 file type

Reading audio metadata from Amazon S3

Dependency graph

About

Resources

Stars

Watchers

Forks

Releases 16

Sponsor this project

Packages 0

Contributors 7

Languages

`makeChunkedTokenizerFromS3`

`makeStreamingTokenizerFromS3`

Packages