aigve.utils

`LoadVideoFromFile`

Bases: BaseTransform

Load a video from file.

Required Keys:

- video_path_pd

Modified Keys

video_pd

Parameters:

Name	Type	Description	Default
`height`	`int`	int, default is -1 Desired output height of the video, unchanged if `-1` is specified.	`-1`
`width`	`int`	int, default is -1 Desired output width of the video, unchanged if `-1` is specified. See details in: https://github.com/dmlc/decord/blob/master/python/decord/video_reader.py#L18	`-1`

Source code in aigve/utils/loading.py

@TRANSFORMS.register_module()
class LoadVideoFromFile(BaseTransform):
    """Load a video from file.

    Required Keys:

        - video_path_pd

    Modified Keys:
        - video_pd

    Args:
        height: int, default is -1
            Desired output height of the video, unchanged if `-1` is specified.
        width: int, default is -1
            Desired output width of the video, unchanged if `-1` is specified.
            See details in: https://github.com/dmlc/decord/blob/master/python/decord/video_reader.py#L18
    """

    def __init__(self, height: int = -1, width: int = -1):
        self.height = height
        self.width = width


    def transform(self, results: dict) -> Optional[dict]:
        """Functions to load video. 
        Referred to 'https://github.com/Vchitect/VBench/blob/master/vbench/utils.py#L103'

        The function supports loading video in GIF (.gif), PNG (.png), and MP4 (.mp4) formats.
        Depending on the format, it processes and extracts frames accordingly.

        Args:
            results (dict): Result dict from
                :class:`mmengine.dataset.BaseDataset`.

        Returns:
            dict: The dict contains loaded video in shape (F, C, H, W) and 
            meta information if needed. F is the number of frames, C is the 
            number of channels, H is the height, and W is the width.

        Raises:
            - NotImplementedError: If the video format is not supported.

        The function first determines the format of the video file by its extension.
        For GIFs, it iterates over each frame and converts them to RGB.
        For PNGs, it reads the single frame, converts it to RGB.
        For MP4s, it reads the frames using the VideoReader class and converts them to NumPy arrays.
        If a data_transform is provided, it is applied to the buffer before converting it to a tensor.
        Finally, the tensor is permuted to match the expected (F, C, H, W) format.
        """

        video_path = results['video_path_pd']
        if video_path.endswith('.gif'):
            frame_ls = []
            img = Image.open(video_path)
            for frame in ImageSequence.Iterator(img):
                frame = frame.convert('RGB')
                frame = np.array(frame).astype(np.uint8)
                frame_ls.append(frame)
            buffer = np.array(frame_ls).astype(np.uint8) # (F, H, W, C), np.uint8
        elif video_path.endswith('.png'):
            frame = Image.open(video_path)
            frame = frame.convert('RGB')
            frame = np.array(frame).astype(np.uint8)
            frame_ls = [frame]
            buffer = np.array(frame_ls) # (1, H, W, C), np.uint8
        elif video_path.endswith('.mp4'):
            import decord
            decord.bridge.set_bridge('native')
            if self.width and self.height:
                video_reader = VideoReader(video_path, width=self.width, height=self.height, num_threads=1)
            else:
                video_reader = VideoReader(video_path, num_threads=1)
            frames = video_reader.get_batch(range(len(video_reader)))  # (F, H, W, C), torch.uint8
            buffer = frames.asnumpy().astype(np.uint8) # (F, H, W, C), np.uint8
        else:
            raise NotImplementedError

        frames = torch.Tensor(buffer)
        frames = frames.permute(0, 3, 1, 2) # (F, C, H, W), torch.uint8
        results['video_pd'] = frames

        return results

    def __repr__(self):
        repr_str = (f'{self.__class__.__name__}, '
                    f'height={self.height}, '
                    f'width={self.width}')

`transform(results)`

Functions to load video. Referred to 'https://github.com/Vchitect/VBench/blob/master/vbench/utils.py#L103'

The function supports loading video in GIF (.gif), PNG (.png), and MP4 (.mp4) formats. Depending on the format, it processes and extracts frames accordingly.

Parameters:

Name	Type	Description	Default
`results`	`dict`	Result dict from :class:`mmengine.dataset.BaseDataset`.	required

Returns:

Name	Type	Description
`dict`	`Optional[dict]`	The dict contains loaded video in shape (F, C, H, W) and
	`Optional[dict]`	meta information if needed. F is the number of frames, C is the
	`Optional[dict]`	number of channels, H is the height, and W is the width.

Raises:

Type	Description
`-NotImplementedError`	If the video format is not supported.

The function first determines the format of the video file by its extension. For GIFs, it iterates over each frame and converts them to RGB. For PNGs, it reads the single frame, converts it to RGB. For MP4s, it reads the frames using the VideoReader class and converts them to NumPy arrays. If a data_transform is provided, it is applied to the buffer before converting it to a tensor. Finally, the tensor is permuted to match the expected (F, C, H, W) format.

Source code in aigve/utils/loading.py

def transform(self, results: dict) -> Optional[dict]:
    """Functions to load video. 
    Referred to 'https://github.com/Vchitect/VBench/blob/master/vbench/utils.py#L103'

    The function supports loading video in GIF (.gif), PNG (.png), and MP4 (.mp4) formats.
    Depending on the format, it processes and extracts frames accordingly.

    Args:
        results (dict): Result dict from
            :class:`mmengine.dataset.BaseDataset`.

    Returns:
        dict: The dict contains loaded video in shape (F, C, H, W) and 
        meta information if needed. F is the number of frames, C is the 
        number of channels, H is the height, and W is the width.

    Raises:
        - NotImplementedError: If the video format is not supported.

    The function first determines the format of the video file by its extension.
    For GIFs, it iterates over each frame and converts them to RGB.
    For PNGs, it reads the single frame, converts it to RGB.
    For MP4s, it reads the frames using the VideoReader class and converts them to NumPy arrays.
    If a data_transform is provided, it is applied to the buffer before converting it to a tensor.
    Finally, the tensor is permuted to match the expected (F, C, H, W) format.
    """

    video_path = results['video_path_pd']
    if video_path.endswith('.gif'):
        frame_ls = []
        img = Image.open(video_path)
        for frame in ImageSequence.Iterator(img):
            frame = frame.convert('RGB')
            frame = np.array(frame).astype(np.uint8)
            frame_ls.append(frame)
        buffer = np.array(frame_ls).astype(np.uint8) # (F, H, W, C), np.uint8
    elif video_path.endswith('.png'):
        frame = Image.open(video_path)
        frame = frame.convert('RGB')
        frame = np.array(frame).astype(np.uint8)
        frame_ls = [frame]
        buffer = np.array(frame_ls) # (1, H, W, C), np.uint8
    elif video_path.endswith('.mp4'):
        import decord
        decord.bridge.set_bridge('native')
        if self.width and self.height:
            video_reader = VideoReader(video_path, width=self.width, height=self.height, num_threads=1)
        else:
            video_reader = VideoReader(video_path, num_threads=1)
        frames = video_reader.get_batch(range(len(video_reader)))  # (F, H, W, C), torch.uint8
        buffer = frames.asnumpy().astype(np.uint8) # (F, H, W, C), np.uint8
    else:
        raise NotImplementedError

    frames = torch.Tensor(buffer)
    frames = frames.permute(0, 3, 1, 2) # (F, C, H, W), torch.uint8
    results['video_pd'] = frames

    return results

`read_image_detectron2(file_name, format=None)`

Read an image into the given format. Will apply rotation and flipping if the image has such exif information.

Parameters:

Name	Type	Description	Default
`file_name`	`str`	image file path	required
`format`	`str`	one of the supported image modes in PIL, or "BGR" or "YUV-BT.601".	`None`

Returns:

Name	Type	Description
`image`	`ndarray`	an HWC image in the given format, which is 0-255, uint8 for supported image modes in PIL or "BGR"; float (0-1 for Y) for YUV-BT.601.

Source code in aigve/utils/image_reading.py

def read_image_detectron2(file_name, format=None):
    """
    Read an image into the given format.
    Will apply rotation and flipping if the image has such exif information.

    Args:
        file_name (str): image file path
        format (str): one of the supported image modes in PIL, or "BGR" or "YUV-BT.601".

    Returns:
        image (np.ndarray):
            an HWC image in the given format, which is 0-255, uint8 for
            supported image modes in PIL or "BGR"; float (0-1 for Y) for YUV-BT.601.
    """
    try:
        import detectron2
    except ImportError:
        print("detectron2 is not installed. Installing...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", "detectron2"])

        return detectron2.data.detection_utils.read_image(img_src, format="BGR")