How does the coordinate frame of the transformation matrix defined? #1199

AIBluefisher · 2022-05-15T14:20:54Z

❓ How does the coordinate frame of the transformation matrix defined?

Hi, I noticed that the nerf project uses a different coordinate frame from the original one, which uses an OpenGL coordinate frame. However, I have trouble figuring out how the transformation matrices are converted from the original, which are provided by the json files. For example, when I output the camera parameters of the Fern dataset, the transformation matrix of the first camera in the original file is:

            "transform_matrix": [
                [
                    -0.9999021887779236,
                    0.004192245192825794,
                    -0.013345719315111637,
                    -0.05379832163453102
                ],
                [
                    -0.013988681137561798,
                    -0.2996590733528137,
                    0.95394366979599,
                    3.845470428466797
                ],
                [
                    -4.656612873077393e-10,
                    0.9540371894836426,
                    0.29968830943107605,
                    1.2080823183059692
                ],
                [
                    0.0,
                    0.0,
                    0.0,
                    1.0
                ]
            ]

However, the nerf project gives the transformation matrix as:

R: tensor([[[ 9.9990e-01,  4.1922e-03,  1.3346e-02],
         [ 1.3989e-02, -2.9966e-01, -9.5394e-01],
         [ 4.6566e-10,  9.5404e-01, -2.9969e-01]]])
T: tensor([[-3.7253e-09,  1.1101e-07,  4.0311e+00]])

AIBluefisher · 2022-05-15T16:13:39Z

It's clear that the original nerf uses the convention that the camera poses unproject image point to the world, however, in Pytorch3D, the camera poses project a point from world to the camera. Thus, we need invert the original transformation matrix at first, then apply a transformation matrix to align the OpenGL coordinate frame to Pytorch3D's coordiante frame. It's easy to obtain that the rotation matrix is: [[-1, 0, 0], [0, 1, 0], [0, 0, -1]], which means the z-axis points to front, the x-axis points to right, and the y-axis downwards, if we face towards the image. But I still cannot obtain the correct translation, if the translation is zero, then I obtained:

[[ 9.9990219e-01  1.3988681e-02  4.6566129e-10 -5.6255717e-10]
 [ 4.1922452e-03 -2.9965907e-01  9.5403719e-01  2.3841858e-07]
 [ 1.3345719e-02 -9.5394367e-01 -2.9968831e-01  4.0311279e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  1.0000000e+00]]

The translation part `` obviously has a translation in both the x and y axes, since the correct translation is [-3.7253e-09, 1.1101e-07, 4.0311e+00]. Now, the question is how does the translation come from?

ricshaw · 2022-05-17T16:02:34Z

Any update on this? I'm struggling to get the correct translation too

AIBluefisher · 2022-05-18T01:26:08Z

After reading the documentation that introduces the camera coordinate system, I still can't obtain the correct translation on the lego dataset. However, on the fern dataset, by simply left multiplying the same transformation matrix [[-1, 0, 0, 0], [0, 1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]] to the original poses, I can get the same poses as it downloaded from Pytorch3D's. It's really weird.
Besides, I wonder why Pytorch3D uses the focal length in the world unit? And the principal points are all [0, 0], since we often assume the origin of the local camera system is on the center of image.

ricshaw · 2022-05-18T12:01:13Z

That is really weird. I wonder if it has anything to do with the lego data being 360 degrees vs the fern data which is frontal?... Also I noticed that the rotation matrix of the lego data does not have a determinant of exactly 1, whereas the fern rotation matrix does.
Going back to your solution, the rotation matrix needs an additional transpose at the end to make it the same as pytorch3d right?

AIBluefisher · 2022-05-18T12:53:57Z

That is really weird. I wonder if it has anything to do with the lego data being 360 degrees vs the fern data which is frontal?... Also I noticed that the rotation matrix of the lego data does not have a determinant of exactly 1, whereas the fern rotation matrix does. Going back to your solution, the rotation matrix needs an additional transpose at the end to make it the same as pytorch3d right?

Still cannot figure out why, but I'll check the poses one by one later.
For my solution, yes, we need to transpose the poses at the end, because Pytorch3D uses Transform3D to store the poses, which are in column-major form, while the original poses are all stored in the row-major form. You can also refer to the comments of the Transform3D class:

    This class assumes that transformations are applied on inputs which
    are row vectors. The internal representation of the Nx4x4 transformation
    matrix is of the form:

    .. code-block:: python

        M = [
                [Rxx, Ryx, Rzx, 0],
                [Rxy, Ryy, Rzy, 0],
                [Rxz, Ryz, Rzz, 0],
                [Tx,  Ty,  Tz,  1],
            ]

    To apply the transformation to points which are row vectors, the M matrix
    can be pre multiplied by the points:

    .. code-block:: python

        points = [[0, 1, 2]]  # (1 x 3) xyz coordinates of a point
        transformed_points = points * M

github-actions · 2022-06-18T05:31:59Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions · 2022-06-23T05:35:25Z

This issue was closed because it has been stalled for 5 days with no activity.

github-actions bot added the Stale label Jun 18, 2022

github-actions bot closed this as completed Jun 23, 2022

3a1b2c3 mentioned this issue Jan 5, 2023

Add opengl camera conversion to utils, update camera_conversion_util to FoVPerspectiveCameras #1413

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the coordinate frame of the transformation matrix defined? #1199

How does the coordinate frame of the transformation matrix defined? #1199

AIBluefisher commented May 15, 2022

AIBluefisher commented May 15, 2022 •

edited

Loading

ricshaw commented May 17, 2022

AIBluefisher commented May 18, 2022 •

edited

Loading

ricshaw commented May 18, 2022

AIBluefisher commented May 18, 2022

github-actions bot commented Jun 18, 2022

github-actions bot commented Jun 23, 2022

How does the coordinate frame of the transformation matrix defined? #1199

How does the coordinate frame of the transformation matrix defined? #1199

Comments

AIBluefisher commented May 15, 2022