Skip to content

How does the coordinate frame of the transformation matrix defined? #1199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AIBluefisher opened this issue May 15, 2022 · 7 comments
Closed
Labels

Comments

@AIBluefisher
Copy link

❓ How does the coordinate frame of the transformation matrix defined?

Hi, I noticed that the nerf project uses a different coordinate frame from the original one, which uses an OpenGL coordinate frame. However, I have trouble figuring out how the transformation matrices are converted from the original, which are provided by the json files. For example, when I output the camera parameters of the Fern dataset, the transformation matrix of the first camera in the original file is:

            "transform_matrix": [
                [
                    -0.9999021887779236,
                    0.004192245192825794,
                    -0.013345719315111637,
                    -0.05379832163453102
                ],
                [
                    -0.013988681137561798,
                    -0.2996590733528137,
                    0.95394366979599,
                    3.845470428466797
                ],
                [
                    -4.656612873077393e-10,
                    0.9540371894836426,
                    0.29968830943107605,
                    1.2080823183059692
                ],
                [
                    0.0,
                    0.0,
                    0.0,
                    1.0
                ]
            ]

However, the nerf project gives the transformation matrix as:

R: tensor([[[ 9.9990e-01,  4.1922e-03,  1.3346e-02],
         [ 1.3989e-02, -2.9966e-01, -9.5394e-01],
         [ 4.6566e-10,  9.5404e-01, -2.9969e-01]]])
T: tensor([[-3.7253e-09,  1.1101e-07,  4.0311e+00]])
@AIBluefisher
Copy link
Author

AIBluefisher commented May 15, 2022

It's clear that the original nerf uses the convention that the camera poses unproject image point to the world, however, in Pytorch3D, the camera poses project a point from world to the camera. Thus, we need invert the original transformation matrix at first, then apply a transformation matrix to align the OpenGL coordinate frame to Pytorch3D's coordiante frame. It's easy to obtain that the rotation matrix is: [[-1, 0, 0], [0, 1, 0], [0, 0, -1]], which means the z-axis points to front, the x-axis points to right, and the y-axis downwards, if we face towards the image. But I still cannot obtain the correct translation, if the translation is zero, then I obtained:

[[ 9.9990219e-01  1.3988681e-02  4.6566129e-10 -5.6255717e-10]
 [ 4.1922452e-03 -2.9965907e-01  9.5403719e-01  2.3841858e-07]
 [ 1.3345719e-02 -9.5394367e-01 -2.9968831e-01  4.0311279e+00]
 [ 0.0000000e+00  0.0000000e+00  0.0000000e+00  1.0000000e+00]]

The translation part `` obviously has a translation in both the x and y axes, since the correct translation is [-3.7253e-09, 1.1101e-07, 4.0311e+00]. Now, the question is how does the translation come from?

@ricshaw
Copy link

ricshaw commented May 17, 2022

Any update on this? I'm struggling to get the correct translation too

@AIBluefisher
Copy link
Author

AIBluefisher commented May 18, 2022

After reading the documentation that introduces the camera coordinate system, I still can't obtain the correct translation on the lego dataset. However, on the fern dataset, by simply left multiplying the same transformation matrix [[-1, 0, 0, 0], [0, 1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]] to the original poses, I can get the same poses as it downloaded from Pytorch3D's. It's really weird.
Besides, I wonder why Pytorch3D uses the focal length in the world unit? And the principal points are all [0, 0], since we often assume the origin of the local camera system is on the center of image.

@ricshaw
Copy link

ricshaw commented May 18, 2022

That is really weird. I wonder if it has anything to do with the lego data being 360 degrees vs the fern data which is frontal?... Also I noticed that the rotation matrix of the lego data does not have a determinant of exactly 1, whereas the fern rotation matrix does.
Going back to your solution, the rotation matrix needs an additional transpose at the end to make it the same as pytorch3d right?

@AIBluefisher
Copy link
Author

That is really weird. I wonder if it has anything to do with the lego data being 360 degrees vs the fern data which is frontal?... Also I noticed that the rotation matrix of the lego data does not have a determinant of exactly 1, whereas the fern rotation matrix does. Going back to your solution, the rotation matrix needs an additional transpose at the end to make it the same as pytorch3d right?

Still cannot figure out why, but I'll check the poses one by one later.
For my solution, yes, we need to transpose the poses at the end, because Pytorch3D uses Transform3D to store the poses, which are in column-major form, while the original poses are all stored in the row-major form. You can also refer to the comments of the Transform3D class:

    This class assumes that transformations are applied on inputs which
    are row vectors. The internal representation of the Nx4x4 transformation
    matrix is of the form:

    .. code-block:: python

        M = [
                [Rxx, Ryx, Rzx, 0],
                [Rxy, Ryy, Rzy, 0],
                [Rxz, Ryz, Rzz, 0],
                [Tx,  Ty,  Tz,  1],
            ]

    To apply the transformation to points which are row vectors, the M matrix
    can be pre multiplied by the points:

    .. code-block:: python

        points = [[0, 1, 2]]  # (1 x 3) xyz coordinates of a point
        transformed_points = points * M

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jun 18, 2022
@github-actions
Copy link

This issue was closed because it has been stalled for 5 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants