Skip to content

Inaccurate camera centre obtained by Pytorch3d #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eckertzhang opened this issue Jul 30, 2020 · 13 comments
Closed

Inaccurate camera centre obtained by Pytorch3d #294

eckertzhang opened this issue Jul 30, 2020 · 13 comments
Assignees
Labels
how to How to use PyTorch3D in my project

Comments

@eckertzhang
Copy link

I have some files including obj, camera extrinsics (world2cam matrix, e.g. R and T) and object photo with the same viewpoint. I try to render this obj with Pytorch3d, But I got the rendered result with a little view shifted.
For comparison, I show the original photo, OpenGL rendering result and Pytorch3d rendering result.
The original photo in the same viewpoint:
https://i.loli.net/2020/07/30/ysbBakuDZFrSi9C.png
The OpenGL rendering result:
https://i.loli.net/2020/07/30/Lk7w6SOqiAWyoMF.png
The Pytorch3d rendering result:
https://i.loli.net/2020/07/30/lWmNhfPxIDU3JyG.png
For comparison, I made a GIF with the original photo, OpenGL rendering result and Pytorch3d rendering result.
https://s1.ax1x.com/2020/07/30/aM9ZlR.gif

I think maybe there is some different in the camera centre. I noticed that when pytorch3d calculates the camera centre, it is obtained by inverting the world2cam_pytorch3d matrix, where:
_

world2cam_pytorch3d = [R, 0; T, 1]
camera_center = world2cam_pytorch3d.inverse()[3, :3]

_
While the OpenGL calculates the camera centre by inverting the world2cam matrix, where:
_

world2cam = [R, T; 0, 1]
camera_center = world2cam.inverse()[:3, 3]

_

I output the results of the two calculation methods and found that they are not exactly the same:
https://i.loli.net/2020/07/30/yVYuoPgceEpIfKd.png
So I confused about this issue, and it is no problem to transform the R and T in the form of pytorch3d?

@nikhilaravi nikhilaravi self-assigned this Jul 30, 2020
@eckertzhang
Copy link
Author

I have solved this problem!
I found several errors in your code during the function call. I believe your original intention was to transpose the world2cam matrix to obtain the pytorch3d form, but only the T was transposed during the synthesis process. you can see line 839 in cameras.py https://github.com/facebookresearch/pytorch3d/blob/07d7e12644ee48ea0b544c07becacdafe93c260a/pytorch3d/renderer/cameras.py#L804. After Rotation(), the R didn't change the content (I mean it wasn't transposed). Then the R.compose(T) is not true when we use the .get_matrix. Actually, I changed line515 of transform3d.py https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/transforms/transform3d.py as: mat[:, :3, :3] = R.transpose(1,2). It works!!! I think the T and R should perform similar operations (transpose) after the function Translate() and Rotate().

@gkioxari gkioxari self-assigned this Jul 31, 2020
@gkioxari gkioxari added the how to How to use PyTorch3D in my project label Jul 31, 2020
@gkioxari
Copy link
Contributor

Hi @Eckert-ZJB
Could you give us reproducible code to verify where the error because it's hard for me to follow what is going from your description. If you could give a compact and reproducible test case where the error is happening that would be best for us to investigate further

@eckertzhang
Copy link
Author

Sorry about my terrible describe. Here is a demo to calculate the camera centre by using different method.
the result is different unless you change https://github.com/facebookresearch/pytorch3d/blob/07d7e12644ee48ea0b544c07becacdafe93c260a/pytorch3d/transforms/transform3d.py#L515 as mat[:, :3, :3] = R.transpose(1,2).
demo.zip

@eckertzhang
Copy link
Author

@gkioxari

@eckertzhang
Copy link
Author

@gkioxari @nikhilaravi
Because of the unclear expression last time, there is no reply given by the pytorch3d team. Maybe I need to give a more clear and detailed explanation about this issue.

Firstly, I need to render a mesh in a given camera pose. The informations that I have include: the mesh (i.e. an obj file with texture), camera intrinsic, camera extrinsics (i.e. world2cam matrix, or R and T), and a real picture of object obtained from this viewpoint. Then, I use two different renderers to render this mesh. One is Pyrender (based on OpenGL, which is used as a reference result), the other is Pytorch3d. Noted the difference between their the camera coordinate system definition, I rotate the world2cam matrix 180 degrees arround the z axis before it is sent to Pytorch3d render, in other words, perform world2cam *= Rz here Rz is the rotation matrix.

After this, I rendered the mesh with the Pytorch3d function:

R = torch.tensor(world2cam[:3, :3].reshape(1, 3, 3))
T = torch.tensor(world2cam[:3, 3].reshape(1, 3))
cameras=OpenGLPerspectiveCameras(R=R, T=T)
raster_settings = RasterizationSettings(image_size=640, blur_radius=0.0, faces_per_pixel=8)
lights = PointLights(location=[[0.0, 0.0, -3.0]], ambient_color=((1, 1, 1),), diffuse_color=((0, 0, 0),), specular_color=((0, 0, 0),), device=device,)
renderer = MeshRenderer( rasterizer=MeshRasterizer( cameras=cameras, raster_settings=raster_settings ), shader=TexturedSoftPhongShader( device=device, cameras=cameras, lights=lights ) )
images = renderer(mesh)

In fact, I got a wrong rendering result after performming the previous process. But why? Then, I carefully checked the middle process of program execution, and I found there is an approximate transposition process in the sub-function get_world_to_view_transform(R, T) of OpenGLPerspectiveCameras().
In this sub-function, the input R and T are processed to create a Transform3d object by: T = Translate(T, device=T.device), R = Rotate(R, device=R.device), and return R.compose(T). This return is the key of the previous wrong pipeline.
Let me explain it in detail.
This sub-function get_world_to_view_transform was called twice in the whole process. First time is in the function get_camera_center to calculate the camera center. Second time is in the function get_world_to_view_transform to perform a rasterization.

(1) Let's see the first time:
Original code is:

w2v_trans = self.get_world_to_view_transform(**kwargs)
P = w2v_trans.inverse().get_matrix()
C = P[:, 3, :3] where C is the camera center.

It seems right. However, 'this is right' is based on the return w2v_trans is right. In other words, w2v_trans should be the transpose of world2cam matrix. But it isn't!
See the follow splide for details.
11

(2) The process of second time to call the sub-function get_world_to_view_transform is similar to the first time. They both get a wrong result because of the wrong return of get_world_to_view_transform.

The suggestion to solve the problem above:
In the function get_world_to_view_transform, there is a called function named Rotate(). When we change the code mat[:, :3, :3] = R in function Rotate() as mat[:, :3, :3] = R.transpose(1, 2), this problem can be solved.

@gkioxari gkioxari added bug Something isn't working and removed how to How to use PyTorch3D in my project labels Sep 12, 2020
@gkioxari
Copy link
Contributor

gkioxari commented Sep 23, 2020

Hi @Eckert-ZJB
Sorry for the late response. I just got to your issue.

I think your confusion stems from the fact that R in your definition is supposed to transform column vectors. Note that in PyTorch3D Transform3D we assume that the transformation are applied to row vectors

This class assumes that transformations are applied on inputs which
are row vectors. The internal representation of the Nx4x4 transformation
matrix is of the form:

Hence, before sending your R in to your camera you need to transpose it. This is essentially what you are doing to fix the issue but the clean solution is to transpose R before you pass it in the camera.

As a side note, 3D is a complicated issue. When you switch between libraries (in this case OpenGL and PyTorch3D) you need to make sure that the data you transfer is transformed so that they obey the same world coordinate conventions of the libraries and that the transformation matrices are assuming the same order (row vs vector).

@gkioxari gkioxari added how to How to use PyTorch3D in my project and removed bug Something isn't working labels Sep 23, 2020
@gkioxari
Copy link
Contributor

gkioxari commented Oct 1, 2020

Closing this!

@gkioxari gkioxari closed this as completed Oct 1, 2020
@YaroslavShchekaturov
Copy link

Hi @gkioxari @Eckert-ZJB

I have reproduced your demo and I suppose that I've recieved required results.

import numpy as np
w2c = np.loadtxt('./data/demo/00000_pose.txt')
intrinsic = np.loadtxt('./data/demo/intrinsic.txt')
width = 640
height = 480

Unified camera coordinate system

print(w2c)
Rz = np.array([[-1, 0, 0, 0],
[0, -1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]])
world2cam = np.dot(Rz, w2c)

print('The true camera position is:\n', np.linalg.inv(world2cam)[:3,3])

R = torch.tensor(np.transpose(world2cam[:3, :3]).reshape(1,3,3))
T = torch.tensor(world2cam[:3, 3].reshape(1,3))

Calculate the camera position using Pytorch3d method

world_to_view_transform = get_world_to_view_transform(R=R, T=T)
P = world_to_view_transform.inverse().get_matrix()
camera_position_pytorch3d = P[:, 3, :3]

print('The camera position calculate by Pytorch3d:\n', camera_position_pytorch3d)

The true camera position is:
[ 0.6659276 -0.26834545 -1.02891755]
The camera position calculate by Pytorch3d:
tensor([[ 0.6659, -0.2683, -1.0289]])

img_size = (480, 640)
camera = PerspectiveCameras(R=R, T=T, focal_length=intrinsic[0,0], principal_point=torch.tensor([intrinsic[0,2], intrinsic[1,2]]).unsqueeze(0), image_size=(img_size,),in_ndc = False)

lights = PointLights(device=device, location=[[0.0, 0.0, 5.0]])
raster_settings = RasterizationSettings(
image_size=(480,640),
blur_radius=0.0,
faces_per_pixel=1,
perspective_correct=False,
)
renderer = MeshRenderer(
rasterizer=MeshRasterizer(
cameras=camera,
raster_settings=raster_settings
),
shader=SoftPhongShader(
device=device,
cameras=camera,
lights=lights
)
)

Render the cow mesh from each viewing angle

target_images = renderer(mesh, cameras=camera.cuda(), lights=lights)
plt.imshow(target_images[0].cpu())
plt.show()

image
00000_color
but I completly got stuck with understanding the orientation in Pytorch3d.
I tried to import mesh camera pose from the demo provided ny @Eckert-ZJB but it seems unrelevant
image
Am I rigth that OpenGl and Blender use the same coordinate system? The question comes because I have a mesh model and a set of cameras with 4x4 matricies for each and when I whant to render images it gives unexpected results.

@3a1b2c3
Copy link

3a1b2c3 commented Jan 3, 2023

Would love some native support for this pretty please. Shouldnt it be rotated around y rather than z?

image

@bottler
Copy link
Contributor

bottler commented Jan 4, 2023

@3a1b2c3 This is an old, closed issue. If you want help, especially on a new question, please open a new issue. I don't understand your question. It does look like the difference in views on the diagram is approximately around y.

@3a1b2c3
Copy link

3a1b2c3 commented Jan 4, 2023

Thanks, is there a place for feature request?

@bottler
Copy link
Contributor

bottler commented Jan 4, 2023

Just create a new issue asking for the feature :).

1 similar comment
@bottler
Copy link
Contributor

bottler commented Jan 4, 2023

Just create a new issue asking for the feature :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
how to How to use PyTorch3D in my project
Projects
None yet
Development

No branches or pull requests

6 participants