Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

torch.optim.LBFGS isuue #126625

Closed
komingsu opened this issue May 18, 2024 · 2 comments
Closed

torch.optim.LBFGS isuue #126625

komingsu opened this issue May 18, 2024 · 2 comments

Comments

@komingsu
Copy link

🐛 Describe the bug

I apologize for not including all of the code, but the main content of the code is as follows.

    def net_output(self, x, y):
        """
        [input] x, y [Coordinates]

        [output] u, v, p, stress x, y, z
        """
        # Implementation based on self.forward
        xy = torch.cat([x, y], dim=1)  # Concatenate the input data
        uv_pred = self.model(xy)  # Perform prediction using the modified MLP

        # Separate u, v, p, s11, s22, s12 values from uv_pred
        psi = uv_pred[:,0:1]
        p = uv_pred[:,1:2]
        s11 = uv_pred[:, 2:3]
        s22 = uv_pred[:, 3:4]
        s12 = uv_pred[:, 4:5]
        
        u = torch.autograd.grad(psi.sum(), y, create_graph=True)[0]
        v = -torch.autograd.grad(psi.sum(), x, create_graph=True)[0]
        return u, v, p, s11, s22, s12

    def net_func(self, x, y):
        """
        [input]
            x (torch.Tensor): The input tensor for x-coordinate.
            y (torch.Tensor): The input tensor for y-coordinate.

        [output]
            Tuple[torch.Tensor]: f_u, f_v, f_s11, f_s22, f_s12, f_p, and uvp_pred[:3].
        """
        u, v, p, s11, s22, s12 = self.net_output(x, y)

        u_x = torch.autograd.grad(u, x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
        u_y = torch.autograd.grad(u, y, grad_outputs=torch.ones_like(u), create_graph=True)[0]
        v_x = torch.autograd.grad(v, x, grad_outputs=torch.ones_like(v), create_graph=True)[0]
        v_y = torch.autograd.grad(v, y, grad_outputs=torch.ones_like(v), create_graph=True)[0]

        s11_x = torch.autograd.grad(s11, x, grad_outputs=torch.ones_like(s11), create_graph=True)[0]
        s12_y = torch.autograd.grad(s12, y, grad_outputs=torch.ones_like(s12), create_graph=True)[0]
        s22_y = torch.autograd.grad(s22, y, grad_outputs=torch.ones_like(s22), create_graph=True)[0]
        s12_x = torch.autograd.grad(s12, x, grad_outputs=torch.ones_like(s12), create_graph=True)[0]

        f_u = self.pde_config["rho"] * (u * u_x + v * u_y) - (s11_x + s12_y)
        f_v = self.pde_config["rho"] * (u * v_x + v * v_y) - (s12_x + s22_y)
        f_s11 = -p + 2 * self.pde_config["mu"] * u_x - s11
        f_s22 = -p + 2 * self.pde_config["mu"] * v_y - s22
        f_s12 = self.pde_config["mu"] * (u_y + v_x) - s12

        f_p = p + (s11 + s22) / 2

        return f_u, f_v, f_s11, f_s22, f_s12, f_p

    def loss_fn(self, X_c, Y_c, wall, inlet, outlet):
        f_u, f_v, f_s11, f_s22, f_s12, f_p = self.net_func(X_c, Y_c)

        u_INLET, v_INLET, _, _, _, _ = self.net_output(inlet[:, 0:1], inlet[:, 1:2])
        _, _, p_OUTLET, _, _, _ = self.net_output(outlet[:, 0:1], outlet[:, 1:2])
        u_WALL, v_WALL, _, _, _, _ = self.net_output(wall[:, 0:1], wall[:, 1:2])

        loss_f = torch.mean(f_u**2 + f_v**2 + f_s11**2 + f_s22**2 + f_s12**2 + f_p**2)
        
        loss_INLET = torch.mean((u_INLET - inlet[:, 2:3])**2 + (v_INLET - inlet[:, 3:4])**2)
        loss_WALL = torch.mean(u_WALL**2 + v_WALL**2)  # WALL에서는 u, v가 0이어야 함
        loss_OUTLET = torch.mean((p_OUTLET - torch.zeros_like(p_OUTLET))**2)  # OUTLET에서는 p가 0이어야 함

        total_loss = loss_f + 2*(loss_INLET + loss_WALL + loss_OUTLET)
        return total_loss
        
    def train(self):
        """
        some code
        """    
        def closure():
            optimizer_lbfgs.zero_grad()
            total_loss = self.loss_fn(self.X_c,self.Y_c,self.wall,self.inlet,self.outlet)
            total_loss.backward()
            self.writer.add_scalar('Loss/train', total_loss.item(), self.iter)
            self.iter += 1
            return total_loss

        optimizer_lbfgs.step(closure)

Versions

I am using PyTorch version '2.2.0', and I encountered the following error:

   256         view = p.grad.to_dense().view(-1)
    257     else:
--> 258         view = p.grad.view(-1)
    259     views.append(view)
    260 return torch.cat(views, 0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

When I replaced .view(-1) with .reshape(-1) in the code, it worked without any issues.

Thank you for reading.

@douyipu
Copy link

douyipu commented May 19, 2024

GPT4:

This problem occurs because the .view() method requires the tensor to be contiguous in memory. If the tensor's underlying data is not stored contiguously, then .view() will fail with an error message like the one you received. This often happens after performing operations that change the layout of the tensor in memory, such as transposing, slicing, or using operations like torch.cat.

When you replace .view(-1) with .reshape(-1), the issue is resolved because .reshape() can handle non-contiguous tensors by internally making a copy of the data if necessary, ensuring that the resultant tensor is contiguous.

Let me explain.

def net_output(self, x, y):
    # Implementation based on self.forward
    xy = torch.cat([x, y], dim=1)  # Concatenate the input data
    uv_pred = self.model(xy)  # Perform prediction using the modified MLP

    p = uv_pred[:,1:2]

'''
omit other irrelevant code
'''

view = p.grad.view(-1)

As we can see, the uv_pred is a tensor contains the prediction of your MLP model. Normally, uv_pred is a brand new tensor, so it would be a contiguous tensor. Then tensor p is created and assigned value of uv_pred[:,1:2]. However, tensor p is NOT actually created in the storage. And tensor p is NOT a contiguous tensor.

If uv_pred is:

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

Then the storage of uv_pred in computer is:

+-------+  <-- uv_pred's data pointer    offset: 0
|   1   |      size: torch.Size([3, 3])  strides: (3, 1)
+-------+ 
|   2   |
+-------+
|   3   |
+-------+
|   4   |
+-------+
|   5   |
+-------+
|   6   |
+-------+
|   7   |
+-------+
|   8   |
+-------+
|   9   |
+-------+

Because p = uv_pred[:,1:2], so p is:

tensor([[2],
        [5],
        [8]])

Then the storage of p in computer is:

+-------+  <-- uv_pred's data pointer    offset: 0
|   1   |      size: torch.Size([3, 3])  strides: (3, 1)
+-------+  <-- p's data pointer          offset: 1
|  *2*  |      size: torch.Size([3, 1])  strides: (3, 1)
+-------+
|   3   |
+-------+
|   4   |
+-------+
|  *5*  |
+-------+
|   6   |
+-------+
|   7   |
+-------+
|  *8*  |
+-------+
|   9   |
+-------+

So, p is not a contiguous tensor.

>>> p.is_contiguous()
False

Recommend the blog about Pytorch internals: http://blog.ezyang.com/2019/05/pytorch-internals/

image

@drisspg
Copy link
Contributor

drisspg commented May 20, 2024

I am going to close this, @douyipu provided a good description of the data layout of tensors. And specifically on what can or can't be a view. If you have further questions I would suggest moving this convo over to dev-discuss:
https://dev-discuss.pytorch.org/

@drisspg drisspg closed this as completed May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants