torch.optim.LBFGS isuue #126625

komingsu · 2024-05-18T21:15:04Z

🐛 Describe the bug

I apologize for not including all of the code, but the main content of the code is as follows.

    def net_output(self, x, y):
        """
        [input] x, y [Coordinates]

        [output] u, v, p, stress x, y, z
        """
        # Implementation based on self.forward
        xy = torch.cat([x, y], dim=1)  # Concatenate the input data
        uv_pred = self.model(xy)  # Perform prediction using the modified MLP

        # Separate u, v, p, s11, s22, s12 values from uv_pred
        psi = uv_pred[:,0:1]
        p = uv_pred[:,1:2]
        s11 = uv_pred[:, 2:3]
        s22 = uv_pred[:, 3:4]
        s12 = uv_pred[:, 4:5]
        
        u = torch.autograd.grad(psi.sum(), y, create_graph=True)[0]
        v = -torch.autograd.grad(psi.sum(), x, create_graph=True)[0]
        return u, v, p, s11, s22, s12

    def net_func(self, x, y):
        """
        [input]
            x (torch.Tensor): The input tensor for x-coordinate.
            y (torch.Tensor): The input tensor for y-coordinate.

        [output]
            Tuple[torch.Tensor]: f_u, f_v, f_s11, f_s22, f_s12, f_p, and uvp_pred[:3].
        """
        u, v, p, s11, s22, s12 = self.net_output(x, y)

        u_x = torch.autograd.grad(u, x, grad_outputs=torch.ones_like(u), create_graph=True)[0]
        u_y = torch.autograd.grad(u, y, grad_outputs=torch.ones_like(u), create_graph=True)[0]
        v_x = torch.autograd.grad(v, x, grad_outputs=torch.ones_like(v), create_graph=True)[0]
        v_y = torch.autograd.grad(v, y, grad_outputs=torch.ones_like(v), create_graph=True)[0]

        s11_x = torch.autograd.grad(s11, x, grad_outputs=torch.ones_like(s11), create_graph=True)[0]
        s12_y = torch.autograd.grad(s12, y, grad_outputs=torch.ones_like(s12), create_graph=True)[0]
        s22_y = torch.autograd.grad(s22, y, grad_outputs=torch.ones_like(s22), create_graph=True)[0]
        s12_x = torch.autograd.grad(s12, x, grad_outputs=torch.ones_like(s12), create_graph=True)[0]

        f_u = self.pde_config["rho"] * (u * u_x + v * u_y) - (s11_x + s12_y)
        f_v = self.pde_config["rho"] * (u * v_x + v * v_y) - (s12_x + s22_y)
        f_s11 = -p + 2 * self.pde_config["mu"] * u_x - s11
        f_s22 = -p + 2 * self.pde_config["mu"] * v_y - s22
        f_s12 = self.pde_config["mu"] * (u_y + v_x) - s12

        f_p = p + (s11 + s22) / 2

        return f_u, f_v, f_s11, f_s22, f_s12, f_p

    def loss_fn(self, X_c, Y_c, wall, inlet, outlet):
        f_u, f_v, f_s11, f_s22, f_s12, f_p = self.net_func(X_c, Y_c)

        u_INLET, v_INLET, _, _, _, _ = self.net_output(inlet[:, 0:1], inlet[:, 1:2])
        _, _, p_OUTLET, _, _, _ = self.net_output(outlet[:, 0:1], outlet[:, 1:2])
        u_WALL, v_WALL, _, _, _, _ = self.net_output(wall[:, 0:1], wall[:, 1:2])

        loss_f = torch.mean(f_u**2 + f_v**2 + f_s11**2 + f_s22**2 + f_s12**2 + f_p**2)
        
        loss_INLET = torch.mean((u_INLET - inlet[:, 2:3])**2 + (v_INLET - inlet[:, 3:4])**2)
        loss_WALL = torch.mean(u_WALL**2 + v_WALL**2)  # WALL에서는 u, v가 0이어야 함
        loss_OUTLET = torch.mean((p_OUTLET - torch.zeros_like(p_OUTLET))**2)  # OUTLET에서는 p가 0이어야 함

        total_loss = loss_f + 2*(loss_INLET + loss_WALL + loss_OUTLET)
        return total_loss
        
    def train(self):
        """
        some code
        """    
        def closure():
            optimizer_lbfgs.zero_grad()
            total_loss = self.loss_fn(self.X_c,self.Y_c,self.wall,self.inlet,self.outlet)
            total_loss.backward()
            self.writer.add_scalar('Loss/train', total_loss.item(), self.iter)
            self.iter += 1
            return total_loss

        optimizer_lbfgs.step(closure)

Versions

I am using PyTorch version '2.2.0', and I encountered the following error:

   256         view = p.grad.to_dense().view(-1)
    257     else:
--> 258         view = p.grad.view(-1)
    259     views.append(view)
    260 return torch.cat(views, 0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

When I replaced .view(-1) with .reshape(-1) in the code, it worked without any issues.

Thank you for reading.

The text was updated successfully, but these errors were encountered:

douyipu · 2024-05-19T13:18:05Z

GPT4:

This problem occurs because the .view() method requires the tensor to be contiguous in memory. If the tensor's underlying data is not stored contiguously, then .view() will fail with an error message like the one you received. This often happens after performing operations that change the layout of the tensor in memory, such as transposing, slicing, or using operations like torch.cat.

When you replace .view(-1) with .reshape(-1), the issue is resolved because .reshape() can handle non-contiguous tensors by internally making a copy of the data if necessary, ensuring that the resultant tensor is contiguous.

Let me explain.

def net_output(self, x, y):
    # Implementation based on self.forward
    xy = torch.cat([x, y], dim=1)  # Concatenate the input data
    uv_pred = self.model(xy)  # Perform prediction using the modified MLP

    p = uv_pred[:,1:2]

'''
omit other irrelevant code
'''

view = p.grad.view(-1)

As we can see, the uv_pred is a tensor contains the prediction of your MLP model. Normally, uv_pred is a brand new tensor, so it would be a contiguous tensor. Then tensor p is created and assigned value of uv_pred[:,1:2]. However, tensor p is NOT actually created in the storage. And tensor p is NOT a contiguous tensor.

If uv_pred is:

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

Then the storage of uv_pred in computer is:

+-------+  <-- uv_pred's data pointer    offset: 0
|   1   |      size: torch.Size([3, 3])  strides: (3, 1)
+-------+ 
|   2   |
+-------+
|   3   |
+-------+
|   4   |
+-------+
|   5   |
+-------+
|   6   |
+-------+
|   7   |
+-------+
|   8   |
+-------+
|   9   |
+-------+

Because p = uv_pred[:,1:2], so p is:

tensor([[2],
        [5],
        [8]])

Then the storage of p in computer is:

+-------+  <-- uv_pred's data pointer    offset: 0
|   1   |      size: torch.Size([3, 3])  strides: (3, 1)
+-------+  <-- p's data pointer          offset: 1
|  *2*  |      size: torch.Size([3, 1])  strides: (3, 1)
+-------+
|   3   |
+-------+
|   4   |
+-------+
|  *5*  |
+-------+
|   6   |
+-------+
|   7   |
+-------+
|  *8*  |
+-------+
|   9   |
+-------+

So, p is not a contiguous tensor.

>>> p.is_contiguous()
False

Recommend the blog about Pytorch internals: http://blog.ezyang.com/2019/05/pytorch-internals/

drisspg · 2024-05-20T16:32:22Z

I am going to close this, @douyipu provided a good description of the data layout of tensors. And specifically on what can or can't be a view. If you have further questions I would suggest moving this convo over to dev-discuss:
https://dev-discuss.pytorch.org/

drisspg closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torch.optim.LBFGS isuue #126625

torch.optim.LBFGS isuue #126625

komingsu commented May 18, 2024

douyipu commented May 19, 2024

drisspg commented May 20, 2024

torch.optim.LBFGS isuue #126625

torch.optim.LBFGS isuue #126625

Comments

komingsu commented May 18, 2024

🐛 Describe the bug

Versions

douyipu commented May 19, 2024

drisspg commented May 20, 2024