boundary number error

Question

boundary number error

bmemlh opened this issue 3 months ago · 1 comments

Discussed in #1788

^{Originally posted by bmemlh June 29, 2024}
Hello everyone,

I have an error related to number of boundary points for the problem. If the boundary number is increased until approximately 100000, no error emerges. However, when the boundary number is slightly increased from this point an error message pops out, as shown below:

_

ValueError Traceback (most recent call last)
Cell In[13], line 10
8 bc_C_n2 = dde.icbc.NeumannBC(geomtime, lambda x:0.0, boundary5)
9 ic_C = dde.icbc.IC(geomtime, lambda x:C0/C_scale, initial, component=0)
---> 10 data = dde.data.TimePDE(geomtime, pde, [bc_C_d1, bc_C_d2, bc_C_d3, bc_C_n1, bc_C_n2, ic_C], num_domain=300000, num_boundary=110000, num_test=100000, num_initial=100000)

File ~\anaconda3\lib\site-packages\deepxde\data\pde.py:322, in TimePDE.init(self, geometryxtime, pde, ic_bcs, num_domain, num_boundary, num_initial, train_distribution, anchors, exclusions, solution, num_test, auxiliary_var_function)
306 def init(
307 self,
308 geometryxtime,
(...)
319 auxiliary_var_function=None,
320 ):
321 self.num_initial = num_initial
--> 322 super().init(
323 geometryxtime,
324 pde,
325 ic_bcs,
326 num_domain,
327 num_boundary,
328 train_distribution=train_distribution,
329 anchors=anchors,
330 exclusions=exclusions,
331 solution=solution,
332 num_test=num_test,
333 auxiliary_var_function=auxiliary_var_function,
334 )

File ~\anaconda3\lib\site-packages\deepxde\data\pde.py:127, in PDE.init(self, geometry, pde, bcs, num_domain, num_boundary, train_distribution, anchors, exclusions, solution, num_test, auxiliary_var_function)
124 self.test_x, self.test_y = None, None
125 self.train_aux_vars, self.test_aux_vars = None, None
--> 127 self.train_next_batch()
128 self.test()

File ~\anaconda3\lib\site-packages\deepxde\utils\internal.py:38, in run_if_all_none..decorator..wrapper(self, *args, **kwargs)
36 x = [getattr(self, a) for a in attr]
37 if all(i is None for i in x):
---> 38 return func(self, *args, **kwargs)
39 return x if len(x) > 1 else x[0]

File ~\anaconda3\lib\site-packages\deepxde\data\pde.py:175, in PDE.train_next_batch(self, batch_size)
173 @run_if_all_none("train_x", "train_y", "train_aux_vars")
174 def train_next_batch(self, batch_size=None):
--> 175 self.train_x_all = self.train_points()
176 self.bc_points() # Generate self.num_bcs and self.train_x_bc
177 if self.bcs and config.hvd is not None:

File ~\anaconda3\lib\site-packages\deepxde\utils\internal.py:38, in run_if_all_none..decorator..wrapper(self, *args, **kwargs)
36 x = [getattr(self, a) for a in attr]
37 if all(i is None for i in x):
---> 38 return func(self, *args, **kwargs)
39 return x if len(x) > 1 else x[0]

File ~\anaconda3\lib\site-packages\deepxde\data\pde.py:338, in TimePDE.train_points(self)
336 @run_if_all_none("train_x_all")
337 def train_points(self):
--> 338 X = super().train_points()
339 if self.num_initial > 0:
340 if self.train_distribution == "uniform":

File ~\anaconda3\lib\site-packages\deepxde\utils\internal.py:38, in run_if_all_none..decorator..wrapper(self, *args, **kwargs)
36 x = [getattr(self, a) for a in attr]
37 if all(i is None for i in x):
---> 38 return func(self, *args, **kwargs)
39 return x if len(x) > 1 else x[0]

File ~\anaconda3\lib\site-packages\deepxde\data\pde.py:265, in PDE.train_points(self)
263 tmp = self.geom.uniform_boundary_points(self.num_boundary)
264 else:
--> 265 tmp = self.geom.random_boundary_points(
266 self.num_boundary, random=self.train_distribution
267 )
268 X = np.vstack((tmp, X))
269 if self.anchors is not None:

File ~\anaconda3\lib\site-packages\deepxde\geometry\timedomain.py:147, in GeometryXTime.random_boundary_points(self, n, random)
145 t = self.timedomain.random_points(n, random=random)
146 t = np.random.permutation(t)
--> 147 return np.hstack((x, t))

File <array_function internals>:180, in hstack(*args, **kwargs)

File ~\anaconda3\lib\site-packages\numpy\core\shape_base.py:345, in hstack(tup)
343 return _nx.concatenate(arrs, 0)
344 else:
--> 345 return _nx.concatenate(arrs, 1)

File <array_function internals>:180, in concatenate(*args, **kwargs)

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 109999 and the array at index 1 has size 110000

_

In addition, when I assign the number of boundary points as 300000, the inconsistency between the matrices are varied as below:

ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 0, the array at index 0 has size 299997 and the array at index 1 has size 300000

I don't have any idea why is this happening, please help me
Thank you!

Answer 1 · 2024-07-27T10:51:11.000Z

I tried this example with exactly these points: num_domain=300000, num_boundary=110000, num_test=100000, num_initial=100000. It is working fine and starts the training. You can ignore the CUDA error as my GPU can't accommodate this much points.

╰─ python try.py
Using backend: pytorch

/home/hell/anaconda3/lib/python3.9/site-packages/skopt/sampler/sobol.py:246: UserWarning: The balance properties of Sobol' points require n to be a power of 2. 0 points have been previously generated, then: n=0+300002=300002. 
  warnings.warn("The balance properties of Sobol' points require "
/home/hell/anaconda3/lib/python3.9/site-packages/skopt/sampler/sobol.py:246: UserWarning: The balance properties of Sobol' points require n to be a power of 2. 0 points have been previously generated, then: n=0+110002=110002. 
  warnings.warn("The balance properties of Sobol' points require "
/home/hell/anaconda3/lib/python3.9/site-packages/skopt/sampler/sobol.py:246: UserWarning: The balance properties of Sobol' points require n to be a power of 2. 0 points have been previously generated, then: n=0+100002=100002. 
  warnings.warn("The balance properties of Sobol' points require "
Warning: 100000 points required, but 100352 points sampled.
Compiling model...
'compile' took 0.000165 s

Training model...

Step      Train loss                        Test loss                         Test metric   
0         [1.77e+01, 6.23e-02, 3.59e-01]    [1.70e+01, 6.23e-02, 3.59e-01]    [8.92e-01]    
Traceback (most recent call last):
  File "/home/hell/Desktop/temp/zsh/try.py", line 75, in <module>
    losshistory, train_state = model.train(iterations=10000)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/deepxde/utils/internal.py", line 22, in wrapper
    result = f(*args, **kwargs)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 573, in train
    self._train_sgd(iterations, display_every)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 590, in _train_sgd
    self._train_step(
  File "/home/hell/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 494, in _train_step
    self.train_step(inputs, targets)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 329, in train_step
    self.opt.step(closure)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
    return func(*args, **kwargs)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/torch/optim/adam.py", line 100, in step
    loss = closure()
  File "/home/hell/anaconda3/lib/python3.9/site-packages/deepxde/model.py", line 326, in closure
    total_loss.backward()
  File "/home/hell/anaconda3/lib/python3.9/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/hell/anaconda3/lib/python3.9/site-packages/torch/autograd/__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA out of memory. Tried to allocate 88.00 MiB (GPU 0; 3.94 GiB total capacity; 2.43 GiB already allocated; 49.19 MiB free; 2.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF