/MidcurveNN

Computation of Midcurve of Thin Polygons using Neural Networks

Primary LanguageTeXMIT LicenseMIT

MidcurveNN

Midcurve by Neural Networks

MidcurveNN is a project aimed at solving the challenging problem of finding the midcurve of a 2D closed shape using neural networks. The primary goal is to transform a closed polygon, represented by a set of points or connected lines, into another set of points or connected lines, allowing for the possibility of open or branched polygons in the output. To run the project, follow the provided instructions in the "Instructions to Run" section.

Midcurve

  • Goal: Given a 2D closed shape (closed polygon), find its midcurve (polyline, closed or open).
  • Input: Set of points or set of connected lines, non-intersecting, simple, convex, closed polygon.
  • Output: Another set of points or set of connected lines, open/branched polygons possible.

If you are interested in working/contributing to this project voluntarily, do have a look at the issues

Instructions to Run

cd src
conda env create -f environment.yml
activate midcurvenn
python simpleencoderdecoder\main_simple_encoderdecoder.py

Thoughts

What is it?

Graph Summarization/Dimension-Reduction/Compression: Reducing a large graph to a smaller graph preserving its underlying structure, similar to text summarization, which attempts to keep the essence.

Representation Issue

  • Shapes can not be modeled as sequences. Although polygon shape L may appear as sequence of points, it is not.
  • All shapes can not be drawn without lifting a pencil, which is possible for sequences. Say, Shapes like Y or concentric O, cannot be modeled as sequences. So, Midcurve transformation cannot be modeled as Sequence 2 Sequence network.
  • How to represent a geometric figure to feed to any Machine/Deep Learning problem, as they need numeric data in vector form?
  • How to convert geometric shapes (even after restricting the domain to 2D linear profile shapes) to vectors?
  • Closest data structure is graph, but thats predominantly topological and not geometrical. Meaning, graphs represent only connectivity and not spatial positions. Here, nodes would have coordinates and arcs would have curved-linear shape. So, even Graph Neural Networks, which convolute neighbors around a node and pool the output to generate vectors, are not suitable.
  • Need RnD to come up with a way to generate geometric-graph embedding that will convolute at nodes (having coordinates) around arcs (having geometry, say, a set of point coordinates), then pool them to form a meaningful representation. Real crux would be how to formulate pooling to aggregate all incident curves info plus node coordinates info into a single number!!

Variable Lengths Issue

  • This is a dimension-reduction problem wherein, in 2D, the input constitutes the sketch profile (parametrically 2D), while the output represents the midcurve (parametrically 1D). Input points are ordered, mostly forming a closed loop known as a manifold. Conversely, output points may lack a defined order and can exhibit branches, referred to as non-manifold structures.
  • It poses a variable input and variable output challenge, given that the number of points and lines differs in the input and output.
  • This presents a network 2 network problem (not Sequence to Sequence) with variable-sized inputs and outputs.
  • For Encoder-Decoder networks like those in Tensorflow, fixed-length inputs are necessary. Introducing variable lengths poses a significant hurdle as padding with a distinct unused value, such as 0,0, is not feasible, considering it could represent a valid point.
  • The limitation with Seq2Seq models is notable; the input polygons and output branched midcurves are not linearly connected. They may exhibit loops or branches, necessitating further consideration for a more suitable solution. (More details on limitations below)
  • Instead of adopting a point list as input/output, let's consider the well-worked format of images. These images are standardized to a constant size, say 64x64 pixels. The sketch profile is represented as a color profile in a bitmap (black and white only), and similarly, the midcurve appears in the output bitmap. This image-based format allows for the application of LSTM encoder-decoder Seq2Seq models. To diversify training data, one can introduce variations by shifting, rotating, and scaling both input and output. The current focus is on 2D sketch profiles, limited to linear segments within a single, simple polygon without holes.
  • The vectorization process involves representing each point as 2 floats/ints. Consequently, the total input vector for a polygon with 'm' points is 2m floats/ints. For closed polygons, the first point is repeated as the last one. The output is a vector of 2n points, repeating the last point in the case of a closed figure. Preparing training data involves using data files from MIDAS. For scalability, input and output can be scaled with different factors, and entries can be randomly shuffled. Determining the maximum number of points in a profile establishes a fixed length for both input and output.
  • Resources and further exploration for Seq2Seq models can be found at Tensorflow Seq2Seq Tutorial, Seq2Seq Video Tutorial, A Neural Representation of Sketch Drawings, and Sketch RNN GitHub Repository.
  • Additionally, consider incorporating plotting capabilities to visualize polygons and their midcurves, facilitating easy debugging and testing of unseen figures.

Dilution to Images

  • Images of geometric shapes address both, representation as well as variable-size issue. Big dilution is that, true geometric shapes are like Vector images, whereas images used here would be of Raster type. Approximation has crept in.

  • Even after modeling, the predicted output needs to be post-processed to bring to geometric form. Challenging again.

  • Thus, this project is divided into two phases:

    • Phase I: Image to Image transformation learning
      • Img2Img: i/o fixed size 100x100 bitmaps
      • Populate many by scaling/rotating/translating both io shapes within the fixed size
      • Use Encoder Decoder like Semantic Segmentation or Pix2Pix of IMages to learn dimension reduction
    • Phase II: Geometry to Geometry transformation learning
      • Build both, input and output polyline graphs with (x,y) coordinates as node features and edges with node id pairs mentioned. For poly-lines, edges being lines, no need to store geometric intermediate points as features, else for curves, store say, sampled fixed 'n' points.
      • Build Image-Segmentation like Encoder-Decoder network, given Graph Convolution Layers from DGL in place of usual Image-based 2D convolution layer, in the usual pytorch encoder-decoder model.
      • Generate variety of input-output polyline pairs, by using geometric transformations (and not image transformations as done in Phase I).
      • See if Variational Graph Auto-Encoders https://github.com/dmlc/dgl/tree/master/examples/pytorch/vgae can help.
    • Phase III: Using Large Language Models (LLMs)
      • With representation of Profile and Midcurve in form of text, json-brep, so that LLMs can be leveraged
      • Prepapre instruct-based fine-tuning dataset which can be used using Ludwig or classical Hugging Face way of fine-tuning
  • Phase I has been implemented in a simplistic manner.

  • Phase II is trying to look for a good representation to store geometry/graph/network as text so that NLP (Natural Language Techniques) can be applied. Paper: "Talk like a graph: encoding graphs for large language models" surveys many such representations and benchmarks them, but none of them looked appropriate for geometry. Need to wait till such graph 2 graph networks are available.

  • Phase III : We leverage a geometry representation similar to that found in 3D B-rep (Boundary representation), but in 2D. It can be shown as:

{
	'ShapeName': 'I',
	'Profile': [(5.0, 5.0), (10.0, 5.0), (10.0, 20.0), (5.0, 20.0)],
	'Midcurve': [(7.5, 5.0), (7.5, 20.0)],
	'Profile_brep': {
				'Points': [(5.0, 5.0), (10.0, 5.0), (10.0, 20.0),(5.0, 20.0)], # list of (x,y) coordinates
				'Lines': [[0, 1], [1, 2], [2, 3], [3, 0]], # list of point ids (ie index in the Points list)
                'Segments': [[0, 1, 2, 3]] # list of line ids (ie index in Lines list)
				},
	'Midcurve_brep': {
				'Points': [(7.5, 5.0), (7.5, 20.0)],
				'Lines': [[0, 1]],
                'Segments': [[0]]
				},				
}

Column information is as below:

  • ShapeName (text): name of the shape. Just for visualization/reports.
  • Profile(text): List of Points coordinates (x,y) representing outer profile shape, typically closed.
  • Midcurve(text): List of Points coordinates (x,y) representing inner midcurve shape, typically open.
  • Profile_brep(text): Dictionary in Brep format representing outer profile shape, typically closed.
  • Midcurve_brep(text): Dictionary in Brep format representing inner midcurve shape, typically open.

Each Segment is a continuous list of lines. In case of, say Midcurve-T, as there is a intersection, we can treat each line in a separate segment. In case of 'Profile O', there will be two segments, one for outer lines and another for inner lines. Each line is list of points, for now, linear. Points is list of coordinates (x,y), later can be (x,y,z).

Once we have this brep representations of both, profile and the corresponding midcurve, in the text form, then we can try various machine translation approaches or LLM based fine tuning or few-shots prompt engineering.

One major advantage of text based method over image based method is that image output still has stray pixels, cleaning which will be a complex task. But text method has exact points. It may just give odd lines, which can be removed easily.

For a wider discussion on Use of Deep Learning in CAD (Computer-aided Design), refer Notes

Publications/Talks

Citations

Regarding the state of the art, the closest work related to this paper is   from   Kulkarni[28].   
Kulkarni   proposed MidcurveNN, an encoder-decoder neural network to extract mid-curves from 
polygonal 2D shapes. The principle is to train the network with both a pixel image of the 
polygonal shape and of the final desired mid-curves. Although in an early  stage  of  research,  
the  network  is  able  to  produce reasonably  well  the mid-curves  of  simple  L-shaped polygons.  
The  limitations  of  this  work  remain  in  the noisiness of the produced results. 
It has not been tested on a large diversity of shapes and is performed on the full shape 
globally potentially requiring a high-resolution pixel grid for large models.

[28]Y.H. Kulkarni, MIDCURVENN: Encoder-decoder neural network for computing midcurve of a thin polygon, 
in: Open Data Sci. Conf.,2019

Disclaimer:

Author (yogeshkulkarni@yahoo.com) gives no guarantee of the results of the program. It is just a fun script. Lot of improvements are still to be made. So, don’t depend on it at all.