Abstract
A number of machine learning methods have been recently proposed to circumvent the high computational cost of the gradient-based topology optimization solvers. By and large, these methods show tight generalizability to unseen boundary and external loading conditions, require prohibitively large datasets for training, and do not take into consideration topological constraints of the predictions, which results in solutions with unpredictable connectivity. To address these limitations, we propose a design exploration framework for topology optimization that exploits the knowledge transfer capability of the transfer learning methods and the generative power of conditional generative adversarial networks (GANs). We show that the proposed framework significantly exceeds the generalization ability of current methods. Moreover, the proposed architecture is capable of reusing the knowledge learned on low-resolution and computationally inexpensive samples, which notably reduces both the size of the required high-resolution training datasets and the demand on the computational infrastructure needed to generate the training data. Finally, we propose and evaluate novel approaches to improve the structural connectivity of the predicted optimal topology by including topological metrics into the loss function. We show that by including the bottleneck distance between the persistence diagrams of the predicted and ground truth structures, we significantly improve the connectivity of the prediction. Together, our results reveal the ability of generative adversarial networks implemented in a transfer learning environment to serve as powerful and practical real-time design exploration tools in topology optimization.
Introduction
Topology optimization (TO) is an iterative procedure that outputs the optimum material distribution of a structure within a specified domain subject to prescribed boundary conditions and external loads [1,2]. Typically, gradient-based techniques combined with model parameter continuation are used to promote the convergence to optimal solutions. This process often requires hundreds of iterations, each involving complete finite element solutions, and collectively, these iterations are computationally extremely demanding [3] for any non-trivial problem. For example, the SIMP solver presented in Ref. [4] needs about 9 min (1.85 s per iteration) to solve the simple MMB 2D rectangular beam with resolution 300 × 100, and the computational cost grows exponentially with the resolution and the dimension of the space. Since these methods produce one optimal solution for every formulation of the problem, they can be used in principle to explore the design space [5] to investigate how the optimal solution changes as the problem formulation changes. In addition, evaluating the effects of the uncertainty in the input [6] substantially raises the complexity and computational cost.
In line with the increased popularity and capability of machine learning algorithms in engineering and science,2 various machine learning approaches have been proposed over the past few years to improve the computational cost of gradient-based topology optimization. The next section provides a detailed discussion of the current work in this area. Notably, these methods make their predictions much more efficiently than the gradient-based optimization solvers, but they (1) universally require vast training datasets that are very difficult to produce in 2D and prohibitively difficult in 3D for any problem of real complexity; (2) have not been shown to generalize well outside of their domain, and (3) invariably lack the ability to enforce the topological connectivity3 of their predictions. In addition, most published methods have been tested only on simple 2D rectangular domains with one external load, with the exception, perhaps, of Ref. [9] as well as of our prior work [10]. In addition, only a few of the proposed algorithms have been shown to handle boundary conditions that are not part of the training data, and in all these cases, the unseen boundary and loading conditions are in the neighborhood of those used in the training data [11,12].
Without a doubt, a practical machine-learning based topology optimization tool that adequately supports the exploration of the associated design space, possibly in the presence of uncertainties [5], needs to address these limitations.
Generative Adversarial Networks (GANs) are one of the most promising developments in machine learning because they have the ability to learn low- and high-level patterns in datasets and use them to generate new data that bears a (statistical) resemblance to the original dataset. GANs have been used to generate realistic artificial images, and these same generative properties inspired researchers to use GANs to make predictions of optimal topologies. However, standard GANs need very large training data sets to be able to generate new images, and their training can be notoriously difficult due to mode collapses and diminished gradients.
In this paper, we combine the generative power of conditional GANs with the knowledge transfer capabilities of transfer learning methods to predict optimal topologies for unseen boundary conditions. We show that the knowledge transfer capabilities embedded in the design of the proposed architecture significantly reduce the size of the training dataset compared to the traditional deep learning neural or adversarial networks and that our architecture significantly exceeds the generalization ability of current data-driven topology optimization methods. In fact, we show that by using the same amount of data as in Ref. [10], the proposed GANTL can generate the structures with new types of boundary conditions as well as with up to a 10-fold increase in the number of external loads that are not included in the training data, thus further reducing the need to fine-tune the target model for unseen boundary and loading conditions. Specifically, we train GANTL on one external force on a rectangular domain and show that it maintains its predictive performance on one, two, five, and even ten external forces and other domain geometries without additional retraining. Finally, we propose and evaluate approaches to improve the structural connectivity of the predicted optimal topology by including topological metrics into the loss function. Specifically, we show that adding a term to the loss function that depends on the bottleneck distance between the persistence diagrams of the predicted and ground truth structures significantly improves the connectivity of the prediction.
To the best of our knowledge, this is the first time that a topologically aware loss function is combined with a GAN architecture to tackle the connectivity issue of the predicted optimum structures. Together, our results reveal the ability of generative adversarial networks implemented in a transfer learning environment to serve as powerful and practical real-time design exploration tools in topology optimization.
Background
The significant computational cost of the traditional gradient-based TO prompted the investigation of various alternatives to the highly iterative process. For instance, parallelized algorithms running on high-performance computing platforms with a multigrid preconditioner were discussed in Ref. [13] to optimize “ultra large” scale shell models composed of over 11 M elements in 17 h using 800 compute cores. The parallelized level set method [14] solves a 96 × 48 × 24 cantilever beam in 45 min on a desktop computer with 4 Intel i7-6700 CPU cores and 16 GB memory. Moreover, the work described in Ref. [15] employs the Portable and Extendable Toolkit for Scientific Computing (PETSc) to produce the optimal topology starting from a cantilever beam with 480 × 240 × 240 elements in 14 h using 144 CPU cores.
The high computational cost of the gradient-based TO algorithms prompted the development of a number of machine learning methods that can make predictions almost instantaneously once trained. Recently proposed CNN-based algorithms can predict the optimum structure for low-resolution 2D domains (50 × 50, and 40 × 40, respectively) [16,17] and low-resolution 3D domains (24 × 12 × 12) [9].
Several recent papers in the realm of topology optimization were inspired by the ability of GANs to synthesize artificial images, such as human faces that do not belong to a real person [18]—see also section “Problem Formulation”. Despite this generative power, the published TO approaches built with GANs do not explore this important aspect. For example, conditional Wasserstein generative adversarial networks (CWGAN), which minimize an approximation of the Earth Mover’s distance [19], were used in Ref. [20] for 2D TO. That method was trained on 3024 cantilever beams (120 × 120) with design variables that include the volume fraction, penalty factor, and radius of the filter and for what appears to be unique boundary and loading conditions with a single-fixed external load. They concatenate their design variables with the noise vector and feed them to their CWGAN to predict the structure that corresponds to the three design variables. Moreover, a two-stage hierarchical prediction–refinement GAN-based framework [21] is used to predict the low-resolution near-optimal structure and its corresponding refined structure in high resolution for a single heat sink and single heat source problem. Their training dataset contains 19,800 (9900 low-resolution (40 × 40) and 9900 high-resolution (160 × 160)) training cases, and the method achieves a 9.1 % MSE for high-resolution predictions. It is worth noting that the optimal topologies for this type of heat sink problems have a tree-like structure rather than a graph-like structure of the optimal topologies seen in structural design. Similarly, Ref. [22] describes a two-stage refinement cGAN to predict the high resolution optimum structure for domain with a single force, which is trained with 64,000 low- (32 × 32) resolution and their high- (128 × 128) resolution counterparts. Their training dataset includes a unique rectangular domain, with a single but varying concentrated force and with identical boundary conditions. We note that while promising, these algorithms require large datasets for training for the chosen boundary conditions, do not explore the generalization abilities of the proposed architectures, and do not have a mechanism to enforce or control the topology of their predictions.
Two recent papers do provide some evidence of generalizability. Specifically, the method described in Ref. [11] uses 64,000 cantilever beams with different number of external loads to train a deep CNN to predict the optimum structure for rectangular low-resolution (40 × 80) 2D domains. Their network uses the displacement and the strain field as inputs, and the authors show the ability to generate a solution for the simply supported and the two-span continuous boundary conditions that are not included in the training data set under the same type of external loading. Moreover, a conditional GAN, named TopologyGAN [12], used around 40,000 2D beams (64 × 128) with 38 displacement boundary conditions that appear to be variations of six different types and with a single external load to train their model. Their results display a nearly three times reduction in the MSE on test problems involving boundary conditions that are not part of the training data set and with one external force and favorably compare to those produced by the baseline cGAN. Based on the data presented in the paper, the specific differences between the test and training BCs are unclear. Nevertheless, these papers show the positive impact of encoding the explicit boundary conditions into physical fields that are used as input on the performance of the network prediction. However, both methods do require large training datasets, display the ability to generalize only in the neighborhood of the prescribed boundary and loading conditions, and often produce predictions with disconnected (tree-like) branches because they do not include topological considerations into the formulation.
In our recent paper [10], we introduced a transfer learning approach based on Convolutional Neural Networks (CNNs) that were capable of handling 2D and 3D design domains of various shapes and topologies. We showed that the knowledge transfer capabilities of that network significantly reduced the size of the high-resolution training dataset compared to the state of the art traditional deep learning neural networks. Specifically, we observed a reduction by at least a factor of 6 compared to Ref. [21], which used 9900 for each of the low- and high-resolution datasets compared to our 8000 low-resolution and 1500 high-resolution training cases. Consequently, we showed that the proposed architecture was capable of handling boundary conditions that were unseen to the source network by only fine-tuning the target network with a much smaller training data set. While that method illustrates the knowledge transfer capabilities of transfer learning applied to topology optimization, it requires fine-tuning of the target network for unseen domains and boundary conditions, which, in turn, requires the generation of a new training dataset, albeit much smaller than those required by deep learning methods.
To date, the only known methods to produce well-connected predictions of optimal topologies involve variants of the gradient-based TO methods, which trade computational efficiency for an improved connectivity. For instance, Refs. [23,24] proposed an iterative topology optimization method using neural networks for single material (TOuNN) and multi-material structures (MM-TOuNN) for direct execution of TO. In this work, the NN weights and bias are used to parameterize the density function, thus disconnecting the density from the finite element mesh and obtain a crisp and differentiable material interface and boundary. The NN is used to predict the density values at each iteration, and part of the associated sensitivity analysis was computed via backpropagation. Importantly, the iterations performed by this approach are still driven by FEA. Clearly, this variation of the gradient-based TO is flexible in terms of domains and boundary conditions it can tackle, but remains computationally expensive and appears to produce solutions that are distinct from the optimal solutions generated by SIMP. For example, some test cases from Ref. [23] require twice as much time per iteration compared to SIMP [4].
In this paper, we integrate notions from persistence homology [25,26] to consider the connectivity of the predicted structure and our process is described in the “A Topology-Aware Loss Function” section. Generating predictions with improved topological properties has been recently explored in image processing. For example, Ref. [27] uses feature maps from pre-trained deep convolutional networks obtained for the ground truth and the predicted image to construct a topology-aware loss function tailored to linear structures such as those appearing in roads and material cracks. They showed that by minimizing this loss function, one can significantly improve the topological features of the predicted image. The work described in Ref. [28] used the modified Wasserstein distance between the persistent diagrams corresponding to the ground truth and the prediction to construct the loss function. This method improves the topological properties of the predictions as measured in terms of Betti number4 error between the prediction and ground truth compared to other prior methods.
Contributions.
The data-driven topology optimization method proposed in this paper combines the generative power of conditional GANs with the knowledge transfer capabilities of transfer learning methods. We show that our approach outperforms state-of-the-art methods in terms of (1) required size of the training dataset; (2) generalization abilities to unseen boundary conditions and number of external forces; and (3) the ability to control the connectivity of the prediction on a range of 2D rectangular and non-rectangular domains. One of the key contributions of this work is to show that by using the bottleneck distance between persistence diagrams of the ground truth and predicted structures to augment the loss function, we produce a visibly significant improvement of the topological connectivity of the predicted structures, which can also be quantified in terms of the Betti number error.
Problem Formulation
The dataset that we used in this paper has been generated with available codes implementing the well known SIMP approach [30] for compliance based topology optimization. However, our preliminary experiments indicate that the proposed approach maintains its prediction quality on other physics-based objective functions, such as stress based objective functions [31,32] or those based on heat diffusion [10,21].
Generative Adversarial Networks and Transfer Learning.
Generative Adversarial Networks are one of the most exciting and most promising innovations in machine learning that learn the patterns and statistical distributions of the input data in order to synthesize new samples [33] that do not exist in the original data. The synthesizing ability of GANs has been used in semantic image editing [34], data augmentation [35], and style transfer [36]. Furthermore, one can use the patterns learned by GANs for other data-intensive tasks, such as classification [37].
GANs consist of a generator and a discriminator, which are both neural networks. During training, the generator learns to generate plausible data, and the discriminator learns to distinguish the generator’s fake data from real data and penalizes the generator for producing implausible results. As training progresses, the generator becomes better at generating plausible data, and the discriminator has a harder time distinguishing between real and fake data. Using GANs in practice must overcome several challenges, including those induced by vanishing gradients, mode collapse, and convergence failures. A good and recent review on algorithms, theory, and applications of GANs can be found in Ref. [38].
Deep Neural Networks achieve promising results on image classification problems as long as large datasets are available for training. For example, Xception [39] is trained on 350M images. However, many practical use cases do not have the required abundance of data needed for training of these networks. This is where transfer learning can come in.
As the name implies, transfer learning reuses the knowledge learned by a source network to improve the performance of a target network trained on a different but related task. Transfer learning can significantly increase the network performance for situations when the data is scarce [40].
Recent results of applying transfer learning in conjunction with generative adversarial networks [41–43] show that by transferring pre-trained knowledge, one can effectively reduce the amount of training data in the context of image generation. In addition, this knowledge increases the GAN convergence rate and improves the convergence failures due to the well-known mode collapse [33].
Proposed Generative Adversarial Network Architecture.
As mentioned earlier, GANs consist of a generator and a discriminator, and in our case, both contain a source as well as a target neural networks. The overall architecture of the source generator is inspired by Ref. [44] and separates the subnets handling the feature extraction from the synthesis. The proposed architecture in Ref. [44] was shown to be capable of synthesizing very high-quality images. The synthesis subnet is responsible for generating the optimum structure from a noise vector using the feature maps obtained by the feature extraction subnet. The latter is inspired by SE-Res Net introduced in Ref. [45], but we remove the addition layer [46] from the SE-Res block and use a concatenation layer instead to increase the information flow between the layers and reuse the extracted features [47].
The architecture of our source generator is shown in Fig. 1, and the corresponding input data are described in the next section. The synthesis subnet uses a series of synthesis blocks described in Fig. 2 that help to combine the feature maps of boundary conditions with the previous layer and determine the amount of the feature information that should be retained for the downstream blocks. The SE-Concat block contains a squeeze and excitation (SE) block [45], which improves channel interdependencies and determines the important features of the input, as shown in Fig. 2. SE learns to weigh each feature map based on importance to and effect on the final predictions, unlike typical networks that equally weigh all feature maps. It is shown in Ref. [45] that adding SE blocks can boost the model performance at almost no computational cost.
The source discriminator is built using convolutional layers and a dense layer to decide whether an optimum structure output by the generator is fake, as shown in Fig. 3. The discriminator’s inputs are a predicted structure as well as its corresponding boundary information, and the output is 0 for the fake structures and 1 for the real structures.
The target generator is built by adding a convolutional transpose layer and four convolutional layers on top of the source generator to upsample the output of the source network to the high resolution. Before using the source generator on the target GAN, we remove the last two-layers of the source generator. Furthermore, we add a down sampling function at the front of the target generator to reduce the high-resolution input to the low-resolution input required by the pre-trained generator. The architecture of the target discriminator is very similar with that of the source discriminator, with the exception of an additional convolutional layer and of a convolutional transpose layer tasked with matching the output resolution. The structures of the target generator and discriminator are shown in Fig. 4. We built and trained our GAN model using the open source TensorFlow library [48].
Dataset.
We used a SIMP-based topology optimization code [4] to generate our training and test cases with its default settings of convergence. The code is modified to automate the data generation for different domains and boundary conditions, and we used as input to the SIMP algorithm the discretized domain geometry, volume fraction, filter radius, load, and displacement boundary conditions. The volume fraction and filter radius for the sensitivity filtering are prescribed to be 0.5 and 1.5, respectively.
The magnitudes of the force components were sampled using uniform random sampling within the range [−100, 100]N. The location of the external load is selected according to a uniform random sampling within prescribed ranges along the coordinate axes. For example, the force components Fx and Fy for a 2D beam domain are applied within the range and [0, by], respectively, where bx and by are the beam dimensions in the x and y directions. We also used a discrete random sampling to select one of the displacement boundary conditions shown in Figs. 5(a) and 5(b).
It was shown in Ref. [12] that using physical fields such as the von Mises stress and the strain energy density instead of explicitly representing the boundary conditions increases the prediction accuracy. They hypothesized that these particular physical fields are rich in information that correlates well the explicit boundary conditions with the final optimal structure. Thus, we adopt these same fields as input to our architecture since they allow us to “distribute” the effects of the prescribed boundary conditions throughout the structure. Moreover, we add the force channels to allow the network to discover relationship between the force characteristics and the phsysical fields and focus on the structural patterns that serve as effective load paths for the prescribed boundary conditions.
Our inputs are captured in five equally sized channels for the 2D cases represented as matrices:
First channel: Initial density value for each unit cell.
Second channel: Force component in the x direction at each node.
Third channel: Force component in the y direction at each node.
Fourth channel: The von Mises stress of the initial domain.
- Fifth channel: The strain energy density function of the initial domain defined:where σi and εi are stress and strain in direction i, respectively. The stress and strain corresponding to the initial domain were computed with SolidsPy library [49] in python. The dimensions of all channels are nodex × nodey, and the first channel was padded to reach this dimension.
Based on the information given earlier, a low-resolution dataset (11,000 cases) and a much smaller high-resolution dataset (1500 cases) were generated to train the source and target GAN, respectively.
Training Process and Performance Metrics.
We followed a training process similar to Ref. [33] for rectangular domain with boundary conditions shown in Figs. 5(a) and 5(b), although additional strategies for training GANs are discussed in Ref. [50]. The loss function for both generator and discriminator uses the binary cross-entropy logistic unit, and Adam is used as the optimizer. We used the same criteria that we described in Ref. [10] to evaluate the quality of the GAN predictions.
The evolution of the loss function during training indicates that the proposed GAN is stable. Based on our experiments, this stability is due to the fact that we use the binary cross entropy (BCE) logistic unit (logits) rather than BCE alone.
Experiments and Evaluation
The data generation and GAN training were performed on the UConn HPC facility running Red Hat RHEL7 operating system. All predictions were performed on a Dell Intel Xeon Processor E5-2650 v3 CPU with 64 GB RAM and Nvidia Quadro K2200 4 GB GPU.
Evaluation With Traditional Cross-Entropy Loss.
We used a freely available matlab Code [4] to generate 11,000 low-resolution cases (40 × 80) to train the generator and discriminator of the source GAN, and 1500 high-resolution cases according to the scenarios outlined in Table 2 to train the generator and discriminator of the target GAN. We augmented the training data by mirroring each SIMP-generated case with respect to the x and y axes.
Training: The source GAN is trained only once, and the same trained source network was reused for the target models trained at higher resolutions. Similarly with the training of deep neural networks, the target model must be trained for each of the high resolutions being considered, albeit with a much smaller data set due to the knowledge transferred from the source network. The training of our source and target GANs used two randomly selected boundary conditions corresponding to those shown in Figs. 5(a) and 5(b). All cases used for training had a single externally applied force with randomized location, orientation, and magnitude within the prescribed ranges, and a rectangular domain.
Testing: The remaining three boundary conditions illustrated in Figs. 5(c)–5(e) are therefore unseen to both the source and target GANs and are used exclusively for testing. Recall that training was completed with only one external force applied, while we tested cases with one, two, five, and ten external forces. Moreover, we evaluated the model trained exclusively on rectangular domains on L-shaped as well as curved beams. We extensively tested the performance of our predictions and the generalization abilities of our GANTL framework in the following testing scenarios.
Discussion
Figures 6–11 show the side-by-side comparisons for all testing scenarios summarized in Table 1, and the corresponding evaluation data captured by the performance metrics described in the “Proposed Generative Adversarial Network Architecture” section are collected in Tables 2–5. The low-resolution (40 × 80) predictions in all these figures show the prediction of our source GANs and the comparison with the equivalent low-resolution ground truth provided by SIMP.
Scenario # | BCs from Figure # | # of external forces | Predictions in Figure # | Data in Table # |
---|---|---|---|---|
1 | 5(a) and 5(b) | 1 | 6 | 2 |
2 | 5(c)–5(e) | 1 | 7 | 3 |
3 | 5(a) and 5(b) | 2 | 8 | 4 |
4 | 5(c)–5(e) | 2 | 9 | 5 |
5 | 5(a) and 5(b) | 5 & 10 | 10 | – |
6 | 5(a) and 5(b) | 1 | 11 | – |
Scenario # | BCs from Figure # | # of external forces | Predictions in Figure # | Data in Table # |
---|---|---|---|---|
1 | 5(a) and 5(b) | 1 | 6 | 2 |
2 | 5(c)–5(e) | 1 | 7 | 3 |
3 | 5(a) and 5(b) | 2 | 8 | 4 |
4 | 5(c)–5(e) | 2 | 9 | 5 |
5 | 5(a) and 5(b) | 5 & 10 | 10 | – |
6 | 5(a) and 5(b) | 1 | 11 | – |
Note: Training was performed with one external force, while testing was completed for 1, 2, 5, and 10 randomized external forces and unseen domain geometries.
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 6(a) (source network) | 40 × 80 | 7473 | 0.48% | 98.51% | 0.32% | 1.7% |
From Fig. 6(b) | 80 × 160 | 350 | 1.92% | 96.66% | 1.03% | 3.3% |
From Fig. 6(c) | 120 × 160 | 463 | 2.41% | 96.36% | 1.08% | 3.8% |
From Fig. 6(d) | 120 × 240 | 388 | 2.28% | 96.56% | 0.98% | 1.9% |
From Fig. 6(e) | 160 × 320 | 350 | 2.90% | 96.06% | 1.14% | 2.6% |
From Fig. 6(f) | 200 × 400 | 350 | 2.77% | 96.20% | 1.45% | 1.9% |
Average | 2.12% | 96.72% | 1.00% | 2.5% |
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 6(a) (source network) | 40 × 80 | 7473 | 0.48% | 98.51% | 0.32% | 1.7% |
From Fig. 6(b) | 80 × 160 | 350 | 1.92% | 96.66% | 1.03% | 3.3% |
From Fig. 6(c) | 120 × 160 | 463 | 2.41% | 96.36% | 1.08% | 3.8% |
From Fig. 6(d) | 120 × 240 | 388 | 2.28% | 96.56% | 0.98% | 1.9% |
From Fig. 6(e) | 160 × 320 | 350 | 2.90% | 96.06% | 1.14% | 2.6% |
From Fig. 6(f) | 200 × 400 | 350 | 2.77% | 96.20% | 1.45% | 1.9% |
Average | 2.12% | 96.72% | 1.00% | 2.5% |
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 7(a) (source network) | 40 × 80 | 1000 | 8.24% | 88.52% | 8.31% | 11.4% |
From Fig. 7(b) | 80 × 160 | 500 | 8.81% | 88.91% | 6.65% | 9.43% |
From Fig. 7(c) | 120 × 160 | 500 | 8.67% | 89.64% | 8.09% | 12.43% |
From Fig. 7(d) | 120 × 240 | 500 | 8.92% | 89.32% | 7.03% | 11.92% |
From Fig. 7(e) | 160 × 320 | 500 | 8.97% | 89.65% | 6.91% | 11.80% |
From Fig. 7(f) | 200 × 400 | 250 | 10.07% | 88.53% | 9.47% | 13.51% |
Average | 8.94% | 89.10% | 7.74% | 11.74% |
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 7(a) (source network) | 40 × 80 | 1000 | 8.24% | 88.52% | 8.31% | 11.4% |
From Fig. 7(b) | 80 × 160 | 500 | 8.81% | 88.91% | 6.65% | 9.43% |
From Fig. 7(c) | 120 × 160 | 500 | 8.67% | 89.64% | 8.09% | 12.43% |
From Fig. 7(d) | 120 × 240 | 500 | 8.92% | 89.32% | 7.03% | 11.92% |
From Fig. 7(e) | 160 × 320 | 500 | 8.97% | 89.65% | 6.91% | 11.80% |
From Fig. 7(f) | 200 × 400 | 250 | 10.07% | 88.53% | 9.47% | 13.51% |
Average | 8.94% | 89.10% | 7.74% | 11.74% |
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 8(a) (source network) | 40 × 80 | 1000 | 5.84% | 90.98% | 7.74% | 13.2% |
From Fig. 8(b) | 80 × 160 | 500 | 7.50% | 90.04% | 16.12% | 20.85% |
From Fig. 8(c) | 120 × 160 | 500 | 8.91% | 90.01% | 9.74% | 16.88% |
From Fig. 8(d) | 120 × 240 | 500 | 7.67% | 90.49% | 9.50% | 15.30% |
From Fig. 8(e) | 160 × 320 | 500 | 8.32% | 90.13% | 8.86% | 14.62% |
From Fig. 8(f) | 200 × 400 | 250 | 8.42% | 90.09% | 10.92% | 16.52% |
Average | 7.77% | 90.29% | 10.48% | 16.22% |
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 8(a) (source network) | 40 × 80 | 1000 | 5.84% | 90.98% | 7.74% | 13.2% |
From Fig. 8(b) | 80 × 160 | 500 | 7.50% | 90.04% | 16.12% | 20.85% |
From Fig. 8(c) | 120 × 160 | 500 | 8.91% | 90.01% | 9.74% | 16.88% |
From Fig. 8(d) | 120 × 240 | 500 | 7.67% | 90.49% | 9.50% | 15.30% |
From Fig. 8(e) | 160 × 320 | 500 | 8.32% | 90.13% | 8.86% | 14.62% |
From Fig. 8(f) | 200 × 400 | 250 | 8.42% | 90.09% | 10.92% | 16.52% |
Average | 7.77% | 90.29% | 10.48% | 16.22% |
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 9(a) (source network) | 40 × 80 | 1000 | 9.40% | 87.95% | 10.25% | 14.1% |
From Fig. 9(b) | 80 × 160 | 500 | 10.60% | 86.89% | 12.73% | 16.98% |
From Fig. 9(c) | 120 × 160 | 500 | 10.81% | 87.20% | 11.83% | 16.61% |
From Fig. 9(d) | 120 × 240 | 500 | 10.85% | 87.18% | 11.21% | 14.87% |
From Fig. 9(e) | 160 × 320 | 500 | 11.68% | 86.66% | 11.96% | 16.11% |
From Fig. 9(f) | 200 × 400 | 250 | 11.86% | 86.47% | 11.23% | 15.81% |
Average | 10.86% | 87.05% | 11.53% | 15.74% |
Design domain | Resolution | Number of test cases | MSE | Binary accuracy | Compliance error | Compliance error std. |
---|---|---|---|---|---|---|
From Fig. 9(a) (source network) | 40 × 80 | 1000 | 9.40% | 87.95% | 10.25% | 14.1% |
From Fig. 9(b) | 80 × 160 | 500 | 10.60% | 86.89% | 12.73% | 16.98% |
From Fig. 9(c) | 120 × 160 | 500 | 10.81% | 87.20% | 11.83% | 16.61% |
From Fig. 9(d) | 120 × 240 | 500 | 10.85% | 87.18% | 11.21% | 14.87% |
From Fig. 9(e) | 160 × 320 | 500 | 11.68% | 86.66% | 11.96% | 16.11% |
From Fig. 9(f) | 200 × 400 | 250 | 11.86% | 86.47% | 11.23% | 15.81% |
Average | 10.86% | 87.05% | 11.53% | 15.74% |
The performance data show that our network achieves a better prediction performance compared to the state-of-the-art deep learning-based methods. In addition, none of the state-of-art methods have been shown that they can generate structures for unseen boundary conditions, unseen external loading conditions, and unseen initial geometry.
Specifically, for scenario # 1 (seen BCs with one randomly selected external force), the average MSE is 2.12%, and the average binary accuracy is 96.72 with a 1% average compliance error and a 2.5% standard deviation. This implies that our network captures very well the changes in the topological patterns induced by a continuous variation in the location, direction, and magnitude of the external force.
For scenario # 2 (unseen BCs with one randomly selected external force), the average MSE is 9%, and the average binary accuracy is 89%, with an average compliance error of 7.7% and standard deviation of 11.7%. Given the fact that the GAN was trained on only two different boundary conditions illustrated in Figs. 5(a) and 5(b), this test scenario indicates that the GANTL network does capture some high-level dependence of the topological patterns on the discrete boundary conditions. We note that the errors are primarily coming from incorrect predictions of the thin members of the ground truth, although this is not surprising given the fact that the network learned the topological patterns from boundary conditions that are dissimilar from those considered in this scenario.
The next two test scenarios use an unseen loading case, i.e., two randomly selected external forces and both seen and unseen boundary conditions, as described earlier. Recall that all training cases considered only one externally applied force. We observe that the performance metrics for both scenarios 3 & 4 from Table 1, which include partially or totally unseen information, are very similar: average MSE—7% versus 11%, binary accuracy—90% versus 87%, average compliance error—10.5% versus 11.5%, and standard deviation—16.2% versus 15.7%.
In order to evaluate the performance of our predictions as the number of external forces increases, we included test cases with five and 10 external forces for domains with a 200 × 400 resolution and boundary conditions displayed in Figs. 5(a) and 5(b). The side-by-side comparison between our predictions and the SIMP ground truths is shown in Fig. 10. Although the average binary accuracy of the 250 test cases having five and ten external forces is around 84% and 82%, respectively, so only slightly lower than the data for scenarios # 3 and 4, these figures indicate that as the number of external forces increases, the confidence of our network in predicting the thin members decreases slightly, as one would expect. This behavior follows the known limitations of extrapolating approximations of non-linear functions, including the extrapolation of the neural networks learning outside of the support of the training distribution [51]. In some sense, this is conceptually similar with teaching a student the basic concepts of derivation and integration, then ask the student to solve differential equations. The student may solve simple ODEs but would have a much harder time with complex ODEs or PDEs. However, it is worth noting that a 10-fold increase in the number of external forces compared to the training dataset only decreases the binary accuracy of the predictions by about 10%.
For the last scenario, we test the performance of the proposed framework on various unseen initial domains with 200 × 400 resolution and boundary conditions displayed in Figs. 5(a) and 5(b). The new domains include the L-shaped and curved beam, and the side-by-side comparison between the prediction and ground truth are shown in Fig. 11. As mentioned above, our model was trained only on simple rectangular domains with one external force. The average binary accuracy of 100 test cases for the L-shaped and the curved beam are 90% and 83.5%, respectively. These figures show that the model can generate reasonably accurate structures for domains that are sufficiently different from the rectangular domain used in all training cases.
Collectively, the test scenarios # 1–6 demonstrate that the proposed network is capable of high-quality predictions of the overall optimal structure and clearly illustrate the generalization ability of our proposed GANTL. These test scenarios also indicate that the GANTL network makes accurate predictions of the thicker members of the optimal structure. At the same time, we observe that our network has a somewhat limited confidence in predicting the thinnest members of the optimal structures, and in particular for the completely unseen boundary and loading conditions, and that this confidence decreases as the number of forces increases. This, however, is an expected behavior that is consistent with the observations from other machine learning-based predictors and shows that the topological patterns learned by GANTL from the single external force used in training are changing as the number of external forces increases. Nevertheless, our results also suggest that constructing a training dataset that includes a reasonable sample of the practical loading cases used in practice would rapidly increase the quality of the network’s predictions for design explorations.
Finally, it is worth noting that the GANTL’s average prediction time is 0.026 s for a 200 × 400 resolution, which is in the realm of real-time prediction rates. For the same resolution, the SIMP algorithm requires 350 s to produce the optimal solution. This efficiency is in fact observed by most other data driven algorithms. However, the combination of real time prediction and improved generalizability enables the use of GANTL as a 2D design space exploration tool with a reasonably good accuracy.
A related and important problem is that of choosing the distribution of the training cases across the design space of interest. Even though we used here a random sampling based on a uniform distribution, it makes sense to more densely sample the regions of the design space in which the optimal solutions are more sensitive to changes in the domain geometry as well as boundary and loading conditions. These investigations could benefit from the extensive amount of work performed in the engineering optimization community and focused on adaptive sampling, although reaching meaningful insights would require extensive investigations that are outside the scope of this paper.
A Topology-Aware Loss Function.
One common challenge of all published machine learning approaches in topology optimization, which exclusively use for now an image-based approach, is in predicting connected structural members. This is important because graph-like optimal topologies are preferred over tree-like members in structural design because the latter are simply unable to carry loads. A quick review of the existing literature shows that all published machine learning based TO approaches simply ignore the topological connectivity of their predictions. Notably, even the best performing such methods attempt to improve the quality of the predictions by merely diversifying the training set, but this approach has not produced consistent improvements in connectivity to date. Instead, this strategy increases the number of required training cases and, therefore, the associated computational cost of generating the dataset and of training the network, which may also impact the convergence of the training process.
Motivated by this observation, we propose a promising alternative for stimulating the network to produce well-connected and therefore load-carrying structural members. Specifically, we construct and evaluate a custom loss function that is based on concepts from persistence homology [25,26] and specifically on topological persistence diagrams. Informally, these persistence diagrams extract the “birth” and “death” of the topological features (e.g., connected components) in the structure as the structure evolves according to predefined filters. By correlating the persistence diagram of the ground truth with that of the prediction, we push the network to make predictions that are topologically similar to the ground truth.
The comparison of two persistence diagrams has been formulated as a comparison of two types of specialized “distances,” namely the Wasserstein and bottleneck distance [52]. It is important to note that the Wasserstein distance can be seen as an lp-type metric while the bottleneck distance can be considered an l∞-type metric. In more practical terms, the former accounts for all points in the persistence diagrams, so it is more sensitive to noise, which is prevalent in machine learning predictors, while the latter captures only the global structure of these persistence diagrams.
Given the fact that a predictor of optimal topology produces material occupancy data that can be interpreted as noisy, it makes sense to use a more robust version of a distance between two persistence diagrams. Therefore, in our experiments, we implemented the bottleneck distance between the persistence diagrams corresponding to the ground truth and the network prediction. This can be thought of as the shortest “distance” b for which there exist an optimal matching between points of two given persistence diagrams such that any pair of matched points are at distance at most b [25,53].
We reuse the already trained source GAN without the topological loss function, and only fine-tune the target GAN for our highest resolution (200 × 400) domains with the new topology-aware loss function. We note that these higher resolution problems also produced predictions with the highest number of disconnected members in the experiments detailed earlier. Prior to displaying the predicted structures, we round up or down the density values to the closest integer, either 0 or 1. We tested the performance of the topology-aware loss function for both seen and unseen displacement boundary conditions.
As shown in Fig.12, the proposed topology-aware loss function can significantly reduce the number of disconnected members for both seen and unseen boundary conditions and also eliminates many small holes in our predictions. This visible improvement can be quantified as a decrease of the average Betti number error, i.e., which directly compares the difference in the number of handles between ground truth and predicted structures, from 1.54 to 0.768.
Comparing Generative Adversarial Network- and Convolutional Neural Network-Based Architectures.
To illustrate the advantages of a GAN architecture over one that uses a deep CNN, we compare the method proposed here with the one discussed in our recent paper [10], which uses a deep CNN network within a transfer learning framework. We present here two comparisons. First, we fine-tune the target CNN with same dataset that the target GAN was trained on, namely rectangular domains with one external force. Second, we fine-tune the target CNN with 1500 samples of rectangular domains with 5 forces. For both test scenarios, the target GAN was trained on rectangular domains with one external force (as explained in the “Proposed Generative Adversarial Network Architecture” section). Moreover, the source networks of both CNN- and GAN-based frameworks are trained using the same low resolution data set. The results illustrated in Fig. 13 show that GANTL outperforms the CNN-based transfer learning architecture in both scenarios. Furthermore, it can be seen that, in the first scenario (third column of Fig. 13), the CNN-based network has difficulties in extrapolating the information that it learned from the one-force case to the five-force case. It is also interesting to observe that the CNN-based architecture that was fine-tuned with five forces has a lower performance than the GAN architecture that was only trained on samples involving a single external force. We hypothesize that this lower performance is due to the fact that the source network of the CNN architecture was trained on single force samples, and the features learned from these samples are sufficiently different from those needed by the domains with five forces.
Conclusions
We proposed a highly efficient and accurate generative design tool that combines the power of GANs with the efficiency provided by the knowledge reuse of transfer learning. The proposed method uses 11,000 low-resolution cases to train a source GAN that is trained only once for all examples shown in this paper, whose knowledge is then transferred to a target GAN. Consequently, the training (fine-tuning) of the target network for high-resolution domains needs a much smaller dataset (1500 cases), and we note that our target GAN was trained only once for each resolution. The fact that we transferred the knowledge learned from the low resolution to the high-resolution cases allowed us to significantly reduce the number of high resolution cases without losing the accuracy of the predictions. A discussion of the associated computational costs of generating the training cases is presented in Ref. [10].
We produced numerous examples to show that the proposed GANTL approach can generate highly accurate predictions for seen boundary conditions with a relatively small performance decrease for domains that use unseen boundary conditions that were not part of the training data. In addition, while all training cases had a single external force applied to the domain, our test data also included external loading with two, five, and ten external forces.
An analysis of these examples points to the fact that the confidence of our network in predicting the thin members decreases as the number of forces and the domain resolution increases. This behavior is consistent with the behavior observed by all image-based machine learning algorithms, including all data-driven topology optimization algorithms. Given the documented limitations of extrapolating approximations of non-linear functions outside of their sampled domain, the published data-driven algorithms for topology optimization show a limited ability to generalize outside of their sampled domain and our experiments show that GANTL outperforms the state of the art algorithm in terms of generalization abilities. Specifically, the topological patterns learned by GANTL from training data using a single external force on rectangular domains allow the network to continue to produce meaningful predictions as the number of external forces increases and as the geometry of the domain changes, which is something that no other published data driven algorithm has shown.
Motivated by these observations, we also proposed a novel and promising alternative for stimulating the topological connectivity of the predictions produced by the network by constructing a custom loss function that correlates the persistence diagram of the ground truth with that of the prediction. We showed that this novel topology-aware loss function adequately promotes those predictions that are topologically similar with the topology of the ground truth and results in predictions with a significant improvement in terms of the connectivity of the predicted structural members. We posit here that enforcing topological properties of the optimal structures predicted by GANTL can be achieved by developing specialized loss functions that integrate appropriate topological measures. However, a detailed investigation of these important aspects is outside the scope of this paper.
Training GANs can be notoriously difficult, as discussed earlier, and our network is no exception. We addressed these challenges by employing the binary cross-entropy (BCE) with the logistic unit (logits) rather than using BCE alone. This modification helps the discriminator avoid loss values that are near zero and consequently improve GAN’s stability.
One key limitation in promoting predictions with topological properties that are similar to those of the ground truth structures is the added computational overhead of the topology-aware loss function. To compute this loss function, we have to compute the persistence diagrams of all ground truth structures and of the corresponding predicted structures, as well as their pairwise bottleneck distance. This, in turn, significantly increased the training time of our GAN model, although it was computed only once for the training set.
As we demonstrated in Ref. [10], transfer learning not only significantly reduces the size of the dataset required to train the networks but also improves the generalization ability of the deep learning models as long as the target network is fine-tuned with domains and boundary conditions that are not part the training dataset of the source model. Importantly, by combining the generative power of GANs with the knowledge reuse of transfer learning, we train both the source and target networks only once and therefore eliminate the need to fine-tune the target network every time we want to explore new boundary conditions. This is a critical feature of the proposed GANTL that significantly reduces the training time and directly improves the practicality of the approach for design space explorations.
To the best of our knowledge, this is the first attempt to combine transfer learning with generative adversarial networks in the area of topology optimization. Furthermore, our proposed topology-aware loss function pioneers further integration of powerful topological measures into the TO predictions made by machine learning algorithms, which will eventually lead to practical, powerful, and interactive design exploration tools with topology optimization.
Footnotes
Neural network-based models have very recently been developed to solve one of the most difficult open pattern matching problems in science today, namely that of protein folding structure prediction [7] with impressive results.
Informally, topological connectivity can be defined in terms of “one-sided” structural branches that do not serve as load paths. The concept can be formalized in terms of graph cycles of the associated graphs [8].
The kth Betti number captures the number of k-dimensional holes on a topological surface.
Acknowledgment
This work was supported in part by the National Science Foundation grants IIS-1526249 and CMMI-1635103. The responsibility for any errors and omissions lies solely with the authors.
Conflict of Interest
There are no conflicts of interest.