Jump to content
Sign in to follow this  
The Mule

Unsupervised Learning and Neural Architecture Search (NAS)

Recommended Posts

Greetings everyone,

I am new to this forum and am excited as such. Another forum that I frequent - The Fossil Forum - seems to adopt the same the software or platform as this one; it's familiarity I find comforting alongside the fact that there are, supposedly, many people interested in science here. Now for the what I am creating this post for.

My topic here concerns a paper I am trying to reproduce for a challenge. I have about a month remaining before I need to finish my reproduction and am a ways off from having achieved a solid foundation from which I can begin running the tests I need to run.

The paper - https://arxiv.org/pdf/2006.06936v1.pdf - I am trying to reproduce is Does Unsupervised Learning Architecture Representation Help Neural Architecture Search? and employs a method of pre-trained embeddings, as opposed to supervised embeddings, to cluster the latent search space in such a way as to slightly increase the efficiency of the search algorithms used later on (this is all under the umbrella or context of Neural Architecture Search). In the author's words,

"To achieve this, arch2vec uses a variational graph isomorphism autoencoder to learn architecture representations using only neural architectures without their accuracies. As such, it injectively captures the local structural information of neural architectures and makes architectures with similar structures (measured by edit distance) cluster better and distribute more smoothly in the latent space, which facilitates the downstream architecture search."

I have had many troubles thus far in tackling this reproduction: my understanding of the mathematics, my ability to use Google Cloud's computing services, my creativity in devising tests of robustness, and my judgment in selecting which parts of their code base https://github.com/MSU-MLSys-Lab/arch2vec to port from their implementation in PyTorch to TensorFlow. Also, if I do everything I need to, I still have to write it up in a scholarly fashion, which I do not think will be too difficult, but which I think will require much editing and my available time is but a month. So, I come to you today to take a look at one of these troubles, my understanding of the mathematics.

I will introduce the first part of their paper that I think is important, the Variational Graph Isomorphism Autoencoder. I think that before I do this, a recap of Neural Architecture Search (NAS) is due (I am certainly not qualified to introduce this but will do the best I can with the knowledge available to me).

      In NAS, the goal is to generate and/or find the best neural architecture for a particular dataset. There is also the goal of searching for the best performing architecture amongst a dataset of architectures. All this involves many steps. First, a neural architecture must be represented in some way. How would you naturally break down a CNN? Well, researchers use these graph cells which have nodes consisting of operations and edges that connect nodes to one another. The operations can be something such as a 3x3 convolution or 3x3 max-pooling. A single architecture consists of some number of these nodes (it depends on the particular datasets of architecture, with the two important ones in the paper NAS-Bench 101 and NAS-Bench 201). From the former dataset, here is a representation:

231848598_ScreenShot2020-12-15at5_13_04PM.thumb.png.447d320cd9dec2c35c24c74fe06b700a.png

And from the latter dataset

713382910_ScreenShot2020-12-15at5_12_57PM.thumb.png.a15eb941b8f9730bddc84d2df01a5964.png

Mathematically, the "node by node" matrix is the upper triangular adjacency matrix \(\mathbf{A} \in \mathbb{R}^{N x N}\) and the "node by operation" matrix can be represented by a one-hot operation matrix \(X \in \mathbb{R}^{N x K} \)  where is \(N\) is the number of nodes and \(K\) is a set of predefined operations (remember like a 3x3 convolution). One thing this group of researchers does is augment the adjacency matrix -> \(\tilde{\mathbf{A}} = \mathbf{A} + \mathbf{A}^T\) to allow for bidirectional information flow, to "transfer original directed graphs into undirected graphs". I do not really understand this and would appreciate a contextual description of un/directed graphs in context of isomorphisms.

Moving on, the rest of NAS consists of developing some embeddings that a search algorithm (random search, the reinforcement learning REINFORCE algorithm, bayesian optimization) can use to find the best performing architectures in the architecture datasets.

The researchers here use a Variational Graph Isomorphism Encoder:

766859554_ScreenShot2020-12-15at5_27_45PM.thumb.png.3819d1b67494120447950f176a524838.png

Which I am having some trouble mentally turning in TensorFlow code. I feel somewhat lost now, but will probably return to this post with a more strategic mindset in a little bit. For now, are there any resources on graph theory, graph isomorphism autoencoders, autoencoders, that you think would benefit me? Alternatively, does anyone have a broad scoping explanation for the intuition behind graph isomorphism autoencoders and for why they are used here instead of some other method?

Thanks to everyone reading this and I hope I can find interesting things to contribute in the future, hopefully in a less verbose manner.

Have a nice day and stay safe!

 

Share this post


Link to post
Share on other sites

I find the topic interesting! I'll try to read through the material and see if I can contribute. I must say though that this is outside my area of expertise so I can't guarantee that I'll be successful within a reasonable time. I do not have enough experience to intuitively address your questions. 

That said, there are plenty of members quite skilled in mathematics. If necessary we may be able to extract some mathematics specific question and post in that section of the forums.

Share this post


Link to post
Share on other sites

I unfortunately am not of any help, but am curious to see how this advances and hope some people can hep you out! Well written account of what is up, although I do think you may need to elaborate more on what type of help you exactly want. The resources on various topics you ask for may be too broad (but you never know!). It would probably also help if you explain in more detail what you have done so far regarding the TensorFlow implementation and where you can't see a good way of doing it. Otherwise it may feel to some members that you are asking them to walk you through from start to finish (but as long as you continue providing well written posts that probably won't happen). Good luck! 

Share this post


Link to post
Share on other sites

@Ghideon and @Dagl1, thank you both for your input and ears. My intent for this thread now, seeing as it has gained some traction and as I have a small audience of two, is to lay out what I can figure out on my own and, through this process, more precisely hone in the nature of my mental shortcomings pertaining to this task. Hopefully, the process of documenting my progress with this reproduction will be informative and potentially useful to people who are also uncertain about how to proceed in such things. As @Dagl1 mentions, I want to avoid unnecessarily calling upon others to hold my hand along the way and, quite frankly, I need to learn how to do this sort of thing myself. I think that later tonight I will begin by presenting an overview of some pieces of the research. In addition, I may provide a tutorial-like explanation of how to use Google Cloud. Learning how to use this service is probably the first step since unlike Google Colaboratory, which can stop at any point and is not too robust, the Cloud affords great precision and a wide array of services to understand the machine learning jobs people run through it. Thank you for listening thus far and I look forward to detailing further developments as they come.

Share this post


Link to post
Share on other sites
On 12/15/2020 at 10:33 PM, The Mule said:

I do not really understand this and would appreciate a contextual description of un/directed graphs in context of isomorphisms.

I gave this a try but I haven't been successful gathering a good description. How far have you got regarding the Weisfeiler-Lehman heuristic for graph isomorphism testing mentioned in the paper? As far as I know that works for undirected graphs. Chapter 7 of this paper mentions the excessive running time: 

https://www.lics.rwth-aachen.de/global/show_document.asp?id=aaaaaaaaabbtcqu

I am not in a position to speculate if the researchers used undirected graphs due to performance and/or to cluster graphs with similar properties together. In any case, I find this interesting so if you have already got an answer, or anything to add to continue the discussion I am interested.  

 

In case you are interested, these show links may help 

Expressive power of graph neural networks and the Weisfeiler-Lehman test: https://towardsdatascience.com/expressive-power-of-graph-neural-networks-and-the-weisefeiler-lehman-test-b883db3c7c49

Introduction to Weisfeiler-Lehman https://davidbieber.com/post/2019-05-10-weisfeiler-lehman-isomorphism-test/

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.