[1] Sadia Afrin. Weight initialization in neural network, inspired
by andrew ng, https://medium.com/@safrin1128/weight-
initialization-in-neural-network-inspired-by-andrew-ng-
e0066dc4a566, 2020.
3
[2] Armen Aghajanyan, Sonal Gupta, and Luke Zettlemoyer. In-
trinsic dimensionality explains the effectiveness of language
model fine-tuning. In Proceedings of the 59th Annual Meeting
of the Association for Computational Linguistics and the 11th
International Joint Conference on Natural Language Process-
ing, pages 7319?7328, Online, Aug. 2021. Association for
Computational Linguistics.
3
[3] Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. Only
a matter of style: Age transformation using a style-based
regression model. ACM Transactions on Graphics (TOG),
40(4), 2021.
3
[4] Yuval Alaluf, Omer Tov, Ron Mokady, Rinon Gal, and Amit
Bermano. Hyperstyle: Stylegan inversion with hypernetworks
for real image editing. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
pages 18511?18521, 2022.
2
[5] Alembics. Disco diffusion, https://github.com/alembics/disco-
diffusion, 2022.
3
[6] Omri Avrahami, Thomas Hayes, Oran Gafni, Sonal Gupta,
Yaniv Taigman, Devi Parikh, Dani Lischinski, Ohad Fried,
and Xi Yin. Spatext: Spatio-textual representation for con-
trollable image generation. arXiv preprint arXiv:2211.14305,
2022.
2
,
3
[7] Omri Avrahami, Dani Lischinski, and Ohad Fried. Blended
diffusion for text-driven editing of natural images. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 18208?18218, 2022.
3
[8] Omer Bar-Tal, Lior Yariv, Yaron Lipman, and Tali Dekel.
Multidiffusion: Fusing diffusion paths for controlled image
generation. arXiv preprint arXiv:2302.08113, 2023.
3
[9] Dina Bashkirova, Jose Lezama, Kihyuk Sohn, Kate Saenko,
and Irfan Essa. Masksketch: Unpaired structure-guided
masked image generation. arXiv preprint arXiv:2302.05496,
2023.
3
[10] Tim Brooks, Aleksander Holynski, and Alexei A Efros. In-
structpix2pix: Learning to follow image editing instructions.
arXiv preprint arXiv:2211.09800, 2022.
2
,
3
[11] John Canny. A computational approach to edge detection.
IEEE Transactions on Pattern Analysis and Machine Intelli-
gence, (6):679?698, 1986.
6
[12] Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A.
Sheikh. Openpose: Realtime multi-person 2d pose estima-
tion using part affinity fields. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2019.
6
[13] Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping
Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, and
Wen Gao. Pre-trained image processing transformer. In Pro-
ceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, pages 12299?12310, 2021.
3
[14] Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong
Lu, Jifeng Dai, and Yu Qiao. Vision transformer adapter
for dense predictions. International Conference on Learning
Representations, 2023.
2
[15] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha,
Sunghun Kim, and Jaegul Choo. Stargan: Unified genera-
tive adversarial networks for multi-domain image-to-image
translation. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, pages 8789?8797,
2018.
3
[16] darkstorm2150.
Protogen x3.4 (photorealism) offi-
cial release, https://civitai.com/models/3666/protogen-x34-
photorealism-official-release, 2022.
8
[17] Prafulla Dhariwal and Alexander Nichol. Diffusion models
beat gans on image synthesis. Advances in Neural Information
Processing Systems, 34:8780?8794, 2021.
3
[18] Tan M. Dinh, Anh Tuan Tran, Rang Nguyen, and Binh-Son
Hua. Hyperinverter: Improving stylegan inversion via hy-
pernetwork. In Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition, pages 11389?
11398, 2022.
2
[19] Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming
transformers for high-resolution image synthesis. In Proceed-
ings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pages 12873?12883, 2021.
3
,
5
,
7
,
8
[20] Oran Gafni, Adam Polyak, Oron Ashual, Shelly Sheynin,
Devi Parikh, and Yaniv Taigman. Make-a-scene: Scene-
based text-to-image generation with human priors. In Euro-
pean Conference on Computer Vision (ECCV), pages 89?106.
Springer, 2022.
2
,
3
[21] Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik,
Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. An
image is worth one word: Personalizing text-to-image genera-
tion using textual inversion. arXiv preprint arXiv:2208.01618,
2022.
2
,
3
[22] Rinon Gal, Or Patashnik, Haggai Maron, Amit H Bermano,
Gal Chechik, and Daniel Cohen-Or. Stylegan-nada: Clip-
guided domain adaptation of image generators. ACM Trans-
actions on Graphics (TOG), 41(4):1?13, 2022.
3
[23] Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao
Fang, Yongfeng Zhang, Hongsheng Li, and Yu Qiao. Clip-
adapter: Better vision-language models with feature adapters.
arXiv preprint arXiv:2110.04544, 2021.
2
[24] Geonmo Gu, Byungsoo Ko, SeoungHyun Go, Sung-Hyun
Lee, Jingeun Lee, and Minchul Shin. Towards light-weight
and real-time line segment detection. In Proceedings of the
AAAI Conference on Artificial Intelligence, 2022.
6
[25] David Ha, Andrew M. Dai, and Quoc V. Le. Hypernetworks.
In International Conference on Learning Representations,
2017.
2
[26] Heathen. Hypernetwork style training, a tiny guide, stable-
diffusion-webui, https://github.com/automatic1111/stable-
diffusion-webui/discussions/2670, 2022.
2