Zaɓi Harshe

DeepVRSketch+: Ƙirƙirar Tufafi na 3D Na Musamman Ta Hanyar Zane-zane na AR/VR

Wani sabon tsari wanda ke baiwa masu amfani na yau da kullum damar ƙirƙirar tufafi na 3D masu inganci ta hanyar zane-zane na 3D mai sauƙi a cikin AR/VR, wanda aka ƙarfafa ta hanyar ƙirar diffusion mai sharadi da sabon bayanan.
diyshow.org | PDF Size: 11.8 MB
Kima: 4.5/5
Kimarku
Kun riga kun ƙididdige wannan takarda
Murfin Takardar PDF - DeepVRSketch+: Ƙirƙirar Tufafi na 3D Na Musamman Ta Hanyar Zane-zane na AR/VR

Tsarin Abubuwan Ciki

1. Gabatarwa & Bayyani

Wannan aikin yana magance wani gibi mai mahimmanci a cikin ƙaddamar da ƙirƙirar salon dijital. Yayin da fasahohin AR/VR ke zama manyan kayan lantarki na mabukaci, kayan aikin ƙirƙirar abubuwa na 3D a cikin waɗannan wuraren shiga ciki suna da wahala kuma ba su isa ga waɗanda ba ƙwararru ba. Takardar ta gabatar da DeepVRSketch+, wani sabon tsari wanda ke baiwa masu amfani na yau da kullum damar ƙirar tufafi na 3D na musamman ta hannu, ta hanyar zane-zane na 3D cikin sauƙi a cikin yanayin AR/VR. Babban ƙirƙira yana cikin fassara zane-zane na 3D na mai amfani, waɗanda ba daidai ba, zuwa samfuran tufafi na 3D masu inganci, masu sawa ta amfani da tsarin AI na samarwa wanda aka ƙera da kyau.

Aikace-aikacen tsarin sun haɗa da bayyana kai na musamman a cikin metaverse, nunin AR/VR, da gwaji na kama-da-wane, wanda ya sanya shi a matsayin mai ba da damar samar da abun ciki na mai amfani a cikin dandamali na dijital na gaba.

Babbar Matsala da aka Warware

Ƙaddamar da ƙirar salon 3D, cire manyan matsalolin fasaha ga masu amfani na yau da kullum.

Fasaha ta Asali

Ƙirar Diffusion Mai Sharadi + Mai Shigar da Zane-zane na 3D + Koyon Tsarin Karatu Mai Daidaitawa.

Gudummawar Sabuwa

Gabatar da bayanan KO3DClothes: nau'ikan tufafi na 3D da zane-zane na mai amfani.

2. Hanyoyi & Tsarin Fasaha

An gina tsarin da aka gabatar akan ginshiƙai uku: sabon bayanan, tsarin ƙirar samarwa, da dabarun horo da aka keɓance.

2.1. Bayanan KO3DClothes

Don shawo kan ƙarancin bayanan horo don ayyukan zane-zane na 3D zuwa tufafi, marubutan sun gabatar da KO3DClothes. Wannan bayanan ya ƙunshi nau'ikan samfuran tufafi na 3D masu inganci (misali, riguna, riguna, wando) da madaidaicin zane-zane na 3D waɗanda masu amfani suka ƙirƙira a cikin yanayin VR da aka sarrafa. Zane-zanen sun ɗauki rashin daidaito na halitta da bambancin salo na shigarwar waɗanda ba ƙwararru ba, wanda yake da mahimmanci don horar da ƙirar ƙwaƙƙwaran.

2.2. Tsarin DeepVRSketch+

Babban ƙirar samarwa ita ce ƙirar diffusion mai sharadi. Tsarin ya haɗa da Mai Shigar da Zane-zane $E_s$ wanda ke jefa zane-zane na 3D na shigarwa zuwa cikin vector latent $z_s$. Wannan lambar latent tana sharadi ga ƙirar diffusion $G_\theta$ don samar da yanayin tufafi na 3D da aka yi niyya $\hat{X}$.

Manufar horarwa tana rage haɗuwar asara: asarar sake gini $L_{rec}$ (misali, Chamfer Distance) tsakanin raga da aka samar $\hat{X}$ da gaskiya ta ƙasa $X$, da asarar adawa $L_{adv}$ don tabbatar da gaskiya:

$L_{total} = \lambda_{rec} L_{rec}(\hat{X}, X) + \lambda_{adv} L_{adv}(D(\hat{X}))$

inda $D$ shine cibiyar sadarwar nema.

2.3. Koyon Tsarin Karatu Mai Daidaitawa

Don ɗaukar bambance-bambance a cikin ingancin zane-zane da rikitarwa, ana amfani da dabarun koyon tsarin karatu mai daidaitawa. Ƙirar ta fara horo akan nau'ikan zane-zane-tufafi masu sauƙi, masu tsabta sannan a hankali ta gabatar da zane-zane masu ƙalubale, masu hayaniya, ko na zahiri. Wannan yana kwaikwayon tsarin koyo na ɗan adam kuma yana inganta ƙarfin ƙirar ga shigarwar da ba ta cika ba sosai.

3. Sakamakon Gwaji & Ƙima

3.1. Ma'aunin Ƙididdiga

Takardar tana ƙima DeepVRSketch+ da wasu ma'auni ta amfani da ma'auni na samar da siffar 3D:

  • Chamfer Distance (CD): Yana auna matsakaicin nisa mafi kusa tsakanin gajimaren maki da aka samar da gaskiya ta ƙasa. DeepVRSketch+ ya sami CD mai ƙasa da 15-20% fiye da mafi kusa na ma'auni, yana nuna mafi girman daidaiton lissafi.
  • Fréchet Inception Distance (FID) a cikin 3D: An daidaita shi don siffofi na 3D, yana auna kamancen rarraba. Ƙirar da aka gabatar ta nuna mafi kyawun maki (mafi ƙasa) na FID, yana tabbatar da cewa tufafin da aka samar sun fi gaskiya da bambancin.
  • Makin Zaɓin Mai Amfani: A cikin gwaje-gwajen A/B, fiye da 78% na tufafin da aka samar an fi son su fiye da waɗanda daga hanyoyin ma'auni.

3.2. Nazarin Mai Amfani & Bincike Na Halitta

An gudanar da cikakken binciken mai amfani tare da mahalarta waɗanda ba su da gogewar ƙirar 3D. An nemi masu amfani su ƙirƙiri zane-zane a cikin VR su kuma ƙima sakamakon da aka samar. Babban binciken:

  • Amfani: 92% na masu amfani sun sami mu'amalar zane-zane na 3D mai sauƙi da jin daɗi.
  • Ingancin Fitowa: 85% sun gamsu da cikakkun bayanai da sawa na tufafin da aka samar daga zane-zanensu.
  • Nazarin Hoto na 1: Hoto a cikin PDF yana nuna tsarin aiki yadda ya kamata: daga zane-zane na 3D a cikin AR/VR, ta hanyar ƙirar AI (DeepVRSketch+), zuwa samfurin 3D na ƙarshe da aikace-aikacensa (Nunin AR/VR, Bayyanar Dijital, Gyaran Kama-da-wane). Yana bayyana ƙaddamar da tsarin ƙira daga ƙarshe zuwa ƙarshe a zahiri.

4. Fahimtar Asali & Ra'ayi Mai Bincike

Fahimtar Asali: Wannan takarda ba kawai game da mafi kyawun ƙirar 3D ba ce; yana cikin caca mai dabarun kan dandamali na ƙirƙira. Ta hanyar rage ƙwarewar ƙirar abun ciki na 3D zuwa "za ka iya zana a cikin iska?", DeepVRSketch+ yana nufin mayar da kowane mai VR/AR headset ya zama mai yuwuwar mai ƙirar salon tufafi. Wannan yana kai hari kai tsaye ga babban matsalar metaverse da salon dijital: ƙarancin abun ciki mai jan hankali, wanda mai amfani ya samar. Ainihin samfurin a nan ba tufafin ba ne, amma hukumar ƙirƙira da aka ba mai amfani.

Tsarin Hankali: Hankali yana da ban sha'awa amma yana bin hanya da aka saba a cikin binciken AI: gano yanki mai ƙarancin bayanai (zane-zane na 3D zuwa tufafi), gina sabon bayanan (KO3DClothes) don warware shi, amfani da tsarin ƙirar samarwa na zamani (ƙirar diffusion), da ƙara jujjuyawar horo mai wayo (koyon tsarin karatu) don ƙarfi. Gudun daga matsala (kayan aikin da ba su isa ba) zuwa mafita (zane-zane mai sauƙi + AI) yana bayyana kuma yana shirye don kasuwa. Yana kwaikwayon nasarar ƙirar hoto-daga-rubutu kamar DALL-E 2 wajen ƙaddamar da fasahar 2D, amma an yi amfani da shi a cikin sararin samaniya na 3D—wani iyaka na gaba mai ma'ana.

Ƙarfi & Kurakurai: Babban ƙarfinsa shine mayar da hankali kan amfani da bayanai. Ƙirƙirar KO3DClothes babbar gudummawa ce, mai tsada wacce za ta amfana ga dukan al'ummar bincike, kamar yadda ImageNet ta kawo sauyi ga hangen nesa na kwamfuta. Amfani da koyon tsarin karatu don ɗaukar shigarwar ɗan adam "mai ɓarna" ƙwararren injiniya ne. Duk da haka, aibi yana cikin abin da ba a tattauna ba: matsalar "mil na ƙarshe" na salon dijital. Samar da raga na 3D mataki na ɗaya ne kawai. Takardar ta yi watsi da muhimman abubuwa kamar kwaikwaiyon simintin tufafi don raye-raye, samar da launi/kayan, da haɗawa cikin injunan wasa/VR na yanzu—matasalolin da kamfanoni kamar NVIDIA ke magance tare da mafita kamar Omniverse. Bugu da ƙari, yayin da binciken mai amfani yana da kyau, dogon lokaci na shiga da tasirin sabon abu na "zane-zane tufafi" har yanzu ba a tabbatar da su ba. Shin masu amfani za su ƙirƙiri tufafi ɗaya su tsaya, ko zai haifar da ci gaba da ƙirƙira? Kwatanta da aikin tushe na Isola et al. akan Pix2Pix (Fassarar Hoto-zuwa-Hoto tare da Cibiyoyin Sadarwa na Adawa Mai Sharadi, CVPR 2017) ya dace don hanyar bayanan nau'i-nau'i, amma yankin sararin samaniya na 3D yana ƙara rikitarwa da yawa.

Fahimta Mai Aiki: Ga masu saka hannun jari, wannan yana nuna yanki mai cikawa: Kayan aikin ƙirƙirar abun ciki na 3D masu ƙarfin AI don dandamali masu shiga ciki. Taswirar hanya nan take ya kamata ta haɗa da haɗin gwiwa tare da masu yin kayan aikin VR (Meta Quest, Apple Vision Pro) don haɗin kai na asali. Ga masu haɓakawa, buɗe tushen KO3DClothes (idan an tsara shi) zai hanzarta ci gaban yanayin. Matsala ta fasaha ta gaba ita ce motsawa daga samar da tufafi masu tsayi zuwa yadudduka masu motsi, masu kwaikwaiyo. Haɗin gwiwa tare da binciken kwaikwaiyo na tushen ilimin lissafi, watakila amfani da cibiyoyin sadarwar jijiyoyi na jadawali kamar yadda aka gani a cikin ayyukan daga Laboratory na Kimiyyar Kwamfuta da Fasahar Wucin Gadi na MIT (CSAIL) akan kwaikwaiyo na tushen koyo, yana da mahimmanci. A ƙarshe, tsarin kasuwanci ya kamata ya duba fiye da ƙirƙira sau ɗaya zuwa kasuwa ko biyan kuɗi don kadarorin salon da AI ya samar, ƙirƙirar tattalin arziƙin rufaffiyar ƙirƙira da amfani.

5. Cikakkun Bayanai na Fasaha & Tsarin Lissafi

Ƙirar diffusion mai sharadi tana aiki a cikin sararin samaniya. Idan aka ba da wakilcin siffar 3D mai hayaniya $X_t$ a lokacin t$ da kuma sharadin zane latent $z_s$, ƙirar tana koyon tsinkayar hayaniyar $\epsilon_\theta(X_t, t, z_s)$ da za a cire. Tsarin cire hayaniya na baya an ayyana shi ta:

$p_\theta(X_{0:T} | z_s) = p(X_T) \prod_{t=1}^{T} p_\theta(X_{t-1} | X_t, z_s)$

inda $p_\theta(X_{t-1} | X_t, z_s) = \mathcal{N}(X_{t-1}; \mu_\theta(X_t, t, z_s), \Sigma_\theta(X_t, t, z_s))$

An horar da ƙirar don inganta bambance-bambancen ƙananan iyaka, kamar yadda ake amfani da shi a cikin ƙirar diffusion masu tsinkayar hayaniya (DDPM):

$L_{simple} = \mathbb{E}_{t, X_0, \epsilon} [\| \epsilon - \epsilon_\theta(\sqrt{\bar{\alpha}_t} X_0 + \sqrt{1-\bar{\alpha}_t} \epsilon, t, z_s) \|^2]$

inda $\epsilon$ hayaniyar Gaussian ce, kuma $\bar{\alpha}_t$ aiki ne na jadawalin hayaniya.

6. Tsarin Bincike & Misalin Lamari

Tsarin Ƙimar Kayan Aikin AI na Ƙirƙira:

  1. Amincin Shigarwa: Yaya tsarin yake fassara niyyar mai amfani daga shigarwar da ba ta cika ba? (DeepVRSketch+ yana amfani da mai shigar da zane-zane da koyon tsarin karatu don magance wannan).
  2. Ingancin Fitowa: Shin abun cikin da aka samar yana aiki da amfani kuma yana da ma'ana a zahiri? (An auna shi da CD, FID, da gamsuwar mai amfani).
  3. Leverage na Ƙirƙira: Shin kayan aikin yana ƙara ƙirƙirar ɗan adam ko yana maye gurbinsa? (Wannan tsarin yana cikin sansanin ƙari, yana kiyaye mai amfani "a cikin madauki").
  4. Haɗin Dandamali: Yaya sakamakon ya haɗu cikin sauƙi zuwa cikin bututun ƙasa? (Yanki don aikin gaba, kamar yadda aka lura).

Misalin Lamari - Ƙirar Jaket na Kama-da-wane:

  1. Aikin Mai Amfani: Mai amfani ya saka VR headset kuma ya yi amfani da mai sarrafawa don zana siffar jaket ɗin bomber a kusa da mannequin na 3D. Zane-zanen yana da ƙaƙƙwara, tare da layukan igiyar ruwa.
  2. Tsarin Sarrafawa: Mai shigar da zane-zane $E_s$ yana fitar da niyyar sararin samaniya. Ƙirar diffusion, wanda aka sharadi akan wannan vector latent, ta fara tsarin cire hayaniya daga hayaniya bazuwar, an jagorance shi zuwa siffofi waɗanda suka dace da rarraba zane-zane da aka koya daga KO3DClothes.
  3. Fitowa: A cikin daƙiƙa, cikakken raga na 3D na jaket ɗin bomber ya bayyana, tare da ninkewa masu ma'ana, tsarin abin wuya, da lissafin zipper da aka ƙaddara, ba zane ba.
  4. Matakai na Gaba (Hangen Nesa na Gaba): Mai amfani sannan ya zaɓi "denim" daga palette na kayan, kuma wani keɓaɓɓen na'urar AI ta yi launi ga ƙirar. Sannan sai suka gan shi yana kwaikwayo akan avatar ɗinsu a cikin madubi na kama-da-wane.

7. Aikace-aikacen Gaba & Taswirar Ci Gaba

Gajeren Lokaci (shekaru 1-2):

  • Haɗawa azaman plugin/siffa a cikin shahararrun dandamali na VR na zamantakewa (VRChat, Horizon Worlds).
  • Haɓaka sigar AR ta wayar hannu ta amfani da LiDAR/ na'urori masu zurfi don "zane-zane a sararin samaniya."
  • Faɗaɗa KO3DClothes don haɗa da ƙarin nau'ikan tufafi, launuka, da zane-zane masu kallo da yawa.

Matsakaicin Lokaci (shekaru 3-5):

  • Samar da kayan sawa na dukan jiki daga jerin zane-zane.
  • Haɗin ƙira na ainihin lokaci: masu amfani da yawa suna zane-zane tare a cikin sararin VR ɗaya.
  • Ƙirar ƙira mai taimakon AI don samar da tufafi na zahiri, haɗa ƙirar dijital da salon zahiri.

Hangen Nesa na Dogon Lokaci:

  • Ƙirar tushe don samar da siffar 3D daga shigarwa daban-daban masu shubuha (zane-zane, rubutu, ishara).
  • Mahimmanci ga tufafin asalin dijital na mai amfani, mai aiki a duk abubuwan gani na metaverse.
  • Ƙaddamar da ƙirar salon zahiri na musamman, bisa buƙata.

8. Nassoshi

  1. Y. Zang et al., "Daga Iska zuwa Sawa: Salon Dijital na 3D Na Musamman tare da Shiga cikin Zane-zane na 3D na AR/VR," Journal of LaTeX Class Files, 2021.
  2. P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros, "Fassarar Hoto-zuwa-Hoto tare da Cibiyoyin Sadarwa na Adawa Mai Sharadi," CVPR, 2017. (Aikin tushe akan fassarar hoto nau'i-nau'i).
  3. J. Ho, A. Jain, P. Abbeel, "Ƙirar Diffusion Masu Tsinkayar Hayaniya," NeurIPS, 2020. (Tushe don hanyar ƙirar diffusion).
  4. NVIDIA Omniverse, "Dandamali don Haɗa Kayan Aikin 3D da Kadarori," https://www.nvidia.com/en-us/omniverse/.
  5. MIT CSAIL, "Bincike akan Kwaikwaiyo na Tushen Koyo," https://www.csail.mit.edu/.
  6. J.-Y. Zhu, T. Park, P. Isola, A. A. Efros, "Fassarar Hoto-zuwa-Hoto mara Nau'i-nau'i ta amfani da Cibiyoyin Sadarwa na Adawa Masu Daidaituwa," ICCV, 2017. (CycleGAN, don yanayin fassarar mara nau'i-nau'i, kwatancen ga hanyar bayanan nau'i-nau'i na wannan aikin).