Fusion 2 - Fusion Marker Based AR Collaboration

This document is about: FUSION 2

SWITCH TO

FUSION 2-SHARED FUSION 1 FUSION 2

Level

ADVANCED

Topology

SHARED AUTHORITY

Overview

The Marker Based AR Collaboration sample demonstrates how to build augmented reality use cases based on the detection of markers (QR Codes or ArUco).

In the first example scene, we show how a marker can be used in a colocation scenario where multiple users are physically in the same room.
When one user detects a marker, the application creates a network anchor linked to that marker. This network anchor is positioned in 3D space at the marker's real-world location.
When a second user detects the same marker, the application associates it with the existing network anchor. The second user's scene is then realigned so that the newly detected anchor matches the previously created network anchor.
Because each user is aligned to the same marker synchronized over the network, all users become colocated without needing to scan the room beforehand.

The second scenario illustrates how two users in different physical locations can collaborate effectively using a marker.
This is particularly relevant in the case of an on-site technician requesting technical support from a remote engineer.
The on-site technician is in a location with equipment that has a marker attached. First, the technician performs a calibration so the position of the marker on the real object can be determined.
After calibration is saved, each time the marker is detected, the system can automatically determine the technician's exact position relative to the equipment.
When the remote support engineer connects, they can precisely see the technician's location with respect to the equipment.
Additionally, a giant mode lets the user change scale to obtain an overall bird's-eye view of the scene.
Each user can also stream the video captured by their headset for richer remote assistance.

Technical Info

This sample uses the Shared Authority topology.
The project has been developed with Unity 6 & Fusion 2 and tested with the following packages :
- Meta XR Core SDK 78.0.0 : com.meta.xr.sdk.core
- Meta MR Utility Kit 78.0.0 : com.meta.xr.mrutilitykit
- Unity OpenXR Meta 2.1.1 : com.unity.xr.meta-openxr
- OpenCV for Unity 2.6.6 (optionnal)
Headset firmware version: v79 & 81
The video broadcast is done using Photon Video SDK v2.58. Please note that a specific patch has been applied here because the official video SDK v2.58 preview resolution does not match the resolution of the video stream. This feature will be supported in a coming new version of the Video SDK.
Compilation : the SubsampledLayoutDesactivation editor script manages disabling the Meta XR Subsampled Layout option automatically : please remove it if it is not the desired behaviour.

Before you start

To run the sample :

Create a Fusion AppId in the PhotonEngine Dashboard and paste it into the App Id Fusion field in Real Time Settings (reachable from the Fusion menu).
Create a Voice AppId in the PhotonEngine Dashboard and paste it into the App Id Voice field in Real Time Settings

Download

Version	Release Date	Download
2.0.7	12月 05, 2025	Fusion Marker Based AR Collaboration 2.0.7

Version	Release Date	Download
2.0.7	12月 05, 2025	Fusion Marker Based AR Collaboration Without Video SDK 2.0.7

Folder Structure

The main folder /Marker_Based_AR_Collaboration contains all elements specific to this sample.

The /Photon folder contains the Fusion and Photon Voice/Video SDK.

The /Photon/FusionAddons folder contains the Industries Addons used in this sample.

The /Photon/FusionAddons/FusionXRShared folder contains the rig and grabbing logic coming from the VR shared sample, creating a FusionXRShared light SDK that can be shared with other projects.

The /XR folder contains configuration files for virtual reality.

Architecture overview

The Marker Based AR Collaboration sample is based on the same code base as that described in the VR Shared page, notably for the rig synchronization.

Aside from this base, the sample, like the other Industries samples, contains some extensions to the Industries Addons, to handle some reusable features like synchronized rays, locomotion validation, touching, teleportation smoothing or a gazing system.

For both scenarios in this sample, the architecture is very similar.

Marker tracking

Marker types

The project is compatible with two types of visual markers :

QR Codes
ArUco

QR code detection is based on the Meta MR Utility Kit (MRUK). Now, it supports Trackables of QR code type, as an experimental API. Please see the "Track QR Codes in MR Utility Kit" documentation page on the Meta website, to see the current status, limitations, and how to enable it.

For ArUco markers detection and tracking, a modified version of the Quest ArUco Marker Tracking project has been used. It requires the OpenCV for Unity asset available in the Unity on the Asset Store.

The project can operate with a single marker type (QR code or ArUco) or both simultaneously.

Note :

We do not distribute OpenCV for Unity with the project. You must purchase it yourself and add it to the project. Once installed, it is automatically detected, enabling ArUco marker support.
Using ArUco markers requires compiling the application, whereas QR codes work directly in the Unity editor.
Due to temporary technical constraints with the Video SDK, ArUco marker tracking is disabled when a video stream is started.

If QR code detection doesn't work with a compiled application, specially after an headset reboot, try to enable the experimental mode with adb command (adb shell setprop debug.oculus.experimentalEnabled 1)

Marker management

The management of markers is divided into several layers :

The first layer, IRLAnchorTracking, is responsible for the visual detection of the markers located in the user's room in order to generate the associated anchors.
It will notify through events as soon as a change occurs on an anchor and will calculate a stabilized position for the anchors.
A second layer will then handle the processing of this information.
For the first colocation scenario, this is done by the IRLRoomManager class, whereas for the remote-assistance scenario it is done by the AnchorBasedObjectSynchronization class.
These two classes implement the IRLAnchorTracking.IIRLAnchorTrackingListener interface so that they are notified as soon as a change has been detected on an anchor.

For each detected marker, several prefabs are associated :

detectedIrlAnchorTagPrefab : this is the current position of the marker reported by the detection mechanism (MRUK for QR codes or OpenCV for ArUco markers). This prefab is not intended to be displayed except for debugging purposes, in particular when using ArUco markers, because the current position may show large variations depending on the user's head movements.
stabilizedIrlAnchorTagPrefab : this is used to represent a stabilized version of the marker position. It is calculated based on an average of the previous positions (the length of the position history used in the calculation is configurable).
representationIrlAnchorTagPrefab : this is the visual element that is displayed when an anchor has remained stabilized at the same position for a sufficient amount of time.

The information related to an anchor is grouped in the IRLAnchorInfo class, while the visual aspect of the anchors is handled by the AnchorTag class that is representing a point detected in space: it can be used to track marker detection results, or to visualize stabilized version of those detected position.

Network Connection

The network connection is managed by the Meta building blocks [BuildingBlock] Network Manager && [BuildingBlock] Auto Matchmaking.

[BuildingBlock] Auto Matchmaking set the room name and Fusion topology (Shared mode).

[BuildingBlock] Network Manager contains the Fusion's NetworkRunner. The UserSpawnercomponent, placed on it, spawns the user prefab when the user joins the room and handles the Photon Voice connection.

Network parameters

The user can change the network settings using radial menu button under the left hand.

It can be usefull to select a specific room, region or to use a local server.
Please note that a watch is attached to the use's HardwareRig, so the user can interact with the watch and open the network settings menu even when not connected to the network.

Watch Interaction

The watch menu opens when the user looks at the watch. Buttons appear, and the user can trigger actions by pressing them.

For both senarios, the buttons displayed above the watch trigger actions directly, while the settings buttons below the watch open configuration windows.

See Watch Menu Addon for more details.

Please note that the prefab spawned for each user contains 2 watches :

one for the hand model driven by the controllers
one for the hand model driven by the finger tracking

The RigPartVisualizer enables/disables watches according to current hand tracking mode.

Colocation scenario

The Marker_Based_Colocation scene is very simple, because the passthrough is enabled and there is no 3D environment.
Each user who connects is represented by an avatar.
Before colocation is performed, the position of the other users' avatars does not match their actual position in the room.
Once colocation is completed, each user's avatar should correspond to their real-world position in the room.

Regarding the user colocation feature, the following are covered by the Anchor addon used here:

Multiple people located in a single room, with one or more markers in the room.
Several groups of people located in different rooms. Each room has one or more markers. A person does not change rooms.
Multiple people moving from room to room in the same building, each room equipped with a marker.

In this sample, we illustrate the first case : the goal in the demo scene is to correctly position people located in the same room.
For this, at least one marker must be visible in the room where the participants are located.

As soon as a person connects and detects a marker, a network anchor is created at the marker's position as detected by the headset.
When another person connects, thanks to real-time network synchronization, they detect that another user has already detected this marker.
They are then repositioned in the scene so that their position corresponds to their actual location in the room.

Note: the same scene can be in fact also used for the second scenario, several group of people in several rooms, with no change

Colocalization logic

The detailed mechanism of the colocalization is described in the colocalization chapter of the Ancors add-on documentation.

To summarize, several classes support this feature:

NetworkIRLRoomMember:
- This component located on the network rig manages a user's presence in a room.
- When a user connects, a random room identifier (RoomId) is assigned to them and registered in the IRLRoomManager.
NetworkIRLRoomAnchor:
- The markers detected by a user in a room are represented by this class. The AnchorId parameter corresponds to the marker's payload. The RoomId is based on the room Id of the user who detects it (NetworkIRLRoomMember.RoomId).
IRLRoomManager:
- The scene has a room manager game object with this class, to manage all users and anchors detected in the room by those users.
- It implements the IRLAnchorTracking.IIRLAnchorTrackingListener interface in order to be notified whenever a marker is detected by the headset.
- It tracks all real life rooms detected, and stores which anchors and members are present in it.
- It takes cares of triggering colocations when a member see an anchor with an anchor id previously detected in another room by another user: the 2 rooms are merged, and networked anchors and members in the merged room are moved to make the real life anchor match its preexisting virtual counterpart position.

Tracking settings

The user can modify the tracking settings using the radial-menu button located under the left hand.
The tracking settings UI is managed by the TrackingSettingsMenu class.
In this menu, they can choose which type of marker will be used for tracking. Note that the button related to ArUco markers will be interactable only if OpenCV is installed in the project.

The Marker Stability parameter sets the amount of time required for a marker to be considered stable at a fixed position (expectedDetectedAnchorsStabilityDuration variable of IRLAnchorTracking).

Finally, for ArUco markers, it is necessary to specify the size of the markers being used. If the selected size does not match the actual marker, the anchor will appear either in front of or behind the real marker.

Scene mapping

In addition to colocalization, another use case for visual markers is to map virtual elements onto the real-world scene observed in AR (for example, to change the room's décor).
If the position of a marker in the real environment is known, then once the headset detects that marker we can deduce the user's position in the room and precisely overlay graphical elements.

A simple way to test this feature is to place an IRLRoomAnchor directly in the Unity scene and attach the associated visual as a child, so it depends on the position of that marker.

For more complex scenarios that use multiple markers, where it is difficult to measure their exact real-world positions in advance, you need to implement a calibration mechanism, which consists of :

Placing the virtual elements in the real room at the desired positions.
Detecting the positions of the markers with the headset.
Saving the associated information (the positions of the virtual elements relative to the markers) so that later, when the associated marker is detected, the virtual element can be positioned correctly.

A similar calibration mechanism is used in the remote collaboration scenario.

Remote Collaboration scenario

The remote collaboration scenario is illustrated by the Marker_Based_AR_Collaboration scene.

This scenario offers several features to enable effective collaboration when users are in different locations:

Real-time 3D object repositioning using a marker, so that a remote participant can know the on-site person's position relative to the equipment with the marker.
Giant mode, which allows changing scale to gain an overall bird's-eye view of the scene, helping users understand the environment.
The 'World Move' feature, which lets the remote user correctly position themselves in the scene relative to the on-site user.
Streaming of the video captured by the headset camera.

Tracking settings

Like for the colocation scenario, the user can modify the tracking settings using the radial-menu button located under the left hand.
The tracking settings UI is managed by the RepositioningTrackingSettingsMenu class.
In this menu, they can choose which type of marker will be used for tracking. Note that the button related to ArUco markers will be interactable only if OpenCV is installed in the project.
For ArUco markers, it is necessary to specify the size of the markers being used. If the selected size does not match the actual marker, the anchor will appear either in front of or behind the real marker.

To perform the calibration, a 3D model representing the real equipment is spawned when the window opens (see model manager to change the 3D model).
It is possible to adjust its transparency or even completely disable the visual.
The size of the object is also configurable, mainly for debugging purposes when the object is very large.

Calibration

The calibration process simply consists of positioning the virtual object at the same location as the real object and pressing the Save Calibration button.
This button is enabled only when at least one marker is visible to the headset camera (checks that at least one tracking system, QR code or ArUco, is enabled if the button is never interactable).

Once the calibration is saved, it does not need to be repeated on the next launch of the application : as soon as the marker used during the calibration is detected by the headset, the 3D model will spawn and will be repositioned according to the detected marker's position and the calibration data.

ModelManager

The ModelManager centralizes all information related to the 3D model that must be spawned during calibration or whenever the marker associated with a calibration is detected.
Any changes the user makes to the 3D model through the tracking settings interface are stored in this component (model scale, visibilty & transparency).

To change the 3D model, simply modify the AnchoredModelSettings scriptable object reference.
The scriptable object allows you to specify:

the prefab to spawn
its scale
the material's transparency
the spawn position relative to the user's head position

The prefab of the 3D model object should contains the following components :

Transform & NetworkTransform to synchronize objet position,
Grabbable & NetworkGrabbable to grab the objet (required during calibration process),
NetworkVisibilty to change & synchronize the model visibilty (enable or disable),
ModelScaleChanger to change the model scale (SyncScale option in the NetworkTransform component should be enabled),
ModelPositionChanger to control the object position & rotation.

In addition, it can be useful to add MagnetCoordinator and AttractableMagnet components so that the object can be easily positioned horizontally (on the floor or any other object with an AttractorMagnet).

Object repositionning

If a calibration has been done, when a marker is detected, the application will spawn the 3D model and reposition it according to the detected marker's position and the calibration data.
This repositionning is managed by the AnchorBasedObjectSynchronization class.

To prevent the model from being repositioned every time the anchor shifts slightly, a threshold is defined.
Depending on the size of the detected object, it may be necessary to adjust the minPositionChangeForUpdate parameter.
For example, a precision of 1 cm might be appropriate for small objects, whereas a precision of 10 cm may be sufficient for larger ones.

Please note that the repositioning algorithm is currently optimized to detect a static object and to compensate for head movements using a stabilization algorithm.
This is especially important when tracking ArUco markers, where marker positions, reported at a high frequency, can vary significantly.
It is less necessary with QR code detection through the Meta API, because the reported position is already stabilized.

To track a moving object, the history used to calculate a stabilized anchor position would need to be reduced, or even removed, so the system can react more quickly.

Giant mode

This feature allows the user to scale up and view the entire scene from a towering perspective.
When the user press the 'Giant mode' button, it calls the Swap() method of the ChangeScale component (see ScaleChanger game object in the scene).
The scale of the ScaleChanger game object in the scene defined the target scale when the feature is enabled.

To synchronize the user's scale for remote participants, the SyncScale option in the NetworkTransform component of the network rig is enabled.

Locomotion

To effectively assist the on-site user, the remote user may need to move to a specific position, especially to perform hand-gesture interactions.
However, because users are in augmented reality, using traditional virtual reality style teleportation can feel strange, since their real environment does not visually move.
That's why we developed a different approach here : instead of the person moving themselves, the remote user moves the world.
But from the on-site user's perspective, the remote person appears to move normally.

When the 'World Move' feature is enabled, a grid is displayed (class DisplayWorldLocomotionAnchor) and the user simply needs to grab in front of them and move their hand to initiate movement.
They will have the visual impression of moving the world (hence the name of the feature), but technically, this action moves the user within the scene.
The distance the user moves is proportional to the velocity of their hand movements (see class SelfLocomotionGrabbable for more details).

Camera Streaming

Each user can decide to start/stop streaming their camera to the other users using the watch radial menu.
When streaming is active, in addition to the text on the watch, visual feedback informs the user that they are sharing their camera.
Other remote users can then view this video stream on the screen that appears when the stream is launched.
This screen is located in front of the user sharing their camera, giving the impression of having a kind of portal onto their real environment.
By pressing the watch streaming button again, you can place the screen at a fixed position in the scene, avoiding having a screen that is always moving.
The Meta camera and streaming resolution can be changed using the settings button under the left hand.
Note that several users can share their camera simultaneously.

To stream the camera, we use the Photon Video SDK. It is a special version of the Photon Voice SDK, including support for video streaming.
As the requirement is very similar to screen sharing, we are reusing the developments that were included in the ScreenSharing Addon.

Notably, being able to use Android video surfaces alongside XR single pass rendering requires a specific shader, that is included in the add-on, as well as some additional components handling those specific textures.

Camera Permission

In order to access to the Meta Quest Camera, it is required to request the permissions.

This is managed by the WebCamTextureManager & PassthroughCameraPermissions components located on the WebCamTextureManagerPrefab game object.

Both scripts are provided by Meta in the Unity-PassthroughCameraAPISamples.

The WebCamTextureManagerPrefab scene game object is disabled by default and it is automatically activated by the VoiceConnectionFadeManager when the Photon Voice connection is established. It is required to prevent running several authorization requests at the same time.

Camera Video Emission

Camera streaming is managed by the WebCamEmitter class of the ScreenSharing addon.

The user can start/stop the camera streaming using the watch button.
In addition, the streaming user can decide whether the screen can be anchored in the scene, which avoids having a screen that is always moving.
For this, the streaming watch button calls the StreamingHandler ToggleEmitting() methods when the user touches the watch.
This toggles between the following 3 states :

no streaming,
streaming screen following the user's head,
streaming screen at a fixed position,

To start or stop the camera streaming, the NetworkWebcam component located on the user prefab calls the WebCamEmitter ToggleEmitting() method.

Camera Video Reception

The ScreensharingReceiver scene game object manages the reception of the camera stream.

It waits for new voice connections, on which the video stream will be transmitted. Upon such a connection, it creates a video user and a material containing the video user texture.
Then, with EnablePlayback(), it pass it to the ScreenSharingScreen which manages the screen renderer visibility : the screen will then change its renderer material to this new one.

The special case to be managed in this sample compared with the default screensharing addon is that there is a screen for receiving the video stream for each user.
So, ScreenSharingScreen and NetworkWebcam components are located on the networked user prefab, and the ScreenSharingEmitter passes in the communication channel user data the network object Id of the object containing the target screen.
This way, the ScreensharingReceiver looks for this object instead of using a default static screen.

The NetworkWebcam component :

references itself on the WebCamEmitter as the networkScreenContainer,
configures the screen mode (display or not of the video stream for the local user),
determines whether users stream the camera by default when they join the room,

Camera Resolution

The user can change the Meta camera resolution at runtime using the streaming settings menu (button under the left hand).

A UI is then displayed, showing the different resolutions supported by the Meta Camera API.
The streaming resolution is automatically adapted according to the Meta Camera resolution setting thanks to the WebcamEmitter InitializeRecorder() methods called when a new streaming is started.
If a stream is in progress when the user changes the resolution, the transmission is stopped and then automatically restarted.

Important notes about application configuration and deployment

The Video SDK is incompatible with some options that the Meta tooling might suggest you to activate (see the included PhotonVoice/readme-video.md for more details on the supported configurations).

To configure properly the project, you need:

Graphics jobs disabled
Multithreading rendering disabled

Used XR Addons & Industries Addons

To make it easy for everyone to get started with their 3D/XR project prototyping, we provide a comprehensive list of reusable addons.
See Industries Addons for more details.

Here are the addons we've used in this sample.

XRShared

XRShared addon provides the base components to create a XR experience compatible with Fusion.
It is in charge of the users' rig parts synchronization, and provides simple features such as grabbing and teleport.

See XRShared for more details.

Anchors

We use the Anchors addon to detected visual markers.
See Anchors Addon for more details.

We use the Watch menu addon to allow the user to access the features provided by the sample.
Note that we use subclasses of RadialMenuButtonAction & RadialMenuButtonWindows so that the buttons related to streaming are displayed only if the Photon Video SDK is installed.

See Watch Menu Addon for more details.

Voice Helpers

We use the VoiceHelpers addon for the voice integration.

See VoiceHelpers Addon for more details.

We use the ScreenSharing addon to stream the Meta Quest Camera.

See ScreenSharing Addon for more details.

Meta Core Integration

We use the MetaCoreIntegration addon to synchronize users' hands.

See MetaCoreIntegration Addon for more details.

XRHands synchronization

The XR Hands Synchronization addon shows how to synchronize the hand state of XR Hands's hands (including finger tracking), with high data compression.

See XRHands synchronization Addon for more details.

Feedback

We use the Feedback addon to centralize sounds used in the application and to manage haptic & audio feedbacks.

See Feedback Addon for more details.

3rd Party Assets and Attributions

The sample is built around several awesome third party assets:

Oculus Integration
- Oculus Lipsync
- Oculus Sample Framework hands
OpenCV for Unity
Sounds
- nathangibson-Universal UI Soundpack

Marker Based AR Collaboration

Overview

Technical Info

Before you start

Download

Folder Structure

Architecture overview

Marker tracking

Marker types

Marker management

Network Connection

Network parameters

Watch Interaction

Colocation scenario

Colocalization logic

Tracking settings

Scene mapping

Remote Collaboration scenario

Tracking settings

Calibration

ModelManager

Object repositionning

Giant mode

Locomotion

Camera Streaming

Camera Permission

Camera Video Emission

Camera Video Reception

Camera Resolution

Important notes about application configuration and deployment

Used XR Addons & Industries Addons

XRShared

Anchors

Watch Menu

Voice Helpers

Screen Sharing

Meta Core Integration

XRHands synchronization

Feedback

3rd Party Assets and Attributions