perceptor.models.stable_diffusion

class perceptor.models.stable_diffusion.StableDiffusion(name: str = 'runwayml/stable-diffusion-v1-5', decoder_name: Optional[str] = 'stabilityai/sd-vae-ft-mse', fp16: bool = True, auth_token: Union[bool, str] = True, flash_attention: bool = True, attention_slicing: Optional[Union[int, Literal['auto']]] = None)[source]

Bases: Module

alphas(indices) Tensor[source]
conditioning(texts: ~typing.List[str] = [''], inpainting_masks: ~typing.Optional[~lantern.tensor.Tensor.dims.<locals>.InheritTensor] = None, inpainting_images: ~typing.Optional[~lantern.tensor.Tensor.dims.<locals>.InheritTensor] = None, mask_blur=4.0) Conditioning[source]

Create a conditioning object from a list of texts. Unconditional is an empty string.

Parameters
  • texts – A list of texts to condition on. Unconditional is an empty string

  • inpainting_masks – A tensor of masks to condition on. Must be 1-channel and between 0 and 1

  • inpainting_images – A tensor of images to condition on. Must be 3-channel and between 0 and 1

decode(latents: ~lantern.tensor.Tensor.dtype.<locals>.InheritTensor) InheritTensor[source]
property device
diffuse_latents(denoised_latents, indices, noise=None) Tensor[source]
encode(images: ~lantern.tensor.Tensor.dtype.<locals>.InheritTensor, method='mode') InheritTensor[source]
finetuneable_vae()[source]
with diffusion_model.finetuneable_vae():

images = diffusion_model.decode(latents)

forward(diffused_latents: Tensor, indices: Tensor, conditioning: Optional[Conditioning] = None) Predictions[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

images(latents: ~lantern.tensor.Tensor.dtype.<locals>.InheritTensor) InheritTensor[source]
indices(indices) Tensor[source]
latent_masks(masks, blur)[source]
latents(images: ~lantern.tensor.Tensor.dtype.<locals>.InheritTensor) InheritTensor[source]
predicted_noise(diffused_latents, from_indices, conditioning: Conditioning) Tensor[source]
predictions(diffused_latents, indices, conditioning) Predictions[source]
random_diffused_latents(shape) Tensor[source]
sample(text: str, from_index: int = 999, to_index: int = 0, n_steps: int = 50, guidance_scale: float = 7.0, n_resample: int = 0, init_image: Optional[Tensor] = None, inpainting_mask: Optional[Tensor] = None, mask_blur: float = 4.0, replace_diffused: bool = True)[source]

Helper function to sample a single image.

Parameters
  • text – The text to condition on

  • from_index – The index to start sampling from

  • to_index – The index to end sampling at

  • n_steps – The number of steps to take between from_index and to_index

  • guidance_scale – The scale of the guidance signal

  • n_resample – The number of times to resample at each step

  • init_image – The initial image to start sampling from (also used for inpainting)

  • inpainting_mask – The mask to use for inpainting

  • mask_blur – The amount of blur to apply to the inpainting mask

  • replace_diffused – Whether to replace the diffused latents at each step (peeks into the init image so it’s not true inpainting)

schedule_indices(n_steps=500, from_index=999, to_index=0, rho=3.0) Tensor[source]
property shape
sigmas(indices) Tensor[source]
text_encodings(texts)[source]
training: bool
class perceptor.models.stable_diffusion.Predictions(*, from_diffused_latents: ~lantern.tensor.Tensor.dims.<locals>.InheritTensor, from_indices: ~lantern.tensor.Tensor.dims.<locals>.InheritTensor, predicted_noise: ~lantern.tensor.Tensor.dims.<locals>.InheritTensor, schedule_alphas: ~lantern.tensor.Tensor, schedule_sigmas: ~lantern.tensor.Tensor, encode: ~typing.Callable[[~lantern.tensor.Tensor.dims.<locals>.InheritTensor], ~lantern.tensor.Tensor.dims.<locals>.InheritTensor], decode: ~typing.Callable[[~lantern.tensor.Tensor.dims.<locals>.InheritTensor], ~lantern.tensor.Tensor.dims.<locals>.InheritTensor])[source]

Bases: FunctionalBase

alphas(indices) Tensor[source]
classifier_free_guidance(positive_predictions: Predictions, guidance_scale=7.0) Predictions[source]
correction(previous: Predictions) Predictions[source]
decode: InheritTensor]
property denoised_images: Tensor
property denoised_latents: Tensor
property device
dynamic_threshold(quantile=0.95) Predictions[source]

Thresholding heuristic from imagen paper

encode: InheritTensor]
forced_denoised_latents(denoised_latents) Predictions[source]
forced_predicted_noise(predicted_noise) Predictions[source]
property from_alphas: Tensor
from_diffused_latents: InheritTensor
from_indices: InheritTensor
property from_sigmas: Tensor
guided(guiding, guidance_scale=0.5, clamp_value=1e-06) Predictions[source]
indices(indices) Tensor[source]
latent_dynamic_threshold(quantile=0.95) Predictions[source]
noisy_reverse_step(to_indices) Tensor[source]
predicted_noise: InheritTensor
resample(resample_indices) Tensor[source]

Harmonizing resampling from https://github.com/andreas128/RePaint

resample_noise(resample_indices) Tensor[source]
reverse_step(to_indices) Tensor[source]
schedule_alphas: Tensor
schedule_sigmas: Tensor
sigmas(indices) Tensor[source]
step(to_indices, eta=0.0) Tensor[source]

Reduce noise level to to_indices

Parameters
  • to_indices – Union[Tensor, Tensor.shape(“N”), float]

  • eta – float

Returns

torch.Tensor.shape(“NCHW”)

Return type

diffused_images

wasserstein_distance() Tensor[source]
wasserstein_square_distance() Tensor[source]
class perceptor.models.stable_diffusion.Conditioning(model_name: str, encodings: ~lantern.tensor.Tensor, inpainting_latent_masks: ~typing.Optional[~lantern.tensor.Tensor.dims.<locals>.InheritTensor] = None, inpainting_latents: ~typing.Optional[~lantern.tensor.Tensor.dims.<locals>.InheritTensor] = None)[source]

Bases: Module

property device
input(diffused_latents)[source]
training: bool