Skip to content

Conversation

@natestemen
Copy link
Member

Description: This PR adds a new compiler "UCC+DDD" just to start seeing how error mitigation can fit into this workflow. As you can see via the code, it's quite straightforward since there is no post-processing of results needed.

fixes #184 (at least a first pass at it; maybe more follow on work).

Question: I wasn't sure where to add this new compiler to the existing benchmarks. Any recommendations?

@natestemen natestemen requested a review from Copilot November 5, 2025 05:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new compiler that combines UCC compilation with Digital Dynamical Decoupling (DDD) error mitigation from Mitiq. The implementation creates a UCCDDDCompiler class that extends the base compiler interface to apply DDD after UCC compilation, providing a straightforward integration of error mitigation into the existing workflow.

Key Changes:

  • Added UCCDDDCompiler class implementing UCC compilation followed by Mitiq's Digital Dynamical Decoupling
  • Integrated the new compiler into the project's compiler registry and simulation benchmarks
  • Added mitiq as a project dependency

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
src/ucc_bench/compilers/ucc_ddd_compiler.py New compiler implementation combining UCC with DDD error mitigation
src/ucc_bench/compilers/init.py Export the new UCCDDDCompiler class
pyproject.toml Add mitiq dependency for DDD functionality
benchmarks/simulation_benchmarks.toml Register ucc-ddd compiler in simulation benchmarks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@natestemen natestemen requested a review from bachase November 5, 2025 06:04
Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This look great overall and I think you added in good spots.

However, when I ran, I see the following error in the logs

Task failed for context: {'compiler_id': 'ucc-ddd', 'compiler_version': 'UCC:0.4.11+mitiq:0.48.0', 'benchmark_id': 'qv', 'target_device': None}. Error: 'Compiled circuit contained unsupported gates: ['id', 'x']'

Would it help to add a unit test by adding the compiler here?


@classmethod
def version(cls) -> str:
return f"UCC:{ucc_version}+mitiq:{mitiq_version}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting question this raises, as all other compilers return the single version string of that compiler. This change is reasonable, but would complicate the labels in the performance by time plots.

I believe seeing how performance changes with new software versions is the main use of this version string. For composite compilers like this, that becomes tricky to define a version order.

That being said, I'm ok to leave that an open question, but I wanted to flag.

@natestemen
Copy link
Member Author

Thanks for your help getting this running Brad!

I took a stab at adding a more complex noise model, but I wasn't able to find a particular noise model/set of parameters where DDD improved performance. If the criteria to merge is improved performance, then I'll need to continue playing with noise models and parameters, but it's a bit of a slow iteration loop currently.

I should note that using a fake IBM device was attempted, but I was not able to find a device that was both simulable on my computer, and fit the current benchmarks. E.g. the benchmarks fit on the 27-qubit QPU's, but I was not able to simulate them in a reasonable time (~45 min).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@bachase
Copy link
Collaborator

bachase commented Nov 18, 2025

wasn't able to find a particular noise model/set of parameters where DDD improved performance

Is this because the simulations are taking too long? Guessing it won't be any better for me, but happy to try running it as well.

criteria to merge is improved performance

Not necessarily, as there could be value in an example of combining tools as you do here. Is there a "negative" result here that is still worth tracking over time?

Also #97 might also be relevant here.

@natestemen
Copy link
Member Author

wasn't able to find a particular noise model/set of parameters where DDD improved performance

Is this because the simulations are taking too long?

No, just after fiddling with parameters I didn't stumble upon anything that made UCC+DDD perform better than just UCC. I wonder if running it on a real device (not the fake simulator) might show performance. Is that something ucc-bench can handle if I specify something like this in the toml file?

[[target_devices]]
id = "ibm_algiers"

Also #97 might also be relevant here.

Good point. cc @jordandsullivan since the plots in that issue look way better. Where can I find that plotting script, and also maybe worth running all the sim benchmarks with this more complex noise model to see how things perform.

@natestemen
Copy link
Member Author

natestemen commented Nov 21, 2025

Some results from some experiments below. As you can see there isn't a noise model config that consistently produces better results with DDD, but as @jordandsullivan suggested, using coherent noise did show that DDD on the square heisenberg and QCNN problems works well. Potentially QV as well, but I'm somewhat skeptical of those results due to there being very few instances to insert DD sequences.1

description image
depolarizing noise image
mixed depolarizing/dephasing (there's some issue with the QV circuits here so they did not run) image
depolarizing + coherent $R_z$ (general rule) image
depolarizing + coherent $R_z$ (repeated rule) image

Footnotes

  1. A potentially interesting question would be about how performant DDD is as a function of circuit sparsity. E.g. no places to insert might yield poor performance, but on the other side, when circuits are very sparse, perhaps DD sequences aren't meaningful either.

@bachase
Copy link
Collaborator

bachase commented Nov 21, 2025

Nice digging further @natestemen . A few questions on the results

  1. Given your comment on challenges in inserting DDD -- do you know how much the circuits were changed? Or what a good measure of that would be? Not a blocker, but given your comment on sparsity, maybe some performance can be quantitatively explained that way?
  2. What is the difference between repeated and general rule for DDD?
  3. Do you have an existing circuit and noise model, where DDD is known to show an improvement? Like this mitiq example but switch to the HH rule? I'm wondering if testing that circuit would help isolate performance that comes from compiling versus the circuit itself.

Copy link
Collaborator

@bachase bachase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these results are interesting, but I'm still torn on exactly what to check in and what a good milestone would be. If not too hard, testing this out on the known good case in the mitiq docs would be nice. If it shows improvements, then we have something to check in as a benchmark that will give useful results.

If it doesn't, then we can either dig into why, or make a call on what to check in.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for this PR, but it's worth a design discussion of how best to enable users to select varying noise models. My instinct is this would go the path of a target device, just one where there are no connectivity constraints. But it is worth discussing how you might want to compose different types of models, sweep parameters etc., and how much of that can live in the configuration file versus needs to be in code.

@natestemen
Copy link
Member Author

Given your comment on challenges in inserting DDD -- do you know how much the circuits were changed? Or what a good measure of that would be? Not a blocker, but given your comment on sparsity, maybe some performance can be quantitatively explained that way?

Hard to easily find that out, but that's a great mitiq feature request. unitaryfoundation/mitiq#2882

I hacked about a bit and was able to get this log, but inspecting it with the logs below isn't giving me much insight, unfortunately.

[ucc-ddd] DDD inserted 49 sequences (98 H gates) into circuit 'qaoa'
[ucc-ddd] DDD inserted 17 sequences (34 H gates) into circuit 'qv'
[ucc-ddd] DDD inserted 37 sequences (74 H gates) into circuit 'qft'
[ucc-ddd] DDD inserted 17 sequences (34 H gates) into circuit 'square_heisenberg'
[ucc-ddd] DDD inserted 13 sequences (26 H gates) into circuit 'ghz'
[ucc-ddd] DDD inserted 877 sequences (1754 H gates) into circuit 'prep_select'
[ucc-ddd] DDD inserted 13 sequences (26 H gates) into circuit 'qcnn'

What is the difference between repeated and general rule for DDD?

Ah sorry I glossed over that. A general rule only inserts the gate sequence once. So for our example that means inserting 2 $H$ gates even if more could be added. The repeated rule inserts as many as possible, filling the idle time.

Do you have an existing circuit and noise model, where DDD is known to show an improvement? Like this mitiq example but switch to the HH rule? I'm wondering if testing that circuit would help isolate performance that comes from compiling versus the circuit itself.

The latest noise model added here was the noise model from this mitiq example, and in the latest commit I added the GHZ circuit to the mix to see how the performance looks there. You can see below that the performance is quite bad compared to no DDD which hints that either both/or

  1. the noise model is subtly different than the one defined via cirq
  2. the $HH$ rule we are using here is fundamentally different than $XX$
Screenshot 2025-11-24 at 12 23 07 PM

I'm still torn on exactly what to check in and what a good milestone would be.

Same here. Maybe good to get some fresh eyes on it from @jordandsullivan for another opinion?

@jordandsullivan
Copy link
Contributor

the H H rule we are using here is fundamentally different than X X
Hmmmm, did you play around with different DDD rules? e.g. I used YY sequences in the example you linked above, and I did have to play around to find which sequence actually helped. @natestemen

@natestemen
Copy link
Member Author

I did, but the way the repo is set up is that everything has to be compiled to {"id", "rx", "ry", "rz", "h", "cx"}, so there aren't many options for DD sequences :/

@jordandsullivan
Copy link
Contributor

Ah that is a very interesting finding! The decomposition itself prevents the DDD from working. This will be a good thing to test again when #209 is merged.

@bachase
Copy link
Collaborator

bachase commented Nov 28, 2025

@natestemen -- Would it work to hack your branch to

  1. Disable the check
    validate_circuit_gates(compiled_circuit, DEFAULT_GATESET)
  2. Change the target basis here to include $X$ gates?

That should isolate

the $HH$ rule we are using here is fundamentally different than $XX$

from

the noise model is subtly different than the one defined via cirq

IMHO, finding out why this differs from the performance in the mitiq tutorial is more important than having a check-in ready PR.

@natestemen
Copy link
Member Author

finding out why this differs from the performance in the mitiq tutorial is more important than having a check-in ready PR.

I did some more digging and after disabling the gate-set check and switching to the $XX$ rule, I was still seeing degraded performance of UCC+DDD vs UCC on the GHZ example. This points to the major difference being the noise model. In cirq, noise is applied after every layer, independent of their being a gate in the layer, whereas in qiskit, noise is applied after every gate. My attempt to mimic this noise model in qiskit by adding identity gates at every moment has so far failed, but it's progress at understanding the difference.

My next step was to convert the mitiq DDD + ZNE example from cirq to qiskit as accurately as I (and an LLM) could. Sure enough I no longer see an improvement when using DDD with qiskit. This again points to the way in which noise is applied in qiskits noise models. I've asked on qiskit-aer if it's possible to build a more-similar noise model to what we have on qiskit (Qiskit/qiskit-aer#2392) in the meantime.

@bachase
Copy link
Collaborator

bachase commented Dec 4, 2025

@natestemen nice digging! As I read that other issue, this sounds relevant broadly for how we consider modeling noise in the benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add mitiq to simulation benchmarks

4 participants