2020

1 — 2001.00133

\caption{Additional \aastex\symbols}

2 — 2001.00401

\caption{Two-body contributions $N_2$ into the full normalization integral (equal to 1) for some coupling constants $\alpha$ and corresponding binding energies $B=2m-M$. The states Nos. 5 (normal), \textcolor{red}{6} and \textcolor{red}{7} (both abnormal) correspond to the solutions shown in Fig. \ref{fig1}. \label{tab1}}

\caption{Left, middle and right are the elastic form factors of the states No. 5 (normal), \textcolor{red}{6} and \textcolor{red}{7} (both abnormal) from the Table \protect{\ref{tab1}} (calculated correspondingly with left, middle and right $g(z)$ in Fig. \protect{\ref{fig1}}).}

\caption{Left is the transition form factor between the (normal) state No. 5 from the Table \protect{\ref{tab1}} and the first abnormal one No. \textcolor{red}{6}; middle is the transition form factor between the (normal) state No. 5 and the second abnormal one No. \textcolor{red}{7}; right is the transition form factor between the two abnormal states No. \textcolor{red}{6} and \textcolor{red}{7}. \label{fig3}}

3 — 2001.00558

\caption{The mean and the worst-case (WC) hyperspectral image reconstruction error in $\Delta E$ and MRAE under original, half and double exposure settings. Best results are shown in \textcolor{red}{red} and the second-best results are shown in \textcolor{blue}{blue}.}

4 — 2001.00733

\caption{An illustration of the connecting words (in \textcolor{blue}{blue}) for target \textit{love} and source \textit{lottery} (in \textcolor{red}{red}) by different part of speech (POS) tags. Plots (a), (b), and (c) respectively show adjectives, verbs, and nouns in the underlying word vector space. Numbers on the dotted lines represent the semantic distance (defined as $1 - cosine(\cdot,\cdot)$) between a pair of words. }

\caption{Examples of generated metaphors in decreasing order of the smoothness, properness, and novelty scores. Targets (in \textcolor{red}{red}), sources (in \textcolor{orange}{orange}), and connecting words (underlined) are highlighted.}

5 — 2001.00987

\caption{Single image results obtained on test images from the Make3D dataset. Each result contains the following four images (from left to right): original photograph, ground truth depth from the dataset, our inferred depth, and our synthesized anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.pdf} image. The depth images are shown in log scale. Darker pixels indicate nearby objects (black is roughly 1m away) and lighter pixels indicate objects farther away (white is roughly 80m away). Each pair of ground truth and inferred depths are displayed at the same scale. }

\caption{Video results obtained on test images for each building in our stereo-RGBD dataset (buildings 1-4, from left to right and top to bottom). For each result (from left to right): original photograph, ground truth depth, our inferred depth, ground truth anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.pdf} image, and our synthesized anaglyph image. Because the ground truth 3D images were recorded with a fixed interocular distance (roughly 5cm), we cannot control the amount of ``pop-out,'' and the 3D effect is subtle. However, this is a parameter we can set using our automatic approach to achieve a desired effect, which allows for an enhanced 3D experience. Note also that our algorithm can handle multiple moving objects (\emph{top}). %We captured data in four different buildings, with data from only one building used for training. Results in the left column are from the building used for training (obtained by holding each of the left sequences out of the training database), and the results on the right are from other buildings not used in training. % Each pair of inferred and ground truth depths are displayed in log space at the same scale. \vspace{2mm} }

\caption{Several clips from the feature film {\it Charade}. Each result contains (from top to bottom): the original frames, estimated depth, and estimated anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.pdf} automatically generated by our algorithm. Some imperfections in depth are conveniently masked in the 3D image due to textureless or less salient regions. %(trained again using only Building 1 data). %While not perfect, our depth maps are suitable for convincing 3D results. Depth errors in less salient regions do not affect the 3D result much. %Here, we show two clips from the movie {\it Charade}. Each result shows several frames from each clip and contains three rows (top to bottom): input frame, estimated depth, estimated anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.pdf} automatically generated by our algorithm (trained again using only Building 1 data). }

6 — 2001.01656

\caption{Illustration of audio-visual fusion methods for AVSR systems: (a) feature concatenation based fusion: acoustic and video features are concatenated before fed into the RecogNet; (b) visual modality driven gated fusion; (c) audio-visual modality driven gated fusion. "{\color{red}$\otimes$}" denotes Hadamard product. The dashed arrow denotes concatenating the gated hidden outputs with the visual features.}

7 — 2001.01744

\caption{We compare our method with state-of-the-art approaches, both traditional and learning-based, using two metrics. For each metric we report mean/median values over all of the objects reconstructed, and across multiple levels of noise. {\color{darkgreen}{Green}} and {\color{darkred}{Red}} indicate the best and second best method respectively.}

\caption[]{Mesh reconstructions for objects in pose ({\color{darkgreen}P}) and out of pose (\cP), under low ({\color{darkgreen}N}) and moderate (\cN) noise, and for objects in ({\color{darkgreen}T}) and out (\cT) of training. GT+PC ground truth mesh and the point cloud used by the various methods to estimate the mesh. Traditional methods introduce a noise vs smoothness trade-off (Laplacian low/high~\cite{laplacian}). State-of-the-art, deep-learning methods (AtlasNet~\cite{Groueix_2018_CVPR} and OccNet~\cite{occNet}) learn object-level priors, which causes them to fail on objects not seen in training (\cT), or even on objects that are \emph{just rotated} w.r.t. the training set (\cP). Our method learns local priors and forces global consistency with the point cloud.}

8 — 2001.02585

\caption{{\footnotesize An Exemplary Realization of DDP. Each node corresponds to a disease ICD-10 code} {\scriptsize (D64.9: Anemia, N18.9: Renal failure, I63.9: Cerebral infarction, E11.9: Diabetes, I70.209: Atherosclerosis, M81.0: Osteoporosis, and I25.10: Heart disease.)} {\footnotesize Edge weights are depicted via their thickness. The left panels show the evolution of the disease network, and the righmost panel shows the intensity functions of three selected nodes. Onset of \textcolor{blue}{heart disease} (at $t_1$) triggers spikes in the intensity functions of \textcolor{green}{diabetes} and \textcolor{red}{anemia}, making them more likely in the future. The onset of \textcolor{green}{diabetes} (at $t_2$) elevates the risk of \textcolor{red}{anemia} (i.e., thicker edge), which consequently occurs at $t_3$. Edge weights are modulated by a neural network over time.}}

9 — 2001.02619

\caption{ Overview of the flavor of declarative specification adopted in \dwa and the construction of \dba on top of it. Parts of this image has been reproduced with permission from \cite{muise2019planning}. {\bf Demo link}: \textcolor{blue}{\url{https://bit.ly/33RNMiR}}. }

10 — 2001.03188

\caption{$K$, $\Theta$ (indicated by dotted ovals), $K/\Theta$, and the herring\-bone lattice$H$; note that \red{$\otimes$} in the figure is not in $K$}

11 — 2001.03360

\caption{The leaderboard of the counting performance on the NWPU-Crowd test set. In the ranking strategy, the Overall MAE is the primary key. ``FS'' represents that the model is trained From Scratch, without any pre-trained model. $S0 \sim S4$ respectively indicates five categories according to the different number range: $0$, $(0,100]$, $(100,500]$, $(500,5000]$, and $\geq5000$. $L0 \sim L2$ respectively denotes three luminance levels on \emph{the test set}: $[0,0.25]$, $(0.25,0.5]$, and $(0.5,0.75]$. Limited by the paper length, only MAE are reported in the category-wise results. The speed and FLOPs are computed on the input size of $576 \times 768$. The {\color{red}{red}}, {\color{blue}{blue}} and {\color{green}{green}} color respectively represent the {\color{red}{first}}, {\color{blue}{second}} and {\color{green}{third}} place of the leaderboard.}

12 — 2001.03373

\caption{\label{fig:KE} VMI images obtained after filtering using the partial covariance method on the TOF peaks correspond to the fragments C$_2$H$_5^+$ (a) and C$_8$H$_{11}^+$ (b) (left part: raw data and right part: inverted data). Artefacts are coming from intense signal (helium and parent ion) that were not filtered out by covariance analysis (see SI for more details). (c) and (d) Ion kinetic energy distributions of the respective fragments obtained by angular integration of the inverted data avoiding the artefacts signal and energy calibrated using ion trajectory simulations (SIMION\textregistered \cite{Dahl2000}). (e) Kinetic energy release distribution (KERd) for the channel C$_2$H$_5^+$ / C$_8$H$_{11}^+$ obtained by convolution of the kinetic energy distributions of the two fragments.}

13 — 2001.03604

\caption{Results of quasi-static analysis for model~(\ref{Eq:NumericalResults:IdentifiedModel:Cms}) with input $u_k{=}70\sin(2\pi k)\,{\rm V}$. The hysteresis loop indicated with (\textcolor{black}{$\cdots$})~is a result of the interaction of (\textcolor{black}{---})~attracting ($\tilde{y}_{\rm L}^{\rm a}$, $\tilde{y}_{\rm U}^{\rm a}$) and (\textcolor{red}{-$\,\cdot\,$-})~repelling ($\tilde{y}_{\rm L}^{\rm r}$, $\tilde{y}_{\rm U}^{\rm r}$) sets. (\textcolor{magenta}{\chemarrow}) indicates the orientation of the hysteresis loop.}

\caption{Free-run simulation of model (\ref{Eq:NumericalResults:IdentifiedModel:Cms}). This figure is arranged in columns, which have: (a) sinusoidal input of voltage $u_k{=}40\sin(2\pi k)\,{\rm V}$ and in (b) the case where this input becomes constant during a loading ($\bullet$) and unloading (\textcolor{gray}{$\blacklozenge$}) regime with the final value of $16.8\,{\rm V}$, its temporal responses are shown in (c) and (d) while the hysteresis loops are in (e) and (f), respectively. (\textcolor{green}{---}) represents the measured data and (\textcolor{red}{- -}) is the estimated output of the model. The full records have $N=50000$ data points.}

\caption{Hysteresis compensation for the piezoelectric actuator (\ref{Eq:NumericalResults:System}). (a)~Compensation inputs, (b)~temporal responses and in (c)~hysteresis loops. \hbox{(\textcolor{red}{- -})}~results obtained with compensator (\ref{Eq:NumericalResults:InputCompensation:Cms}) (\textcolor{black}{$\cdots$})~results with compensator (\ref{Eq:NumericalResults:InputCompensation:Cci}), (\textcolor{green}{-$\,\cdot\,$-})~system output without compensation, and (\textcolor{blue}{---})~displacement reference $r=40\sin (2\pi t)\,\mu$m.}

\caption{MAPE index (\ref{Eq:MAPE}) computed for the models and compensators described, respectively, by equations (a)~(\ref{Eq:NumericalResults:IdentifiedModel:Cms}) and (\ref{Eq:NumericalResults:InputCompensation:Cms}); (b)~(\ref{Eq:NumericalResults:IdentifiedModel:Cci}) and (\ref{Eq:NumericalResults:InputCompensation:Cci}). (\textcolor{blue}{$\circ$})~model and (\textcolor{red}{$\bullet$})~tracking accuracies. (\textcolor{green}{$\blacktriangle$})~accuracy of uncompensated system.}

\caption{Left column refers to model (\ref{Eq:ExperimentalResults:IdentifiedModel:Cms}) and right column to model (\ref{Eq:ExperimentalResults:IdentifiedModel:Cci}). (a)~input $u_k{=}0.56\sin(0.2\pi k)+3\,{\rm V}$ and (c)~the corresponding measured output (\textcolor{green}{---}) $y$ and (\textcolor{red}{- -}) model (\ref{Eq:ExperimentalResults:IdentifiedModel:Cms}) free-run simulation; (b)~smoothed version of $y$ in (c); (d)~the corresponding output which is $u_k$ in (a) and (\textcolor{red}{- -}) model (\ref{Eq:ExperimentalResults:IdentifiedModel:Cci}) free-run simulation. (e) and (f) show the same data as (c) and (d), respectively.}

\caption{Hysteresis compensation for the pneumatic valve. (a)~Compensation inputs, (b) and (c)~its temporal responses and in (d) and (e)~ the hysteresis loops. \hbox{(\textcolor{red}{- -})}~illustrates the results obtained with compensator (\ref{Eq:ExperimentalResults:InputCompensation:Cms}), (\textcolor{black}{$\cdots$})~refers to the results by using compensator (\ref{Eq:ExperimentalResults:InputCompensation:Cci}), (\textcolor{green}{-$\,\cdot\,$-})~the system output without compensation, and (\textcolor{blue}{---})~the reference $r{=}0.56\sin (0.2\pi t){+}3\,{\rm V}$.}

14 — 2001.03615

\caption{\textbf{From regions to grids.} \textbf{Left}: We convert the original region feature extractor used by bottom-up attention~\cite{anderson2018bottom} back to the ResNet~\cite{he2016deep} grid feature extractor for the \emph{same} layer (see Sec.~\ref{sub_sec:same_layer}, weights in {\color{RoyalBlue} blue} are transferred), and find it works surprisingly well for VQA~\cite{goyal2017making}. \textbf{Right}: We build a detector based on 1{\x}1 \roi while keeping the output architecture \emph{fixed} for grid features (see Sec.~\ref{sub_sec:improved_grids}), and the resulting grid features consistently perform at-par with region features.}

\caption{\textbf{Main comparison}. `\butd' stands for region features as in bottom-up attention~\cite{anderson2018bottom}. `\ours' stands for grid features. All results reported on VQA 2.0 \valminusminival. We show that: \textbf{1)} by simply extracting grid features from the \emph{same} layer $C_5$ of the same model, the VQA accuracy is already much closer to bottom-up attention than ImageNet pre-trained ones (row 1,3 \& 5);\textbf{2)} 1{\x}1 \roi based detector pre-training improves the grid features accuracy while the region features get worse (row 1,2 \& 4). Last column is the gap compared to the original bottom-up features (underlined).}

15 — 2001.03667

\caption{DDCS for net ionization as a function of the electron emission angle in degrees for $E_{\rm{el}}=$ 200 eV separated into the five MO contributions. In the upper panels results are shown for C$^{6+}$ and in the lower ones for Si$^{13+}$ projectiles, both with initial velocity of 12.65 a.u.; with static (a), (c) and with time-dependent (b), (d) screening. (---) $1b_1$, ({\textcolor{red}{$--$}}) $3a_1$, ({\textcolor{green}{$\cdot-\cdot$}}) $1b_2$, ({\textcolor{red}{$\cdot\cdot-\cdot\cdot$}}) $2a_1$, ({\textcolor{orange}{$--\cdot--$}}) $1a_1$.\label{ddcs20mo6-13}}

\caption{SDCS for net ionization as a function of electron emission angle in degrees for Si$^{13+}$, with static (a) and with (b) time-dependent screening at 112 MeV. (---) Net, ({\textcolor{red}{$\cdot\cdot\cdot$}}) $\frac{{\rm{d}} \sigma_1}{{\rm{d}} \Omega_{\rm{el}}}$, ({\textcolor{green}{$--$}}) $2\frac{{\rm{d}} \sigma_2}{{\rm{d}} \Omega_{\rm{el}}}$, ({\textcolor{red}{$\cdot\cdot-\cdot\cdot$}}) $3\frac{{\rm{d}} \sigma_3}{{\rm{d}} \Omega_{\rm{el}}}$, ({\textcolor{orange}{$\cdot-\cdot$}}) $4\frac{{\rm{d}} \sigma_4}{{\rm{d}} \Omega_{\rm{el}}}$, ({\textcolor{brown}{$--\cdot--$}}) $5\frac{{\rm{d}} \sigma_5}{{\rm{d}} \Omega_{\rm{el}}}$, ($\times$) $\sum\limits_{q=1}^5q\frac{{\rm{d}} \sigma_q}{{\rm{d}} \Omega_{\rm{el}}}$.\label{sdcstq}}

\caption{SDCS for net ionization as a function of electron energy for Si$^{13+}$, with static (a) and with time-dependent (b) screening at 112 MeV. (---) Net, ({\textcolor{red}{$\cdot\cdot\cdot$}}) $\frac{{\rm{d}} \sigma_1}{{\rm{d}} E}$, ({\textcolor{green}{$--$}}) $2\frac{{\rm{d}} \sigma_2}{{\rm{d}} E}$, ({\textcolor{red}{$\cdot\cdot-\cdot\cdot$}}) $3\frac{{\rm{d}} \sigma_3}{{\rm{d}} E}$, ({\textcolor{orange}{$\cdot-\cdot$}}) $4\frac{{\rm{d}} \sigma_4}{{\rm{d}} E}$, ({\textcolor{brown}{$--\cdot--$}}) $5\frac{{\rm{d}} \sigma_5}{{\rm{d}} E}$, ($\times$) $\sum\limits_{q=1}^5q\frac{{\rm{d}} \sigma_q}{{\rm{d}} E}$.\label{sdcseq}}

16 — 2001.03743

\caption{Protocols in \textsf{SGX-SE2}. The new instructions of \textsf{SGX-SE2} is in \textcolor{blue}{blue}}

17 — 2001.03799

\caption{Quantitative comparison of T2 reconstructions from different undersampling patterns and methods at R = 5. Best results with and without T1 prior are marked in \textcolor{red}{red} and \textcolor{blue}{blue}, respectively.}

18 — 2001.03960

\caption{\textbf{Qualitative results of Attention Flow} Bounding boxes on sample test images {\color{green} (green)} show the ground truth attentional focus. The second and third columns are our estimated Attention Flow. In the third column, we also depicted the estimated bounding boxes {\color{cyan} (cyan)}. Figure best viewed in color. }

19 — 2001.03991

\caption{(color online) Motor's mean square displacement $<r^{2}(t)>$ (a) At a temperature for which the medium without stimuli is solid T=30K, (b) the medium is a viscous liquid T=50K. In each Figure, from bottom to top the curves correspond to: The motor off and no field (dark green curve); the motor on: no field ($F.d$=0), time symmetric motor (red curve), then asymmetric (dark red curve); $F.d=6.6$ $10^{2} pN.$\AA\: symmetric (blue curve), then asymmetric (dark blue curve) these two curves superimpose almost perfectly for both temperatures;$F.d= 6.6$ $10^{3} pN.$\AA\time symmetric (gray curve) then asymmetric (black curve).}

20 — 2001.04015

\caption{(a) Examples of the stochastic time evolution of the angle $\varphi(t)$ obtained from numerical simulations of fractional Brownian motion for different values of the Hurst exponent $H$. From bottom to top: $H = 0.1, 0.3, 0.5, 0.7, 0.9$. (b) Corresponding mean-squared angular displacements $\left\langle \Delta\varphi(t)^{2}\right\rangle$. The symbols correspond to the numerical results obtained from the simulated trajectories in (a) whereas the solid lines are computed from Equation~(\ref{msdfGmotion}). The resulting 2D trajectories of the particle position $(x,y)$ are plotted for (c) $H = 0.1$, (d) $H=0.3$, (e) $H = 0.5$, (f) $H = 0.7$, and (g) $H = 0.9$. All trajectories start at $[x(0)=0, y(0) = 0]$. (h) Expanded view of the active trajectory in (b), showing the loops resulting from the superdiffusive angular motion for $H = 0.9$. The arrows represent the instantaneous orientation $\hat{\boldsymbol{v}}(t) = [\cos \varphi(t), \sin \varphi(t)]$ at different times $t$.} \label{fig:trajectories} \end{figure*} \subsection{Numerical analysis} In order to gain insight into the statistics of active particles subject to fractional rotational Brownian motion, we have simulated stochastic trajectories evolving according to Eqs.~(\ref{LangevinPosition})-(\ref{LangevinDirection}) for different values of $H$. To this end, fractional Brownian motion with autocorrelation function given by (\ref{fGmotion}), is generated using the circulant embedding method of the covariance matrix \cite{dietrich1997}, whereas the 2D particle position is solved by means of an Euler-Cromer scheme. In the numerical results presented throughout the paper, velocities, timescales, length-scales, and translational diffusion coefficients are normalized by $v_0$, $\tau_{H}\equiv D_H^{-\frac{1}{2H}}$, $\ell_{H}\equiv v_0 D_H^{-\frac{1}{2H}}$, and $\mathcal{D}_{H}\equiv v_0^2 D_H^{-\frac{1}{2H}}$, respectively. In particular, $\tau_{1/2} \equiv D_{1/2}^{-1}$ and $\ell_{1/2} \equiv v_0 D_{1/2}^{-1}$ correspond to the rotational diffusion time and the swimming-persistence length for active Brownian motion driven by rotational diffusion ($H = \frac{1}{2}$), respectively. In Fig.~\ref{fig:trajectories}(a), we show some examples of the temporal evolution of the angle $\varphi(t)$, in the extended domain $(-\infty,\infty)$, for different values of the Hurst exponent over the time interval $0 \le t \le 10^{4}\, D_H^{-\frac{1}{2H}}$. The corresponding mean-squared angular displacements are plotted in Fig.~\ref{fig:trajectories}(b), thus showing agreement with the expression given in Eq.~(\ref{msdfGmotion}). 2D trajectories of the active particle position $[x(t),y(t)]$, resulting from the fractional rotational Brownian motion, are shown in Figs.~\ref{fig:trajectories}(c)-(h). For the sake of simplicity and in order to better appreciate the separate effect of fractional rotational Brownian noise on the 2D active motion, here we focus on the case without translational fluctuations, i.e., $D_{T}=0$. We want to clarify this is not an approximation, but rather a consequence of the separate dynamics given by Eqs.~(\ref{LangevinPosition})-(\ref{LangevinDirection}), as is shown in the next Section \ref{Sect:FP}. For $0<H<\frac{1}{2}$, the anti-persistence of the stochastic rotational dynamics leads to a highly persistent translational motion, as seen in Figs.~\ref{fig:trajectories}(c) and \ref{fig:trajectories}(d) for $H=0.1$ and $0.3$ respectively. This translates into actual persistence lengths much larger than the one expected for $\delta$-correlated rotational diffusion, $\ell_{1/2}$. This effect vanishes for $H=\frac{1}{2}$, for which the rotational dynamics results in diffusive translational motion with an effective active diffusion coefficient $D_{1/2}^{\mathrm{eff}} = \frac{1}{2} \mathcal{D}_{1/2} = \frac{1}{2}v_0^2 D_{1/2}^{-1}$. Then, for observation times much larger than $\tau_{1/2}=D_{1/2}^{-1}$, the swimming persistence is lost and the particle performs an effective memoryless random walk, as is seen in Fig.~\ref{fig:trajectories}(e). Similarly, for $\frac{1}{2}<H<1$, an \emph{active random walk} also emerges at timescales much larger than $D_H^{-\frac{1}{2H}}$, as shown in Figs~\ref{fig:trajectories}(f) and~\ref{fig:trajectories}(g) for $H = 0.7$ and $H=0.9$, respectively. However, a close inspection of the active trajectories reveals that the short-time motion is qualitatively distinct from the active Brownian motion for $H = 0.5$: looped trajectories are formed as $H$ increases and become more conspicuous as $H$ approaches the value 1, see Fig.~\ref{fig:trajectories}(g) for $H = 0.9$. We point out the different nature of such looped trajectories from the ones developed by chiral self-propelled particles driven by a constant torque~\cite{loewen2016}, for which the sense of rotation remains fixed over time. Instead, the trajectories obtained in this paper resemble the stochastic circular orbits performed by active colloids moving in viscoelastic fluids~\cite{narinder2018} and the meandering and chaotic motion predicted for self-phoretic particles at large P\'eclet numbers~\cite{hu2019}. \section{Statistics of active motion}\label{Sect:FP} \begin{figure*} \includegraphics[width=\textwidth]{Fig2.eps} \caption{Profiles of the probability density function of the particle orientation angle $\phi$ for different values of $H$: (a) $H = 0.1$ (subdiffusive regime), (b) $H = 0.5$ (diffusive regime), (c) $H = 0.9$ (superdiffusive regime), at the times: $t = 0.01$ (thick solid line), $t = 0.1$ (dashed line), $t = 1$ (dotted line), $t = 10$ (dotted-dashed line) and $t = 100$ (thin solid line). The insets in (b) and (a) are expanded views of the main plots for $P(\phi,t) \le 0.3$.}\label{fig:pdfangle} \end{figure*} \subsection{Fokker-Planck equation} The Fokker-Planck equation for the one-particle probability density $p({\boldsymbol{x}},\varphi ;t)\equiv \langle \delta[{\boldsymbol{x}% }-{\boldsymbol{x}}(t)]\delta[\varphi -\varphi (t)]\rangle$ that corresponds to the Langevin equations (\ref{modelo}), can be derived by standard methods, see for instance \cite{sevilla2014,sevilla2015}. By taking the derivative with respect to time $t$, we get \begin{eqnarray}\label{eq:derivFP} \frac{\partial}{\partial t} p({\boldsymbol{x}},\varphi;t) & = & - v_0 \hat{\boldsymbol{v}} \cdot \nabla p({\boldsymbol{x}},\varphi;t) \nonumber\\ & & - \nabla \cdot \langle \boldsymbol{ \xi}_{T}(t) \delta[\boldsymbol{x} - \boldsymbol{x}(t)] \delta[\varphi - \varphi(t)]\rangle \nonumber\\ & & - \frac{\partial}{\partial \varphi} \langle \xi_R(t) \delta[\boldsymbol{x} - \boldsymbol{x}(t)] \delta[\varphi - \varphi(t)]\rangle, \end{eqnarray} where $\nabla = \left(\frac{\partial}{\partial x}, \frac{\partial}{\partial y} \right)$ and the brackets stand for an average over realizations of both the translational and rotational noises, $\boldsymbol{\xi}_{T}(t)$ and $\xi_R(t)$, respectively. The second and the third terms on the right-hand side of Eq.~(\ref{eq:derivFP}), namely, the mean value of the product of the functional $\delta[\boldsymbol{x} - \boldsymbol{x}(t)] \delta[\varphi - \varphi(t)]$ with the Gaussian noises $\boldsymbol{ \xi}_{T}(t)$ and $\xi_R(t)$, respectively, can be evaluated by applying the Furutsu-Novikov theorem~\cite{furutsu1963,novikov1965}. A straightforward calculation leads to the Fokker-Planck equation for the Langevin model (\ref{modelo}) \begin{equation}\label{eq:FPfGn} \frac{\partial }{\partial t}p({\boldsymbol{x}},\varphi ;t)+v_{0}\hat{ \boldsymbol{v}} \cdot \nabla p({\boldsymbol{x}},\varphi;t)= \\ D_{T}\nabla^{2}p({\boldsymbol{x}},\varphi ;t) +\Omega(t)\frac{\partial^{2}}{\partial\varphi^{2}}p({\boldsymbol{x}},\varphi;t), \end{equation} where $\Omega(t)=\int_{0}^{t}ds\, \omega(s)$ for an arbitrary stationary correlation function $\omega(t)$ of the rotational noise $\xi_R(t)$. In the case of fractional Gaussian noise, the correlation function given in Eq.~(\ref{fGnoise}) leads to \begin{equation}\label{eq:intcorrfGnoise} \Omega(t)=2HD_{H}t^{2H-1}. \end{equation} Note that the parameter $\Omega(t)$ given by Eq.~(\ref{eq:intcorrfGnoise}) plays the role of a time-dependent rotational diffusion coefficient in the Fokker-Planck equation (\ref{eq:FPfGn}), where the Brownian rotational diffusion coefficient, $\Omega(t) = D_{1/2}$, is recovered for $H = \frac{1}{2}$. We point out here that the net diffusion of a freely active particle described by Eq. (\ref{eq:FPfGn}) can be split into the free-self-induced diffusion by orientational fluctuations, and the free-induced diffusion by thermal fluctuations. By writing \begin{equation} p(\boldsymbol{x},\varphi,t)=\int d^{2}x^{\prime}G(\boldsymbol{x}-\boldsymbol{x}^{\prime};t)p_{a}(\boldsymbol{x}^{\prime},\varphi,t), \end{equation} where $G(\boldsymbol{x};t)$ is the bivariate Gaussian distribution that solves the diffusion equation $\partial_{t}G(\boldsymbol{x};t)=D_{T}\nabla^{2}G(\boldsymbol{x};t),$ it can be shown that the diffusion induced by the orientational fluctuations is described by \begin{equation} \frac{\partial }{\partial t}p_{a}({\boldsymbol{x}},\varphi ;t)+v_{0}\hat{ \boldsymbol{v}} \cdot \nabla p_{a}({\boldsymbol{x}},\varphi;t)=\Omega(t)\frac{\partial^{2}}{\partial\varphi^{2}}p_{a}({\boldsymbol{x}},\varphi;t), \label{eq:ActiveFPfGn} \end{equation} where $p_{a}(\boldsymbol{x},\varphi;t)$ gives the probability density of finding a particle at $\boldsymbol{x}$, moving in the direction $\varphi$ at time $t$, due to self-propulsion only. Notice that~(\ref{eq:ActiveFPfGn}) can be obtained from Eq.~(\ref{eq:FPfGn}) by simply putting $D_{T}=0$. \subsection{Angular probability density function}\label{subsect:angPDF} By integrating Eq.~(\ref{eq:FPfGn}) with respect to $\boldsymbol{x}$ over the entire two-dimensional spatial domain, we find the Fokker-Planck equation for the probability density of the angle $\varphi$ at time $t$, i.e., for $P(\varphi,t) = \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} dx\,dy\, p(\boldsymbol{x},\varphi;t)$ \begin{equation}\label{eq:FPangle} \frac{\partial}{\partial t}P(\varphi,t) = \Omega(t)\frac{\partial^2}{\partial \varphi^2}P(\varphi,t). \end{equation} This corresponds to the diffusion equation with time-dependent rotational diffusion coefficient $\Omega(t)$. The periodicity of $P(\varphi,t)$ with respect to the variable $\varphi$ is imposed by requiring that $P(\varphi,t)=P(\varphi+2\pi,t)$. The domain of description for the particle orientation is restricted to the interval $[0,2\pi)$ (or sometimes $(-\pi,\pi]$) if we introduce the new angle $\phi = \mathrm{mod}(\varphi,2\pi)$. Then, the probability density that a single active particle transits from moving along the direction $\phi^{\prime}$ at time $t^{\prime}$ to move along the direction $\varphi$ in the time interval $t-t^{\prime}$, $\mathcal{P}(\phi,t-t^{\prime}\vert\phi^{\prime})$, is given by the solution of Eq.~(\ref{eq:FPangle}) for time $t \ge t^{\prime}$. Given the initial condition $\mathcal{P}(\phi, 0\vert\phi^{\prime}) = \delta(\phi-\phi^{\prime})$, such a solution is \begin{eqnarray}\label{eq:solFPangle} \mathcal{P}(\phi,t-t^{\prime}\vert\phi^{\prime})&=&\frac{1}{2\pi} \sum_{n=-\infty}^{\infty} e^{in(\phi-\phi^{\prime})}e^{- n^2 \overline{\Omega}(t-t^{\prime})},\\ & = & \frac{1}{2\pi} \left(1+2\sum_{n=1}^{\infty} \cos[n(\phi-\phi^{\prime})] e^{-n^2 \overline{\Omega}(t-t^{\prime})} \right),\nonumber \end{eqnarray} where $\overline{\Omega}(t-t^{\prime})=\int_{0}^{t-t^{\prime}} ds\, \Omega(s)$ for an arbitrary autocorrelation function of the rotational noise $\xi_R(t)$. Eq.~(\ref{eq:solFPangle}) can be written in terms of the Jacobi theta function $\vartheta_{3}(z,q)=\sum_{n=-\infty}^{\infty}q^{n^{2}}e^{2inz}=1+2\sum_{n=1}^{\infty}q^{n^{2}}\cos2nz$ \cite{abramowitzBook}, namely \begin{equation} \mathcal{P}\left(\phi,t\vert\phi^{\prime}\right)= \frac{1}{2\pi}\vartheta_{3}\left(\frac{\phi-\phi^{\prime}}{2},e^{-\overline{\Omega}(t)}\right). \end{equation} By use of the Poisson summation formula \cite{guinand1941}, $\mathcal{P}\left(\phi,t\vert\phi^{\prime}\right)$ can be rewritten as \begin{equation} \mathcal{P}\left(\phi,t\vert\phi^{\prime}\right)=\sqrt{\frac{1}{4\pi \overline{\Omega}(t)}}\exp\left[-\frac{(\phi-\phi^{\prime})^{2}}{4\overline{\Omega}(t)}\right] \vartheta_{3}\left(\frac{\pi(\phi-\phi^{\prime})}{2 i\overline{\Omega}(t)},e^{-\frac{\pi^2}{\overline{\Omega}(t)}}\right), \end{equation} from which the Gaussian distribution appears explicitly as a factor. Notice that, in a short time interval $t$, $\mathcal{P}(\phi,t\vert\phi^{\prime})$ peaks sharply around $\phi^{\prime}$, meaning that the transition from the direction of motion $\phi^{\prime}$ to the new one $\phi$, occurs more frequently in the forward direcion, i.e., around the direction of motion $\phi^{\prime}$. As the duration $t$ of the time interval of the transition becomes larger, the peak is smoothed out, thus converging to a uniform transition distribution in the asymptotic limit $t\rightarrow\infty$. The mean value of quantities of the form $f(\phi-\phi^{\prime})$, defined by \begin{equation}\label{eq:meantrans} \langle f(\phi-\phi^{\prime})\rangle_{t}=\int_{0}^{2\pi}d\phi\int_{0}^{2\pi}d\phi^{\prime}f(\phi-\phi^{\prime})\mathcal{P}(\phi,t\vert\phi^{\prime})P(\phi^{\prime},0), \end{equation} is of special interest. In particular, it can be shown from (\ref{eq:solFPangle}) and (\ref{eq:meantrans}) that \begin{equation} \left\langle e^{in(\phi-\phi^{\prime})}\right\rangle_{t}=e^{-n^{2}\overline{\Omega}(t)} \end{equation} or equivalently \numparts \begin{eqnarray} \left\langle \cos n\left[\phi-\phi^{\prime}\right]\right\rangle_{t}&=e^{-n^{2}\overline{\Omega}(t)},\\ \left\langle \sin n\left[\phi-\phi^{\prime}\right]\right\rangle_{t}&=0, \end{eqnarray} \endnumparts which give the contributions to the moment expansion of $\mathcal{P}(\phi,t\vert\phi^{\prime})$ when this is written as \begin{equation} \mathcal{P}(\phi,t\vert\phi^{\prime})=\frac{1}{2\pi}\sum_{n=-\infty}^{\infty}e^{in(\phi-\phi^{\prime})}\left\langle e^{in(\phi-\phi^{\prime})}\right\rangle_{t}. \end{equation} The probability density $P(\phi,t)$, independent of the initial angle $\phi^{\prime}$, is obtained from $\mathcal{P}(\phi,t\vert\phi^{\prime})$ as \begin{equation} P(\phi,t)=\int_{0}^{2\pi}d\phi^{\prime}\mathcal{P}(\phi,t\vert\phi^{\prime})P(\phi^{\prime},0) \end{equation} where $P(\phi^{\prime},0)$ denotes the initial distribution of the particle direction of motion. In the case of fractional Gaussian noise with autocorrelation given by Eq.~(\ref{fGnoise}), Eq.~(\ref{eq:intcorrfGnoise}) leads to \begin{equation}\label{eq:intintcorrfGnoise} \overline{\Omega}(t) = D_Ht^{2H}, \end{equation} which corresponds to half the variance of the fluctuations of $\varphi$, see Eq.~(\ref{msdfGmotion}), thus yielding \begin{eqnarray}\label{eq:angdistr} P(\phi,t) & = & \frac {1}{2\pi} \vartheta_3 \left(\frac{\phi}{2},e^{-D_Ht^{2H}}\right),\nonumber\\ & = & \frac{e^{ -\frac{\phi^2}{4D_H t^{2H}}}}{\sqrt{4\pi D_H t^{2H}}}\vartheta_3 \left( \frac{\pi \phi}{2iD_H t^{2H}}, e^{-\frac{\pi^2}{D_Ht^{2H}}} \right), \end{eqnarray} for the initial angular distribution $P(\phi^{\prime},0)=\delta(\phi^{\prime})$. The angular probability density given by Eq.~(\ref{eq:angdistr}) retains a Gaussian-like shape at sufficiently short time-scales $\bigl(D_H^{\frac{1}{2H}}t \ll 1\bigr)$, then spreads over the entire interval $0 \le \phi < 2\pi$ as $t$ increases and converges to the uniform distribution $P(\phi,t) = \frac{1}{2\pi}$ as $D_H^{\frac{1}{2H}}t \rightarrow \infty$ for all $0 < H < 1$. However, depending on the specific value of $H$, different profiles of $P(\phi,t)$ are observed at a given time $t > 0$, for the same initial condition $P(\phi,0) = \delta(\phi)$. This is illustrated in Figs.~\ref{fig:pdfangle}(a),~\ref{fig:pdfangle}(b) and~\ref{fig:pdfangle}(c) where we plot the angular density $P(\phi,t)$ for different values of the Hurst exponent, $H = 0.1$ (antipersistent orientational dynamics), 0.5 (Brownian orientational dynamics), and 0.9 (persistent orientational dynamics), respectively, at different times $D_H^{\frac{1}{2H}}t = 0.01,0.1,1,10,100$. For antipersistent rotational noise ($H < \frac{1}{2}$), $P(\phi,t)$ broadens quickly over the full angular domain $[0,2\pi)$ during $0 < t < D_H^{-\frac{1}{2H}}$ and markedly peaks around $\phi=0$ [Fig.~\ref{fig:pdfangle}(a)]. This indicates that the particle moves more frequently in the forward direction even when the probability density is finite for any change of the orientation (\emph{rectification of motion}), thereby causing a highly correlated motion. In contrast, $P(\phi,t)$ converges very slowly to the uniform angular density $(2\pi)^{-1}$ for time intervals $t > D_H^{-\frac{1}{2H}}$, thus retaining a smooth peak at $\phi=0$ [see Fig.~\ref{fig:pdfangle}(a) for $H = 0.1$], thus leading to a strong persistence of translational motion, as shown in Fig.~\ref{fig:trajectories}(c). The opposite trend is observed for persistent fractional noise ($H > \frac{1}{2}$). For instance, in Fig.~\ref{fig:pdfangle}(c) we show that, for $H = 0.9$, the initial delta peak $\delta(\phi)$ at $t = 0$ broadens rather slowly during $0 < t < D_H^{-\frac{1}{2H}}$. As a result, the particle retains its direction of motion, whose persistence causes the translational looped trajectories shown in Fig. \ref{fig:trajectories}(g)-(h). On the other hand, a very fast convergence to the steady-state uniform value $ (2\pi)^{-1}$ occurs for time intervals $t > D_H^{-\frac{1}{2H}}$. For such a large value of $H$, the typical timescale needed to observe such a convergence is $t \sim 10 D_H^{-\frac{1}{2H}}$, see dotted-dashed line in Fig.~\ref{fig:pdfangle}(c). Only for the specific time interval $t = D_H^{-\frac{1}{2H}}$, the angular density profile is the same for all values of the Hurst exponent, and is given by $P\left(\phi,D_H^{-\frac{1}{2H}}\right)= \frac{1}{2\pi} \vartheta_3 \left(\frac{\phi}{2},e^{-1} \right)$, see dotted lines in Figs.~\ref{fig:pdfangle}(a)-(c). We point out that the convergence of $P(\phi,t)$ to a uniform angular distribution suggests that an active particle with fractional rotational Brownian motion must exhibit active diffusion at sufficiently long timescales, for both persistent and antipersistent rotational noise, as explicitly shown in Section~\ref{Sect:AD}. \section{Active diffusion}\label{Sect:AD} \subsection{Velocity autocorrelation function} \begin{figure}[t] \includegraphics[width=0.8\textwidth]{Fig3.eps} \caption{(a) Velocity autocorrelation function computed from Equation (\ref{eq:velcorr}), for different values of $H$: $0.1$ (dashed line), $0.3$ (dotted line), $0.5$ (thick solid line), $0.7$ (dotted-dashed line), and $0.9$ (thin solid line). Inset: expanded view for $0 \le t \le 1$. (b) Velocity autocorrelation function computed from simulated active trajectories with fractional rotational Brownian motion for the same values of $H$ as in (a), plotted with same line style.}\label{fig:velautocorr} \end{figure} We now compute the autocorrelation function of the swimming velocity, i.e., $\langle \boldsymbol{v}_{s}(s) \cdot \boldsymbol{v}_{s}(s') \rangle = v_0^2 \langle \hat{\boldsymbol{v}}(s) \cdot \hat{\boldsymbol{v}}(s')\rangle$, where the orientational correlation function can be expressed in terms of the angular coordinate $\phi$ as \begin{equation}\label{eq:orientcorr} \langle \hat{\boldsymbol{v}}(s) \cdot \hat{\boldsymbol{v}}(s')\rangle = \left\langle\cos[ \phi(s) - \phi(s')] \right\rangle, \end{equation} which is equivalent to $\langle \cos [\phi - \phi'] \rangle_{s-s^{\prime}}$ for $s\ge s^{\prime}$. Therefore, Eq.~(\ref{eq:orientcorr}) can be explicitly computed by means of %\begin{widetext} \begin{equation}\label{eq:angcorr} \langle \hat{\boldsymbol{v}}(s) \cdot \hat{\boldsymbol{v}}(s')\rangle = \int_0^{2\pi} \int_0^{2\pi} d\phi \, d\phi'\,\cos(\phi - \phi')\mathcal{P}(\phi,s-s'|\phi')P(\phi',s'), \end{equation} %\end{widetext} where $\mathcal{P}(\phi,s-s'|\phi')$ is the transition probability density from $\phi'$ at time $s'$ to $\phi$ at time $s$, as was introduced in Sect. \ref{subsect:angPDF}, whereas $P(\phi',s')$ is the angular probability density at time $s' \ge 0$ given by (\ref{eq:angdistr}). For $s \ge s' \gg D_H^{-\frac{1}{2H}}$, $\langle \hat{\boldsymbol{v}}(s) \cdot \hat{\boldsymbol{v}}(s')\rangle$ becomes stationary, where $P(\phi',s') \rightarrow (2\pi)^{-1}$, while $\mathcal{P}(\phi,s-s'|\phi') = P(\phi-\phi',s-s')$. Using the expressions given in Eqs.~(\ref{eq:solFPangle}), (\ref{eq:intintcorrfGnoise}) and~(\ref{eq:angcorr}), we find that the velocity autocorrelation function is given explicitly by \begin{equation}\label{eq:velcorr} \langle \boldsymbol{v}_{s}(s) \cdot \boldsymbol{v}_{s}(s') \rangle = v_0^2 \exp \left[-D_H (s-s')^{2H} \right]. \end{equation} Eq.~(\ref{eq:velcorr}) corresponds to: a \emph{stretched exponential} when $0 < H < \frac{1}{2}$ describing a highly correlated motion; a \emph{pure exponential} if $H = \frac{1}{2}$ describing Brownian correlations of the direction of motion; and a \emph{compressed exponential} when $\frac{1}{2} < H < 1$ that describes short-ranged correlations of the direction of motion. In Fig.~\ref{fig:velautocorr}(a) we plot the autocorrelation function of the swimming velocity given by Eq.~(\ref{eq:velcorr}), $\langle \boldsymbol{v}_{s}(t) \cdot \boldsymbol{v}_{s}(0) \rangle = \langle \boldsymbol{v}_{s}(s'+t) \cdot \boldsymbol{v}_{s}(s') \rangle$, as a function of the time lag $t = s - s'$ for different values of $H$. We check that they perfectly agree with the numerical results shown in Fig.~\ref{fig:velautocorr}(b). In addition, we find that, regardless of $H$, the velocity autocorrelation attains the value $v_0^2e^{-1}$ at $t = D_H^{-\frac{1}{2H}}$. Nevertheless, for other values of $t$, different regimes are observed depending on $H$. For instance, for $0 < H < \frac{1}{2}$, $\langle \boldsymbol{v}_{s}(t) \cdot \boldsymbol{v}_{s}(0) \rangle$ decays sharply from $v_0^2$ to $v_0^2 e^{-1}$ for $0 \le t < D_H^{-\frac{1}{2H}}$, as highlighted in the insets of Figs.~\ref{fig:velautocorr}(a) and \ref{fig:velautocorr}(b), followed by a very slow decrease for $t \ge D_H^{-\frac{1}{2H}}$. On the other hand, for $H=\frac{1}{2}$ we find that a purely exponential decay is recovered, i.e., $\langle \boldsymbol{v}_{s}(t) \cdot \boldsymbol{v}_{s}(0) \rangle = v_0^2 \exp(-D_{1/2} t)$, for which the particle orientation is driven by Gaussian white noise with rotational diffusion coefficient $D_{1/2}$ and decorrelation time set by $\tau_{1/2} = D_{1/2}^{-1}$. Finally, for $\frac{1}{2} < H < 1$ (persistent rotational noise) the velocity autocorrelation function decreases more slowly in time for $0 \le t < D_H^{-\frac{1}{2H}}$, whereas it quickly goes to 0 for $t \ge D_H^{-\frac{1}{2H}}$. In particular, as $H \rightarrow 1$, the velocity autocorrelation approaches a Gaussian decay, i.e., $v_0^2\exp(-D_1 t^2)$. Consequently, for $\frac{1}{2} \le H < 1$, the typical decorrelation time of the swimming velocity is $\lesssim D_H^{-\frac{1}{2H}}$, whereas for $0 < H < \frac{1}{2}$, long-range temporal correlations of the particle orientation lead to a rather high persistence of the swimming velocity. \begin{figure*} \includegraphics[width=0.9\textwidth]{Fig4.eps} \caption{(a) Translational mean-squared displacement given by Eq. (\ref{eq:msdtrans}), for different values of the Hurst exponent: $H = 0.1$ (thick solid line), $H = 0.3$ (dashed line), $H = 0.5$ (dotted line), $H = 0.7$ (dashed-dotted line), and $H = 0.9$ (thin solid line). The top-left and bottom-right insets show a linear-linear representation of the main plot at short and long time-scales, respectively. (b) Translational mean square displacements obtained from simulated trajectories. Same color code and same line style as in Fig. \ref{fig:MSD2dtrans}(a). (c) Long-time behavior of the translational mean square displacements for an active particle driven by antipersistent rotational fractional noise. From top to botton; $H = 0.1, 0.125, 0.15, 0.2, 0.25$. The colored symbols correspond to the curves obtained from numerical simulations, whereas the black solid lines represent the analytical expression given by Eq.~(\ref{eq:msdtrans}). The triangles ($\triangleright$) depict the corresponding location of the effective rotational time, $\tau_H^{\mathrm{eff}}$, above which active diffusion emerges. Inset: translational mean-squared displacement for $H=0.1$. The triangles ($\triangleleft$) and ($\triangleright$) indicate the location of the persistence time $\tau_H$ and the effective rotational time $\tau_H^{\mathrm{eff}}$, respectively, which define the time interval $[\tau_H,\tau_H^{\mathrm{eff}}]$ in which anomalous diffusion is observed. (d) Active diffusion coefficient (solid line, left axis) and effective rotational diffusion time (dashed line, right axis), as a function of $H$. The horizontal solid and dashed lines represent the limit values as $H \rightarrow 1$, $D^{\mathrm{eff}}_H = \frac{\sqrt{\pi}}{4}$ and $\tau_H^{\mathrm{eff}} = \frac{1}{\sqrt{\pi}}$, respectively, whereas the squares are numerical values of $D_H^{\mathrm{eff}}$ computed from simulations. The vertical dotted line separates the two distinct regimes of active motion: I) highly-correlated swimming velocity, and II) fast decay of the swimming-velocity autocorrelation function.} \label{fig:MSD2dtrans} \end{figure*} \subsection{Mean-squared displacement} The translational mean-squared displacement of the particle position, $\boldsymbol{x} = (x,y)$, can determined by employing the relation \numparts \begin{eqnarray}%\label{eq:transmsd2D} \langle |{\boldsymbol{x}}(t)|^2 \rangle & = & \int_0^t \int_0^t ds \, ds'\left\langle\frac{d}{ds}\boldsymbol{x}(s) \cdot \frac{d}{ds'}\boldsymbol{x}(s') \right\rangle, \label{eq:msd2Dtrans} \label{eq:transmsd2D} \\ & = & \int_0^t \int_0^t ds \, ds'\langle \boldsymbol{\xi}_{T}(s) \cdot \boldsymbol{\xi}_{T}(s') + \int_0^t \int_0^t ds \, ds'\langle \boldsymbol{v}_{s}(s) \cdot \boldsymbol{v}_{s}(s') \rangle. \label{eq:transmsd2Dcomponents} \end{eqnarray} \endnumparts The first term on the right hand-side of Eq.~(\ref{eq:transmsd2Dcomponents}), which will be denoted by $\langle |{\boldsymbol{x}}(t)|^2 \rangle_p$, represents the \emph{passive} component of the mean-squared displacement due to translational velocity fluctuations, $\boldsymbol{\xi}_T(t)$. Since we assume that $\boldsymbol{\xi}_T(t)$ is delta-correlated in the model (\ref{modelo}), this yields trivially the diffusive contribution \begin{equation}\label{eq:passmsdtrans} \langle |{\boldsymbol{x}}(t)|^2 \rangle_p = 4D_T t. \end{equation} On the other hand, the second term on the right hand-side of Eq.~(\ref{eq:transmsd2Dcomponents}), which will be denoted by $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a$, originates from the orientational changes in the swimming velocity driven by fractional rotational Brownian noise and can be rewritten as \begin{equation}\label{eq:transmsd2Dact} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a = 2v_{0}^{2}\int_0^t ds\int_0^s ds' \langle\cos(\phi-\phi^{\prime})\rangle_{s-s^{\prime}}. \end{equation} Henceforth, we focus on the nontrivial \emph{active} contribution to the translational mean-squared displacement, i.e., $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a = \langle |{\boldsymbol{x}}(t)|^2 \rangle - \langle |{\boldsymbol{x}}(t)|^2 \rangle_p$. Thus, from Eqs.~(\ref{eq:velcorr}) and~(\ref{eq:transmsd2Dact}), we can derive in a straightforward manner the general expression for this active component for all $0 < H < 1$, namely \begin{eqnarray} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a &= &2v_{0}^{2}\int_0^t ds\int_0^s ds'e^{-D_{H}{s^{\prime}}^{2H}}\nonumber\\ &= &v_0^2 t^2 \sum_{k = 0}^{\infty} \frac{\bigl(-D_H t^{2H}\bigr)^k}{k!(1+k H)(1+2k H)}\label{eq:msdtrans1}. \end{eqnarray} Eq.~(\ref{eq:msdtrans1}) can be rewritten as \begin{equation}\label{eq:msdtrans} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a = \frac{v_0^2}{H D_H^{\frac{1}{H}}}\left[\gamma\left(\frac{1}{2H},D_H t^{2H}\right)D_H^{\frac{1}{2H}}t - \gamma\left(\frac{1}{H},D_Ht^{2H}\right)\right], \end{equation} where $\gamma(\nu,z) = \int_0^z t^{\nu - 1} e^{-t}dt$ is the lower incomplete gamma function. In Figs.~\ref{fig:MSD2dtrans}(a) and~\ref{fig:MSD2dtrans}(b), we plot some exemplary mean-squared displacements for different $H$, computed by means of Eq.~(\ref{eq:msdtrans}) and from the simulated trajectories, respectively. We verify that the analytic expression (\ref{eq:msdtrans}) for arbitrary $H$ agrees very well with the numerical results. An apparent expression can be readily derived from Eq.~(\ref{eq:msdtrans}) for the particular case $H = \frac{1}{2n}$, where $n = 1,2,\ldots$. In such a case, the mean-squared displacement can be expressed as the following finite sum \begin{eqnarray}\label{eq:msdtrans2n} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a & = & \frac{2n v_0^2}{D_{{1}/{2n}}^{2n}}\left\{(n-1)!\left[1 - e^{-D_{{1}/{2n}}t^{\frac{1}{n}}} \sum_{k=0}^{n-1}\frac{D^k_{{1}/{2n}}t^{\frac{k}{n}}}{k!} \right]D_{{1}/{2n}}^n t \right.\nonumber\\ &&\left.+ (2n-1)!\left[e^{-D_{{1}/{2n}}t^{\frac{1}{n}}}\sum_{k=0}^{2n-1}\frac{D^k_{{1}/{2n}}t^{\frac{k}{n}}}{k!} -1 \right]\right\}. \end{eqnarray} For $n = 1$, i.e. $H = \frac{1}{2}$, we recover the well-known expression of persistent Brownian motion with rotational diffusion coefficient $D_{1/2}$ \begin{equation}\label{eq:msdABM} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a =\frac{2v_{0}^{2}}{D_{1/2}^{2}}\left(D_{1/2}t+e^{-D_{1/2}t}-1\right), \end{equation} whereas for $n = 2$, i.e., $H = \frac{1}{4}$ (antipersistent rotational noise), we get \begin{equation}\label{eq:msdtransH0_25} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a = \frac{4v_0^2}{D_{1/4}^{4}}\left[2e^{-D_{1/4}\sqrt{t}}\left(3 + 3D_{1/4} \sqrt{t} + D_{1/4}^2 t \right)+ D_{1/4}^2 t - 6 \right]. \end{equation} From Eq.~(\ref{eq:msdtrans1}), it can be easily seen that in the limit of fully antipersistent rotational noise, $H \rightarrow 0$, the mean-squared displacement is ballistic at all times, i.e., \begin{equation}\label{eq:fullpersist} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a \rightarrow \frac{1}{e} v_0^2 t^2. \end{equation} Eq.~(\ref{eq:fullpersist}) represents the limit of infinite persistence of self-propelled motion, characterized by a constant value of the velocity autocorrelation function~$ \langle \boldsymbol{v}_{s}(s) \cdot \boldsymbol{v}_{s}(s') \rangle \rightarrow v_0^2 / e$ for all $s > s'$, and an effective swimming speed $v_0/\sqrt{e}$. The other extreme limit corresponds to fully persistent rotational fractional Brownian noise, $H \rightarrow 1$, for which we find \begin{equation}\label{eq:msdtransH1} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a \rightarrow \frac{v_0^2}{D_1}\left[\sqrt{\pi}\,{\mathrm{erf}}\left({D_1}^{\frac{1}{2}} t\right)D_1^{\frac{1}{2}}t + e^{-D_1 t^2} -1 \right], \end{equation} where $\mathrm{erf}(z) = \frac{2}{\sqrt{\pi}}\int_{0}^{z} dt e^{-t^2}$ is the error function. Note that, in this case, the velocity autocorrelation approaches the Gaussian decay $\langle \boldsymbol{v}_{s}(s) \cdot \boldsymbol{v}_{s}(s') \rangle \rightarrow v_0^2\exp \left[-D_1 (s-s')^2 \right]$. In the long-time regime $D_{1}^{1/2}t\gg1$, ${\mathrm{erf}}({D_1}^{\frac{1}{2}} t)\approx1$ and hence the linear dependence $\sqrt{\pi/D_{1}}v_{0}^{2}\,t$ is obtained. For all values of $0 < H < 1$, two important limiting cases are observed. First, since $\gamma(\nu,z) \rightarrow \nu^{-1}z^{\nu}$ as $z \rightarrow 0$, then for $t \ll D_H^{-\frac{1}{2H}}$ \begin{equation}\label{eq;shortmsd} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a \approx v_0^2 t^2. \end{equation} This limit corresponds to the characteristic ballistic behavior, which is expected to happen due to the persistence of the swimming velocity ${\boldsymbol{v}}_s(t)$ at sufficiently short timescales. However, at intermediate timescales two qualitatively distinct regimes can be distinguished depending on the behavior of the mean-squared displacement with respect to the value of $H$: \begin{itemize} \item[I)] First, for $H < 0 < \frac{1}{2}$ the short-time ballistic regime is rapidly hindered at $t \lesssim D_H^{-\frac{1}{2H}}$ by the antipersistence of the rotational noise. This results in an intermediate \emph{anomalous} regime where $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a$ grows with time $t$ slower than $\sim t^2$ but faster than $\sim t$ over a broad temporal interval up to several times $D_H^{-\frac{1}{2H}}$, as shown in Figs.~\ref{fig:MSD2dtrans}(a) and~\ref{fig:MSD2dtrans}(b) for $H = 0.1,0.3$, and in Fig.~\ref{fig:MSD2dtrans}(c) for $H = 0.1,0.125,0.15,0.2$. This is also consistent with the long-range temporal correlations of the swimming velocity, which persist even for timescales comparatively larger than $D_H^{-\frac{1}{2H}}$, as shown in Fig.~\ref{fig:velautocorr}. \item[II)] On the other hand, for $\frac{1}{2}\le H <1$, the persistence of the rotational noise allows to fully preserve the ballistic behavior up to timescales $t \approx D_H^{-\frac{1}{2H}}$, see Figs.~\ref{fig:MSD2dtrans}(a) and~\ref{fig:MSD2dtrans}(b) for $H =0.5,0.7,0.9$. For $t \gtrsim D_H^{-\frac{1}{2H}}$, $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a$ reaches quickly a diffusive behavior, caused by the complete decorrelation of the particle orientation, as verified in Fig.~\ref{fig:velautocorr}. Note that in this case, the resulting slope of linear behavior of the mean-squared displacement varies very weakly with $H$, as observed in the bottom-right insets of Figs.~\ref{fig:MSD2dtrans}(a) and~\ref{fig:MSD2dtrans}(b). \end{itemize} Furthermore, the second important limit is obtained at sufficiently long-time scales ($t \gg D_H^{-\frac{1}{2H}}$), for which we find \begin{equation}\label{eq:actdiff} \langle |{\boldsymbol{x}}(t)|^2 \rangle_a \approx \frac{v_0^2}{H D_H^{\frac{1}{2H}}}\Gamma\left( \frac{1}{2H} \right) t, \end{equation} where $\Gamma(\nu) = \int_0^{\infty} t^{\nu - 1} e^{-t}dt$ is the complete gamma function. Remarkably, Eq.~(\ref{eq:actdiff}) reveals that active diffusion emerges in the long-time limit for all values of the Hurst exponent, $0 < H < 1$: $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a \approx 4D_H^{\mathrm{eff}} t$, where the resulting active diffusion coefficient is \begin{eqnarray}\label{eq:effdiffcoeff} D^{\mathrm{eff}}_H & = & \frac{v_0^2}{4 H D_H^{\frac{1}{2H}}}\Gamma\left( \frac{1}{2H} \right),\nonumber\\ & = & \frac{1}{4H} \Gamma\left( \frac{1}{2H} \right) \mathcal{D}_H. \end{eqnarray} Indeed, in Fig.~\ref{fig:MSD2dtrans}(c), we show that, even for regime I ($0<H<\frac{1}{2}$), for which an anomalous growth of the mean-squared displacement occurs at intermediate timescales, a diffusive behavior is reached at sufficiently long timescales. In such a case, the slope of the long-time linear behavior of $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a$ becomes very sensitive to small variations of the Hurst exponent: the smaller the value of $H$, the larger the resulting active diffusion coefficient, as illustrated in Fig.~\ref{fig:MSD2dtrans}(c). By performing a linear fit of the long-time behavior of mean-squared displacements obtained from the simulated trajectories, we compute the numerical values of $D_H^{\mathrm{eff}}$, which are plotted as squares in Fig. ~\ref{fig:MSD2dtrans}(d). For comparison, we also plot as a solid line the dependence of $D_H^{\mathrm{eff}}$ on $H$ given by Eq. (\ref{eq:effdiffcoeff}), thereby showing a very good agreement with the numerical results. Once again, two distinct behaviors of $D^{\mathrm{eff}}_H$ are observed as a function of $H$, which coincide with the existence of the two different regimes (I and II) previously identified. For regime I, the active diffusion coefficient exhibits a sharp monotonic increase with decreasing $H$, and diverges as $H \rightarrow 0$, In addition, it approaches the value $D_{1/2}^{\mathrm{eff}} = \frac{1}{2} \mathcal{D}_{1/2} = \frac{1}{2}v_0^2 D_{1/2}^{-1}$ as $H \rightarrow \frac{1}{2}$. On the other hand, for regime II, $D^{\mathrm{eff}}_H$ varies very weakly with $H$: starting from $D_{1/2}^{\mathrm{eff}}$ it decreases monotonically as $H$ increases and converges to the value $D^{\mathrm{eff}}_1 = \frac{\sqrt{\pi}}{4} \mathcal{D}_{1} = \frac{v_0^2}{4}\sqrt{\frac{\pi}{D_1}}$ as $H \rightarrow 1$. The previous results suggest that two relevant timescales are necessary to describe active motion driven by fractional rotational Brownian noise. The first is the natural timescale \begin{equation}\label{eq:persisttime} \tau_H \equiv D_H^{-\frac{1}{2H}}, \end{equation} which represents a \emph{persistence time} over which the active particle is able to keep on average a constant swimming velocity despite the angular fluctuations. On the contrary, a second timescale, which will be denoted by $\tau_H^{\mathrm{eff}}$, represents the time needed for the particle orientation to become completely decorrelated and uniformly distributed over $[0,2\pi)$. Therefore, $\tau_H^{\mathrm{eff}}$ can be interpreted as an \emph{effective rotational time}, similar to $\tau_{1/2} = D_{1/2}^{-1}$ defined for active Brownian motion ($H=0.5$) as the timescale at which the autocorrelation function of the particle orientation decays to $1/e$. In fact, for this particular value of the Hurst exponent, both timescales coincide: $\tau_{1/2} = \tau_{1/2}^{\mathrm{eff}}$. However, for $H \neq 1/2$, it is expected that $\tau_{H}^{\mathrm{eff}}$ could be different from $\tau_H$ due to the non-exponential decay of the velocity autocorrelation function. In order to determine $\tau_H^{\mathrm{eff}}$, we realize that a diffusive behavior of $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a$ must be observed for $t \gtrsim \tau_H^{\mathrm{eff}}$. Thus, taking into account that $\gamma(\nu,z) \rightarrow \Gamma(\nu)$ as $z \rightarrow \infty$, by applying the condition $\Gamma \left(\frac{1}{2H} \right)D_H^{\frac{1}{2H}}t \gg \Gamma\left(\frac{1}{H} \right)$ to Eq.~(\ref{eq:msdtrans}) we find \begin{eqnarray}\label{eq:rottime} \tau_H^{\mathrm{eff}} & = & \frac{\Gamma\left(\frac{1}{H}\right)}{\Gamma \left(\frac{1}{2H} \right)} D_H^{-\frac{1}{2H}},\nonumber\\ & = & \frac{\Gamma\left(\frac{1}{H}\right)}{\Gamma \left(\frac{1}{2H} \right)} \tau_H. \end{eqnarray} Indeed, for $H = \frac{1}{2}$, Eq.~(\ref{eq:rottime}) reduces to the well known expression $\tau_{1/2}^{\mathrm{eff}} = D_{1/2}^{-1} = \tau_{1/2}$ for pure rotational Brownian noise in two dimensions. In Fig.~\ref{fig:MSD2dtrans}(d) we show as a dashed line the dependence of $\tau_H^{\mathrm{eff}}$ on $H$ given by Eq.~(\ref{eq:rottime}). For regime I, $\tau_H^{\mathrm{eff}}$ exhibits a very pronounced increase as $H$ decreases, and diverges as $H \rightarrow 0$. It should be noted that, in the case of antipersistent rotational noise, the increase of $\tau_H^{\mathrm{eff}}$ on $H$ is much more pronounced than that of $D_H^{\mathrm{eff}}$, as shown in Fig.~\ref{fig:MSD2dtrans}(d). For instance, for $H = 0.1$, $\tau_{0.1}^{\mathrm{eff}} = 15120 \,\tau_{0.1}$, whereas $D_{0.1}^{\mathrm{eff}} = 60 \,\mathcal{D}_{0.1}$. In Fig.~\ref{fig:MSD2dtrans}(c) we represent as triangles the location of $\tau_H^{\mathrm{eff}}$ on the mean-squared displacement curves , $\langle |{\boldsymbol{x}}(t)|^2 \rangle_a$ vs. $t$, for different $0 < H < \frac{1}{2}$. We verify that active diffusion emerges if the elapsed time $t$ is only slightly larger than the values of $\tau_H^{\mathrm{eff}}$ determined by means of Eq.~(\ref{eq:rottime}). Note that the separation between the persistence time and the effective rotational time opens a time interval $[\tau_H,\tau_H^{\mathrm{eff}}]$ over which the active motion is neither ballistic nor diffusive, as illustrated in the inset of Fig.~\ref{fig:MSD2dtrans}(c). The length of this interval where anomalous active motion occurs broadens as $H$ decreases, as illustrated in Fig.~\ref{fig:MSD2dtrans}(c) for different values of $0 < H < \frac{1}{2}$. The opposite behavior is observed for regime II: with increasing $H$, $\tau_H^{\mathrm{eff}}$ decreases monotonically from the value $\tau_{1/2}^{\mathrm{eff}} = D_{1/2}^{-1}$ at $H = 1/2$, thus becoming smaller than $\tau_H$. In this case, the dependence of $\tau_H^{\mathrm{eff}}$ on $H$ is much less pronounced than in I, where the limiting value as $H \rightarrow 1$ is $\tau_1^{\mathrm{eff}} = \tau_{1}/\sqrt{\pi} = {1}/{\sqrt{\pi D_1}}$. Note that in this regime the time interval for the possible appearance of anomalous active motion, $[\tau_H^{\mathrm{eff}},\tau_H]$, is quite narrow. In fact, the maximum relative difference between $\tau_H$ and $\tau_H^{\mathrm{eff}}$ is $(\tau_H - \tau_H^{\mathrm{eff}}) / \tau_H \approx 0.44$ as $H \rightarrow 1$. This implies that a rather abrupt transition from ballistic to active diffusion must occur in this regime at $t \approx D_H^{-\frac{1}{2H}}$, as verified in Figs.~\ref{fig:MSD2dtrans}(a) and~\ref{fig:MSD2dtrans}(b). \section{Summary and final remarks}\label{Sect:Conc} In this paper, we have investigated a two-dimensional model for a overdamped self-propelled particle moving at constant swimming speed, whose orientation is driven by fractional Brownian noise. The resulting dynamics of the swimming direction of the particle has deep consequences on its translational pattern of motion. Remarkably, for positively correlated rotational noise, circular-like motion can be observed even in the absence of external elements that break the rotational symmetry, as found for active colloids swimming in viscoelastic media~\cite{narinder2018} or at large P\'eclet number~\cite{hu2019}. We have derived the corresponding Fokker-Planck equations, as well as the solution for the probability density function of the particle orientation for arbitrary values of the Hurst exponent $H$ of the fractional rotational noise. This in turn has allowed us to find analytical expressions for the swimming-velocity autocorrelation function and the translational mean-squared displacement, which reduce for $H = 0.5$ to the widely-known expressions of the conventional ABP model. By analyzing the behavior of the derived quantities for different values of the Hurst exponent, we have identified two distinct regimes of active motion, marked by the influence of either the antipersistence or the persistence of the rotational noise. We have demonstrated that active diffusion effectively emerges in the asymptotic long-time limit regardless of the nature of the rotational noise. Moreover, we have provided an analytical expression for the active diffusion coefficient as a function of $H$, and checked that our results are in excellent agreement with numerical simulations of active trajectories evolving according to the proposed model. One remarkable finding of our work is the emergence of an $H$-dependent timescale which plays the role of an effective rotational-diffusive time, even though the orientational dynamics of the particle is not exponentially correlated if $H \neq 0.5$. The existence of such a timescale, in addition to the well-known persistence time, sets an interval over which the active motion exhibits anomalous diffusion. This is markedly apparent for antipersistent rotational noise with small Hurst exponent. In such a case, there exists a broad time interval characterized by long-range temporal correlations of the swimming velocity and an anomalous grow of the mean-squared displacement. To our knowledge, our work is the first investigation of the effects of non-exponential orientational correlations in the motion of self-propelled particles. Thus, we expect that the results presented here will contribute to a better understanding of active motion in complex media with anomalous rotational diffusion, such as those found in many biological systems. Further steps of our work could also address the effect of retarded memory effects in the rotational friction~\cite{sandev2014,rodriguez2015}, which could also modify the active diffusive behavior that emerges in the asymptotic limit. One more possible aspect to investigate is the influence of geometrical confinements, as it is known that rotational memory can significantly modify, e.g., the rectification of active particles in asymmetric periodic channels \cite{hu2017}. \section*{Acknowledgements} J.R.G.-S. acknowledges support from DGAPA-UNAM PAPIIT Grant No. IA103320. F.J.S. acknowledges support from DGAPA-UNAM PAPIIT-114717 and PAPIIT-IN110120. \section*{References} \begin{thebibliography}{99} \bibitem{bechinger2016} Bechinger C, Di Leonardo R, L\"owen H, Reichhardt C, Volpe G and Volpe G 2016{\it Rev. Mod. Phys.} \textbf{88}, 045006 \bibitem{ramaswamy2010} Ramaswamy S 2010 {\it Annu. Rev. Condens. Matter Phys.} \textbf{1}, 323 \bibitem{elgeti2015} Elgeti J, Winkler R G and Gompper G 2015 {\it Rep. Prog. Phys.} \textbf{78}, 056601 \bibitem{taktikos2013} Taktikos J, Stark H and Zaburdaev V, 2013 {\it PLoS ONE} \textbf{8}, e81936 \bibitem{darnton2007} Darnton N C and Berg H C, 2007 {\it Biophys. J.} \textbf{92}, 2230 \bibitem{howse2007} Howse J R, Jones R A L, Ryan A J, Gough T, Vafabakhsh R and Golestanian R 2007 {\it Phys. Rev. Lett.} \textbf{99}, 048102 \bibitem{saragosti2012} Saragosti J, Silberzan P and Buguin A 2012 {\it PLoS One} \textbf{7}, e35412 \bibitem{cates2013} Cates M E and Tailleur J 2013 {\it EPL} \textbf{101}, 20010 \bibitem{tenhagen2011} Ten Hagen B, van Teeffelen S and L\"owen H 2011{\it J. Phys.: Condens. Matter} \textbf{23} 194119 \bibitem{pototsky2012} Pototsky A and Stark H 2012 {\it EPL} \textbf{98}, 50004 \bibitem{redner2013} Redner G S, Hagan M F and Baskaran A 2013 {\it Phys. Rev. Lett.} \textbf{110}, 055701 \bibitem{bialke2013} Bialk\'e J, L\"owen H, and Speck T 2013{\it EPL} \textbf{103}, 30008 \bibitem{sevilla2014} Sevilla F J and Gomez Nava L A 2014 {\it Physical Review E} \textbf{90}, 022130 \bibitem{sevilla2015} Sevilla F J and Sandoval M 2015 {\it Phys. Rev. E} \textbf{91}, 052150 \bibitem{basu2018} Basu U, Majumdar S N, Rosso A and Schehr G 2018 {\it Phys. Rev. E} \textbf{98}, 062121 \bibitem{bregulla2015} Bregulla A P and Cichos F 2015 {\it Faraday Discuss.} \textbf{184}, 381 \bibitem{gomezsolano2017} Gomez-Solano J R, Samin S, Lozano C, Ruedas-Batuecas P, van Roij R and Bechinger C 2017 {\it Sci. Rep.} \textbf{7}, 14891 \bibitem{solon2015} Solon A P, Cates M E and Tailleur J 2015 {\it Eur. Phys. J. Spec. Top.} \textbf{224}, 1231 \bibitem{vachier2019} Vachier J and Mazza M G 2019 {\it Eur. Phys. J. E} \textbf{42}, 11 \bibitem{woillez2019} Woillez E, Zhao Y, Kafri Y, Lecomte V and Tailleur J, 2019 {\it Phys. Rev. Lett.} \textbf{122}, 258001 \bibitem{tenhagenpre2011} Ten Hagen B, Wittkowski R and L\"owen H 2011{\it Phys. Rev. E }\textbf{84}, 031105 \bibitem{zoettl2012} Z\"oettl A and Stark H 2012{\it Phys. Rev. Lett.} \textbf{108}, 218104 \bibitem{li2017} Li Y, Marchesoni F, Debnath T and Ghosh P K 2017 {\it Phys. Rev. E} \textbf{96}, 062138 \bibitem{hu2017} Hu C-T, Wu J-C and Ai B-Q 2017 {\it J. Stat. Mech.} 053206. \bibitem{duzgun2018} Duzgun A and Selinger J V 2018 {\it Phys. Rev. E} \textbf{97}, 032606 \bibitem{wagner2019} Wagner C G, Hagan M H and Baskaran A 2019 {\it Phys. Rev. E} \textbf{100}, 042610 \bibitem{wysocki2014} Wysocki A, Winkler R G and Gompper G 2014 {\it EPL} \textbf{105}, 48004 \bibitem{stenhammar2014} Stenhammar J, Marenduzzo D, Allen R J and Cates M E 2014 {\it Soft Matter} \textbf{10}, 1489 \bibitem{richard2016} Richard D, L\"owen H and Speck T 2016{\it Soft Matter} \textbf{12}, 5257 \bibitem{speck2016} Speck T 2016 {\it EPL} \textbf{114} 30006 \bibitem{pietzonka2016} Pietzonka P, Kleinbeck K and Seifert U 2016 {\it New J. Phys.} \textbf{18} 052001 \bibitem{falasco2016} Falasco G, Pfaller R, Bregulla A P, Cichos F and Kroy K 2016 {\it Phys. Rev. E} \textbf{94}, 030602(R) \bibitem{gaspard2017} Gaspard P and Kapral R 2017 {\it J. Chem. Phys} \textbf{147} 211101 \bibitem{shankar2018} Shankar S and Marchetti M C 2018 {\it Phys. Rev. E} 98, 020604(R) \bibitem{peruani2007} Peruani F and Morelli L G 2007 {\it Phys. Rev. Lett.} 99, 010602 \bibitem{gosh2015} Ghosh P K, Li Y, Marchegiani G and Marchesoni F 2015 {\it J. Chem. Phys.} \textbf{143}, 211101 \bibitem{debnath2016} Debnath D, Ghosh P K, Li Y, Marchesoni F and Li B 2016 {\it Soft Matter} \textbf{12} \bibitem{narinder2018} Narinder N, Bechinger C and Gomez-Solano J R 2018 {\it Phys. Rev. Lett.} \textbf{121}, 078003 \bibitem{sevilla2019} Sevilla F J, Rodríguez R F and Gomez-Solano J R 2019 {\it Phys. Rev. E} \textbf{100}, 032123 \bibitem{gomezsolano2016} Gomez-Solano J R, Blokhuis A and Bechinger C 2016 {\it Phys. Rev. Lett.} \textbf{116}, 138301 \bibitem{lozano2018} Lozano C, Gomez-Solano J R and Bechinger C 2018 {\it New J. Phys.} \textbf{20}, 015008 \bibitem{lozano2019} Lozano C, Gomez-Solano J R and Bechinger C 2019 {\it Nat. Mater.} \textbf{18}, 1118 \bibitem{saad2019} Saad S and Natale G 2019 {\it Soft Matter} \textbf{15}, 9909 \bibitem{narinder2019} Narinder N, Gomez-Solano J R and Bechinger C 2019 {\it New J. Phys.} \textbf{21}, 093058 \bibitem{chepizhko2013} Chepizhko O and Peruani F 2013 {\it Phys. Rev. Lett.} \textbf{111}, 160604 \bibitem{tolic2004} Toli\'c-N{\o}rrelykke I M, Munteanu E-L, Thon G, Oddershede L and Berg-S{\o}rensen K 2004 {\it Phys. Rev. Lett.} \textbf{93}, 078102 \bibitem{wong2004} Wong I Y, Gardel M L, Reichman D R, Weeks E R, Valentine M T, Bausch A R and Weitz D A 2004 {\it Phys. Rev. Lett.} \textbf{92}, 178101 \bibitem{jeon2013} Jeon J-H, Leijnse N, Oddershede L B and Metzler R 2013 {\it New J. Phys.} \textbf{15} 045011 \bibitem{thapa2019} Thapa S, Lukat N, Selhuber-Unkel C, Cherstvy A G and Metzler R 2019 {\it J. Chem. Phys.} \textbf{150}, 144901 \bibitem{deschenes2001} Deschenes L A and Vanden Bout D A2001 {\it Science} \textbf{292}, 255 \bibitem{cote2010} Cote Y, Senet P, Delarue P, Maisuradze G G and Scheraga H A 2010 {\it PNAS} \textbf{107}, 19844 \bibitem{andabloreyes2005} Andablo-Reyes E, Díaz-Leyva P and Arauz-Lara J L 2005 {\it Phys. Rev. Lett.} \textbf{94}, 106001 \bibitem{gutierrezsosa2018} Gutierrez-Sosa C, Merino-Gonzalez A, Sanchez R, Kozina A and Diaz-Leyva P 2018 {\it Macromolecules} \textbf{51}, 9203 \bibitem{oliveira2019} Oliveira F A, Ferreira R M S, Lapas L C and Vainstein M H, 2019 {\it Front. Phys.} \textbf{7}, 1 \bibitem{qian2003} Qian H 2003 {\it Processes with Long-Range Correlations: Theory and Applications} ed G Rangarajan and M Z Ding (Springer-Verlag), p 22. \bibitem{dietrich1997} Dietrich C R and Newsam G N 1997 {\it SIAM J. Sci. Comput.} \textbf{18}, 1088 \bibitem{loewen2016} L\"owen H 2016{\it Eur. Phys. J. Special Topics} \textbf{225}, 2319 \bibitem{hu2019} Hu W-F, Lin T-S , Rafai S and Misbah C 2019 {\it Phys. Rev. Lett.} \textbf{123}, 238004 \bibitem{furutsu1963} Furutsu K 1963 {\it J. Res. Natl. Inst. Stand. Technol.} \textbf{67(D)}, 303 \bibitem{novikov1965} Novikov E A 1965 {\it Sov. Phys. JETP} \textbf{20}, 1290 \bibitem{abramowitzBook} Abramowitz M and Stegun I A 1964 {\it Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables} ( New York: Dover Publications Inc.) \bibitem{guinand1941} Guinand A P 1941 {\it Ann. Math.} \textbf{42}, (3), 591 \bibitem{sandev2014} Sandev T, Metzler R and Tomovski Z 2014 {\it J. Math. Phys.} \textbf{55}, 023301 \bibitem{rodriguez2015} Rodriguez R F, Fujioka J and Salinas-Rodriguez E 2015 {\it Physica A} \textbf{427}, 326 \end{thebibliography} \end{document} }}}}\end{eqnarray}}

21 — 2001.04541

\caption{Qualitative visualization of narrative stories generated by human and different methods. Similar sentences for similar images are annoated in {\color{blue} blue}.}

\caption{Qualitative visualization of good quality stories that are generated with anchor words. The sentences with anchor words are annotated as {\color{red}red} in the generated stories.}

22 — 2001.04611

\caption{The optimized mean-field parameters ($\chi=1$), energy, and central charge $c$ by MPO-MPS method $L=60$ at several different bond dimensions. Maximally localized Wannier orbitals are used and the unit of energy is $J=1$. The deviation is computed with respect to the precise ground state energy obtained by exact solutions or DMRG, which read (1) $\frac{2}{3}$ for AKLT, (2) $-4$ for TB, (3) 0.2962 for ULS, and (4) 1.4015 for Heisenberg. {\color{red}\bf Why this table is inconsistent with Table~\ref{tb:BBQ}} ? }

23 — 2001.04928

\caption{Direct transfer (SHRED) and unsupervised domain adaptation (SHRED+ktCUDA) performance on benchmark re-ID datasets compared to published methods. $1^\text{st}$/$2^\text{nd}$/$3^\text{rd}$ best results are in \textbf{\color{red}red}/\textbf{\color{blue}blue}/\textbf{\color{cyan}cyan}. Multisource domain method in \color{magenta}magenta.}

\caption{Comparison of SHRED direct transfer results with state-of-the-art unsupervised direct transfer methods on Market-1501. $1^\text{st}$/$2^\text{nd}$/$3^\text{rd}$ best results are in \textbf{\color{red}red}/\textbf{\color{blue}blue}/\textbf{\color{cyan}cyan}. Multisource domain method in \color{magenta}magenta. }

\caption{Domain adaptation (ktCUDA) and direct transfer (SHRED) comparison for Market-1501. $1^\text{st}$/$2^\text{nd}$/$3^\text{rd}$ best results are in \textbf{\color{red}red}/\textbf{\color{blue}blue}/\textbf{\color{cyan}cyan}.Multisource domain method in \color{magenta}magenta. }

24 — 2001.04952

\caption{ \label{fig:PrincipalBundle} A principal $G$-bundle $P(X,G)$ provides a natural arena for geometry realized through a \emph{connection}, i.e., a smooth direct sum decomposition $T_p P = {\color{blue}V_p P} \oplus {\color{red}H_p P}$ of tangent spaces into {\color{blue}``vertical''} and {\color{red}``horizontal''} components that is \emph{equivariant} under an action of $G$. In the figure, $\pi : P \rightarrow X$ is the bundle projection map. %V and H stand for vertical and horizontal. π is the projection map from a fiber to a point in the base space. The connection is the specification of these vector spaces at each point of the total bundle, and the equivariance criterion ensures that they fit together nicely. }

25 — 2001.05357

\caption{Top-4 extracted symptoms of each method for the disease \textit{appendicitis}. The retrieved \colorbox{yellow!30}{relevant symptoms} and \colorbox{green!30}{primary symptoms} are highlighted.}

26 — 2001.05458

\caption{In Coin Game, 2 agents (\textcolor{red}{Red} and \textcolor{blue}{Blue}) appear at random positions in a 3x3 grid. A coin of color Red or Blue appears randomly in a location. Agents traverse the grid to pick coins which maximises their reward. For each agent, eating a coin of any color gives $+1$ reward, but eating a coin of opponent's color penalizes the opponent with $-2$ reward. The best strategy (which maximizes long-term reward) to play requires cooperating with the opponent.}

\caption{Converting a Na\"ive Leaner into a Status-Quo aware Learner using the imaginary$\eta$-Stationary Environment intuition. In the figure, \textcolor{red}{K} refers to parameter $\kappa$ as used in the paper Section~\ref{sec:approach:SQLoss}}

\caption{\label{fig:ipd_results_with_sticky}% Instantaneous probability of cooperation for both the agents in the IPD game when the environment enforces the stationary criterion. \textcolor{green}{Green} line indicates probability of cooperation while \textcolor{blue}{Blue} line indicates probability of defection. \todopb{Figure is hard to read. Define epoch.} }

27 — 2001.05691

\caption{Examples of video and text pairs. The first two rows are movie clips and their associated scripts from LSMDC~\cite{RohrbachTRTPLCS17} and the last two rows are web videos with their titles from YouTube. These textual information provide semantic information about video content (e.g., {\color{green}{green}} words), but also contain a lot of irrelevant noise (e.g., {\color{red}{red}} words). Best viewed in color.}

28 — 2001.05729

\caption{ {\color{red} CAPTION CAPTION CAPTION CAPTION CAPTION } }

29 — 2001.05731

\caption{Illustration of the setup. The three red arrows represent the unit wave vectors $\mathbf{e}_{\mathbf{k}_i}$ ($i\in\left\{1,2,3\right\}$) for the pump field. They form a right triangular pyramid where the isosceles are described by theses three unit wave vectors $\mathbf{e}_{\mathbf{k}_i}$. The angle between them are $90^{\circ}$ and the angle between these and the distance perpendicular to the base is $\alpha_{\mathrm{c}}\approx54.74^{\circ}$. Beside, the blue arrow symbolize the unit wave vector $\mathbf{e}_{\mathbf{k}_0}$ of the probe beam; it includes the angle $\alpha_{\mathrm{p}}\approx125.26^{\circ}$ with each pump unit wave vector.} \label{fig:pyramid} \end{figure} For the pump laser beams we choose the wave vectors $\mathbf{k}_i=\nu_i \omega_0\,\mathbf{e}_{\mathbf{k}_i}$ with $i\in\left\{1,2,3\right\}$, where the unit wave vectors are \begin{equation} \mathbf{e}_{\mathbf{k}_1} = \left(- \sqrt{\frac{{2}}{{3}}},\, 0,\,\frac{1}{\sqrt{3}}\right)\,, \qquad \mathbf{e}_{\mathbf{k}_2} = \left(\frac{1}{\sqrt{6}},\, -\frac{1}{\sqrt{2}},\,\frac{1}{\sqrt{3}}\right)\,, \qquad \text{and} \qquad \mathbf{e}_{\mathbf{k}_3} = \left(\frac{1}{\sqrt{6}},\,\frac{1}{\sqrt{2}},\,\frac{1}{\sqrt{3}}\right)\,. \end{equation} All pump wave vectors encounter at the same spot including an angle of $90^{\circ}$ with each other. We define that spot as origin of the coordinate system. Furthermore, the angle between each beam and the $z$-axis is $\alpha_{\mathrm{c}}=\arctan\sqrt{2}$. The associated electric and magnetic fields show in the $\mathbf{e}_{\boldsymbol{\mathcal{E}_i}}$ and $\mathbf{e}_{\mathbf{B}_i}$ directions. The overall profile of the field strength is given by the functions $\mathcal{E}_i\left(x\right)$ with suitable $i$. In the used coordinate system, the field vectors for the $i$the pump beam are $\boldsymbol{\mathcal{E}}_i = \mathcal{E}_i\left(x\right)\mathbf{e}_{\boldsymbol{\mathcal{E}_i}}$ and $\mathbf{B} = \mathcal{E}_i\left(x\right)\mathbf{e}_{\mathbf{B}_i}$. We choose \begin{equation} \mathbf{e}_{\boldsymbol{\mathcal{E}}_1} = \left(\frac{1}{\sqrt{3}},\,0,\, \frac{\sqrt{2}}{\sqrt{3}}\right) \qquad \text{and} \qquad \mathbf{e}_{\boldsymbol{\mathcal{E}}_2} = \mathbf{e}_{\boldsymbol{\mathcal{E}}_3} = \left(\frac{\sqrt{2}}{\sqrt{3}},\,0,\, - \frac{1}{\sqrt{3}}\right)\,. \end{equation} The unit vectors for the magnetic field are determined by $\mathbf{e}_{\mathbf{B}_i} = \mathbf{e}_{\mathbf{k}_i} \times \mathbf{e}_{\boldsymbol{\mathcal{E}}_i}$. Now we want to probe that high-intensity volume with another laser beam, the probe beam. We use again a petawatt class laser with frequency $\omega_0$ and pulse duration $\tau$. To increase the signature of nonlinearities we want to maximize the angle between the probe beam and all pump beams. For the proposed setup the only option is to achieve that maximum angle by using the pump laser pointing on the tip of the pyramid formed by the probe beams, see figure \ref{fig:pyramid}. We denote the wave vector of the probe field with $\mathbf{e}_{\mathbf{k}_0}=-\mathbf{e}_z$, it includes an angle $\alpha_{\mathrm{p}}$ with each pump field wave vectors $\mathbf{e}_{\mathbf{k}_i}$, $i\in\left\{1,2,3\right\}$. That angle is connected to $\alpha_{\mathrm{c}}$ by $\alpha_{\mathrm{p}}=\pi-\alpha_{\mathrm{c}}\approx 125.26^{\circ}$. In addition, we choose the polarization of the probe beam by $\mathbf{e}_{\boldsymbol{\mathcal{E}}_0} = \mathbf{e}_y$. We assume an alignment of all laser beams such that the maxima of intensity of each beam -- even the probe beam -- meet at the same point in spacetime. We define the collision center as the origin in our coordinate system. Each laser beam has a Gaussian profile. To boost the signal we focus all beams -- including the higher harmonics after frequency doubling -- to the same beam waist size $w_{i}=\lambda$ at the collision center. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Results} \label{sec:results} In this section we analyze the setup introduced in the previous section, calculate the differential number of signal photons analytically and discuss the advantages. \subsection{Derivation of the signal} \label{sec:DerivSig} Let us compute the differential number of signal photons $\mathrm{d}^3N$ analytically. We can decompose the electric fields $\boldsymbol{\mathcal{E}}_i = \mathcal{E}\left(x\right) \mathbf{e}_{\boldsymbol{\mathcal{E}}_i}$ and the magnetic fields $\mathbf{B}_i = \mathcal{E}\left(x\right) \mathbf{e}_{\mathbf{B}_i}$ by using the spacetime dependent over-all field profile $\mathcal{E}\left(x\right)$ and the spacetime independent unit vectors. The signal amplitude $S_{\left(1\right)}\left(\mathbf{k}\right)$, see Eq. \eqref{eq:S1_FG}, yields \begin{equation} S_{\left(p\right)} \left(\mathbf{k}\right)= \frac{1}{\mathrm{i}} \frac{e^2}{4\pi} \frac{m_e^2}{45} \sqrt{\frac{k^{0}}{2}} \left(\frac{e}{m_e^2}\right)^3 \sum_{i,j,l=0}^3 \mathcal{I}_{ijl} \left(\mathbf{k}\right)g_{\left(p\right);ijl} \left(\hat{\mathbf{k}}\right)\label{eq:S_p} \end{equation} %\sum_{\substack{i,j,l=0\\ i\neq j}}^3 with the Fourier integral \begin{equation} \mathcal{I}_{ijl}\left(\mathbf{k}\right)\equiv \int\!\mathrm{d}^4x \mathrm{e}^{\mathrm{i}k_{\mu}x^{\mu}} \mathcal{E}_i\left(x\right)\mathcal{E}_j\left(x\right)\mathcal{E}_l\left(x\right)\,, \label{eq:FourierInt} \end{equation} and an additional function $g_{\left(p\right);ijl}\left(\vartheta,\varphi\right)$ just depending on the signal photon angles $\vartheta$ and $\varphi$ and the polarization. This function is only determined by the geometry of the unit vectors of all electromagnetic fields including the unit field vectors of the signal photon; we define \begin{equation} g_{\left(1\right);ijl}\left(\vartheta,\varphi\right) \equiv 2 \left( \mathbf{e}_{\left(1\right)} \cdot \mathbf{e}_{\boldsymbol{\mathcal{E}}_l} - \mathbf{e}_{\left(2\right)} \cdot \mathbf{e}_{\mathbf{B}_l} \right) \left( \mathbf{e}_{\mathbf{B}_i} \cdot \mathbf{e}_{\mathbf{B}_j} - \mathbf{e}_{\boldsymbol{\mathcal{E}}_i} \cdot \mathbf{e}_{\boldsymbol{\mathcal{E}}_j} \right) - \frac{7}{2} \left( \mathbf{e}_{\left(1\right)} \cdot \mathbf{e}_{\mathbf{B}_l} + \mathbf{e}_{\left(2\right)} \cdot \mathbf{e}_{\boldsymbol{\mathcal{E}}_l} \right) \left(\mathbf{e}_{\mathbf{B}_i} \cdot \mathbf{e}_{\boldsymbol{\mathcal{E}}_j} + \mathbf{e}_{\mathbf{B}_j} \cdot \mathbf{e}_{\boldsymbol{\mathcal{E}}_i} \right) \label{eq:geo_func} \end{equation} and analogously $g_{\left(2\right);ijl}\left(\vartheta,\varphi\right)=\left.g_{\left(1\right);ijl}\left(\vartheta,\varphi\right)\right|_{\substack{\mathbf{e}_{\left(1\right)}\rightarrow\mathbf{e}_{\left(2\right)}\\\mathbf{e}_{\left(2\right)}\rightarrow-\mathbf{e}_{\left(1\right)}}}$. The indices $i$,$j$,$l$ in the Fourier integral $\mathcal{I}_{ijl}\left(\mathbf{k}\right)$ and the geometry function $g_{\left(p\right);ijl}\left(\vartheta,\varphi\right)$ represent all possible field configurations appearing in the signal photon amplitude. We have to combine three laser fields (or field profiles); it is always an interaction of three parts of the background laser fields with the field of the signal photon $\gamma_{\left(p\right)}$. As mentioned in section \ref{sec:geo} we use a Gaussian beam profile in the limit of infinity Rayleigh range \cite{Svelto10,Siegman86,Robertson1954}. Within this assumption, it can be represented as \begin{equation} \mathcal{E}_i \left(x\right) = \frac{1}{2}A_i\, \mathcal{E}_{\star}\, \mathrm{e}^{-4 \frac{\left(r_i-t\right)^2}{\tau^2}}\, \mathrm{e}^{- \frac{x_{\perp,i}^2}{w_{i}^2\left(r_i\right)}} \left( \mathrm{e}^{\mathrm{i} \nu_i\, \omega_0 \left(r_i -t \right)} + \mathrm{e}^{-\mathrm{i} \nu_i\, \omega_0 \left(r_i -t \right)} \right) \,, \label{eq:E_x_infRay} \end{equation} where we use the abbreviations $r_i = \mathbf{e}_{\mathbf{k}_i}\cdot\mathbf{x}$ and $x_{\perp,i}^2 = \left|\mathbf{e}_{\mathbf{k}_i}\times \mathbf{x} \right|^2$. The infinite Rayleigh range approximation is valid for weakly focused laser beams. This is well justified for pump laser beams generated by higher harmonics. Aiming to gather observables we use the signal amplitude $S_{\left(p\right)}\left(\mathbf{k}\right)$, see Eq. \eqref{eq:S_p}, together with the beam profile $\mathcal{E}_i\left(x\right)$ and the geometry introduced in section \ref{sec:geo} to calculate the differential number of signal photons \begin{equation} \mathrm{d}^3N_{\left(p\right)} \left(\mathbf{k}\right) = \mathrm{d} \mathrm{k} \mathrm{d} \cos\vartheta\, \mathrm{d}\varphi \frac{\mathrm{k}^2}{\left(2\pi\right)^3} \left|S_{\left(p\right)}\left(\mathbf{k}\right)\right|^2\,. \label{eq:d3N_result} \end{equation} We can define a number density for photons in a given frequency range $\mathrm{k}_i$ to $\mathrm{k}_f$. This number density $\rho_{\left(p\right)}\left(\mathrm{k}_i,\mathrm{k}_f,\vartheta,\varphi\right)$ yields after integration over this frequency range with respect to the volume element $\mathrm{k}^2$: \begin{equation} \rho_{\left(p\right)}\left(\mathrm{k}_i,\mathrm{k}_f,\vartheta,\varphi\right)= \frac{1}{\left(2\pi\right)^3} \int_{\mathrm{k}_i}^{\mathrm{k}_f} \!\mathrm{dk} \left|\mathrm{k}\, S_{\left(p\right)}\left(\mathbf{k}\right)\right|^2\,,\label{eq:rho_result} \end{equation} and we define $\rho_{\left(p\right)}\left(\vartheta,\varphi\right)\equiv\rho_{\left(p\right)}\left(0,\infty,\vartheta,\varphi\right)$ in the all-optical regime. Finally, we sum over both polarizations and integrate over the solid angles. This leads us to the total number of signal photons \begin{equation} N_{\text{tot}} = \sum_{p=1}^2 \int_0^{\infty}\!\mathrm{d}\varphi \int_{-1}^{1} \mathrm{d}\!\cos\vartheta \; \rho_{\left(p\right)} \left(\vartheta,\varphi\right) \,. \label{eq:N_tot} \end{equation} \subsection{Semi-analytic results} \label{sec:results_sa} In the next step we want to use the above-mentioned formula Eq. \eqref{eq:d3N_result} and Eq. \eqref{eq:rho_result} to derive results which can be measured in an actual experiment. The main focus lies on the distinguishability of the predicted signal photons from the background photons of the driving laser beams. First we provide estimates for the differential numbers of driving laser photons. Afterwards, we present the attainable numbers of signal photons encoding the signature of quantum vacuum nonlinearity based on the results derived in section \ref{sec:DerivSig}. \subsubsection{Driving laser beams} \label{sec:driving_laser} In section \ref{sec:geo} we introduced a specific laser beam configuration allowing to create a narrow spatially confined scattering center of high intensity. This configuration is based on petawatt class lasers reaching strong electromagnetic field strengths. As we assumed Gaussian beam profiles, the far-field angular decay of the differential number of laser photons constituting a given driving laser beam follows as a Gaussian distribution. For the $i$th laser this quantity is given by \cite{Svelto10,Siegman86,Robertson1954} \begin{equation} \mathrm{d}^2 N_{i} = \mathrm{d}\varphi\,\mathrm{d}\cos\vartheta\; \nu_i A_i^2 N_{\star} \mathrm{e}^{-2\nu_i^2\pi^2\vartheta_i^2\left(\vartheta,\varphi\right)}\,. \end{equation} Here, $\vartheta_i\left(\vartheta,\varphi\right)$ parameterizes the angular decay of the laser photons with respect to the unit wave vector $\mathbf{e}_{\mathbf{k}_i}$. The factor $N_{\star}=2\pi W/\omega_0$ is determined by the laser properties. \subsubsection{Signal Photons} To obtain the total number of signal photons $N_{\text{tot}}$, we have to combine the results for both polarization; see Eq. \eqref{eq:N_tot}. Furthermore, we use the parameters encoding geometric and laser properties introduced in sections \ref{sec:geo} and \ref{sec:DerivSig} to determine the analytical expressions of $\mathrm{d}^3N_{\left(1,2\right)}$ and $\rho_{\left(1,2\right)}\left(\mathrm{k}_i,\mathrm{k}_f,\vartheta,\varphi\right)$. Using $\rho\left(\vartheta,\varphi\right)=\sum_{p=1}^2\rho_{\left(p\right)}\left(\vartheta,\varphi\right)$ we perform the integral over the solid angle numerically, which yields the total number of signal photons in the all-optical regime. We find $N=325.29$ signal photons for the considered setup. For an enhanced analysis we subdivide the frequencies of the resulting signal photons into several intervals, allowing for a spectrally resolved analysis of the signal. To this end, we use a frequency range $\mathrm{k}_i$ to $\mathrm{k}_f$ in the number density and integrate over the solid angles. We are in particular interested in the number of signal photons emitted in the frequency ranges of the driving laser beams. In table \ref{tab:ki_kf_N} we assemble the total numbers of signal photons associated with different frequency ranges. \begin{table}[H] \caption{Total number of signal photons attainable with the suggested setup based on three pump laser beams of frequencies $\omega_0=1.55\,\mathrm{eV}$, $2\omega_0=3.1\,\mathrm{eV}$ and $4\omega_0=6.2\,\mathrm{eV}$ and one probe beam of frequency $\omega_0=1,55\,\mathrm{eV}$. All beams are pulsed and feature a pulse duration of $\tau=25\,\mathrm{fs}$. Moreover, they are focused to a beam waist of $w_{i}=\lambda=800\,\mathrm{nm}$. We assume two one petawatt lasers at our disposal: one one geneartes the pump fields and one the probe. This table provides the number of signal photons for different frequency ranges $\mathrm{k}_i$ to $\mathrm{k}_f$.} \label{tab:ki_kf_N} \centering \begin{tabular}{ccc} \toprule \textbf{initial frequency $\mathrm{k}_i$ in $\mathrm{eV}$} & \textbf{final frequency $\mathrm{k_f}$ in $\mathrm{eV}$} & \textbf{number of signal photons $N$}\\ \midrule 0.97 & 2.13 & 192.69\\ 2.52 & 3.68 & 81.23\\ 5.62 & 6.78 & 51.27\\ 0.00 & $\infty$ & 325.29\\ \bottomrule \end{tabular} \end{table} Moreover, we study the angularly resolved signal photon emission characteristics. A Mollweide projection allows us to transform the spherical data onto a flat chart. Because Mollweide projections do not change the areas of objects they are particularly suited to illustrate the spatial distribution of the signal photons. Note however, that these projections are not conformal and thus do not conserve angles. We present results for the spatial distribution of the signal photons for three frequency regimes, namely $\mathrm{k}_{i,1}=0.97\,\mathrm{eV}$ to $\mathrm{k}_{f,1}=2.13\,\mathrm{eV}$, $\mathrm{k}_{i,2}=2.52\,\mathrm{eV}$ to $\mathrm{k}_{f,3}=3.68\,\mathrm{eV}$, and $\mathrm{k}_{i,3}=5.62\,\mathrm{eV}$ to $\mathrm{k}_{f,3}=6.78\,\mathrm{eV}$. For each regime we determine $\rho\left(\mathrm{k}_i,\mathrm{k}_f,\vartheta,\varphi\right)$. Figure \ref{fig:Mollweide_Signal} shows these number densities. Here, the colors distinguish between different frequency regimes and the brightness indicates the relative number density. As signal photons of different frequencies are emitted into complementary directions, they can be depicted in one plot. \begin{figure}[H] \raggedright%\centering \begin{minipage}{0.7\textwidth} \raggedright%\centering \includegraphics[scale=0.85]{Figures/MollweideSignal.png} \end{minipage} \begin{minipage}{0.2\textwidth} \raggedright \includegraphics[scale=0.6]{Figures/dNBarRed.pdf} \includegraphics[scale=0.6]{Figures/dNBarGreen.pdf} \includegraphics[scale=0.6]{Figures/dNBarBlue.pdf} \end{minipage} \caption{Mollweide projection of the differential signal photon number $\rho\left(\mathrm{k}_i,\mathrm{k}_f,\vartheta,\varphi\right)$. The longitude gives the coordinate $\varphi$ and the latitude $\vartheta$. The three different colors denote the considered frequency regimes, i.e. $\mathrm{k}_{i,1}=0.97\,\mathrm{eV}$ to $\mathrm{k}_{f,1}=2.13\,\mathrm{eV}$ red, $\mathrm{k}_{i,2}=2.52\,\mathrm{eV}$ to $\mathrm{k}_{f,3}=3.68\,\mathrm{eV}$ green and $\mathrm{k}_{i,3}=5.62\,\mathrm{eV}$ to $\mathrm{k}_{f,3}=6.78\,\mathrm{eV}$ blue. The color scale is linear and normalized to the maximum values $\rho_{\text{max}}$ of each frequency regime. Beside the main peaks coinciding with the propagation directions of the driving beams, there are additional, less pronounced peaks in other directions.} \label{fig:Mollweide_Signal} \end{figure} \subsubsection{Signal-to-background separation} In the previous sections we studied the far-field distributions of both the driving laser photons and the signal photons encoding the signature of quantum vacuum nonlinearities. If we naively compare their total numbers the signature of the QED vacuum seems to be undetectable in an experiment. The driving lasers produce photon numbers of the order of $10^{20}$ photons; the signal is made up of $325$ photons. However, taking into account additional properties of the signal we find possibilities to distinguish the signal from the background of the driving laser photons. One possibility is the analysis of the spatial distribution of the photons of the driving laser beams and the signal photons. The Mollweide projection in figure \ref{fig:Mollweide_DiffLog} highlights where the signal dominates over the driving laser photons. The driving laser photons dominate in the red shaded areas, while the signal dominates in the green shaded areas. Hence, in all green colored regions of figure \ref{fig:Mollweide_DiffLog} it is in principle possible to distringush the signal photons from the background. In all frequency ranges, the main peaks in the signal photon distribution coincide with the directions of the driving laser beams. Besides, the signal photon distribution exhibits additional peaks. These peaks can be attibuted to effective photon-photon interactions. With the suggested setup we managed to scatter signal photons into areas of lower driving laser intensity, i.e. areas with a much lower background. Using figure \ref{fig:Mollweide_Signal} we identify the frequency regime of the detectable signal photons our analysis implies that especially for the scattered signal photons of frequencies around $4\omega_0=6.2\,\mathrm{eV}$ the differential signal photon number surpasses the background. \begin{figure}[H] \raggedright%\centering \begin{minipage}{0.7\textwidth} \raggedright%\centering \includegraphics[scale=0.85]{Figures/MollweideDiffLog.png} \end{minipage} \begin{minipage}{0.2\textwidth} \raggedright \includegraphics[scale=0.6]{Figures/dNBarRedLog.pdf} \includegraphics[scale=0.6]{Figures/dNBarGreenLog.pdf} \end{minipage} \caption{Mollweide projection of the differential number of signal photons and driving laser photons in the all-optical regime. The longitude gives the coordinate $\varphi$ and the latitude $\vartheta$. In red shaded areas the driving laser photons dominate, while in the green shaded areas the signal photons dominate. The color scale is logarithmic and normalized to the maximum values $\rho_{\text{max}}$ of each type of signal.} \label{fig:Mollweide_DiffLog} \end{figure} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Conclusions and outlook} We used the theoretical basis of QED in strong fields to derive an analytical expression for the differential numbers of signal photons encoding the signatures of quantum vacuum nonlinearities. To achieve a measurable result we introduced a special configuration based on two optical state-of-the-art petawatt lasers with frequency $\omega_0=1.55\,\mathrm{eV}$, pulse duration $\tau=25\,\mathrm{fs}$, and field energy $W=25\,J$. The first laser beam was spit into three different beams, two of which are transformed to higher frequencies $2\omega_0$ and $4\omega_0$ by means of higher harmonic generation. Upon aligning these beams in a right triangular pyramid with an angle of $90^{\circ}$ between each unit wave vector they constructed the pump field. The second laser acted as a probe beam and propagates against the tip of that pyramid. We derived analytical expressions accounting for the experimental parameters and loss factors and obtained the differential number of signal photons and the number density. After numerical evaluation we compared these results with the background of the driving laser beams. We could in particular identify angular regimes where the differential signal photon number dominates the background, thereby constituting a prospective signature of QED nonlinearity in experiments. The presented results represent the actual state of the analysis. Further properties of the signal are under investigation and will be published in the foreseeable future. One example is the spectral differential number, containing additional information beside the spatial distribution. In the latter, a widening of the spectral signal can be observed. The spectral width of the signal photons surpasses the spectral width of the driving lasers. In addition, we can change the beam properties and geometries for prospective studies, e.g. we can account for different loss factors. Another interesting modification is to use different pulse durations or beam widths in the focus for the beams with different frequencies. Both of these quantities sensitively influence the scattering behavior of the signal photons. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \vspace{6pt} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \funding{This research was funded by the Deutsche Forschungsgemeinschaft (DFG) under grant number 416611371 within the Research Unit FOR 2783/1.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \acknowledgments{I thank Holger Gies, Felix Karbstein, Christian Kohlf\"urst, and Elena A. Mosman for the discussion and collaboration. In addition, I thank for the team of the Dubna Summer School 2019 ``Quantum Field Theory at the Limits: from Strong Fields to Heavy Quarks" with special thanks to David Blaschke and Mikhail Ivanov.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\conflictsofinterest{The authors declare no conflict of interest.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% optional %\abbreviations{The following abbreviations are used in this manuscript:\\ % %\noindent %\begin{tabular}{@{}ll} %QED & Quantum electrodynamics\\ %LCFA & Locally constant field approximation\\ %CHF & Coherent harmonic focusing %\end{tabular}} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \reftitle{References} % Please provide either the correct journal abbreviation (e.g. according to the “List of Title Word Abbreviations” http://www.issn.org/services/online-services/access-to-the-ltwa/) or the full name of the journal. % Citations and References in Supplementary files are permitted provided that they also appear in the reference list here. %===================================== % References, variant A: external bibliography %===================================== \externalbibliography{yes} \bibliography{bab} %===================================== % References, variant B: internal bibliography %===================================== %\begin{thebibliography}{999} %% Reference 1 %\bibitem[Author1(year)]{ref-journal} %Author1, T. The title of the cited article. {\em Journal Abbreviation} {\bf 2008}, {\em 10}, 142--149. %% Reference 2 %\bibitem[Author2(year)]{ref-book} %Author2, L. The title of the cited contribution. In {\em The Book Title}; Editor1, F., Editor2, A., Eds.; Publishing House: City, Country, 2007; pp. 32--58. %%Reference 3 %\bibitem[Author1(year)]{ref-journalB} %Author3, T. The title of the cited articleB. {\em Journal Abbreviation} {\bf 2018}, {\em 11}, 132--142. %\end{thebibliography} % The following MDPI journals use author-date citation: Arts, Econometrics, Economies, Genealogy, Humanities, IJFS, JRFM, Laws, Religions, Risks, Social Sciences. For those journals, please follow the formatting guidelines on http://www.mdpi.com/authors/references % To cite two works by the same author: \citeauthor{ref-journal-1a} (\citeyear{ref-journal-1a}, \citeyear{ref-journal-1b}). This produces: Whittaker (1967, 1975) % To cite two works by the same author with specific pages: \citeauthor{ref-journal-3a} (\citeyear{ref-journal-3a}, p. 328; \citeyear{ref-journal-3b}, p.475). This produces: Wong (1999, p. 328; 2000, p. 475) %% for journal Sci %\reviewreports{\\ %Reviewer 1 comments and authors’ response\\ %Reviewer 2 comments and authors’ response\\ %Reviewer 3 comments and authors’ response %} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document} }}\end{equation}}}}}}}}}\end{equation}}}

30 — 2001.05900

\caption[Weak flavor charges]{ $e$ (\tikz{\draw[Purple, fill=Purple] (0,0) circle (.4ex);}), $\mu$ and $\tau$ (\tikz{\fill[Red] (0.1,0.1) rectangle (0.2,0.2);}) weak flavor charges of the elements with $(Z,N)$ in the valley of stability, as well as their gravitational coupling, approximately proportional to $Z+N$ (\tikz{\fill[black, rotate=45] (0.1,0.1) rectangle (0.2,0.2);}). Beware a minus sign in the $\mu, \tau$ flavor charges. }

\caption[2nu-mediated long range potential]{ 2$\nu$-mediated long-range potential between two atoms of $^{56}$Fe, relative to their gravitational potential. Numerical results for $m_\mathrm{min} = 0$ (superimposed \textcolor{DiracColor}{Dirac}/\textcolor{MajoranaColor}{Majorana} dotted lines) and $m_\mathrm{min}=0.1$~eV (dashed \textcolor{DiracColor}{Dirac} and \textcolor{MajoranaColor}{Majorana} lines), together with analytic limits (solid lines) in Eqs.~(\ref{eq:Vshort}) and (\ref{eq:Vlong}). }

31 — 2001.06122

\caption{Results of the feature extractor comparison experiment on the smaller dataset of 44,612 images taken from the larger Indonesian national election dataset, organized graphically. Each circle is a cluster with a size proportional to the number of images in that cluster. They are arranged along the x-axis with respect to the accuracy rating for that cluster on the impostor-host task. The largest cluster of each method is labelled with the percentage of total images it contains. The most significant finding from this experiment is that the PHASH and VGG global feature extraction and matching methods put nearly all of the images in the dataset into a single cluster, and are not suitable for use with a dataset containing related images of diverse visual appearance. Normalized average accuracy is plotted as {\color{red}$\diamond$}. }

32 — 2001.06129

\caption{Additional \aastex\symbols}

33 — 2001.06590

\caption{Illustration of our video compression architecture. The {\color{red}red} (S1-S2) and {\color{green}green} (S3-S7)modules are responsible for the compression of the background and foreground, respectively. The {\color{blue}blue} (S8) module represents a coarse-to-fine two-stage module, which achieves the composition and enhancements of frames. It should be noted that the {\color[rgb]{0.988,0.812,0.325}yellow} lines (imaginary lines) in the figure represent the decoding process, which reconstructs video frames based on the received bitstreams.}

34 — 2001.06972

\caption{Parameter values are $J_{1} = J_{2} = \pi/3, M = 1, N_{x} = 400$. States localized at the left (red) and right (green) edge for (a) $J_{3} = 0.8\pi$, (b) $J_{3} = 1.2\pi$, (c) $J_{3} = 1.5\pi$, and (d) $J_{3} = 1.8\pi$ are shown. Two-terminal conductance, dynamical winding number and number of counter-propagating edge states around zero and $\pi$ quasienergy gap can be seen in Fig.~\blue{\ref{Fig:Phase}} for each distinct case.}

35 — 2001.07078

\caption{Additional \aastex\symbols}

36 — 2001.07161

\caption{% The entropy $S$ (per unit {\red transverse} area on the boundary), produced during a symmetric collision of thin gravitational shockwaves in AdS$_5$ (both shocks have width $ w=0.075/\mu$, where $w$ is the width of the single Gaussian shock waves before the collision), as a function of time $t$, which is given in units of $[t]=[\mu^{-1}]$, where $\mu^3$ is the transverse energy density of the shock fronts. The gauge/gravity duality relates the entropy density $s$ to the volume element of the apparent horizon. To estimate the entropy production we integrate over the longitudinal coordinate. $S$ is given in units of $\mu^2$. For large enough times linear growths seems to be a good approximation. The shock fronts touch at $\mu t=0$. The linear fit, plotted as a red dashed line, is included to guide the eye. Due to the finitely sized spatial box, in which we study the gravitational collision, we could not follow the time evolution long enough to observe a potential saturation regime for the entropy.}

37 — 2001.07394

\caption{Illustrative 2D example depicting different optimization domains (see \lfsecref{sec:domain_selection_for_bo}) and the domains' growth due to \gls{acr:dda} (see \lfsecref{sec:dynamic_domain_adaptation}), as well as the evaluated points ({\color{tableaublue}$\bm{\times}$}). When an estimated optimum (\protect\markerestimatedoptimum) is on the domain's boundary ({\color{black} \rule[1.25 pt]{10 pt}{2pt}}), the domain grows in the respective direction. The global optimum is marked by \protect\markerglobaloptimum.}

\caption{Sketch depicting the relevant parameters used for \gls{acr:dda}. The true objective ({\color{black} \rule[1.25 pt]{10 pt}{2pt}}) is approximated by a \gls{acr:gp}~({\color{tableaured} \rule[1.25 pt]{10 pt}{2pt}}) and the estimated optimum (\protect\markerestimatedoptimum) is on the domain's boundary. Consequently, the domain grows in the direction of the global optimum (\protect\markerglobaloptimum) with stepsize $\Delta \bm{\Theta}$ which is proportional to the \gls{acr:gp}'s lengthscale $\lambda$, the \gls{acr:gp}'s gradient {\color{tableaublue} $\nabla_\theta \mu_{GP}$} at the estimated optimum and the size of the current domain $\bm{\Theta}$. }

\caption{Parameter space view showing the growing domain boundaries~({\color{black} \rule[1.25 pt]{10 pt}{2pt}}) and evaluated points ({\color{tableaublue}$\bm{\times}$}) during optimization. The global optimum is marked by \markerglobaloptimum.}

\caption{Left: Comparison of performance when policy is optimized on the large domain ({\color{tableaublue} \rule[1.25 pt]{10 pt}{2pt}}) and the independence domain ({\color{tableauorange} \rule[1.25 pt]{10 pt}{2pt}}). 10 independent runs were performed. Right: Parameter space showing large domain (\protect\dashedline), independence domain (\protect\dottedline), evaluated policies (\protect\bluecircle /\protect\orangecircle) and nominal LQR ($\bm{\times}$).}

38 — 2001.07437

\caption{\small \textbf{Impact of hyperparameters for feature erasing.} Scatter plots for HaS, ACoL, and ADL. Color and size of the circles indicate the performance at the corresponding hyperparameters. Red crosses ({\color{red}\ding{53}}): non-convergent training sessions. Green crosses ({\color{greencross}\ding{54}}): hyperparameters suggested by the original papers. }

39 — 2001.07549

\caption{A timeline of course \fcolorbox{lightgray}{colorProje}{\ssmall projects} and \fcolorbox{black}{colorPeerR}{\ssmall \color{white}student-driven feedback}. The \rule[2pt]{10pt}{1pt} solid edges are directly utilized feedback in future projects, while \rule[2pt]{3pt}{1pt} \rule[2pt]{3pt}{1pt} \rule[2pt]{3pt}{1pt} dashed edges are indirect utilization. Projects are in 4 broad categories of objectives, including: \setlength{\fboxrule}{2pt}\fcolorbox{colorFamil}{white}{\ssmall familiarization}; \fcolorbox{colorFound}{white}{\ssmall foundation building}; \fcolorbox{colorSkill}{white}{\ssmall applying skills in new contexts}; and \fcolorbox{colorSyste}{white}{\ssmall software engineering}. }

40 — 2001.07810

\caption{\Relevant pictures taken during the\textit{in vitro} experiments. From left to right the instant of bolus front out (top row) and tail out (bottom row) for a) 1.19\% w/w TUC (\textit{Nectar-thick}), b) 0.3 \% w/w cereal extract, c) 1 \% w/w cereal extract and d)-g) aqueous solutions of 2 to 5\% w/w PEO. \textcolor{black}{Solutions a), c) and d) were all characterised as IDDSI Level 2 and showed similar shear viscosity at 50 and 300 s-1, but the solution c) showed significantly stronger extensional properties. These snapshots show that the \textit{in vitro} bolus of solution c) is more compact when leaving the oral cavity}.}

\caption{\Ratio of extensional to zero shear viscosity for the liquids considered in this study.}

41 — 2001.07960

\caption{A quantitative comparison between six state-of-the-art SOD models on \textbf{F-360iSOD}, where $F_{\beta}^{w}$ means Fbw, $S$ represents S-measure. Note that the top three results of each column are highlighted in \textcolor{red}{red}, \textcolor{green}{green} and \textcolor{blue}{blue}, respectively.}

\caption{A fixation-based complexity analysis of the proposed \textbf{F-360iSOD}. The F-360iSOD-train, F-360iSOD-testA/B are annotated in \textcolor{black}{black}, \textcolor{blue}{blue} and \textcolor{red}{red}, respectively.}

42 — 2001.08210

\caption{{\bf Examples of Unsupervised MT via Language Transfer} between {\bf Ja}, {\bf Ko}, {\bf Zh} $\rightarrow$ {\bf En}. We mark the supervised settings in {\color{red} red}. All three languages have quite different character sets (Ja and Zh shares part of the Chinese characters) and syntactic structures. However, they are still culturally and historically correlated, which we assume can be captured through pre-training. For all cases, if we fine-tune the mBART25 model on any pair, the resulted model directly translates well in the other two pairs without seeing any corresponded parallel sentences. We also see failure cases. For instance (the 3rd example), only the supervised model translates ``자석'' into ``magents'' correctly, while the Ja-En and Zh-En guess with irreverent words ``cushions'' and ``jellyfish'', respectively. Also, in the 2nd example, the Ko-En model fails to translate ``developed'' and copies the source tokens. We suspect it is because the pre-training stage biases the output distribution. }

43 — 2001.08435

\caption{Features of big social data analysis cyber-infrastructures. \fullcircle~fully, \halfcircle~partially, and \emptycircle~not supported.}

44 — 2001.08480

\caption{Segmentation results of the proposed approach of two retinal B-scans of two individual patients with retina (\swatch{red}) and PED (\swatch{green}): B-scan (a)/(e), ground truth (b)/(f), U-Net segmentation (c)/(g), CDAE segmentation (d)/(h). Areas with low SNR or motion artifacts (white arrow) (a)/(e) can lead to a false segmentation (c)/(g) and are corrected by the CDAE refinement (d)/(h). \label{fig:qualitativeresult}}

45 — 2001.08740

\caption{\textbf{AVA per-class average precision}. AVSlowFast (27.8 mAP) \vs its SlowFast counterpart (26.3 mAP). The highlighted categories are the 5 highest absolute increases (\textbf{bold}) and top 5 relative increases over SlowFast ({\color{orange}{\textbf{orange}}}). Best viewed in color with zoom. }

46 — 2001.08873

\caption{\changemarker{Effect of adaptive weighting scheme on the proposed regularizers. Methods that do not adopt the adaptive weighting scheme adopt the fixed weighting scheme of $c_t=1, \forall t$. Performance is reported in \textit{one-class} mode.} }

\caption{\changemarker{Performance on \textit{one-vs-all} and \textit{strokes} anomaly test sets for the CIFAR10 dataset. Results for \textit{one-vs-all} are the average of $10$ normal classes. Performance of the deep SVDD and its variants is reported in \textit{one-class} mode.}}

\caption{Performance of the semi-supervised deep SVDD model and its variants in \textit{one-class} mode. \changemarker{Performance for \textit{one-vs-all} setup is the average of $10$ normal classes. For each normal class, only a single anomaly class (out of nine) was included during training and each anomaly class setting was ran five times, resulting in a total of $9\times 5$ runs.}}

47 — 2001.09169

\caption{\textbf{Dynamics of the populations and correlations: Comparison between theory and experiment for cosine and flat potentials of qubits $Q_7$ to $Q_{12}$.} To explore localization due to the competition between kinetic and potential energy, we consider a cosine potential for the qubits $Q_7$ to $Q_{12}$. Similarly, to investigate localization due to disorder, we consider a flat potential for the qubits $Q_7$ to $Q_{12}$. The panels \boldsymbol{$\textbf{a}_{1,2})$} to \boldsymbol{$\textbf{i}_{1,2})$} and \boldsymbol{$\textbf{b}_{1,2})$} to \boldsymbol{$\textbf{j}_{1,2})$} depict the results without $(W=0)$ and with disorder $(W=5J)$, respectively. \boldsymbol{$\textbf{a}_{1,2})$} and \boldsymbol{$\textbf{b}_{1,2})$} show the qubit frequency setups used in the experiment. \boldsymbol{$\textbf{c}_{1,2})$} to \boldsymbol{$\textbf{f}_{1,2})$} depict the population dynamics $\langle\hat{n}_l\rangle$. Correspondingly, the panels \boldsymbol{$\textbf{g}_{1,2})$} to \boldsymbol{$\textbf{j}_{1,2})$} show the dynamics of the correlation function $C_{ZZ}(l,7,t)$. Due to the shape of the cosine potential, the excitation can penetrate a small region of the disordered domain, even in the absence of disorder. In contrast, the excitation can propagate ballistically for a flat potential in the absence of disorder. }

48 — 2001.09193

\caption{\textbf{Overview of \emph{anduin}0.1}: A schematic of the semi-automated and interactive spine processing pipeline developed in-house. The \textbf{thick-black} lines indicate automated steps and the \textcolor{gray}{dotted-grey} lines indicate an interactive step.}

49 — 2001.09252

\caption{Overall network architecture of our PSC-Net. It consists of a pedestrian detection (PD) branch and a part spatial co-occurrence (PSC) module. In contrast to the baseline standard PD branch where the RoI features are used for box regression and classification (\textcolor{red}{X}), the RoI features in our PSC-Net are fed in to the proposed PSC module to integrate both intra and inter part spatial co-occurrence information. The resulting enriched features are then deployed for final bounding-box regression and classification.}

50 — 2001.09364

\caption{Two CKPS-decorations based on \drawit{3}{$ $,4}{2,3}{0} giving $2$-faces are shown, corresponding to an octagon and a triangle. The triangle decoration is obtained by first circling the middle vertex. The leftmost vertex is then uncrossed by the transformation rule, so it may be circled.}

51 — 2001.09437

\caption{(a) Initial vorticity field used for the evolutions in figure \ref{fig:perevol}. Case 0 in table \ref{tab:caseid}, and $\| u\|_2$ norm used for classification. The cells outlined in black (less significant) and red (more significant) have relatively similar initial perturbation intensities, but very different later evolutions. % (b) Premultiplied spectra of the initial flow fields used for the experiments. The dashed vertical lines are the first minimum of the transfer function of a box convolution window corresponding to the cell size of experiments with $N_c=10$, 6 and 4, from left to right. \solid, Enstrophy spectrum; \dashed, energy; \chndot, transfer function for $N_c=10$, scaled to fit the plot. }

\caption{Optimum classification lines for different initial perturbations, in terms of the kinetic energy and of the enstrophy. (a) Case 0. (b) Case 3. (c) Case 10. (d) Case 13. In all cases, 768 flows. Classification norm, $\|u\|_2$. $N_c=10$, $n_{keep}=5$. The contour lines contain 50\%, 70\% and 90\% of the joint probability density functions (p.d.f.) of the diagnostic variables for: \solid, most significant cases; \dashed, least significant cases. }

\caption{Joint p.d.f. of the relative approximation error, as defined in table \ref{tab:temperr}, versus the experimental divergence at $\omega'_0 T_{{\mbox{\tiny\it ref}}} =4.5$. (a) Case 0 in table \ref{tab:caseid}; template is a vortex. (b) Case 10; template is a dipole. % % Black lines use all the experimental cells. Red lines only use the most significant $n_{keep}=5$ cells in each experiment. 768 experiments, classified using $\|\uvec\|_2$. Contours contain 50\% and 95\% of the probability. \solid, Tested using the training set; \dashed, using an independent test set. }

52 — 2001.09456

\caption{Number of links per day (top), and proportion of those that are new (bottom), after $20$ days of observation of the LANL computer network. {\bf\color{corallo2} Solid red} curve: \textit{User -- Source}. {\bf\color{corallo} Dashed blue} curve: \textit{User -- Destination}.}

53 — 2001.09625

\caption{Folded-node/Homoclinic bursting. We take $f(x)=x^3-3x^2$ and \bluebis{$G(x,y,z)=g(x)-y$ with $g(x)=1-5x^2$. The parameter values are: $a=1$, $c=1$, $\alpha=0.3$, $\gamma=1$, $\delta=1.2$, $\eps=0.002$, $\mu=0.033$, $\gamma_y=0.0005$ and $\gamma_{\beta}=-0.008$}. Panels (a-b) show the spike-adding transition in system~\eqref{eq:fnburster}: (a) in the $(z,x)$ plane; (b) associated bifurcation diagram with respect to parameter $\beta$. Panels (c-d) show a folded-node/homoclinic bursting orbit: (c) in the $(\beta,z,x)$ space; (d) $x$-time series. The bottom panels show a comparison between this folded-node bursting orbit from~\eqref{eq:fnburster} and experimental data from~\cite{roy84}.}

\caption{Folded-node/Hopf bursting. We take $f(x)=x^3-3x^2$ and \bluebis{$G(x,y,z)=g(x)-y$ with $g(x)=1-5x^2$. The parameter values are: $a=1$, $c=2$, $\alpha=0.3$, $\gamma=1$, $\delta=1$, $\eps=0.004$, $\mu=0.0104$, $\gamma_y=0.0003$ and $\gamma_{\beta}=-0.05$}. Panels (a-b) show the spike-adding transition in system~\eqref{eq:fnburster}: (a) in the $(z,x)$ plane; (b) associated bifurcation diagram with respect to parameter $\beta$. Panels (c-d) show a folded-node/Hopf bursting orbit: (c) in the $(\beta,z,x)$ space; (d) $x$-time series.}

\caption{Folded-node/Fold of cycles bursting. We take $f(x)=0$ and \bluebis{$G(x,y,z)=-x^3+A_1(z)x+A_2(z)-y(A_3(z)-x+x^2)$}, where \bluebis{$A_1(z)=0.1201z+0.1871$, $A_2(z)=0.0906z-0.0251$, $A_3(z)=0.105z-0.3526$}. \bluebis{The parameter values are: $a=0$, $c=1$, $\alpha=0$, $\gamma=-1$, $\delta=1$, $\eps=0.01$, $\mu=-0.00012$, $\gamma_y=-0.003$, $\gamma_{\beta}=0.0001$}. Panels (a-b) show the spike-adding transition in system~\eqref{eq:fnburster}: (a) in the $(z,x)$ plane; (b) associated bifurcation diagram with respect to parameter $\beta$. Panels (c-d) show a folded-node/fold of cycles bursting orbit: (c) in the $(\beta,z,x)$ space; (d) $x$-time series.}

54 — 2001.09638

\caption{Analytical description of the voltage limitation for $V_0$\,=\,40~V. Green dashed line:$d_V$\,=\,1~V and$b$\,=\,40 (the right and left sigmoid functions meet in the origin continuously differentiable). Red solid line:$d_V$\,=\,0.01~V and$b$\,=\,20 (used below).}

55 — 2001.09691

\caption{Proposed architecture: feature extractors {\rgb{$F^{RGB}$}} and {\flow{$F^{Flow}$}} are shared for both {\target{target}} and {\textcolor{blue}{source}} domains. Domain Discriminators, {\rgb{$D^{RGB}$}} and {\flow{$D^{Flow}$}}, are applied to each modality. Self-supervised correspondence of modalities, $C$, is trained from both \textcolor{blue}{source} and \target{unlabelled target} data. Classifiers, {\rgb{$G^{RGB}$}} and {\flow{$G^{Flow}$}} are trained using {\textcolor{blue}{source}} domain examples only from the average pooled classification scores of each modality. During inference, multimodal \target{target} data is classified.}

\caption{t-SNE plots of RGB and Flow feature spaces produced by source-only, self-supervised alignment and our proposed model MM-SADA. \target{target} is shown in red and {\color{blue}source} in blue. Our method better aligns both modalities.}

56 — 2001.09876

\caption{\label{fig:method} (a) \emph{Visual illustration of the POLAR framework}. A set of pre-trained embeddings ($\mathbf{R}^{V\times d}$) represents the input to our approach, and we assume that an Oracle provides us with a list of polar opposites with which we generate the polar opposite space ($\mathbf{R}^{d\times N}$). We apply change of basis transform to obtain the final embeddings ($\mathbf{R}^{V\times N}$). Note that $V$ is the size of the vocabulary, $N$ is the number of polar opposites and $d$ is the dimension of the pre-trained embeddings. (b) \emph{POLAR transformation.} In this example the original size of the embeddings is three and we consider two polar opposites (cold, hot) and (hard, soft). In the first step (1) we obtain the direction of the polar opposites (vectors in the original space represented in \textcolor{blue}{blue}) which also represent the change of basis vectors for the polar subspace (represented by \textcolor{red}{red} dashed lines). In the second step (2) we project the original word vectors (`Alaska' in this case) to this polar subspace. After the transformation, `Alaska' gets aligned more to the (cold, hot) direction which is much more related to `Alaska' than the (hard, soft) direction (3).}

57 — 2001.10178

\caption{Average frontiers. The proposed method is in \textcolor{violet}{purple}, and the comparison method TPOT is in \textcolor{orange}{orange}. The training frontier is given as dashed lines, and the true testing frontier given as a solid line. The training points in the frontier are indicated with a `.', and the testing points indicated with a `*'. The ideal position is the bottom right (i.e. a score of 100 on the x-axis, and a complexity of 1 on the y-axis)}

\caption{Adaptive hyperparameter (y) over generations (x). Each grey line indicates the result from a single run. The dotted \textcolor{orange}{orange} line indicates the default (expert chosen) value in TPOT. The dotted \textcolor{violet}{purple} line shows the average rate for the adaptive method.}

58 — 2001.10484

\caption{Lossless bit rates of 14 bit mosaic test images (\textcolor{ao(english)}{\textit{green}} percentages show the improvement regarding to the PNG compression)}

\caption{Lossless bit rates of 8 bit mosaic test images (\textcolor{ao(english)}{\textit{green}} percentages show the improvement regarding to the PNG compression)}

59 — 2001.10507

\caption{Real part of eigenmode $\phi_{4,-5}$ in $\Omega = [0,2\pi)^2$ with associated eigenvalue $\omega_{4,-5}=0.11305798$ for \textred{$\vb = \left( 1.165939761, 1\right)^\top $} (black dashed line).}

60 — 2001.10525

\caption{\label{fig:CaOH_Levels} (a) Laser cooling scheme for CaOH. The vibrational structure depicted here indicates all levels that are addressed with lasers in order to limit the branching ratio to other vibrational states to \red{$4.5\times 10^{-4}$}. (b) Rotational structure of CaOH illustrating the 52 MHz spin-rotation splitting in the electronic ground state as well as the unresolved hyperfine structure (1.5 MHz and 7 kHz in the $J''=\frac{3}{2}$ and $J''=\frac{1}{2}$ states respectively \cite{scurlock1993hyperfine}). The $\tilde{X}^2\Sigma^+\left(v_1'' v_2'' v_3''\right) \rightarrow\tilde{A}^2\Pi_{1/2}\left(v_1' v_2' v_3'\right)$ $P_{1}(J''=\frac{3}{2})$ and $^PQ_{12}(J''=\frac{1}{2})$ rotationally closed transitions are shown \cite{di2004laser}. The parity of the ground states is indicated by the sign to the right of the $N''$ value while the parity of the excited states is indicated to the right of the $J'$ value. The rotational structure of the $\tilde{B}^2\Sigma^+(000)$ state is analogous to that of the $\tilde{X}^2\Sigma^+$ states and is not pictured. Rotational closure on repumping lines through this state is achieved by driving $P_1(J''=\frac{3}{2})$ and $^PQ_{12}(J''=\frac{1}{2})$ transitions to the $\tilde{B}^2\Sigma^+(N'=0, J' = \frac{1}{2},+)$ state. The level diagrams are not to scale.} \par\end{centering} \end{figure} \begin{figure*}[ht] \begin{centering} \includegraphics[scale = 1]{setup_v7.pdf} \caption{\label{fig:Experimental_Diagram} A rendering of the experimental apparatus. On the far left is the two-stage buffer-gas beam source, depicted in cut-away view for clarity. 35.5 cm from the exit of the buffer-gas cell, the molecular beam is collimated by a 3 mm square beam aperture. 39 cm from the cell, the molecules enter the interaction region where they are addressed with light from the main MO cooling beams in the vertical direction. Co-propagating vertically are the $(100), (200)$ and $(02^00)$ repumping lasers. The $(02^20)$ and $(01^10)$ repumping light is multipassed in the horizontal direction and extends beyond the MO region. A separate vertically multipassed region containing $(100)$ and $(02^00)$ repumping light lies after the magnetic field coils and serves to recover population from excited vibrational states. Finally, the molecules encounter a detection beam of smaller cross-section than the cooling and repumping light, and the resulting laser-induced fluorescence is collected and imaged onto an EMCCD.} \par\end{centering} \end{figure*} In this Letter, we demonstrate radio frequency (RF) magneto-optical (MO) cooling and compression (1D MOT) of a beam of the polyatomic molecule $^{40}$Ca$^{16}$OH, an archetypal example of the broader class of MOR molecules. In doing so, we realize a cycling scheme capable of scattering $\sim$ 10$^3$ photons. We characterize the MO forces applied here by extracting force constants and damping rates. A concomitant on-axis increase in molecular density is observed. This demonstration of MO cooling establishes a route towards deep laser cooling and optical trapping for numerous species of polyatomic molecules. Effective MO cooling and compression requires scattering many photons without losing population to states that do not couple to the laser light (``dark states''). Establishing such a cycling transition in molecules requires closing both vibrational and rotational degrees of freedom, as depicted in Fig \ref{fig:CaOH_Levels}. Vibrational decay is not governed by rigorous selection rules but instead by wavefunction overlap, which is quantified by Franck-Condon factors (FCFs). CaOH is an example of a broad class of polyatomic molecules that have been identified as promising candidates for laser cooling due to their diagonal FCFs and strong electronic transitions \cite{kozyryev2016proposal,kozyryev2019determination}. The main laser cooling transition in CaOH is the $\tilde{X}^2\Sigma^+\left(000\right)\rightarrow\tilde{A}^2\Pi_{1/2}\left(000\right)$ transition with a natural linewidth of 2$\pi$ $\times$ 6.4 MHz at 626 nm \cite{CaOHAlifetime}. The highly diagonal FCFs of the $\tilde{A}^2\Pi_{1/2}\left(000\right)$ state suppress spontaneous decay to higher vibrational states during a single scattering event; nonetheless, significant optical pumping into excited vibrational states can occur when many photons are scattered. CaOH has three vibrational modes: a symmetric stretch, a doubly degenerate bend, and an antisymmetric stretch. These vibrational modes are labeled with four quantum numbers $\left(v_1,{v_2}^l,v_3\right)$, where $v_1$, $v_2$, and $v_3$ indicate the number of quanta in the symmetric stretching mode, the bending mode, and the antisymmetric stretching mode, respectively. $l$ labels the nuclear orbital angular momentum in the bending mode and takes values of $l=-v_2,-v_2+2,...,v_2$ \cite{herzberg1966molecular}. Five repumping lasers, listed in Table \ref{tab:Transitions}, are used to establish a quasi-closed cycling scheme and recover population in these states, as depicted in Fig. \ref{fig:CaOH_Levels}. Branching ratios within this cycling scheme are reported in the Supplemental Material. Notably, both the $\tilde{X}^2\Sigma^+\left(01^10\right)$ and $\tilde{X}^2\Sigma^+\left(02^20\right)$ states need to be repumped. Decays to these states are nominally forbidden by an approximate $\Delta l = 0$ selection rule that originates from the separation of electronic and vibrational degrees of freedom in the Born-Oppenheimer approximation. The breakdown of this selection rule has been observed previously for $\Delta l = 1$ transitions in CaOH (and other similar systems) and is attributed to a second order process involving Renner-Teller mixing and spin-orbit coupling leading to intensity borrowing via the $\tilde{B}^2\Sigma^+\left(01^10\right)$ state \cite{brazier1985laser,kozyryev2019determination}. Decay to the $\tilde{X}^2\Sigma^+\left(02^20\right)$ state was previously unobserved. We attribute the magnitude of this decay to a similar mechanism that relies on the mixing of vibrational states within the $\tilde{A}^2\Pi_{1/2}$ manifold (see Supplemental Material). We measure the branching ratio out of this cycling scheme to be $4.5(\red{7}) \times 10^{-4}$, which is predicted to be dominated by decay to the $\tilde{X}^2\Sigma^+\left(12^00\right)$, $\tilde{X}^2\Sigma^+\left(12^20\right)$, and $\tilde{X}^2\Sigma^+\left(300\right)$ vibrational states. Details of this measurement will be the subject of a subsequent publication. To avoid populating rotational dark states, each laser beam (main and all repumpers) contains two frequency components separated by the spin-rotation (SR) splitting of 52 MHz depicted in Fig \ref{fig:CaOH_Levels} (b). The hyperfine splitting is below the natural linewidth of the main cooling transition and does not require additional frequency sidebands \cite{scurlock1993hyperfine}. This type of transition ($J \rightarrow J' = J-1$) causes rapid optical pumping into magnetic dark states, significantly reducing the cooling and confining forces in molecular MOTs \cite{tarbutt2015magneto}. We address this by simultaneously switching both the laser polarization and the sign of the magnetic field gradient during cooling, which evolves magnetic dark states into bright states, as previously demonstrated in diatomic systems \cite{anderegg2017radio,norrgard2016submillikelvin,hummon20132d}. \begin{table} \centering \begin{tabular}{| r c l | c |} \hline \multicolumn{3}{|c|}{Transition} & Wavelength (nm) \\ \hline $\tilde{X}^2\Sigma^+\left(000\right)$ & $\rightarrow$ & $\tilde{A}^2\Pi_{1/2}\left(000\right)$ & 626.4 \\ $\tilde{X}^2\Sigma^+\left(100\right)$ & $\rightarrow$ & $\tilde{B}^2\Sigma^+\left(000\right)$ & 574.3 \\ $\tilde{X}^2\Sigma^+\left(200\right)$ & $\rightarrow$ & $\tilde{A}^2\Pi_{1/2}\left(100\right)$ & 650.4 \\ $\tilde{X}^2\Sigma^+\left(02^00\right)$ & $\rightarrow$ & $\tilde{A}^2\Pi_{1/2}\left(100\right)$ & 629.0 \\ $\tilde{X}^2\Sigma^+\left(02^20\right)$ & $\rightarrow$ & $\tilde{A}^2\Pi_{1/2}\left(100\right)$ & 630.0 \\ $\tilde{X}^2\Sigma^+\left(01^10\right)$ & $\rightarrow$ & $\tilde{B}^2\Sigma^+\left(000\right)$ & 566.0 \\ \hline \end{tabular} \caption{Optical transitions and corresponding wavelengths driven to form a quasi-closed cycling transition in CaOH. The $\tilde{X}^2\Sigma^+\left(000\right)\rightarrow \tilde{A}^2\Pi_{1/2}\left(000\right)$ transition is the main cooling line while the other five frequencies correspond to vibrational repumping lasers.}

61 — 2001.10878

\caption{\textbf{(a)}: Relation between amplitude and period for stars with periods greater than 6 days, thus dominated by SRs and Mira variables. The dashed line marks the approximate period threshold, 60 days, above which substantial pulsation-triggered mass loss is expected. \textbf{(b)}: Relation between amplitude and \pextreme\for the same stars. All the Long secondary period variables and Miras in the entire sample are highlighted, while only two representative SRs are shown.}

\caption{\textbf{(a)} Period-$M_K$ diagram of OGLE LPVs (black points) in the LMC \citep[dominant mode only,][]{soszynski2009a}. For OGLE small amplitude red giants, sequences $a_2$, $a_3$, and $a_4$ denoting AGB stars are shown in blue lines, while sequences $b_2$ and $b_3$ denoting red-giant-branch (RGB) stars are shown in red lines (the line parameters were adopted from Table 1 of \citealt{soszynski2007a}). \textbf{(b)} Similar to panel \textbf{a} now including the \kep\LPVs (Note the difference in the scale of the vertical and horizontal axes). Symbol colors have the same meaning as Figure~\ref{fig:periodamplitude}b. Miras with Gaia DR2 parallaxes better than 30\% are highlighted in the dark blue circles. The red line denotes the Period-$M_K$ relation for Miras \citep{feast1996a}. \textbf{(c)} Uncertainties of $M_K$ for the OGLE LPVs. \textbf{(d)} Uncertainties of $M_K$ for the \kep\LPVs.}

\caption{Comparison of the dominant mode amplitudes measured from full-length and a 1/3-length of the \kep\light curves. The top panel displays the amplitude ratio as a function of the amplitude measured from full-length light curves, while the bottom panel shows the ratio against period, also determined from full-length light curves. The running median values are shown in black and their 3-$\sigma$ uncertainties are shown in the cyan filled region. Green and red symbols have the same meaning as Figure~\ref{fig:periodamplitude}a. Miras are highlighted with the pink asterisks.}

\caption{\redbf{Similar as Figure \ref{fig:modelifttime} now using the $I$-band light curves of 3383 SRs (red diamonds) and 499 Miras (pink asterisks) in the OGLE-III catalog \citep{soszynski2009a}. These stars are selected to have light curve coverage longer than 10 years and a duty cycle greater than 0.4.}}

\caption{Stacked power spectra of high-luminosity red giants with $0.14\ \muHz\leq\numax\leq10.54\ \muHz$. The stacked spectra are shown in four panels so as to highlight in different \numax\ranges clear ridges associated with multiple angular degrees over several radial orders. Each horizontal band represents one power spectrum with the power color-coded. The ordinate axis is not linear in\numax, hence the different ridge curvatures in the different panels. For each radial order $n\geq3$, as indicated at the top of each panel, $l=1,2,0$ modes lie along the left, middle, and right ridge, respectively.}

62 — 2001.10929

\caption{\textit{A cat drinks water.} Simplified AMR graph %\leti{as often displayed in papers} and underlying deep form with \textit{is-instance} relations (\protect\includegraphics[scale=0.5]{pics/insancerel-crop.pdf}) from variables (solid) to concepts (dashed).}

\caption{Examples where \textsc{S$^2$match} assigns a higher score, %metric is more benevolent towards a parser accounting for the similarity of \colorbox{yellow}{aligned concepts}. %\textcolor{red}{red vs. yellow marks ?} }

\caption{\textit{`6 Abu Sayyaf suspects were captured last week in a raid in Metro Manila.'} \textcolor{darkgoldenrod}{gold} (top) vs.\parsed AMR (bottom).\textsc{Smatch} aligns \textit{criminal-organization} to \textit{city} (\textcolor{red}{red}); \textsc{S$^2$match} aligns \textit{criminal-organization} to \textit{suspect-01}, \textit{city} to \textit{country-region} (\textcolor{blue}{blue}).}

63 — 2001.11180

\caption{The overall view of Flow-Fuse-Tracker~(FFT) for multiple object tracking. FlowTracker and FuseTracker (in bold grey boxes) are the two DNN modules of our FFT network. In the FlowTracker, the optical flow is generated from two sequential frames, and the target bboxes $\{b^k_{t-1}\}$~(\textcolor{green}{green} dashed bboxes) at frame $t-1$ are regressed to the bboxes $\{b^k_t\}$ at frame $t$ through the optical flow. In the FuseTracker, the bboxes from both $\{b^k_t\}$ and public detections $D_t$~(\textcolor{red}{red} bboxes) at frame $t$ are refined and fused. The FuseTracker outputs the final tracking results~(\textcolor{green}{green} bboxes with shadow). (\textbf{Best viewed in color})}

\caption{The jittering of the bounding boxes. The \textcolor{red}{red} bboxes are the ground truth bboxes. We apply the random jittering to them to slightly shift their positions and change their shapes, and we use the jittered bboxes (\textcolor{green}{green} dashed bboxes) in training the FlowTracker.}

64 — 2001.11263

\caption[]{Normalized mean squared error (NMSE) estimated from simulated data. The results are reported for different number of microphone observations $n_{mic}$, i.e. (\bluelegend):$n_{mic} = 5$, (\orangelegend):$n_{mic} = 15$, (\redlegend):$n_{mic} = 35$, and (\brownlegend):$n_{mic} = 55$. (Color online)}

\caption[]{Mean structural similarity index (MSSIM) estimated from simulated data. The results are reported for different number of microphone observations $n_{mic}$, i.e. (\bluelegend):$n_{mic} = 5$, (\orangelegend):$n_{mic} = 15$, (\redlegend):$n_{mic} = 35$, and (\brownlegend):$n_{mic} = 55$. (Color online)}

\caption[]{Normalized mean square error (NMSE) in dB estimated from experimental data. Top and bottom plots correspond to different source locations. The results are reported for different number of microphone observations $n_{mic}$, i.e. (\bluelegend):$n_{mic} = 5$, (\orangelegend):$n_{mic} = 15$, (\redlegend):$n_{mic} = 35$, and (\brownlegend):$n_{mic} = 55$. (Color online)}

\caption[]{Mean structural similarity (MSSIM) estimated from experimental data. Top and bottom plots correspond to different source locations. The results are reported for different number of microphone observations $n_{mic}$, i.e. (\bluelegend):$n_{mic} = 5$, (\orangelegend):$n_{mic} = 15$, (\redlegend):$n_{mic} = 35$, and (\brownlegend):$n_{mic} = 55$. (Color online)}

\caption[]{Best and worst performing sampling distributions for 6 microphones in terms of NMSE performance. The results are shown for different frequencies in a real room where the source location is the same as the top plots in Figures \ref{fig:NMSE_real} and \ref{fig:SSIM_real}. Symbol (\redlegendcirc) represents the microphone locations. (Color online)}

65 — 2001.11490

\caption{\label{fig:sk} Two schematic views: (left) the Schwinger-Keldysh contour in the complex time plane with the $\tau$ path integral~(\protect \redfcir) discretized with lattice spacing $\Delta \tau$ and the $t$ path integral~(\protect \bluefcir) trotterized with step $\Delta t$.~(\protect \blackcir) are the matching points between the two path integrals, while~(\protect \blackecir) correspond to inserting $\mathcal{O}$ (right) cartoon of how the StN can be improved through smearing, $\Box[\Psi]=\tilde{\Psi}$, and interpolators $P$ permit the preparation of configurations overlapping with non-thermal states}

66 — 2001.11846

\caption{The network topology of quaternionic recurrent {\color{blue} correlation} and {\color{red} projection} neural networks.}

67 — 2001.11889

\caption{Root-mean-square horizontal (circles) and vertical (up triangles) velocities at mid-height \red{($z=0.5$)} at $Pr=0.1$ (red symbols) and $5$ (green symbols) as a function of $R\red{=} Ra/Ra_c$. Inset: kurtosis \red{of vertical velocity $K_w$} at mid-height (\red{$z=0.5$;} squares) and at the Ekman BL \red{thickness} (\red{$z=\delta_\nu$;} right triangles). The \red{horizontal} dashed line at $K_w=3$ indicates Gaussian kurtosis. \red{Vertical d}ash-dotted and dashed green lines are the predicted transitions from convective columns (T) to plumes (P) in Refs. \cite{cheng2015laboratory} and \cite{nieves2014statistical}, respectively, and the \red{vertical} dotted red lines are our estimated transitions to LSVs. Symbols with black edges \red{represent LSV flow states}. The orange stars are the cases \red{selected for further analysis and comparison.}}

\caption{Snapshots of the horizontal kinetic energy \red{for four different cases in terms of Prandtl number, supercriticality and boundary condition.} Cases (a-c) are indicated in \cref{fig:velrms_wkurt} with orange stars. \red{Case} (d) is at the same $Pr$, $Ra$ and $Ek$ (and same $R$) as (a), but with SF \red{BCs}. For clarity, the domains are stretched horizontally by a factor of (a,d) $3.1$ and (b,c) $4.5$.}

\caption{\red{S}hell-to-shell energy transfer function $T(Q,K)$ \red{for the four selected cases.} \red{Note} the \red{different} limits of the color bar for each instance\red{; t}he color scale is chosen to highlight the main energy transfers.}

\caption{\red{Vertical profiles of k}inetic energy budget \red{terms (Eq. \ref{eq:budget})} (a-d) \red{in} the bulk and (e-h) close to the bottom wall. \red{Note} the change in scale of the horizontal axes for the different panels. The vertical coordinate is rescaled by the \red{corresponding} viscous BL thickness $\delta_\nu=\mathcal{O}(Ek^{1/2})$, except for the SF case where this BL is absent and we use the $\delta_\nu$ of its NS counterpart. All profiles are symmetric about mid-height ($z/\delta_\nu\sim400$). The blue and red \red{horizontal} lines depict the viscous and thermal BLs, respectively. The latter is outside the plotting interval in (e) at $z/\delta_\nu=8$ and in (h) at $z/\delta_\nu=6$.}

68 — 2002.00162

\caption{\textcolor{blue}{The system output $y=x_1=z_1$ and virtual error $z_2$.}}

\caption{\textcolor{blue}{The estimates $\hat{\theta}_{v,1}$ and $\hat{\theta}_{v,2}$.}}

\caption{\textcolor{blue}{The estimates $\hat{D}$ and $\hat{\rho}_1$.}}

\caption{\textcolor{blue}{The control input $u$.}}

\caption{\textcolor{blue}{The control input $u$ with control scheme involving $\mathrm{sign}(\cdot)$ in \cite{liu2017adaptive} and with scheme in this paper.}}

\caption{\textcolor{blue}{The output tracking error $z_1$ with control scheme involving $\mathrm{sign}(\cdot)$ in \cite{liu2017adaptive} and with scheme in this paper.}}

\caption{\textcolor{blue}{The virtual error $z_2$ with control scheme involving $\mathrm{sign}(\cdot)$ in \cite{liu2017adaptive} and with scheme in this paper.}}

\caption{\textcolor{blue}{The control input $u$ with control scheme involving $\mathrm{arctan}(10\cdot)$ in \cite{liu2017adaptive} and with scheme in this paper.}}

\caption{\textcolor{blue}{The output tracking error $z_1$ with control scheme involving $\mathrm{arctan}(10\cdot)$ in \cite{liu2017adaptive} and with scheme in this paper.}}

\caption{\textcolor{blue}{The virtual error $z_2$ with control scheme involving $\mathrm{arctan}(10\cdot)$ in \cite{liu2017adaptive} and with scheme in this paper.}}

\caption{\textcolor{blue}{The state variables of uncontrolled system (\ref{eq:eq3nd}) with \emph{green dot} and \emph{red dot} representing initial condition and the origin respectively.}}

\caption{\textcolor{blue}{The state variables of (\ref{eq:eq3nd}) under proposed control scheme with \emph{green dot} and \emph{red dot} representing initial condition and the origin respectively.}}

\caption{\textcolor{blue}{The virtual errors $z_2$ and $z_3$.}}

69 — 2002.00196

\caption{16-QAM transmit beampattern \textcolor{blue}{is given} for a DFRC system with $16$ element transmit antenna array. Communication is realized at $43^{\circ}$ azimuth angle when radar operation is occurred around $0^{\circ}$.}

\caption{Possible topologies for the \textcolor{blue}{IoR}.}

\caption{\textcolor{blue}{Possible JRC Solutions for IoR Application Areas}}

\caption{Observational Learning in \textcolor{blue}{IoR}}

70 — 2002.00460

\caption{\small The reason and contribution factor analysis on some outfits via a model with cross-entropy reason-regularization. At the top of each mini-table, \emph{good/good} means ground-truth/predicted judgment, and the ground-truth reason is also shown. In each mini-table, the three columns represent the component of factors \rcolor, \rprint and \rattribute, respectively. Rows with \emph{C} are the contribution (see Equation~\eqref{eq:contrib}) of predicted judgment, \emph{G} are the reason for \good~(see Equation~\eqref{eq:reason_good}) and \emph{B} are reason for \bad (see Equation~\eqref{eq:reason_bad}). }

71 — 2002.00632

\caption{Adaptive optimization with a surrogate objective. \textcolor{red}{change name.}}

72 — 2002.00785

\caption{\color{red}$1/\omega_g$ \color{black}and \color{blue}$\overline P_\pm$ \color{black}plotted against $b[V]$, a monotonic function of $V$.}

73 — 2002.00867

\caption{Comparison on retrieval task with state-of-the-art deep learning based unsupervised methods. ``R'' denotes ``rotation''. ``\&'' means that two deformations are applied simultaneously. The$1^{st}$/$2^{nd}$~best results on column basis are indicated in \textcolor{red}{red}/\textcolor{blue}{blue}.}

\caption{Sketch retrieval ablation study on our proposed self-supervised representation learning framework. ``\&'' means that two deformations are applied simultaneously. The$1^{st}$/$2^{nd}$~best results on column basis are indicated in \textcolor{red}{red}/\textcolor{blue}{blue}.}

\caption{Sketch retrieval ablation study on the contribution of dual-branch CNN-TCN to rotation-based self-supervised learning. ``R'' denotes ``rotation''. The $1^{st}$/$2^{nd}$~best results on column basis are indicated in \textcolor{red}{red}/\textcolor{blue}{blue}.}

\caption{Comparison on sketch recognition with the state-of-the-art deep learning based unsupervised methods.``R'' denotes ``rotation''. ``\&'' means that two deformations are applied simultaneously. The$1^{st}$/$2^{nd}$~best results are indicated in \textcolor{red}{red}/\textcolor{blue}{blue}.}

74 — 2002.00917

\caption{Eigenvalues (\textcolor{blue}{'+'}) of $C_{0}^{-1}E_{s}$ for a 3D Laplacian matrix discretized on a $20^3$ grid with the zero Dirichlet boundary condition where the number of subdomains $s =5$.}

75 — 2002.01461

\caption{An example of user detection and mapping results on the Dequindre Cut. {\it pedestrian} is noted as {\color{red} $\bullet$}, {\it cyclist} is denoted as {\color{blue}x}, top and bottom pixels are denoted by {\color{magenta} $\boldsymbol{\cdot}$}.}

\caption{Map of Cullen Plaza and camera calibration with reference points. {\color{red} $\bullet$}: selected pixels on image, {\color{blue} $\circ$}: projection of the world locations onto pixel coordinates using camera parameters.}

76 — 2002.01553

\caption{ \newtexthighlight{ (a) A schematic overview of CPH, as explained in Section \ref{sec:MCKP}: %From each group, CPH selects one configuration~(represented by its utility $U$ and resource consumption $r$) while ensuring that the resource constraint is not violated. To reduce the size of each partial configuration, at each step, CPH removes configurations that are Pareto-dominated or violate the resource constraints. (b) CPH adapted to our quality selection problem: The clients that request the same content are merged without Pareto elimination. This variant of CPH keeps the shaded configurations in Step 1 due to the following reason. These configurations might later yield the maximum utility when another client requests the same content. In this example, the maximum utility consists of a partial configuration~(20, 1700) that would be eliminated by CPH resulting in a lower utility, i.e., 27 instead of 35. } %\textcolor{red}{Do you mean, "if it were omitted, the utility had become 27 while it is now 35 since CPH did not omit it?" I'd suggest to revise the sentence, since it is difficult to understand.}} %\textbf{Note that Pareto domination might result in a better solution (content being requested by many people but with a higher backhaul consumption) being eliminated because of Pareto dominance. This is a problem and I am not sure if it is solve-able.} \label{fig:cph-all} % FILE:https://drive.google.com/file/d/1-TKsDIxpQHcWwlL92VRa-vUpk6q9eR3N/view?usp=sharing }

77 — 2002.01609

\caption{Qualitative comparison between the vanilla model and the proposed model(end-to-end). {\color{green} Green} box represents true positive, whereas {\color{red} red} box indicates a false positive. (G$^*$ stands for GMAC$^*$ and it includes both OMG and OD network's computations)}

78 — 2002.01674

\caption{Primary beam corrected $30.72$\,MHz bandwidth difference image of ALOS centered at$87.675$\,MHz. ALOS is a remote sensing satellite orbiting at an altitude of about$690$\,km and has an RCS of$13.6$\,$m^2$. The satellite also has large solar panels, that when fully deployed have an RCS of $66.0$\,$m^2$.}

\caption{Primary beam corrected $30.72$\,MHz bandwidth difference image of UKube-1 centered at$87.675$\,MHz. UKube-1 is a 3 Unit CubeSat. The figure also shows the box make by the automated DSNRS script used for integrating flux density in the head and the tail of the streak.}

79 — 2002.01782

\caption[excess]{Average noise magnitude in the frequency domain for 1 $\mu$s and 2 $\mu$s is shown. Three sources of excess noise are illustrated on the left panel: Low voltage regulator noise, cathode HV harmonic noise, and 900 kHz burst noise. \textcolor{blue}{Add label for 36 and 108 kHz noise as well. And signal is already removed from this.}}

80 — 2002.01956

\caption{Additional \aastex\symbols}

81 — 2002.02161

\caption{Bulk modulus vs temperature: $\bigcirc$ QSA; {\color{red}$\blacksquare$} ITE; {\color{green}$\boldsymbol{-\!\!-}$} EOS;{\color{blue} $\meddiamond\!\!\!\!\!\!=$} experiment}

82 — 2002.02274

\caption{\error and \imbalancecolor in $C=2$ color case for various datasets and different threshold $\theta$ for the quantile used for positive edges. Notice how our algorithm \match + \localsearch has cost comparable to \pivot and not much higher than \localsearch while reducing the imbalance from the up $65\%$ of the unfair algorithms to 0.}

83 — 2002.02315

\caption{Diagram of a communication system. Our proposed Graph Permutation Selection (GPS) block and the permutation function $\pi$ are colored in \textcolor{green}{green}.}

84 — 2002.02362

\caption{Selective patch level performances \textcolor{red}{will add CNN}.}

\caption{Model level performance with different conditions \textcolor{red}{TBA}.}

85 — 2002.02487

\caption{Tags selected by our algorithm for the harmful cluster in the Threat dataset. \red{Red} is intrinsic threat. \blue{Blue} is suggestive. Black is due to a lack of sufficient background.}

86 — 2002.02651

\caption{Direct comparison with and without regularization block. Models that include \textit{Class Regularization} are in \textcolor{orange}{orange}. Reported accuracy rates (top-1 \%) achieved on Kinetics-400, UCF-101 and HMDB-51 datasets on the validation sets (split 1 for UCF-101 and HMDB-51), with all networks being re-trained with the same settings. All models use inputs of size $16 \times 112 \times 112$ for Kinetics and $16 \times 224 \times 224$ for UCF-101 and HMDB-51. Initially, all networks are trained for 170 epochs. During fine-tuning, we trained for 100 epochs.\label{table:table2}}

87 — 2002.03206

\caption{Regularities and exceptions in (a) a two-class input space and (b) in a latent space with chairs and non-chairs. (c) Regularities (\color{figorange}high \color{black} \cscores) and exceptions (\color{figblue}low \color{black} \cscores) in ImageNet.}

\caption{The joint distribution of \cscore per-class means and standard deviations on ImageNet. Image samples from representative classes (indicated by \textcolor{red}{$\star$}'s) are shown in Figure~\ref{fig:imagenet-per-class-egs}.}

88 — 2002.03500

\caption{Two adversarial examples of MIFGSM \protect\cite{DongCVPR2018}, Gaussian Blur \protect\cite{rauber2017foolbox} and our AB$\kern-3pt$\textcolor{mygray}{B}A. MIFGSM produces apparent noise on all 3 cases. Gaussian blur-based method loses most of the details carrying out the attack. Our AB$\kern-3pt$\textcolor{mygray}{B}A generates visually natural motion blur with high attack success rate.}

\caption{From left to right: original image, adversarial examples generated by our kernel-prediction-based attack and motion-based adversarial blur attack~(AB$\kern-3pt$\textcolor{mygray}{B}A).}

\caption{The success rate of AB$\kern-3pt$\textcolor{mygray}{B}A \wrt the variation of both $\epsilon$ and $\epsilon_\theta$ in Eq.~(\ref{eq:motion_adv_obj}) where $\epsilon$ is within $[5,50]$ with step size $5$ and $\epsilon_\theta$ is in $[0,1]$ with step size $0.1$.}

\caption{Up: two examples of AB$\kern-3pt$\textcolor{mygray}{B}A$_\text{pixel}$, AB$\kern-3pt$\textcolor{mygray}{B}A$_\text{obj}$, AB$\kern-3pt$\textcolor{mygray}{B}A$_\text{bg}$, AB$\kern-3pt$\textcolor{mygray}{B}A$_\text{image}$, and AB$\kern-3pt$\textcolor{mygray}{B}A. Bottom: Success rates of our method with respect to the object motion directions. }

\caption{Adversarial comparison results on NeurIPS'17 adversarial competition dataset according to the success rate. There are two comparison groups. For the first one, we compare the effects of attacking different regions, \ie, object or background regions, of inputs for FGSM, MIFGSM, GaussBlur, DefocusBlur, and our {\bf AB$\kern-3pt$\textcolor{mygray}{B}A}. In addition to above methods, the second group comparison contains Interpretation-based blur~($\mathrm{Interp}_{blur}$), Interpretation-based noise~($\mathrm{Interp}_{noise}$)~\protect\cite{FongICCV2017}, DIM~\protect\cite{XieCVPR2019}, and TIMIFGSM~\protect\cite{DongCVPR2019}. The results of DIM and TIMIFGSM are cited from \protect\cite{DongCVPR2019} where the Xception model is not included. We highlight the top three results with \first{red}, \second{green}, and \third{blue}, respectively.}

\caption{The left subfigure shows the interpretable maps of six adversarial examples generated by FGSM, MIFGSM, and AB$\kern-3pt$\textcolor{mygray}{B}A, respectively, with four models. The right subfigure shows the transferability \& consistency distributions of adversarial examples generated by the three attacks.}

89 — 2002.03528

\caption{ An illustration of the multi-body pose graph structure under our setting between a pair of consecutive frames. Nodes in {\color{blue} blue} correspond to the primary pose graph i.e., pose graph for the ego-motion. Nodes in {\color{green} green} correspond to the secondary pose graph i.e., pose graph for the dynamic objects in the scene. }

\caption{Dynamic multibody SLAM results on KITTI-Tracking sequences. \emph{Col 1} illustrates the input image stream with bounding boxes to specify the vehicles being mapped in \emph{Col 2} and \emph{Col 3}. While \emph{Row 1} and \emph{Row 3} illustrate our performance on multi-vehicle road plane scenarios, \emph{Row 2} shows our results for a vehicle far away from the camera over a long sequence. Black vehicle represents the ego-vehicle and {\color{red}red}, {\color{blue}blue} and {\color{green}green} plots represents the unique vehicle instances in the scene. The dotted plots in the same colors represent the corresponding ground truths. }

\caption{Odometry estimations for ORB scaled with our method({\color{blue}blue}).GPS/IMU trajectory is indicated in {\color{red}red} and ORB at their scale is indicated in {\color{yellow}yellow}. The figure illustrates that our method for estimating odometry is proficient on sharp turns and long sequences.}

\caption{ \textbf{Pipeline}: We obtain dynamic-vehicle localizations via the modules explained in {\color{blue}blue} section. The mathematical representations to the same can be found in \ref{subsec:pspipeline} and \ref{subsec:mobili}. The {\color{green}green} section illustrates our approach to obtain accurate odometry estimations in metric scale, as explained in \ref{subsec:odometry}. The {\color{orange}orange} section illustrates a part of the pose-graph structure where the {\color{gray}gray}, the {\color{orange}orange} and the {\color{purple}purple} nodes represent the nodes for ego-car and two dynamic vehicles in the scene respectively. Moreover, the {\color{black}black}, the {\color{blue}blue} and the {\color{red}red} edges represent the camera-camera, vehicle-vehicle and camera-vehicle edges respectively.}

90 — 2002.03646

\caption{In these electrocardiogram data, it is important for models (\textcolor{blue}{blue}) to accurately detect the QRS complex (Q is before the peak, R is the peak marked in \textcolor{red}{red}, S is the local minimum after the peak, other states o1--o6). \textbf{Top:} Previous model of \citet{PanTompkins1985} mistakenly predicts S at the peak. \textbf{Bottom:} proposed constrained change-point model accurately predicts R at each peak.}

91 — 2002.03824

\caption{(a) Three traps create three rings of magnetic nanoparticles. (b) The rings interact with one another (see \textcolor{urlblue}{Visualization 1}, \cite{Masajada:13}).}

92 — 2002.04328

\caption{ P-values of the DM test between TAR\{1; [1, 1, 1, 1]\} and the VAR(1). Yellow rectangles denote p-values lower than 0.05. Red crosses (\textcolor{red}{$\mathbf{\times}$}) refer to cases in favour of the TAR model while black circles (\tikzcircle{3pt}) refer to cases in favour of the VAR.}

\caption{ P-values of the DM test between TAR\{1; [1, 1, 1, 1]\} and the VAR(1). Yellow rectangles denote p-values lower than 0.05. Red crosses (\textcolor{red}{$\mathbf{\times}$}) refer to cases in favour of the TAR model while black circles (\tikzcircle{3pt}) refer to cases in favour of the VAR.}

93 — 2002.04479

\caption{Single image results obtained on test images from the Make3D dataset. Each result contains the following four images (from left to right): original photograph, ground truth depth from the dataset, our inferred depth, and our synthesized anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.jpg} image. The depth images are shown in log scale. Darker pixels indicate nearby objects (black is roughly 1m away) and lighter pixels indicate objects farther away (white is roughly 80m away). Each pair of ground truth and inferred depths are displayed at the same scale. }

\caption{Video results obtained on test images for each building in our stereo-RGBD dataset (buildings 1-4, from left to right and top to bottom). For each result (from left to right): original photograph, ground truth depth, our inferred depth, ground truth anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.jpg} image, and our synthesized anaglyph image. Because the ground truth 3D images were recorded with a fixed interocular distance (roughly 5cm), we cannot control the amount of ``pop-out,'' and the 3D effect is subtle. However, this is a parameter we can set using our automatic approach to achieve a desired effect, which allows for an enhanced 3D experience. Note also that our algorithm can handle multiple moving objects (\emph{top}). Additional results are shown in the supplemental files. %We captured data in four different buildings, with data from only one building used for training. Results in the left column are from the building used for training (obtained by holding each of the left sequences out of the training database), and the results on the right are from other buildings not used in training. % Each pair of inferred and ground truth depths are displayed in log space at the same scale. \vspace{2mm} }

\caption{Several clips from the feature film {\it Charade}. Each result contains (from top to bottom): the original frames, estimated depth, and estimated anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.jpg} automatically generated by our algorithm. Some imperfections in depth are conveniently masked in the 3D image due to textureless or less salient regions. %(trained again using only Building 1 data). %While not perfect, our depth maps are suitable for convincing 3D results. Depth errors in less salient regions do not affect the 3D result much. %Here, we show two clips from the movie {\it Charade}. Each result shows several frames from each clip and contains three rows (top to bottom): input frame, estimated depth, estimated anaglyph \protect\includegraphics[height=5pt]{fig/anaglyph.jpg} automatically generated by our algorithm (trained again using only Building 1 data). }

94 — 2002.04678

\caption{ Ratings of \textsc{NLU-refer}~(\textcolor{RoyalBlue}{Blue}), \textsc{Vision Engine}~(\textcolor{Maroon}{Red}), and \textsc{NLU-attribute/value}~(\textcolor{YellowOrange}{Yellow}). (Strongly) Agree/Disagree were chosen according to a statement ``I found it difficult for the chatbot to use \texttt{<feature>}''. \texttt{<feature>} is the system feature description. %%Right is better. }

\caption{Number of turns to perform the 1st edit (\textcolor{RoyalBlue}{Blue}) and the 2nd edit (\textcolor{Maroon}{Red}).}

\caption{ User feedback on what they liked about our system. We divided into 5 categories (i) Easy to use (\textcolor{RoyalBlue}{Blue}), (ii) Quick (\textcolor{Maroon}{Red}), (iii) Capable (\textcolor{YellowOrange}{Yellow}), (iv) Experience (\textcolor{OliveGreen}{Green}) (v) Other~(\textcolor{Orange}{Orange}).}

95 — 2002.04724

\caption{Illustrations comparing our methods to the baseline. (1) CR-GAN~\citep{CRGAN} is the baseline, with consistency regularization applied only between real images and their augmentations. (2) In Balanced Consistency Regularization (bCR-GAN), we also introduce consistency regularization between generated fake images and their augmentations. With consistency regularization on both real and fake images, the discriminator is trained in a balanced way and less augmentation artifacts are generated. (3) Furthermore, we propose Latent Consistency Regularization (zCR-GAN), where latent $z$ is augmented with noise of small magnitude. Then for the discriminator, we regularize the consistency between corresponding pairs; while for the generator we encourage the corresponding generated images to be more diverse. Note that \textcolor{blue}{\{$\rightarrow\leftarrow$\}} indicates a loss term encouraging pairs to be closer together, while \textcolor{red}{\{$\leftarrow\rightarrow$\}} indicates a loss term pushing pairs apart. }

96 — 2002.04898

\caption{Left: scatter plot of benzene versus temperature. Right: scatter plot of nitrogen oxide versus temperature. The symbols \textcolor{gray}{$\bullet$} $\scriptstyle{\textcolor{gdg}{\blacklozenge}}$ \textcolor{darkgreen}{$\blacktriangle$} denote weights in $(0, 0.33],\ (0.33, 0.66]\ \text{and}\ (0.66, 1]$ respectively. }

97 — 2002.04942

\caption{(a) Typical dimple shapes for different impact conditions in the multi-dimple regime corresponding to the circled red numbers in (b). Multi-pinch-offs dimple: \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{1}}} $D=1.16$ mm, $U=1.7$ m/s, $Fr=259$, $We= 493$; \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{2}}} $D=1.02$ mm, $U=2.1$ m/s, $Fr=421$, $We= 617$; \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{3}}} $D=0.93$ mm, $U=2.05$ m/s, $Fr=463$, $We= 560$ and singular telescopic dimple: \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{4}}} $D=0.73$ mm, $U=2.38$ m/s, $Fr=792$, $We= 593$. The scale bars are 100 $\mu$m long. (b) Characterization of the dimples and jets in {\it Fr-We} space for drop impacts of immiscible liquids. The two dash curves are the bounds of the regular bubble entrapment measured by \cite{pumphrey1990entrainment,oguz1990bubble}. % using the best fits from \cite{oguz1990bubble}. The two solid curves mark the bubble entrapment region based on our study. (c) Enlarged region corresponding to the rectangular dashed box in (b). % The meaning of the large symbols are given graphically in Fig. 5. The symbols correspond to different dimple shapes: (\textcolor{mypink1}{\textbigcircle}) no pinch-off shallow dimple; (\textcolor{red}{$\times$}) first critical pinch-off (first singular jet) at the boundary between no and one bubble pinch-off; (\textcolor{red}{$\triangle$}) tiny bubble pinched off near first critical pinch off; (\textcolor{mypink2}{$\triangle$}) dimple pinch-off with bubble going out with jet; (\textcolor{red}{$\plus$}) secondary critical pinch-off between bubble going out with jet and bubble entrapped in PP1 drop; (\textcolor{red}{$\triangledown$}) tiny bubble pinched off near secondary critical pinch-off; (\textcolor{red}{$\largewhitestar$}) singular telescopic dimple; (\textcolor{black}{$\triangledown$}) pinched-off bubble entrapped in PP1 drop; (\textcolor{blue}{$\square$}) liquid column break-up without dimple pinch-off; (\textcolor{green}{$\Diamond$}) water entrapped in PP1 drop without pinch-off. The dashed cyan lines mark the region of multi-dimples.}

\caption{Scaling of the dimple radius vs time before pinch-off. There is a transition of power-law exponents from 2/3 to 0.55 closest to the pinch-off. The background shading marks the validity of each, with the arrow indicating the approximate cross-over time $t_c$. The data is taken from two video clips spanning time-scales from 100 ns to 200 $\mu$s before pinch-off. The corresponding log-log-plots are included in Suppl. Materials. %The solid lines are the power law, black circle is experimental data and the point lines indicate where the transition starts and ends approximately. %The inset shows the logarithmic scaling at the vicinity of the final pinch-off. %Both the painted areas show a prominent difference of the power law. The inset shows how $t_c$ normalized by the impact time $D/U$ changes with $We$, for dimple pinch-off (\textcolor{mypink2}{$\triangle$} \&$\triangledown$) and singular jets (\textcolor{red}{$\times$}, \textcolor{red}{$\plus$} \&\textcolor{red}{$\largewhitestar$}). The vertical arrows indicate these are lower bounds.}

98 — 2002.04943

\caption{\red{Phase ($\Phi$) spectra as a function of the RF frequency collected for a typical image. A complete image acquisition is constituted by X, Y, R, and $\Phi$ spectra per each pixel of the 2D imaging area.}}

\caption{Imaging of low-conductivity \SI{5}{\milli\litre} calibrated solutions. \textbf{(a)} Phase image of the \SI{9.1\pm0.1}{\siemens\per\meter} sample. \textbf{(b)} Phase image of the \SI{4.5\pm0.1}{\siemens\per\meter} sample. \textbf{(c)} Phase image of the \SI{0.91\pm0.01}{\siemens\per\meter} sample. \red{The red circles mark the position and the extension of the Petri dishes.}}

\caption{\red{Stability of sub-\SI{}{\siemens\per\meter} imaging.} \textbf{(a)} Distribution of the maximum recorded phase change ($\Delta \Phi_{\text{max}}$) across 51 images images of the \SI{0.91}{\siemens\per\meter} sample. The red line is the fitting normal distribution. \textbf{(b)} Corresponding scatter plot. Size and color are proportional to the weight used for averaging. The colorbar indicates the weights $[0,1]$ attributed to the images.}

99 — 2002.04998

\caption{(a) the phases of classical electron trajectories from two neighbor half laser cycles with $\mathbf{p}<0.4$ a.u. (b) The phase of electron trajectories presents concentric rings, the center of which occur in different place of the $\mathrm{p}_{\mathrm{z}}$ axis. (c) Phase difference between electron trajectories from two neighbor half cycles is achieved by $\cos \left[S_{1}(\mathbf{p})-S_{2}(\mathbf{p})\right]$ suggesting that these straight line fringes perpendicular to the laser polarization axis result from the interference between direct electron trajectories without considering the Coulomb potential in simple man model.} \end{figure*} The phases of these two types of quantum trajectories, which are obtained by the classical actions along their paths in the Feynman's path integral formulation, are presented in Figs. 3(b) and (c). Additionally, the radial fringes produced with the phase difference $\cos(\vartriangle S)$ are given in Fig. 3(d), which are coincident to the measurements (Fig. 3(f)) and numerical calculations (Fig. 2(d)). Thus, it can be understood that it is the interference between direct and rescattered EWPs launched from the same attosecond temporal window around the peak of laser electric field that induces the LEIS (see Fig. 3(e)). The fringes in the range of $\textbf{p}_z>0$ come from the same interference mechanism except that the laser electric field aligns in the opposite direction. This further explains the phenomenon in Ref. \cite{Gopal2009} that signatures of radial structures are only observed for $\textbf{p}_z<0$ with a carrier-envelope-phase (CEP) stabilized few-cycle laser pulse. %Next, we illustrate how to access the time information from the LEIS. Similar to strong-field photoelectron holography \cite{Huismans2011}, the interference of LEIS can be viewed as a pump-probe experiment but with higher time resolution. The moment of birth of the direct EWP $t_0^{ref}$ starts this experiment, while at a later time $t_c$ the scattering potential is probed by the rescattered EWP. The changing of scattering potential between $t_0^{ref}$ and $t_c$ can be encoded in the LEIS. The possible application of LEIS is to retrieve the structure information in complex molecules, such as the changing of bond length. Figure 4 (a) shows the relation between the final momentum and the time difference between the collision time of the rescattered EWP with the ionic core and the time-of-birth of the direct EWP $t_c-t_0^{ref}$, and sub-femtosecond time resolution can be achieved. Moreover, the initial tunneling time of EWPs can be recorded in LEIS. In Fig. 4 (b), the time difference between the time-of-birth of the direct and rescattered EWPs $t_0-t_0^{ref}$ is shown with a \emph{\textbf{10-as}} time resolution since these EWPs born in a time window of 240 as produce the fringe pattern of LEIS. %As shown in Figs.2 (e) and (f), the radial LEIS is invisible in the outer two rings. Both two-dimensional PEMDs in experiments and GQTMC simulations shown in Fig. 1 reveal oblong-shape fringes along the direction perpendicular to the laser polarization at the higher energy part. Obviously, the interference of electron trajectories emitted within one single time window around one laser crest cannot explain these oblong-shape interference structure. \hangafter=-1\hangindent=0pt\noindent\textbf{Interplay between the EWPs from different temporal windows.} In the following, we extend our GQTMC analysis to comprehend the interference of EWPs from different temporal windows. The initial conditions and phases of EWPs in two neighbor temporal windows for the photoelectrons in the range of $|\textbf{p}|<0.4$ a.u. are presented in Figs. 4(a), (b), and (c), respectively. The phase of EWPs emitted from each temporal window around the laser peak presents a concentric-ring-shape structure with a non-zero center on the $p_z$ axis and those of EWPs from two neighbour temporal window are symmetric to each other with respect to $p_{z}=0$, as shown in Figs. 4(b) and (c). On the other hand, the phase difference of the EWPs from two neighbor temporal windows is shown in Fig. 4(d), where straight lines almost perpendicular to the laser polarization axis can be identified. For comparison, the reconstructed photoelectron momentum distribution of EWPs released from the two temporal windows is given in Fig. 4(e), where the straight linear fringes close to those in Fig. 4(d) appear. \begin{figure}[h] \centering \includegraphics[width=0.4\textwidth,angle=0]{figure6} \caption{(a) the initial tunneling coordinates of the photoelectrons from three neighbor temporal windows in $|\textbf{p}|<0.4$ a.u.(b) the reconstructed momentum distribution with the EWPs.} \end{figure} The calculations above indicate that these fringes can be attributed to the interference between direct electron trajectories of neighbour temporal windows. This is verified by the calculations with a simple man model \cite{Corkum1993}, where the Coulomb potential has been ignored. In this model, the classical actions \cite{Lewenstein1994} are \begin{equation} S_{j}(\mathbf{p},t_0 )=\int_{t_{0}}^{T_{p}}\left\{\frac{1}{2}[\mathbf{p}+\mathbf{A}(\tau)]^{2}+I_{p}\right\} d \tau , \end{equation} where $\mathbf{p}=\mathbf{p}_{\mathbf{r}} \overrightarrow{\mathbf{e}}_{\mathbf{r}}+\mathbf{p}_{\mathbf{z}} \overrightarrow{\mathbf{e}}_{\mathbf{z}}$ denotes asymptotic momentum and the $t_{0}$ and $T_{p}$ are ionization moment and the end of the pulse. We can rewrite the classical action as \begin{equation} S_{j}(\mathbf{p},t_0)=\int_{t_{0}}^{T_{p}} \frac{1}{2}\left[\mathbf{p}_{\mathbf{z}}+\mathbf{A}(\tau)\right]^{2} d \tau+\left(\frac{1}{2} \mathbf{p}_{\mathbf{r}}^{2}+I_{p}\right) *\left(T_{p}-t_{0}\right) \end{equation}. In the simple man model \cite{Corkum1993}, the electron is born at the moment $t_{0}$ with the zero velocity along the direction of laser field, i.e.,$\mathbf{v}\left(t_{0}\right)=\mathbf{p}_{\mathbf{z}}+\mathbf{A}\left(t_{0}\right)=0$. From this equation we can obtain the relation between $t_{0}$ and $\mathrm{p}_{\mathrm{z}}$ numerically. In Fig. 5 $(\mathrm{a})$ and $(\mathrm{b})$ , we show the phases of EWPs in two neighbor temporal windows calculated with the simple man model and the concentric rings close to the results in Fig. 4(b) and (c) can be found. The phase difference between two EWPs ionized from neighbor half cycles can be achieved by $\cos \left[S_{1}(\mathbf{p})-S_{2}(\mathbf{p})\right]$ \cite{Huismans2011}, as shown in Fig. 5$(\mathrm{c})$. There are fringes of straight lines almost perpendicular to the $\mathrm{p}_{\mathrm{z}}$ axis from $-0.4$ to $0.4$, which are very close to those in the reconstructed momentum distribution with EWPs coming from neighbor half cycles (see Fig. 4$(\mathrm{d})$) and also the previous experimental observation \cite{Gopal2009}, where the number of the relevant temporal windows has been chosen by compressing the pulse duration to 5 fs and stabilizing the carrier envelop phase. %Indeed, the physical mechanism of the temporal double-slit \cite{Lindner2005} or intracycle \cite{Diego2010} interference between EWPs born within one optical cycle of laser field, and agrees with previous experimental observation well \cite{Gopal2009}. To further demonstrate the validity of the interference mechanism discussed above, in Fig. 6(b), the momentum distribution for the EWPs from three neighbor temporal windows (Fig. 6(a)) has been calculated. As shown in this figure, the distinct ATI rings appear due to the intercycle interference can be identified, which is in good agreement with the measurements. \section{Conclusion} In summary, we investigate the low-energy interference structure in above-threshold ionization in mid-infrared laser fields both experimentally and theoretically. Our analysis clarifies that the LEIS arises due to interference between direct electron and electron experiencing soft recollision which are ionized within a small temporal window near the crest of the laser amplitude. The LEIS is universal in atoms and molecules and provides a potential way to retrieve structure and dynamics of EWP of atoms and molecules with attosecond time resolution. \section{Acknowledgements} The work was supported by the National Key Research and Development Program of China (No. 2019YFA0307700 and No. 2016YFA0401100), the NNSF of China (Grant No. 11674209, No. 11774215, No. 91950101, No. 11947243, No. 11774387, No. 11834015, and No. 11974383), Sino-German Mobility Programme (Grant No. M-0031), the Department of Education of Guangdong Province (Grant No.2018KCXTD011), the Open Fund of the State Key Laboratory of High Field Laser Physics (SIOM), and the Science and Technology Department of Hubei Province (No. 2019CFA035). \begin{thebibliography}{xx} \bibitem{Agostini1979} P. Agostini, F. Fabre, G. Mainfray, G. Petite, and N. K. Rahman, Phys. Rev. Lett. \textbf{42}, 1127 (1979). \bibitem{Gopal2009} R. Gopal, K. Simeonidis, R. Moshammer, Th. Ergler, M. d\"{u}rr, M. Kurka, K. -U. K\"{u}hnel, S. Tschuch, C. -D. Schr\"{o}ter, D. Bauer, J. Ullrich, A. Rudenko, O. Herrwerth, Th. Uphues, M. Schultze, E. Goulielmakis, M. Uiberacker, M. Lezius, and M. F. Kling, Phys. Rev. Lett. \textbf{103}, 053001 (2009). \bibitem{Lindner2005} F. Lindner, M. G. Sch\"{a}tzer, H. Walther, A. Baltu\v{s}ka, E. Goulielmakis, F. Krausz, D. B. Milo\v{s}evi\'{c}, D. Bauer, W. Becker, and G.G.Paulus, Phys. Rev. Lett \textbf{95}, 040401 (2005). \bibitem{Huismans2011} Y. Huismans \emph{et al.}, Science \textbf{331}, 61 (2011). \bibitem{keldysh} L. V. Keldysh, Sov. Phys. JETP. \textbf{20}, 1307-1314 (1965). \bibitem{faisal73}F. M. H. Faisal, J. Phys. B \textbf{6}, L89 (1973). \bibitem{reiss80} H. R. Reiss, Phys. Rev. A \textbf{22}, 1786 (1980). \bibitem{simpleman} H. B. Heuvell, van Linden van den, and H.G. Muller (1988) \emph{Multiphoton Processes}(edited by Smith SJ and Knight PL) Cambridge: Cambridge University Press. \bibitem{Corkum1993} P. B. Corkum, Phys. Rev. Lett. \textbf{71}, 1994 (1993). \bibitem{Schafer1993} K. J. Schafer, B. Yang, L. F. DiMauro, and K. C. Kulander, Phys. Rev. Lett. \textbf{70}, 1599 (1993). \bibitem{Blaga2009} C. I. Blaga \emph{et al.}, Nat. Phys. \textbf{5}, 335 (2009). \bibitem{Quan2009} W. Quan \emph{et al.}, Phys. Rev. Lett. \textbf{103}, 093001 (2009). \bibitem{Wu2012} C. Y. Wu, Y. D. Yang, Y. Q. Liu, Q. H. Gong, M. Wu, X. Liu, X. L. Hao, W. D. Li, X. T. He, and J. Chen, Phys. Rev. Lett. \textbf{109}, 043001 (2012). \bibitem{Faisal2009} F. H. M. Faisal, Nature Phys. \textbf{5}, 319 (2009). \bibitem{Liu2010} C. P. Liu, K. Z. Hatsagortsyan, Phys. Rev. Lett. \textbf{105}, 113003 (2010). \bibitem{Kastner2012} A. K\"{a}stner, U. Saalmann, and J. M. Rost, Phys. Rev. Lett. \textbf{108}, 033201 (2012). \bibitem{Yan2010} T. Yan, S. V. Popruzhenko, M. J. J. Vrakking, D. Bauer, Phys. Rev. Lett. \textbf{105}, 253002 (2010). \bibitem{Guo2013} L. Guo, S. S. Han, X. Liu, Y. Cheng, Z. Z. Xu, J. Fan, J. Chen, S. G. Chen, W. Becker, C. I. Blaga, A. D. DiChiara, E. Sistrunk, P. Agostini, and L. F. DiMauro, Phys. Rev. Lett. \textbf{110}, 013001 (2013). \bibitem{WB14} W. Becker and S. P. Goreslavski, D. B. Milo\v{s}evi\'{c}, and G. G. Paulus, J. Phys. B \textbf{47}, 204022 (2014). \bibitem{chen2002} J. Chen and C. H. Nam, Phys. Rev. A \textbf{66}, 053415 (2002). \bibitem{Rudenko2004} A. Rudenko, K. Zrost, C. D. Schr\"{o}ter, V. L. B. de. Jesus, B. Feuerstein, R. Moshammer , and J. Ullrich, J. Phys. B \textbf{37}, L407 (2004). \bibitem{Arbo2006} D. G. Arb\'{o}, S. Yoshida, E. Persson, K. I. Dimitriou, and J. Burgd\"{o}rfer, Phys. Rev. Lett. \textbf{96}, 143003 (2006). \bibitem{Arbo2008} D. G. Arb\'{o}, K. I. Dimitriou, E. Persson, and J. Burgd\"{o}rfer, Phys. Rev. A \textbf{78}, 013406 (2008). \bibitem{ZJChen2006} Z. J. Chen, T. Morishita, A. T. Le, M. Wickenhauser, X. M. Tong, and C. D. Lin, Phys. Rev. A \textbf{74}, 053405 (2006). \bibitem{Lai2017} X. Lai \emph{et al.}, Phys. Rev. A \textbf{96}, 013414 (2017). \bibitem{Liu2012} H. Liu, Y. Liu, L. Fu, G. Xin, D. Ye, J. Liu, X. T. He, Y. Yang, X. Liu, Y. Deng, C. Wu, and Q. Gong, Phys. Rev. Lett. \textbf{109}, 093001 (2012). %\bibitem{Marchenko2011} T. Marchenko, Y. Huismans, K. J. Schafer, and M. J. J. Vrakking, Phys. Rev. A \textbf{84}, 053427 (2011). \bibitem{Ullrich2003} J. Ullrich \emph{et al}., Rep. Prog. Phys. \textbf{66}, 1463 (2003). \bibitem{Jahnke2004} T. Jahnke \emph{et al}., J. Electron. Spectrosc. Relat. Phenom. \textbf{141}, 229 (2004). \bibitem{Boge2013} R. Boge, C. Cirelli, A. S. Landsman \emph{et al.}, Phys. Rev. Lett. \textbf{111}, 103003 (2013). \bibitem{Yu2005} M. Yu. Ivanov, M. Spanner, and O. Smirnova, J. Mod. Opt. \textbf{52}, 165 (2005). \bibitem{Yudin2001} G. L. Yudin and M. Yu. Ivanov, Phys. Rev. A \textbf{64}, 013409 (2001). \bibitem{Perelomov1966} A. M. Perelomov, V. S. Popov, and M. V. Terent'ev, Zh. \'{E}ksp. Teor. Fiz. \textbf{50}, 1393 (1966) [Sov. Phys. JETP \textbf{23}, 924 (1966)]. \bibitem{Brabec1996} T. Brabec, M. Y. Ivanov, and P. B. Corkum, Phys. Rev. A \textbf{54}, R2551 (1996). \bibitem{Hu1997} B. Hu, J. Liu, and S. G. Chen, Phys. Lett. A \textbf{236}, 533 (1997). \bibitem{Chen2000} J. Chen, J. Liu, and S. G. Chen, Phys. Rev. A \textbf{61}, 033402 (2000). \bibitem{Salieres2001} P. Sali\`{e}res, B. Carr\`{e}, L. Le D\`{e}roff \emph{et al}., Science \textbf{292}, 902 (2001). \bibitem{MinLi2014} Min Li, \emph{et al.}, Phys. Rev. Lett. \textbf{112}, 113002 (2014). \bibitem{Song2016} X. Song, C. Lin, Z. Sheng, P. Liu, Z. Chen, W. Yang, S. Hu, C. D. Lin, and J. Chen, Sci. Rep. \textbf{6}, 28392 (2016). \bibitem{Yang2016} W. Yang, H. Zhang, C. Lin, J. Xu, Z. Sheng, X. Song, S. Hu, and J. Chen, Phys. Rev. A \textbf{94}, 043419 (2016). \bibitem{Lin2016} C. Lin \emph{et al.}, Acta Physica Sinica \textbf{65}, 223207 (2016). \bibitem{Song2017} X. Song \emph{et al}., Phys. Rev. A \textbf{95}, 033426 (2017). \bibitem{Gong2017} X. Gong \emph{et al}., Phys. Rev. Lett. \textbf{118}, 143203 (2017). \bibitem{Song2018} X. Song \emph{et al.}, Phys. Rev. Lett. \textbf{121}, 103201 (2018). \bibitem{Lewenstein1994} M. Lewenstein, Ph. Balcou, M. Yu. Ivanov, Anne L'Huillier, and P. B. Corkum, Phys. Rev. A. \textbf{49}, 2117 (1994). \bibitem{Diego2010} D. G. Arb\'{o}, K. L. Ishikawa, K. Schiessl, E. Persson, and J. Burgd\"{o}rfer, Phys. Rev. A. \textbf{81}, 021403(R) (2010). \end{thebibliography} \end{document} }

100 — 2002.05201

\caption{We augment a sampling-based planner, RRT, with a hierarchical recurrent network that encodes the meaning of a natural-language command the robot must follow. Just as with a traditional planner, the robot mentally explores the space around a {\red start location} building a {\blue search tree} to find a good path in its configuration space. Unlike a traditional planner, we do not specify a goal as a location, but instead rely on a neural network to score how likely any position in the configuration space is to be an end state while considering the past history of the robot's actions and its observation of the environment. The structure of the RNNs mirrors that of the search tree, with each splitting off as different decisions are considered. At each time step, the RNNs {\grey observe the environment}, and can adjust the sampling process of the planner to avoid moving in undesirable locations (in this case, the tree is not expanded toward the red circle, and instead adjusted to go down the passageway through the green circle). See~\cref{fig:rnn-model} for details on the structured RNNs and how they encode the structure of sentences as relationships between recurrent models.}

\caption{The structure of the model interpreting and following the command \emph{Pick up the orange ball from below black triangle}. As the search tree described in \cref{fig:model-overview} is constructed, this model interprets the state of each tree node being expanded. It predicts the direction to expand the node in and whether the node completes the plan being followed. Each {\green word is a module} in the network, each module contains two neural networks (shown in black --- the module for \emph{below} is expanded). Each word updates its associated {\orange hidden state} updated at each time step using an RNN. The structure of the network is derived automatically from a parse produced by the NLTK coreNLP parser~\citep{bird2006nltk}. {\blue Visual features} are extracted and provided to each word model. {\red Attention maps} are predicted by each word by a combination of visual features, the attention maps of any words directly below in the hierarchy, and the state of that word. The attention maps indicate which objects should be manipulated and how they should be manipulated. The attention map of the final word and the output of its RNN are used to predict the direction of movement and the success probability. Using attention maps as the mechanism to forward information in the network provides a level of interpretability.}

\caption{Examples of the (a) training set and of the (b) test set. The {\orange robot} is shown in orange as a pair of L-shaped grippers. Objects are randomly positioned, with random properties and orientations. The training set is considerably simpler, with fewer objects on average, without cups that have lids, without the need to traverse doors or channels as all objects are inside the room, and without immovable obstacles (grey rectangles).}

101 — 2002.05217

\caption{Learning the causal graph by actively interacting with the environment. Given a high-level set of {\color{violet} features $f_i$} and an {\color{red} environment $\mu$}, we collect data using a policy $\pi$ to learn an initial {\color{green} causal model $G$ (hypothesis)} from features time series. Then, we {\color{yellow} design an {\em intervention} (an experiment),} a new policy aimed at disproving the current causal graph to learn the true one. Then, the intervention is executed in the environment as a standard {\color{blue} agent-environment interaction with intrinsic reward}, and the process is repeated.}

\caption{Left \ref{fig:envs_causal}: The causal diagram of the environment which the agent should learn. {\color{teal} The player} needs to collect {\color{green} food} and {\color{blue} keys}. Keys are used to open {\color{yellow} chests} and the {\color{blue} number of keys} is displayed above the first black line. {\color{green} Top row with health} decreases at every time-step, and the episode ends if it is 0. {\color{gray} The button} toggles the lamp (\underline{black}/white) which gives no reward. Right two \ref{fig:envs_b}, \ref{fig:envs_c}: layouts of environments B and C. }

\caption{Experimental results on environment B for predicting the true causal graph. Horizontal axis shows the number of episodes (in 1000s), plots are arranged by intervention method (Loss, Node, Edge). Vertical axis represents the number of runs (out of 10) which have converged to the true graph $G^*$. Plots are arranged by the number of interventions (0, 5, 20). {\color{green} Green line} represents the median. 0 interventions corresponds to training with reward. The random policy is evaluated in a separate experiment with spurious correlations as a result. $\infty$ means that the algorithm didn't find the correct graph during training.}

102 — 2002.05322

\caption{Accuracy results for identified phases in the 512x512x768 testing sample using different segmentation trained neural networks with and without weighting (W = weighted and UW = unweighted). The best and second best accuracy results are highlighted in \textcolor{red}{red} and \textcolor{orange}{orange} respectively. Overall, the U-ResNet architecture performed best in 2D, and 3D testing further outperforms in terms of voxelwise accuracy.}

\caption{Barchart plots of the calculated Euler Number on each segmented phase for each of the 10 tested networks, compared to the ground truth result. The best and second best accuracy results are highlighted in \textcolor{red}{red} and \textcolor{orange}{orange} respectively. The connectivity is erratic compared to previously calculated pixelwise accuracies and no single network is able to reliably generate the same connectivity.}

103 — 2002.05386

\caption{Left: CMB spectral $\mu$-distortion with $\mathcal{N}_{\rm unkown} = {\color{orange}0}$, $\color{brown}-13$, $- \infty$ (Credit: \cite{Cho:2017zkj}). Right: The 21-cm power spectra of thermal inflation ($k_{\rm b} =$ {\color{red} $1\invMpc$ (red)}, {\color{Goldenrod} $3\invMpc$ (yellow)}, {\color{cadmiumgreen} $5\invMpc$ (green)}), warm dark matter ( $m_{\rm FD} =$ {\color{brown} $1 \keV$ (brown)}, {\color{magenta} $2 \keV$ (magenta)}), and $\Lambda$CDM (black) scenarios just before the epoch of reionization. The shaded regions imply the power spectra above the thermal noise from the modified SKA configuration with $100~{\rm deg^2}$ sky coverage. Exposure times are ${\color{cyan}10^3}$, ${\color{orange}10^4}$, and ${\color{gray}10^5}$-hours in SKA1-LOW, and ${\color{cyan}10^2}$, ${\color{orange}10^3}$, and ${\color{gray}10^4}$-hours in SKA2-LOW \citep{Greig:2015zra} (Credit: \cite{Hong:2017knn}). }

104 — 2002.05676

\caption{\color{Gray} Time series data (green), fitted values (red) and step ahead forecast (blue) for poliomyelitis cases counting. Poisson GARNN (A); Negative Binomial (B).}

105 — 2002.05677

\caption{\textbf{{\color{changes}Local} intraspecific competition stabilises spatially uniform solutions and patterns at lower migration speeds.} Onset, existence and stability parameter regions of patterned solutions of \eqref{eq: Multispecies pattern: single-species model} are shown in the $(A,c)$ parameter plane. Onset at high precipitation values occurs at a Hopf bifurcation, while onset at low values occurs at a homoclinic solution. The existence region of patterns is bounded below by the homoclinic solution and bounded above by either the Hopf bifurcation or a fold in the solution branch, if it exists. Part (a) corresponds to strong {\color{changes}local} intraspecific competition, (b) to weak {\color{changes}local} intraspecific competition. The loci of both the Hopf bifurcation and the fold in the patterned solution branches are shifted to lower precipitation volumes if {\color{changes}local} intraspecific competition is strong, while the homoclinic solution occurs at higher rainfall levels. {\color{changes}Hence, the length of the rainfall interval in which patterns exist decreases with increasing local intraspecific competition. Shown in (c), the relative difference in the size of the pattern existence rainfall interval is given by $(\overline{A}_\infty-\overline{A}_k)/\overline{A}_\infty$, where $\overline{A}_\infty$ and $\overline{A}_k$ are the lengths of the pattern existence rainfall interval in the absence of local intraspecific competition and for local intraspecific competition dynamics with carrying capacity $k$, respectively. Moreover, strong local intraspecific competition stabilises patterns at lower migration speeds.}}

\caption{\textbf{Linear stability of spatially uniform equilibria.} The spatially uniform equilibria of \eqref{eq: Multispecies pattern: Model: nondimensional model} and their stability under changes to the precipitation volume $A$ are shown. Solid lines indicate stable states, dashed lines unstable states. For high precipitation values, the coexistence equilibrium $\overline{\bm{{v_m^{c,+}}}}$ is stable because interspecific competition for water is sufficiently lower than intraspecific competition. A decrease in $A$ causes $\overline{\bm{{v_m^{c,+}}}}$ to lose stability to the single-species tree equilibrium $\overline{\bm{{v_m^{t,+}}}}$. For the parameters used in the visualisation the stability change occurs where both equilibria intersect, but this need not be the case. {\color{changes}Also note that at the intersection of equilibria, the coexistence steady state becomes ecologically irrelevant, as one of the plant densities becomes negative. Nevertheless, this steady state can be instructive for mathematical understanding of the dynamics.} The grass equilibrium $\overline{\bm{{v_m^{g,+}}}}$ is unstable for all $A$, because changes in rainfall cannot change which species is of higher local average fitness. Here $k_1=k_2 =1000$ to keep {\color{changes}local} intraspecific competition sufficiently weak. For significantly smaller values of $k_1=k_2$ only the coexistence equilibrium is stable. }

\caption{\textbf{Introduction of a second species affects stability of single-species patterns.} A comparison of the essential spectra of a single-species pattern in the single-species model \eqref{eq: Multispecies pattern: single-species model} (a) and the multispecies model \eqref{eq: Multispecies pattern: Model: nondimensional model} (b) are shown. The spectrum in the single-species model is a subset of the spectrum in the multispecies model. The additional elements of the spectrum correspond to the leading order behaviour of perturbations {\color{changes}in the density of the second species.} Note that the spectra yield that the corresponding single-species pattern is stable in the single-species model, but unstable in the multispecies model due to the introduction of {\color{changes} the competitor species}. The vertical lines visualise the imaginary axis. The parameter values are $A=2$ and $c=0.25$. {\color{changes}For this visualisation, a pattern of species $u_1$ was chosen, but identical considerations hold for single-species patterns of species $u_2$.}}

\caption{\textbf{Strong {\color{changes}local} intraspecific competition facilitates spatially uniform coexistence and causes coexistence pattern onset at a Turing-Hopf bifurcation.} Bifurcation diagrams for different values of the carrying capacities $k_1=k_2$ are shown for $c=0.25$. A decrease in {\color{changes}local} intraspecific competition increases the size of the precipitation interval in which coexistence patterns exist and simultaneously inhibits spatially uniform coexistence. Under strong {\color{changes}local} intraspecific competition, two Hopf bifurcations along the spatially uniform coexistence equilibrium exist and cause the onset of patterns. Typically, patterns originating from the lower branch are of large wavelength and are thus omitted form the bifurcation diagram in (a). Both Hopf bifurcation loci meet in a fold as {\color{changes}local} intraspecific competition is increased to a critical threshold beyond which coexistence patterns connect both single-species pattern branches ((b) and (c)). Patterned states are only shown for one value of the uphill migration speed and no stability information is provided. In (b) and (c), $\|u_1\|$ is multiplied by $\operatorname{sign}(u_1)$ to visualise the occurrence of $u_1<0$. }

\caption{\textbf{{\color{changes}Local} intraspecific competition facilitates species coexistence in vegetation patterns.} Two coexistence solutions are shown. In (a), {\color{changes}local} intraspecific competition is strong and the solution represents a vegetation pattern, while in (b) a solution corresponding to a savanna state is visualised, which occurs due to weak {\color{changes}local} intraspecific competition. Note the different values of the precipitation parameter. A decrease in {\color{changes}local} intraspecific competition destabilises the coexistence state at lower rainfall volumes. The species difference parameter is $\chi=0.3$.}

\caption{\textbf{Strong {\color{changes}local} intraspecific competition of the coloniser species and weak {\color{changes}local} intraspecific competition of the locally superior species promote patterned coexistence.} Bifurcation diagrams under changing {\color{changes}local} intraspecific competition of one-species only are shown. Both strong {\color{changes}local} intraspecific competition among the coloniser species $u_1$ and weak {\color{changes}local} intraspecific competition among the locally superior species $u_2$ increase the size of the parameter region in which coexistence patterns exist. The insets in (a) and (b) (axes limits: $A\in [6.75,7.75]$, $\pm\|u_1\| \in [-0.1,0.1]$) show the onset of coexistence patterns close to $u_1=0$ to highlight the transition from onset at the spatially uniform coexistence equilibrium to onset at the single-species $u_2$ pattern as {\color{changes}local} intraspecific competition among $u_2$ decreases. The inset in (c) (axes limits: $A\in [3.2,3.5]$, $\pm\|u_1\| \in [7.1,7.3]$) shows a blow-up of the parameter region in which coexistence pattern exist. The pattern migration speed is $c=0.25$. In (a) and (b), $\|u_1\|$ is multiplied by $\operatorname{sign}(u_1)$ to visualise the occurrence of $u_1<0$. For an interpretation of colours and linestyles used in the visualisation, see the legend of Fig. \ref{fig: Multispecies pattern: bifurcation diag k}. }

\caption{\textbf{Plant dispersal influences spatial species distribution and enables coexistence of a spatially uniform fast disperser with a patterned slow disperser.} The spatial correlation between plant species is shown in (b) and some example solutions are displayed in {\color{changes}(a)}. Note that the spatial correlation peaks close to $D=1$ {\color{changes}but does not reach unity due to the plant species differing in other parameters. No other parameters have any qualitative impact on correlation. In particular, species correlation is unaffected by changes in the strengths of local intraspecific competition, which are set to $k_1=k_2=10$ for visualisation purposes. For $D>1$}, coexistence of the locally superior species (which also disperses faster) in a spatially uniform state with a patterned state of the superior coloniser (but slower disperser) is possible. The species difference is set to $\chi=0.3$ and the wavelength $L$ is fixed to $L=25$ in the numerical continuation with the uphill migration speed allowed to vary.}

106 — 2002.05950

\caption{Backtracking to spit out the hole \protect\includegraphics[scale=0.1]{hole.pdf} in reverse. The transitions of the atomic hole \protect\includegraphics[scale=0.1]{ah1.pdf} are first written in the reverse order, followed by those of \protect\includegraphics[scale=0.1]{ah2.pdf} in reverse, and then of \protect\includegraphics[scale=0.1]{ah3.pdf} in reverse. }

107 — 2002.06195

\caption{ Figure (a)(b)(c) show performances of \textcolor{red}{\textbf{Implicit (red)}} and $l_2$ regression \textcolor{black}{\textbf{L2 (black)}} objective as we increase the Gaussian noise variance. We show the testing error measured by RMSE on entire testing set (\textbf{solid line}), on high frequency region (i.e. $x \in [-2.5, 0.0)$, \textbf{dashed line}) and on low frequency region ($x \in [0.0, 2.5]$, \textbf{dotted line}). The results are averaged over $30$ random seeds. %Figure(d) shows the RMSE on testing set for $\eta \in \{-2.0, -1.0, -0.5, -0.01, 0.5, 1.0, 2.0\}$ under different discretization levels ($10, 50, 100, 200, 400$) when doing prediction. Discretization level $10$ means we search over $10$ evenly spaced values in $[-1.0, 1.0]$ to find $y^\ast$ given a testing point $x^\ast$. }

\caption{ The trained function $f_\theta(x=0, y)$. The \textcolor{blue}{\textbf{blue point}} is $(0, 0)$; and the other two \textcolor{black}{\textbf{black points}} are the predicted points by $\argmin_y f_\theta(0, y)^2 + (\frac{\partial f_\theta(0, y)}{\partial y} + 1)^2$. }

108 — 2002.06261

\caption{ An example of an \textit{AddOneSent} adversarial sample. This example was taken from \newcite{jia-liang-2017-adversarial}. In this case we can see that the model correctly answered the original question, but after the inclusion of the adversarial sentence (in \textcolor{blue}{\textit{italic blue}}), the model fails (answer in \textcolor{red}{red}). }

109 — 2002.06303

\caption{T-3 sample results (Rank 10). For each query (row) one or more faces of the probe returned the corresponding samples of gallery as top 10. Here, \textcolor{red}{x} (red) depicts false predictions, while true predictions displays the relationship type (in green): \textcolor{green}{P} for parent; \textcolor{green}{C} for child; \textcolor{green}{S} for sibling.}

110 — 2002.06388

\caption{\red{(Alternative plot of Fig.\,\ref{fig_fit})} Top: the best-fit model for the low flux state spectrum (red solid line). The red dashed line and the red dotted line show the absorbed disk reflection model and the power-law component respectively. Bottom: the data/model ratio plot for the low flux state spectrum.}

\caption{\red{Alternative of Fig.\,\ref{fig_low_flux}} Top left: the best-fit absorption model for the low flux state spectrum applied to a power law with $\Gamma=2$ to show the shape of the absorption lines. The normalization of the model is arbitrary in this figure. Bottom left: the best-fit model for the low flux state spectrum (black). The unfolded low flux state spectrum is shown in red as reference. The dashed grey line shows the best-fit continuum model after removing absorption. The grey shaded region shows the model difference. Right: $\chi^{2}$ as a function of the maximum disk radius of the absorption zone. The inner radius of the annular absorber is fixed at 2\,$r_{\rm g}$.}

111 — 2002.06406

\caption{Example of citation recommendation from the online evaluation. The table shows recommendations from our hybrid recommender as well as its component algorithms. % Note the following color meaning: Note: \textcolor{blue}{blue} = ground truth; \textcolor{red}{red} = other possibly valid predictions; unhighlighted = invalid or general recommendations related to the context. It becomes apparent that the actual ground truth papers are included at different ranks in the hybrid and the component algorithms. Other similar papers are recommended as well.}

112 — 2002.06429

\caption{Pose estimation outputs of the proposed framework for % each of the cases of \textcolor{cyan}{occluded} and \textcolor{blue}{completely visible} pedestrian samples appearing in CityPersons dataset.}

\caption{Qualitative results of detection and pose estimation on CityPersons dataset. The \textcolor{red}{ground truth} annotations are shown in red, the \textcolor{green}{detection results} are shown in green and the \textcolor{blue}{predicted pose} is shown in blue.}

113 — 2002.06659

\caption{Examples of landform combinations, where \protect\includegraphics[height=0.7em]{\fighome/sand.pdf} stands for sand, \protect\includegraphics[height=0.7em]{\fighome/marble.pdf} stands for marble and \protect\includegraphics[height=0.7em]{\fighome/ice.pdf} stands for ice. }

\caption{An example of \patnames in a $5\times 5$ slippery gridworld with no reward and slipping probability$=$0.4. The template at all \protect\includegraphics[height=0.7em]{\fighome/pat1.pdf}s is $\pattern_1=([0.8,0.2,0,\cdots,0],0)$, and the template at all \protect\includegraphics[height=0.7em]{\fighome/pat2.pdf}s is $\pattern_2=([0.6,0.2,0.2,0,\cdots,0], 0)$.}

114 — 2002.06684

\caption{Illustration of the three recurrent models described in Section 4. \ref{fig:r_a} is the recurrent actor (actors maintain state \textcolor{green}{$h^p$} over time), \ref{fig:r_c} is the recurrent critic (Q maintains \textcolor{yellow}{$h^q$} over time), and \ref{fig:r_ac} is the recurrent actor critic models used in the experiments. The top row shows the models during training, and the bottom row shows the models during execution. Actors communicate with each other and share information (\textcolor{blue}{$m$}). If they decide not to communicate or have no communication budget left, an empty message is sent.}

\caption{Simultaneous arrival task with $N=2$ agents. The agents (\textcolor{blue}{blue}, \textcolor{red}{red}) start at different distances from the goal (black), and their task is to arrive at the goal location simultaneously. A video can be found \href{https://sites.google.com/view/rmaddpg/home\#h.p_Iin8bLPKVOhT}{here}.}

\caption[ ]{\small Reward performance in observability experiments. Under fully observable settings (top row), both MADDPG (\textcolor{red}{red}) and recurrent variants (\textcolor{green}{green}, \textcolor{blue}{blue}, \textcolor{orange}{orange}) perform similarly. Under partially observable (bottom row) settings, the recurrent actor (\textcolor{orange}{orange}) and MADDPG (\textcolor{red}{red}) are unable to learn how to simultaneously arrive (d), and even how to move towards the goal (c). This demonstrates the importance of the recurrent critic in partially observable settings. For partial observability, the communication budget is set to 20 messages, shared between 2 agents over $\sim$ 100 timesteps per episode. }

115 — 2002.06744

\caption{Most unstable wavelength $\lambda_c$ of the fingering patterns versus (a) the initial gap thickness $h_0$ and (b) the radial velocity $v_r$. In (a), the solid line corresponds to a power-law exponent of $3/2$, the dashed line to $\lambda_c^*=1.76$~mm. In (b), the line corresponds to a power-law exponent of $-1/2$. Data obtained with a 8\% wt.~carbon black gel and two different plate radii $R=20$~mm ($\circ$) and $30$~mm (\textcolor{blue}{$\triangle$}). \label{fig4}}

116 — 2002.06749

\caption{Additional \aastex\symbols}

\caption{ From top to bottom, multi-epoch H$\alpha$ images (left and middle panels) of T\,Aur, V476\,Cyg, DQ\,Her, V533\,Her, and FH\,Ser (see Table~\ref{tab:obs} for details) and \emph{RGB} composite colour pictures (right panel) combining NOT ALFOSC images in the broadband $g'$ SDSS $\lambda$4800 (blue) and narrowband H$\alpha$ $\lambda$6563 (green) and [N~{\sc ii}] $\lambda$6583 (red) filters, but for V476\,Cyg, whose colour picture was obtained using an$r'$ SDSS $\lambda$6180 filter for the red colour. \label{fig:fig1}}

117 — 2002.07032

\caption{Adopted random generation rule for the parameters $\boldsymbol{\eta}^{1sh}_l$ and $\boldsymbol{\eta}^{1ax}_l$ tuning the frequency and the magnitude of the applied sinusoidal load components in the $x$ \protect\subref{tab:load_x_param} and $z$ \protect\subref{tab:load_z_param} directions respectively. Here, we indicate with \texttt{randn}($0,\sigma$) the sampling from a Gaussian probability distribution $\mathcal{N}\left(0,\sigma^2\right)$, where $\sigma^2$ is its variance, and with \texttt{takerand}$\left(\left[\boldsymbol{v}\right]\right)$ the uniform sampling from the discrete set of values $\left[ \boldsymbol{v} \right]$. \label{tab:load_composition}} \end{table} The two sets of values adopted for the generation rule of $f^{sh}$ and $f^{ax}$ are chosen on the basis of the structural frequencies that could be excited both in the horizontal and vertical directions. At the same time, thanks to the adopted sampling rule, $f^{sh}$ and $f^{ax}$ may exceed these frequency ranges, producing instances in which the shear frequencies and/or the axial frequencies of the structure are not excited. Regarding the generation rule of the scaling parameter $\gamma^{sh}$, its dependency on the dofs of the structure through the factor $\gamma^{dof}$ has been introduced in Table \ref{tab:load_composition} in order to mimic the load distribution usually considered in a preliminary design process, when the shear behaviour of a regular building is evaluated. Keeping in mind that our principal interest here is to assess the prediction capacities of the NN architecture, this choice has enabled us to obtain displacement time series similar to the ones expected during the monitoring of the structure, although adopting a very simple generation rule for the applied lateral loads. Some examples of the time evolutions of the generated loads, applied to the first floor of the structure (hence of $l_1^{sh}$ and $l_1^{ax}$), are shown in Fig.~\ref{fig:load_cases}. \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} %\begin{framed} \centering \subfloat[$f_{1,2}^{sh}=\left(21.1, 69.2 \right)\gamma_{1,2}^{sh}=\left(-0.058,-0.199\right)$][\label{fig:load_flex_43}$f_{1,2}^{sh}=\left(21.1, 69.2 \right)$ \\ $\gamma_{1,2}^{sh}=\left(-0.058,-0.199\right)$]{\includegraphics[scale=0.3]{load_flex_43.pdf}} $~$ \subfloat[$f_{1,2}^{ax}=\left(32.8, 28.2 \right) \gamma_{1,2}^{ax}=\left(1.38,1.38\right)$][\label{fig:load_axial_43}$f_{1,2}^{ax}=\left(32.8, 28.2 \right)$ \\ $\gamma_{1,2}^{ax}=\left(1.38,1.38\right)$]{\includegraphics[scale=0.3]{load_axial_43.pdf}}\\ ~\subfloat[$f_{1,2}^{sh}=\left(14.5, 2.36 \right)\gamma_{1,2}^{sh}=\left(0.025,-0.159\right)$][\label{fig:load_flex_44}$f_{1,2}^{sh}=\left(14.5, 2.36 \right)$ \\ $\gamma_{1,2}^{sh}=\left(0.025,-0.159\right)$]{\includegraphics[scale=0.3]{load_flex_44.pdf}} $~$ \subfloat[$f_{1,2}^{ax}=\left(15.5, 22.0 \right) \gamma_{1,2}^{ax}=\left(1.133,-1.140\right)$][\label{fig:load_axial_44}$f_{1,2}^{ax}=\left(15.5, 22.0 \right)$ \\ $\gamma_{1,2}^{ax}=\left(1.133,-1.140\right)$]{\includegraphics[scale=0.3]{load_axial_44.pdf}} %\parbox{12cm}{ \caption{Examples of time evolutions of the loads (case 1) applied to the first floor of the building in the $x$ (left column) and $z$ (right column) directions. For the sake of visualisation, the sketched time interval for the loads applied in the $x$ direction has been restricted to $I=\left[0,2.5\right]s$.\label{fig:load_cases}}%} %\end{framed} \end{figure} Through Eq.~\eqref{eq:add_noise}, we have added a measurement noise to mimic the output of a real monitoring system. For the sake of simplicity, the covariance matrix $\boldsymbol{\Sigma}_{\epsilon} \in \mathbb{R}^{16 \times 16}$ of such noise has been assumed to be diagonal, i.e. $\boldsymbol{\Sigma}_{\epsilon} = \sigma^2 \mathbb{I}$ where $\sigma^2$ is the variance of the measurement error $\epsilon$ in horizontal and vertical directions for each floor, and $\mathbb{I} \in \mathbb{R}^{16 \times 16}$ is the identity matrix. Two sources of randomness have been assumed for the noise, due to environmental effects and to the transmission of the electrical signal. Their effects are superimposed in the covariance matrix with diagonal entries respectively amounting to ${\sigma}_{env}^2$ and ${\sigma}_{el}^2$. The environmental noise has been assumed to induce vibrations of the same amplitude and/or to affect in the same way the converted electrical signals, independently of the building floor. Given that horizontal motions at the top of the buildings are in general greater than displacements at the lower levels, this assumption leads small amplitude signals to be more affected, in relative terms, by the environmental noise. This is reasonable if we assume that the localised disturbances that arise because of the surrounding environment have the same magnitude indipendently of the building levels. Regarding the electrical disturbance, the same noise level has been assumed both in directions $x$ and $z$, despite of the usually different technical specifications for sensors measuring displacements with different magnitude. This means that the electrical disturbances have the same effect, in statistical terms, on the measurement outcomes in horizontal direction ${u}^{sh}_i$ and in vertical direction ${u}^{ax}_i$. Fig.~\ref{fig:signal_flex} and Fig.~\ref{fig:signal_axial} respectively show examples of time evolutions of horizontal and vertical displacements, to highlight the effects of the above assumptions on the structural signals. These displacement components always refer to the undamaged case, and to the load conditions specified in the captions. According to what highlighted, it is noted that the displacements of the $8$-th story are less affected by noise than the ones of the $1$-st story. Due to the random generation of the applied load, different structural frequencies are excited in each simulation. To provide different scenarios also in terms of sensor accuracy (see also \cite{art:Giovanni_Capellari_2}) two levels of Signal to Noise Ratio (SNR) of $15$ dB and $10$ dB have been adopted. The SNR is a summary indicator, referring to the overall level of noise corruption for the displacements in one direction. Still referring to Fig.~\ref{fig:signal_flex} and Fig.~\ref{fig:signal_axial}, differences in terms of corruption levels between the two sensor accuracy scenarios can be appreciated. \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} \centering \vspace{-0.5 cm} \subfloat[{1-st floor\label{fig:signal_43_0_1_flex}}]{\includegraphics[scale=0.325]{43_0_1_flex.pdf}} $~$ \subfloat[{1-st floor\label{fig:signal_43_0_1_flex_enlarg}}]{\includegraphics[scale=0.325]{43_0_1_flex_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{4-th floor\label{fig:signal_43_0_4_flex}}]{\includegraphics[scale=0.325]{43_0_4_flex.pdf}} $~$ \subfloat[{4-th floor\label{fig:signal_43_0_4_flex_enlarg}}]{\includegraphics[scale=0.325]{43_0_4_flex_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{8-th floor\label{fig:signal_43_0_8_flex}}]{\includegraphics[scale=0.325]{43_0_8_flex.pdf}} $~$ \subfloat[{8-th floor\label{fig:signal_43_0_8_flex_enlarg}}]{\includegraphics[scale=0.325]{43_0_8_flex_enlarg.pdf}} \\ \vspace{-0.1 cm} \subfloat[{1-st floor\label{fig:signal_44_0_1_flex}}]{\includegraphics[scale=0.325]{44_0_1_flex.pdf}} $~$ \subfloat[{1-st floor\label{fig:44_0_1_flex_enlarg}}]{\includegraphics[scale=0.325]{44_0_1_flex_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{4-th floor\label{fig:signal_44_0_4_flex}}]{\includegraphics[scale=0.325]{44_0_4_flex.pdf}} $~$ \subfloat[{4-th floor\label{fig:signal_44_0_4_flex_enlarg}}]{\includegraphics[scale=0.325]{44_0_4_flex_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{8-th floor\label{fig:signal_44_0_8_flex}}]{\includegraphics[scale=0.35]{44_0_8_flex.pdf}} $~$ \subfloat[{8-th floor\label{fig:signal_44_0_8_flex_enlarg}}]{\includegraphics[scale=0.35]{44_0_8_flex_enlarg.pdf}} \caption{Example of time evolutions of $x$ displacements for stories $1, 4, 8$ with SNR$=15$ dB (from \ref{fig:signal_43_0_1_flex} to \ref{fig:signal_43_0_8_flex_enlarg}) and SNR$=10$ dB (from \ref{fig:signal_44_0_1_flex} to \ref{fig:signal_44_0_8_flex_enlarg}), undamaged state. Low-noise case: $f_{1,2}^{sh}=\left(21.1, 69.2 \right)$, $\gamma_{1,2}^{sh}=\left(-0.058,-0.199\right)$. High-noise case: $f_{1,2}^{sh}=\left(14.5, 2.36 \right)$, $\gamma_{1,2}^{sh}=\left(0.025,-0.159\right)$. Orange lines represent $\boldsymbol{u}$, whereas black lines stand for $\boldsymbol{r}$, according to Eq.~\eqref{eq:add_noise}. On the right side, a closer view for each left side plot is reported.\label{fig:signal_flex}} \end{figure} \newpage \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} \centering \vspace{-0.25 cm} \subfloat[{1-st floor\label{fig:signal_43_0_1_axial}}]{\includegraphics[scale=0.325]{43_0_1_axial.pdf}} $~$ \subfloat[{1-st floor\label{fig:signal_43_0_1_axial_enlarg}}]{\includegraphics[scale=0.325]{43_0_1_axial_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{4-th floor\label{fig:signal_43_0_4_axial}}]{\includegraphics[scale=0.325]{43_0_4_axial.pdf}} $~$ \subfloat[{4-th floor\label{fig:signal_43_0_4_axial_enlarg}}]{\includegraphics[scale=0.325]{43_0_4_axial_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{8-th floor\label{fig:signal_43_0_8_axial}}]{\includegraphics[scale=0.325]{43_0_8_axial.pdf}} $~$ \subfloat[{8-th floor\label{fig:signal_43_0_8_axial_enlarg}}]{\includegraphics[scale=0.325]{43_0_8_axial_enlarg.pdf}} \\ \vspace{-0.1 cm} \subfloat[{1-st floor\label{fig:signal_44_0_1_axial}}]{\includegraphics[scale=0.325]{44_0_1_axial.pdf}} $~$ \subfloat[{1-st floor\label{fig:44_0_1_axial_enlarg}}]{\includegraphics[scale=0.325]{44_0_1_axial_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{4-th floor\label{fig:signal_44_0_4_axial}}]{\includegraphics[scale=0.325]{44_0_4_axial.pdf}} $~$ \subfloat[{4-th floor\label{fig:signal_44_0_4_axial_enlarg}}]{\includegraphics[scale=0.325]{44_0_4_axial_enlarg.pdf}} \\ \vspace{-0.25 cm} \subfloat[{8-th floor\label{fig:signal_44_0_8_axial}}]{\includegraphics[scale=0.325]{44_0_8_axial.pdf}} $~$ \subfloat[{8-th floor\label{fig:signal_44_0_8_axial_enlarg}}]{\includegraphics[scale=0.325]{44_0_8_axial_enlarg.pdf}} \caption{Example of time evolutions of $z$ displacements for stories $1, 4, 8$ with SNR$=15$ dB (from \ref{fig:signal_43_0_1_axial} to \ref{fig:signal_43_0_8_axial_enlarg}) and SNR$=10$ dB (from \ref{fig:signal_44_0_1_axial} to \ref{fig:signal_44_0_8_axial_enlarg}), undamaged state. Low-noise case: $f_{1,2}^{ax}=\left(32.8, 28.2 \right)$, $\gamma_{1,2}^{ax}=\left(1.38,1.38\right)$. High-noise case: $f_{1,2}^{ax}=\left(15.5, 22.0 \right)$, $\gamma_{1,2}^{ax}=\left(1.133,-1.140\right)$. Orange lines represent $\boldsymbol{u}$, whereas black lines stand for $\boldsymbol{r}$, according to Eq.~\eqref{eq:add_noise}. On the right side, a closer view for each left side plot is reported.\label{fig:signal_axial}} \end{figure} \newpage \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} \centering \vspace{-0.1cm} \subfloat[{undamaged scenario\label{fig:signal_44_0_8_flex_enlarg_2}}]{\includegraphics[scale=0.325]{44_0_8_flex_enlarg.pdf}} \vspace{-0.15cm}\\ \subfloat[{damaged scenario 1\label{fig:comp_signal_44_1_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_1_8_flex_enlarg.pdf}} $~$ \subfloat[{damaged scenario 2\label{fig:comp_signal_44_2_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_2_8_flex_enlarg.pdf}} \vspace{-0.15cm}\\ \subfloat[{damaged scenario 3\label{fig:comp_signal_44_3_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_3_8_flex_enlarg.pdf}} $~$ \subfloat[{damaged scenario 4\label{fig:comp_44_4_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_4_8_flex_enlarg.pdf}}\vspace{-0.15cm} \\ \subfloat[{damaged scenario 5\label{fig:comp_signal_44_5_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_5_8_flex_enlarg.pdf}} $~$ \subfloat[{damaged scenario 6\label{fig:comp_signal_44_6_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_6_8_flex_enlarg.pdf}}\vspace{-0.15cm} \\ \subfloat[{damaged scenario 7\label{fig:comp_signal_44_7_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_7_8_flex_enlarg.pdf}} $~$ \subfloat[{damaged scenario 8\label{fig:comp_signal_44_8_8_flex_enlarg}}]{\includegraphics[scale=0.325]{comp_44_8_8_flex_enlarg.pdf}} \caption{Example of time evolutions of displacements in the $x$ direction of the $8$-th story for SNR$=10$ dB, with $f_{1,2}^{sh}=\left(14.5, 2.36 \right)$, $\gamma_{1,2}^{sh}=\left(0.025,-0.159\right)$, in the undamaged scenario (\ref{fig:signal_44_0_8_flex_enlarg_2}) and all possible damage scenarios (\ref{fig:comp_signal_44_1_8_flex_enlarg}-\ref{fig:comp_signal_44_8_8_flex_enlarg}). Orange lines represent $\boldsymbol{u}$, whereas black lines stand for $\boldsymbol{r}$, according to Eq.~\eqref{eq:add_noise}. To show the effects of damage on the structural dynamics, the black dotted lines in \ref{fig:comp_signal_44_1_8_flex_enlarg}-\ref{fig:comp_signal_44_8_8_flex_enlarg} report the noise-free structural dynamics related to the undamage scenario.\label{fig:damaged_signal_flex}} \end{figure} To build the dataset required for the NN training, the procedure described so far has been adopted for all the damage scenarios. Fig.~\ref{fig:damaged_signal_flex} and Fig.~\ref{fig:damaged_signal_axial} respectively show the effects of damage on ${u}^{sh}_8$ and ${u}^{ax}_8$, highlighting the sensitivity of this output to the handled damage state. To better highlight this sensitivity, the time evolutions in Fig.~\ref{fig:damaged_signal_flex} and Fig.~\ref{fig:damaged_signal_axial} are provided for $I= \left[0,2.5\right]$s and $I= \left[0,0.25\right]$s only, even though $I= \left[0,10\right]$s and $I=\left[0,1\right]$s for the NN training. Drifts from the responses relevant to the undamaged case can be observed when the damage scenarios refer to the stiffness reduction of the lowest stories; however, it looks nearly impossible, in general, to perform any classification of the damage scenarios without any effectively trained classifier. \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} \centering \vspace{-0.185cm} \subfloat[{undamaged scenario\label{fig:signal_44_0_8_axial_enlarg_2}}]{\includegraphics[scale=0.32]{44_0_8_axial_enlarg.pdf}} \vspace{-0.185cm} \\ \subfloat[{damaged scenario 1\label{fig:comp_signal_44_1_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_1_8_axial_enlarg.pdf}} $~$ \subfloat[{damaged scenario 2\label{fig:comp_signal_44_1_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_2_8_axial_enlarg.pdf}} \vspace{-0.185cm} \\ \subfloat[{damaged scenario 3\label{fig:comp_signal_44_2_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_3_8_axial_enlarg.pdf}} $~$ \subfloat[{damaged scenario 4\label{fig:comp_44_0_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_4_8_axial_enlarg.pdf}}\vspace{-0.185cm} \\ \subfloat[{damaged scenario 5\label{fig:comp_signal_44_1_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_5_8_axial_enlarg.pdf}} $~$ \subfloat[{damaged scenario 6\label{fig:comp_signal_44_1_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_6_8_axial_enlarg.pdf}} \vspace{-0.185cm} \\ \subfloat[{damaged scenario 7\label{fig:comp_signal_44_1_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_7_8_axial_enlarg.pdf}} $~$ \subfloat[{damaged scenario 8\label{fig:comp_signal_44_8_8_axial_enlarg}}]{\includegraphics[scale=0.32]{comp_44_8_8_axial_enlarg.pdf}} \vspace{-0.15cm} \caption{Examples of time evolutions of displacements in the $z$ direction of the $8$-th story for SNR$=10$ dB, with $f_{1,2}^{ax}=\left(15.5, 22.0 \right)$, $\gamma_{1,2}^{ax}=\left(1.133,-1.140\right)$, in the undamaged scenario (\ref{fig:signal_44_0_8_axial_enlarg_2}) and all possible damage scenarios (\ref{fig:comp_signal_44_1_8_axial_enlarg}-\ref{fig:comp_signal_44_8_8_axial_enlarg}). Orange lines represent $\boldsymbol{u}$, whereas black lines stand for $\boldsymbol{r}$, according to Eq.~\eqref{eq:add_noise}. To show the effects of damage on the structural dynamics, the black dotted lines in \ref{fig:comp_signal_44_1_8_axial_enlarg}-\ref{fig:comp_signal_44_8_8_axial_enlarg} report the noise-free structural dynamics related to the undamage scenario.\label{fig:damaged_signal_axial}} \vspace{-0.25cm} \end{figure} \subsubsection{Case 2 (white noise load case)} \label{sec:load_case_2} In the second load case we have accounted for random vibrations caused e.g. by low-energy seismicity \cite{art:AVT}. The applied loads $\boldsymbol{l} = [\boldsymbol{l}^{sh}, \boldsymbol{l}^{ax}]$, with $i=1,\ldots,8$, at each floor and each time instants are obtained by first sampling out the values from a normal distribution $\mathcal{N}\left(0, 10^4 \right)$ and then low-pass filtering them with a ``roll-off" set between frequencies $f_{min}$ and $f_{max}$. Two different scenarios have been considered for the frequency range of the applied excitations: $f_{min}=15$ and $f_{max}=17$ Hz; $f_{min}=5$ and $f_{max}=7$ Hz. In the first case all the shear modes and the first axial mode have been excited; in the second case, just the first three shear modes and no axial frequencies have been excited, see Tab.~\ref{tab:eigen}. Fig.~\ref{fig:vibrations_15_17} and Fig.~\ref{fig:vibrations_5_7} respectively provide an overview of the simulated forces for the two cases. \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} \centering \subfloat[{\label{fig:signal_15_17_time_flex}}]{\includegraphics[scale=0.3]{15_17_time_flex.pdf}} $~$ \subfloat[{\label{fig:signal_15_17_power_flex}}]{\includegraphics[scale=0.3]{15_17_power_flex.pdf}} \\ %\vspace{-0.35cm} \subfloat[{\label{fig:signal_15_17_time_axial}}]{\includegraphics[scale=0.3]{15_17_time_axial.pdf}} $~$ \subfloat[{\label{fig:15_17_power_axial}}]{\includegraphics[scale=0.3]{15_17_power_axial.pdf}} \\ \caption{White noise load case, $f_{min}=15$ and $f_{max}=17$ Hz. Time evolutions (left column) and Power Spectral Density (right column) of the forces applied to all the building stories in $x$ (first row) and $z$ direction (second row).\label{fig:vibrations_15_17}} \end{figure} \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} \centering \subfloat[{\label{fig:signal_5_7_time_flex}}]{\includegraphics[scale=0.3]{5_7_time_flex.pdf}} $~$ \subfloat[{\label{fig:signal_5_7_power_flex}}]{\includegraphics[scale=0.3]{5_7_power_flex.pdf}} \\ %\vspace{-0.35cm} \subfloat[{\label{fig:signal_5_7_time_axial}}]{\includegraphics[scale=0.3]{5_7_time_axial.pdf}} $~$ \subfloat[{\label{fig:5_7_power_axial}}]{\includegraphics[scale=0.3]{5_7_power_axial.pdf}} \\ \caption{White noise load case, $f_{min} = 5$ and $f_{min} = 7$ Hz. Time evolutions (left column) and Power Spectral Density (right column) of the forces enforced to all the building stories in $x$ (first row) and $z$ direction (second row)\label{fig:vibrations_5_7}.} \end{figure} \subsubsection{Dataset composition and NN training} \label{sec:dataset_composition} We now detail the construction of the employed datasets and the NN training phase. Each of the two classifiers has been trained on a different dataset, made by instances generated by evaluating the physics-based model for different loading and damage conditions. Each instance is made up by $N_0 = 16$ time series recordings of displacements (in two directions, for each of the $8$ floors) of length $L_0 = 667$. Two global datasets $\mathbb{D}^d$ and $\mathbb{D}^l$ made by $V=4608$ instances each have been generated, and then split onto a training, a validation and a testing set, thus yielding $\mathbb{D}^d = \mathbb{D}^d_{train} \cup \mathbb{D}^d_{val} \cup \mathbb{D}^d_{test}$ and $\mathbb{D}^l = \mathbb{D}^l_{train} \cup \mathbb{D}^l_{val} \cup \mathbb{D}^d_{test}$, with $V = V^{train} + V^{val} + V^{test}$ in both cases. %Both $\mathbb{D}^d$ and $\mathbb{D}^l$ are made by $V=4608$ instances, but we have not considered $\mathbb{D}_d \equiv \mathbb{D}_l$, according to each of them a different composition. As previously stated, numerically simulated signals will be used also in the online phase.} For the splitting of the dataset $\mathbb{D}^d$ into training $\mathbb{D}^d_{train}$, validation $\mathbb{D}^d_{val}$ and test $\mathbb{D}^d_{test}$ sets, no specific rules are available, and only some heuristics can be used -- see, e.g., \cite{book:Haykin}. We have thus employed $75\%$ of $V$ to train and validate the NN ($V^{train}$ and $V^{val}$), and the remaining $25\%$ ($V^{test}$) to test it. Within the first subset, $75\%$ of the instances have been in turn allocated for training, and the remaining $25\%$ for validation. The final dataset subdivision then reads: $V^{train}=56.25\% V$, $V^{val}=18.75\% V$, and $V^{test}=25\%V$. The splitting of $\mathbb{D}^l$ has been done identically. The large number of instances employed for validation and test has allowed us to perform a robust assessment of the NN generalization capabilities. This has been done without limiting the information content that can be employed for the NN training; in fact, the dataset dimensions can be arbitrarily enlarged, if necessary, through a synthetic generation of the new instances, still keeping the same subdivision. During the training, an equal number of instances $V_g^{train} = V^{train} / G$ related to each damage scenario $g=0,\ldots, 8$ (the undamaged case has been considered, too, in addition to the $ G=8$ possible cases of damage) have been provided to the NN, to avoid the construction of a biased dataset $\mathbb{D}^d_{train}$; the same has been done for $\mathbb{D}^l_{train}$. In this way, we indeed prevent the NN to be prone to return the class labels that have been more frequently presented in the training stage. There are no specific rules to set $V_g^{train}$ (and, therefore, the overall dimension $V_g = V/G$ of simulated cases for each damage scenario) a priori. Only few theoretical studies provide some recommendations for specific cases, see, e.g., \cite{art:dataset_size}; however, they are not applicable to FCNs. In general, the problem complexity and the employed NN architecture must be taken into account on a case-by-case basis. For this reason, we have evaluated the $\mathcal{G}_{d}$ and $\mathcal{G}_{l}$ classifiers accuracies $A_d$ and $A_{l}$ on the validation set $\mathbb{D}^d_{train}$ and $\mathbb{D}^l_{train}$, and the training time at varying $V^{train}_g$. We have then chosen the best dataset size according to a tradeoff between the two aforementioned indicators, and keeping in mind that the time required to generate a dataset and to train the NN both scale linearly with $V^{train}_g$. % The $\mathcal{G}_{d}$ classifier accuracy is defined as the ratio $A_d = {V_{\star}^{val}}/{V^{val}}$, where %\begin{equation} %A_d = \frac{V_{\star}^{val}}{V^{val}}~, %\label{eq:accuracy} %\end{equation} $V_{\star}^{val}$ is the number of instances of $\mathbb{D}^l_{val}$ which are correctly classified by $\mathcal{G}_{d}$; the $\mathcal{G}_{l}$ classifier accuracy $A_l$ is defined in a similar way. %%Before showing how this heuristic approach has been used in our work, let us refer to the number of instances $V$ setting the overall dataset size $\mathbb{D}^d= \mathbb{D}^d_{train} \bigcup \mathbb{D}^d_{train} \bigcup \mathbb{D}^d_{test}$ considered in the damage detection task (the same is done for damage localization task). %For the splitting of the dataset $\mathbb{D}_d$ into training $\mathbb{D}^d_{train}$, validation $\mathbb{D}^d_{val}$ and test $\mathbb{D}^d_{test}$ sets, no specific rules are available, and only some heuristics can be used -- see, e.g., \cite{book:Haykin}. We have thus employed $75\%$ of $V$ to train and validate the NN ($V^{train}$ and $V^{val}$), and the remaining $25\%$ ($V^{test}$) to test it. Within the first subset, $75\%$ of the instances have been in turn allocated for training, and the remaining $25\%$ for validation. The final dataset subdivision then reads: $V^{train}=56.25\% V$, $V^{val}=18.75\% V$, and $V^{test}=25\%V$. The splitting of $\mathbb{D}_l$ has been done identically. The large number of instances employed for validation and test has allowed us to perform a robust assessment of the NN generalization capabilities. This has been done without limiting the information content that can be employed for the NN training; in fact, the dataset dimensions can be arbitrarily enlarged, if necessary, through a synthetic generation of the new instances, still keeping the same subdivision. Let us now see how we have determined the overall dataset size $V$ by applying the heuristic approach previously discussed. In Fig.~\ref{fig:param_dataset}, the accuracy $A_{l}$ at varying values of $V_g$ is reported, by considering the local case 1. %To increase the dimension of the dataset, we have doubled $V_g$ (from $256$ to $512$, from $512$ to $1024$), and, at a later time, we checked the observed trend by evaluating intermediate cases ($V_g=384$ and $V_g=720$). \begin{figure}[h!] \centerline{ \includegraphics[scale=0.3]{param_dataset.pdf} } \caption{Damage localization, case 1. Dependence on $V_g$ of the accuracy $A_l$ of the classifier $\mathcal{G}_l$.\label{fig:param_dataset}} \vspace{-0.15cm} \end{figure} By increasing $V_g$ from $256$ to $384$, $A_{l}$ is highly affected, while a further increasing yields a smaller gain in accuracy. The non-monothonic variation of $A_{l}$ with respect to $V_g$ is due to the randomness of the procedure, and in particular to the initialization of the weights %$\boldsymbol{w}^{\left( i,n \right)}$ of the convolutional filters. For the above reasons, we have adopted $V_g = 512$ during the training phase. Treating the damage detection task for case 1, a total number of $V = 9216$ instances have been generated. Half of the instances refers to the undamaged conditions, half to damaged conditions. Each damage scenario is equally represented ($V_g = 512$ instances each). Regarding instead the damage localization task, $V = 4608$ and $V_g = 512$ (including the undamaged case $g=0$). Still adopting the discussed heuristic criterion for the determination of the overall dataset dimension, $V = 4096$ has been used for the damage detection task when the white noise load case is treated. Once again, half of the instances refers to the undamaged conditions, half to the damage condition. Each damage scenario is equally represented ($V_g = 128$ instances each). Regarding the damage localization task, $V = 4608$ and $V_g = 128$ (including the undamaged case $g=0$). \subsection{Classification outcomes} We now report the numerical results obtained for the two load cases, and for the two required tasks of damage detection and damage localization. The obtained classification outcomes are affected by the NN architecture, either with one or two convolutional branches, depending on whether the horizontal and vertical sensing are both considered or not. In particular, when treating the damage localization task in presence of the white noise load condition, we will also try to assess the impact of each input channel $\mathcal{F}^{n}_0$ on the overall NN accuracy. Useful indications about the goodness of the training can be derived from the behavior of the loss functions $J_d\left(\boldsymbol{Y},\boldsymbol{p}\right)$ and $J_l\left(\boldsymbol{Y},\boldsymbol{p}\right)$ -- see Eq.~\eqref{eq:cross_entropy} -- of $\mathcal{G}_d$ and $\mathcal{G}_l$, and of the accuracies $A_d$ and $A_l$ on the training and validation sets ($\mathbb{D}^t_{train}$ and $\mathbb{D}^t_{val}$ for $\mathcal{G}_d$; $\mathbb{D}_{train}$ and $\mathbb{D}_{val}$ for $\mathcal{G}_l$) as a function of the number of iterations. This latter depends on both the number of epochs and the minibatch size chosen for the training\footnote{\footnotesize In other words, if the dataset is composed by $100$ instances and a minibatch size of $10$ instances is adopted, after the first epoch the iteration number is equal to $10$.}. To evaluate the NN performances, the adopted indices are still $A_d$ and $A_l$, yet evaluated on $\mathbb{D}_{test}^d$ and $\mathbb{D}^l_{test}$. These indices are always compared against the ones produced by a random guess, equal to $0.5$ for $\mathcal{G}_d$, and to $1/9=0.111$ for $\mathcal{G}_{l}$. For the damage localization case, the misclassification is measured by a confusion matrix in which the rows correspond to the target classes and the columns to the NN predictions. \subsubsection{Damage detection and localization in case 1 - sinusoidal load case} \label{sec:dam_det_sin_load} In Tab.~\ref{tab:dam_det_sin_load_tab} the accuracies $A_d$ of $\mathcal{G}_d$ on $\mathbb{D}^d_{test}$ for the two considered noise levels (SNR$=15$ dB and SNR$=10$ dB) are reported. NN architectures with both one and two convolutional branches have been tested. \begin{table}[h!] \centering \begin{tabular}{*3c} \hline \parbox{2 cm}{SNR (dB)} & \parbox{3cm}{\centering $\lbrace\mathcal{F}_{*} \rbrace$} & $A_{d}$ \\ \hline $15$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.814$ \\ $15$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.850$ \\ $15$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.879$ \\ $10$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.768$ \\ $10$ & $\{\boldsymbol{u}^{ax}_i \}_{i=1}^8$ & $0.775$ \\ $10$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.765$ \\ \bottomrule \end{tabular} \caption{Damage detection, case 1. Accuracy $A_{d}$ of the classifier $\mathcal{G}_d$ evaluated on $\mathbb{D}^d_{test}$. \label{tab:dam_det_sin_load_tab}} \end{table} The classifier $\mathcal{G}_d$ reaches $A_d = 0.879$ for SNR$=15$ dB and $A_d = 0.775$ for SNR$=10$ dB. These outcomes obtained on high-noise datasets show the potentialities of the proposed approach in view of facing real engineering applications. Indeed, noise effect is a principal concern especially when pervasive and low-cost microelectromechanical systems (MEMS) sensor networks are employed \cite{art:low-cost_MEMS}, so that the possibility to handle it through FCNs may enhance the application of MEMS networks. Moreover, thanks to our procedure, we have been able to avoid the data pre-processing required by any ML approach based on problem specific features. Fig.~\ref{fig:sin_det} reports the evolution of the training and validation loss for the dataset with SNR$=15$ dB and SNR$=10$ dB. The iteration number accounts for the number of times the NN weights %$\boldsymbol{w}^{\left(i,j \right)}$ and $\boldsymbol{\theta}_g$ are modified during the training process. The depicted training and validation loss functions refer to the case in which a two branches convolutional architecture has been employed to detect damage. The several spikes observed both in the loss and accuracy graphs are due to the stochastic nature of the training algorithm. During the early stages of the training, the NN displays the most significative gains in terms of classification accuracy, while further increasing the number of iterations only yields a limited effect on the generalization capabilities of the NN. Due to the lack of improvements, the early-stopping criterion has finally stopped the training. \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} %\begin{framed} \centering \subfloat[SNR=15 dB\label{fig:loss_sin_det_15_dB_34_1}]{\includegraphics[scale=0.5]{loss_sin_det_15_dB_34_1.pdf}} \subfloat[SNR=15 dB\label{fig:acc_sin_det_15_dB_34_1}]{\includegraphics[scale=0.5]{acc_sin_det_15_dB_34_1.pdf}} \vspace{-0.2cm} \\ \subfloat[SNR=10 dB\label{fig:loss_sin_det_10_dB_33_1}]{\includegraphics[scale=0.5]{loss_sin_det_10_dB_33_1.pdf}} \subfloat[SNR=10 dB\label{fig:acc_sin_det_10_dB_33_1}]{\includegraphics[scale=0.5]{acc_sin_det_10_dB_33_1.pdf}} \vspace{-0.1cm} \caption{Damage detection, case 1. Training and validation of the two branches convolutional architecture: evolution of the loss $J_d \left(\boldsymbol{Y}, \boldsymbol{p} \right)$ on $\mathbb{D}^d_{train}$ and $\mathbb{D}^d_{val}$ (left column), and of $\mathcal{G}_d$ accuracy $A_d$ (right column) on $\mathbb{D}^d_{train}$ and on $\mathbb{D}^d_{val}$, both for the SNR$=15$ dB case (top row) and for the SNR$=10$ dB case (bottom row). \label{fig:sin_det}} \vspace{-0.1cm} %\end{framed} \end{figure} Moving to the damage localization task, Tab.~\ref{tab:dam_id_sin_load_tab} collects the results related to the outcomes of $\mathcal{G}_l$ on $\mathbb{D}^l_{test}$ obtained for two different noise levels. \begin{table}[h!] \centering \begin{tabular}{*3c} \hline \parbox{2 cm}{SNR (dB)} & \parbox{3cm}{\centering $\lbrace\mathcal{F}_{*} \rbrace$} & $A_{l}$ \\ \hline $15$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.768$ \\ $15$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.769$ \\ $15$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.812$ \\ $10$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.654$ \\ $10$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8 $ & $0.642$ \\ $10$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.707$ \\ \bottomrule \end{tabular} \caption{Damage localization, case 1. Accuracy $\mathbb{A}_{l}$ of the classifier $\mathcal{G}_l$ evaluated on $\mathbb{D}^l_{test}$. \label{tab:dam_id_sin_load_tab}} \vspace{-0.25cm} \end{table} The results show that the NN performances benefit from the employment of a two branches architectures: $A_{l}$ increases, compared to the best outcome of the single convolutional layer architecture, from $0.769$ to $0.812$ for the SNR$=15$ dB case, and from $0.654$ to $0.707$ for the SNR$=10$ dB case. This means that the NN has succeeded in performing a data fusion of the extracted information for the sake of classification. The values of $A_{d}$ and $A_{l}$ are quite close, despite of the greater complexity of the damage localization problem; this might be due to the intrinsic capability of the FCN to detect correlations between different sensor recordings, allowing us to perform a correct damage localization. Fig.~\ref{fig:sin_id} reports the evolution of the training and validation loss functions on $\mathbb{D}^l_{val}$ and $\mathbb{D}^l_{test}$ for the datasets with SNR$=15$ dB and SNR$=10$ dB, in the case where a two branches convolutional architecture has been employed. Compared with Fig.~\ref{fig:sin_det}, a smaller difference in terms of loss and accuracy can be highlighted. This is due to the greater complexity of the damage localization task, that requires to exploit the computational resources of the NN entirely. Indeed, the same number of filters $N_1$, $N_2$ and $N_3$ has been used for both the classification tasks, despite of their different complexity. On the other hand, we expect that $A_d$ on $\mathbb{D}^d_{test}$, reported in Tab.~\ref{tab:dam_det_sin_load_tab}, would not be affected by reducing the number of filters. This conclusion can be reached by looking at Fig.~\ref{fig:sin_det} and observing that, during the last stages of the training, $A_d$ on $\mathbb{D}^d_{train}$ is shown to be always greater than the one obtained on $\mathbb{D}^d_{val}$. \begin{figure}[h!] \captionsetup[subfigure]{justification=centering}\centering \subfloat[SNR=15 dB\label{fig:loss_sin_id_15_dB_49_1}]{\includegraphics[scale=0.5]{loss_sin_id_15_dB_49_1.pdf}} \subfloat[SNR=15 dB\label{fig:acc_sin_id_15_dB_49_1}]{\includegraphics[scale=0.5]{acc_sin_id_15_dB_49_1.pdf}} \vspace{-0.25cm} \\ \subfloat[SNR=10 dB\label{fig:loss_sin_id_10_dB_50_1}]{\includegraphics[scale=0.5]{loss_sin_id_10_dB_50_1.pdf}} \subfloat[SNR=10 dB\label{fig:acc_sin_id_10_dB_50_1}]{\includegraphics[scale=0.5]{acc_sin_id_10_dB_50_1.pdf}} \vspace{-0.1cm} \caption{Damage localization, case 1. Training and validation of the two branches convolutional architecture: evolution of the loss $J_l \left(\boldsymbol{Y}, \boldsymbol{p} \right)$ on $\mathbb{D}^l_{train}$ and on $\mathbb{D}^l_{val}$ (left column), and of $\mathcal{G}_l$ accuracy $A_l$ (right column) on $\mathbb{D}^l_{train}$ and on $\mathbb{D}^l_{val}$, both for the SNR$=15$ dB case (top row) and for the SNR$=10$ dB case (bottom row).\label{fig:sin_id}}\vspace{-0.25cm} \end{figure} In Fig.~\ref{fig:conf_sin_id} the confusion matrices related to the two datasets (SNR$=15$ dB and SNR$=10$ dB) are reported. \begin{figure}[t!] \vspace{-0.25cm} \captionsetup[subfigure]{justification=centering} \centering \subfloat[SNR = 15 dB\label{fig:conf_sin_id_15_dB_49_1}]{\includegraphics[scale=0.4]{conf_sin_id_15_dB_49_1.pdf}} \subfloat[SNR = 10 dB\label{fig:conf_sin_id_10_dB_50_1}]{\includegraphics[scale=0.4]{conf_sin_id_10_dB_50_1.pdf}} \vspace{-0.2cm} \caption{ Damage localization, case 1. Confusion matrices, case 1, $15$ dB (left picture) and $10$ dB (right picture) SNR datasets.\label{fig:conf_sin_id}} \vspace{-0.1cm} \end{figure} Most of the errors concern the classification of the damage scenarios in which the inter-story stiffness of the highest floors has been reduced, as shown by the entries of the $7$-th and $8$-th rows and columns of the matrices. This outcome is not surprising if we consider that these damage scenarios only induce small variations in the shear frequencies. Moreover, by looking at Figs.~\ref{fig:damaged_signal_flex} and \ref{fig:damaged_signal_axial}, we can remark that the time evolution of the structural motions under these damage scenarios cannot be easily distinguished from the undamaged case. \subsubsection{Damage detection and localization in case 2} We now consider the outcomes of the trained classifiers in the case where a random disturbance is applied to the structural system. Regarding the damage detection task, with this type of excitation the NN is able to distinguish between undamaged and damaged instances almost perfectly (see Tab.~\ref{tab:dam_det_noise_load_tab}). Indeed, $A_d=0.999$ and $A_d=0.998$ have been reached by the two convolutional branches architecture when $f_{min}=15$ and $f_{max}=17$ Hz, or $f_{min}=5$ and $f_{max}=7$ Hz, have been selected as frequency ranges for the applied lateral and vertical forces. \begin{table}[h!] \centering \begin{tabular}{*3c} \hline \parbox{2cm}{\centering $f_{min}$ - $f_{max}$ (Hz)} & \parbox{3cm}{\centering $\lbrace\mathcal{F}_{*} \rbrace$} & $A^d$ \\ \hline $15-17$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.998$ \\ $15-17$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.997$ \\ $15-17$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.999$ \\ $5-7$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.996$ \\ $5-7$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.892$ \\ $5-7$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.998$ \\ \bottomrule \end{tabular} \caption{Damage detection, case 2. Accuracy $\mathbb{A}_{d}$ of the classifier $\mathcal{G}_d$ evaluated on $\mathbb{D}^d_{test}$. \label{tab:dam_det_noise_load_tab}} \end{table} We next consider the NN outcomes for the damage localization task. With this type of excitation, the NN is able to accomplish an extremely accurate classification of the damaged scenarios, reaching $A_{l}=0.986$ and $A_{l}=0.993$ when $f_{min}=15$ and $f_{max}=17$ Hz or $f_{min}=5$ and $f_{max}=7$ Hz have been used, respectively. In the former case, the best classification performances have been obtained by the two convolutional branches architecture, as shown in Tab.~\ref{tab:dam_id_noise_load_tab}. For the latter case, the NN employing as input $\mathcal{F}_{*} = \{{u}^{sh}_i\}_{i=1}^8$ provides the best classification result. The better performances of the NN employing $\mathcal{F}_{*} = \{{u}^{sh}_i\}_{i=1}^8$ rather than $\mathcal{F}_{*} = \{{u}^{ax}_i\}_{i=1}^8$ is likely due to the fact in this latter case no axial frequencies have been excited by the applied load, as remarked in \nameref{sec:load_case_2}. However, this fact also shows that the data fusion operated by the two convolutional branches architecture has been only partially able to select the most important information required for the damage localization task. Nevertheless, very good results have been reached by also employing $\mathcal{F}_{*} = \{{u}^{ax}_i\}_{i=1}^8$ (see Tab.~\ref{tab:dam_id_noise_load_tab}). \begin{figure}[h!] \captionsetup[subfigure]{justification=centering} \centering \vspace{-0.2 cm} \subfloat[$N_0=1$; $i=8$\label{fig:conf_noise_id_5_7_42_5_flex_1_ch}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_5_flex_1_ch.pdf}} \subfloat[$N_0=2$; $i=$($4,8$)\label{fig:conf_noise_id_5_7_42_6_flex_2_ch}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_6_flex_2_ch.pdf}} \\ \vspace{-0.4cm} \subfloat[$N_0=3$; $i=$($2,4,8$)\label{fig:conf_noise_id_5_7_42_7_flex_3_ch}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_7_flex_3_ch.pdf}} \subfloat[$N_0=4$; $i=$($2,4,6,8$)\label{fig:conf_noise_id_5_7_42_4_flex_4_ch}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_4_flex_4_ch.pdf}} \\ \vspace{-0.4cm} \subfloat[$N_0=5$; $i=$($2,4,6,7,8$)\label{fig:conf_noise_id_5_7_42_8_flex_5_ch}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_8_flex_5_ch.pdf}} \subfloat[$N_0=6$; $i=$($2,4,5,6,7,8$)\label{fig:conf_noise_id_5_7_42_9_flex_6_ch}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_9_flex_6_ch.pdf}} \\ \vspace{-0.4cm} \subfloat[$N_0=7$; $i=$($2,3,4,5,6,7,8$)\label{fig:conf_noise_id_5_7_42_10_flex_7_ch}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_10_flex_7_ch.pdf}} \subfloat[$N_0=8$; $i=$($1,2,3,4,5,6,7,8$)\label{fig:conf_noise_id_5_7_42_3_flex}]{\includegraphics[scale=0.4]{conf_noise_id_5_7_42_3_flex.pdf}} \\ \caption{$\mathcal{G}_{l}$ confusion matrices for different number $N_0$ of input channels $\mathcal{F}^{i}_{*}$. Case 2, $f_{min}=5$ and $f_{max}=7$ Hz.\label{fig:conf_noise_id}} \end{figure} \begin{table}[h!] \centering \vspace{-0.3cm} \begin{tabular}{*3c} \hline \parbox{2cm}{\centering $f_{min}$ - $f_{max}$ (Hz)} & \parbox{3cm}{\centering $\lbrace\mathcal{F}_{*} \rbrace$} & $A^{l}$ \\ \hline $15-17$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.972$ \\ $15-17$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.972$ \\ $15-17$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.986$ \\ $5-7$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.993$ \\ $5-7$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.892$ \\ $5-7$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.972$ \\ \bottomrule \end{tabular} \vspace{-0.1cm} \caption{Damage localization, case 2. Accuracy $\mathbb{A}_{l}$ of the classifier $\mathcal{G}_l$ evaluated on $\mathbb{D}^l_{test}$. \label{tab:dam_id_noise_load_tab}} \end{table} We highlight the effect of each incoming signal %$\mathcal{F}^{i}_0$ on the classification outcomes (see Tab.~\ref{tab:dam_id_noise_load_ch_tab}), since the accuracy $A^{l}$ on $\mathbb{D}^l_{test}$ changes for different numbers of input signals $N_0$. The results refer to the case in which only some of the displacements ${u}^{sh}_i$, $i=1,\ldots,8$ have been considered, and $f_{min}=5$ and $f_{max}=7$ Hz. The corresponding confusion matrices are sketched in Fig.~\ref{fig:conf_noise_id}, showing that the classification error related to a damage scenario $g$ is reduced when the corresponding ${u}^{sh}_g$, that is the signal acquired on the floor whose inter-story stiffness has been reduced, is used as input for the NN. \begin{table}[h!] \centering \begin{tabular}{*3c} \hline \parbox{1cm}{\centering $N_0$} & \parbox{3cm}{\centering $\lbrace\mathcal{F}_{*} \rbrace$} & $A_l$ \\ \hline $1$ & $i=8$ & $0.226$ \\ $2$ & $i=$($4,8$) & $0.722$ \\ $3$ & $i=$($2,4,8$) & $0.774$ \\ $4$ & $i=$($2,4,6,8$) & $0.906$ \\ $5$ & $i=$($2,4,6,7,8$) & $0.865$ \\ $6$ & $i=$($2,4,5,6,7,8$) & $0.937$ \\ $7$ & $i=$($2,3,4,5,6,7,8$) & $0.899$ \\ $8$ & $i=$($1,2,3,4,5,6,7,8$) & $0.993$ \\ \bottomrule \end{tabular} \vspace{-0.1cm} \caption{Damage localization, case 2. Accuracy $\mathbb{A}_{l}$ of the classifier $\mathcal{G}_l$ evaluated on $\mathbb{D}^l_{test}$. Different numbers $N_0$ of input channels $\mathcal{F}_{*}$, related to ${u}^{sh}_i$, are employed. Here, $f_{min}=5$ and $f_{min}=7$ Hz.\label{tab:dam_id_noise_load_ch_tab}} \vspace{-0.1cm} \end{table} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \section{Conclusions} \label{sec:conclusions} In this paper we have investigated a new strategy for real-time structural health monitoring, treating damage detection and localization as classification tasks \cite{art:Farrar_1}, and framing the proposed procedure in the family of SBC approaches \cite{art:Taddei_Patera}. For the first time in this field, we have proposed to employ fully convolutional networks to analyse time series coming from a set of sensors. Fully convolutional networks architectures differing for the number of convolutional branches have been exploited to deal with datasets including time signals of different length and sampling rate. Convolutional layers have been shown to enable the automatic extraction of features to be used for the classification task at hand. The neural network architecture has been trained in a supervised manner on data generated through the numerical solution of a physics-based model of the monitored structure under different damage scenarios. In the considered numerical benchmarks, we have obtained extremely good performances concerning both damage detection and damage localization, even in presence of noise, when the applied loads can be characterized either $(i)$ in terms of a few (a priori, random) frequencies, or $(ii)$ by a higher number of frequencies, within a given range. Especially in the second case, the outcomes of the NN classifier have shown the potentialities of the proposed procedure in view of the application to real-life cases. In future works we aim to employ the proposed architecture to deal with data coming from real monitoring systems, tackling the main limit of the proposed procedure concerning the adherence of the simulated dataset to the real structural response. This is a well-known problem in the machine learning community \cite{art:domain_adaptation}. By coupling recurrent layers branches to the proposed convolutional ones, we expect to further increase the NN performances. As further steps, we will try to exploit model order reduction techniques for the dataset construction, extending the proposed methodology to more complex structural configurations and damage scenarios, and to design the set of sensors according to a Bayesian optimization technique \cite{art:Capellari_1,proc:Capellari_1,art:Capellari_3}. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% %% %% Backmatter begins here %% %% %% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\section*{Competing interests} % The authors declare that they have no competing interests. % %\section*{Author Contributions} % The authors contributed equally to this work. % %\section*{Availability of data and materials} %Both the numerical benchmark and the neural network architecture have been exhaustively described. The reader can verify the performance of the proposed method by running analogous numerical experiments. \section*{Acknowledgments} The authors thank Andrea Opreni (Politecnico di Milano) for fruitful discussions about DL architectures. LR, SM and AC gratefully acknowledge the financial support from MIUR Project PRIN 15-2015LYYXA 8 ``Multi-scale mechanical models for the design and optimization of microstructured smart materials and metamaterials". %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %\bibliographystyle{vancouver} \bibliography{biblio} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document} }

\caption{Damage detection, case 1. Training and validation of the two branches convolutional architecture: evolution of the loss $J_d \left(\boldsymbol{Y}, \boldsymbol{p} \right)$ on $\mathbb{D}^d_{train}$ and $\mathbb{D}^d_{val}$ (left column), and of $\mathcal{G}_d$ accuracy $A_d$ (right column) on $\mathbb{D}^d_{train}$ and on $\mathbb{D}^d_{val}$, both for the SNR$=15$ dB case (top row) and for the SNR$=10$ dB case (bottom row). \label{fig:sin_det}} \vspace{-0.1cm} %\end{framed} \end{figure} Moving to the damage localization task, Tab.~\ref{tab:dam_id_sin_load_tab} collects the results related to the outcomes of $\mathcal{G}_l$ on $\mathbb{D}^l_{test}$ obtained for two different noise levels. \begin{table}[h!] \centering \begin{tabular}{*3c} \hline \parbox{2 cm}{SNR (dB)} & \parbox{3cm}{\centering $\lbrace\mathcal{F}_{*} \rbrace$} & $A_{l}$ \\ \hline $15$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.768$ \\ $15$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.769$ \\ $15$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.812$ \\ $10$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ & $0.654$ \\ $10$ & $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8 $ & $0.642$ \\ $10$ & $\{\boldsymbol{u}^{sh}_i\}_{i=1}^8$ and $\{\boldsymbol{u}^{ax}_i\}_{i=1}^8$ & $0.707$ \\ \bottomrule \end{tabular} \caption{Damage localization, case 1. Accuracy $\mathbb{A}_{l}$ of the classifier $\mathcal{G}_l$ evaluated on $\mathbb{D}^l_{test}$. \label{tab:dam_id_sin_load_tab}} \vspace{-0.25cm} \end{table} The results show that the NN performances benefit from the employment of a two branches architectures: $A_{l}$ increases, compared to the best outcome of the single convolutional layer architecture, from $0.769$ to $0.812$ for the SNR$=15$ dB case, and from $0.654$ to $0.707$ for the SNR$=10$ dB case. This means that the NN has succeeded in performing a data fusion of the extracted information for the sake of classification. The values of $A_{d}$ and $A_{l}$ are quite close, despite of the greater complexity of the damage localization problem; this might be due to the intrinsic capability of the FCN to detect correlations between different sensor recordings, allowing us to perform a correct damage localization. Fig.~\ref{fig:sin_id} reports the evolution of the training and validation loss functions on $\mathbb{D}^l_{val}$ and $\mathbb{D}^l_{test}$ for the datasets with SNR$=15$ dB and SNR$=10$ dB, in the case where a two branches convolutional architecture has been employed. Compared with Fig.~\ref{fig:sin_det}, a smaller difference in terms of loss and accuracy can be highlighted. This is due to the greater complexity of the damage localization task, that requires to exploit the computational resources of the NN entirely. Indeed, the same number of filters $N_1$, $N_2$ and $N_3$ has been used for both the classification tasks, despite of their different complexity. On the other hand, we expect that $A_d$ on $\mathbb{D}^d_{test}$, reported in Tab.~\ref{tab:dam_det_sin_load_tab}, would not be affected by reducing the number of filters. This conclusion can be reached by looking at Fig.~\ref{fig:sin_det} and observing that, during the last stages of the training, $A_d$ on $\mathbb{D}^d_{train}$ is shown to be always greater than the one obtained on $\mathbb{D}^d_{val}$. \begin{figure}[h!] \captionsetup[subfigure]{justification=centering}\centering \subfloat[SNR=15 dB\label{fig:loss_sin_id_15_dB_49_1}]{\includegraphics[scale=0.5]{loss_sin_id_15_dB_49_1.pdf}} \subfloat[SNR=15 dB\label{fig:acc_sin_id_15_dB_49_1}]{\includegraphics[scale=0.5]{acc_sin_id_15_dB_49_1.pdf}} \vspace{-0.25cm} \\ \subfloat[SNR=10 dB\label{fig:loss_sin_id_10_dB_50_1}]{\includegraphics[scale=0.5]{loss_sin_id_10_dB_50_1.pdf}} \subfloat[SNR=10 dB\label{fig:acc_sin_id_10_dB_50_1}]{\includegraphics[scale=0.5]{acc_sin_id_10_dB_50_1.pdf}} \vspace{-0.1cm} \caption{Damage localization, case 1. Training and validation of the two branches convolutional architecture: evolution of the loss $J_l \left(\boldsymbol{Y}, \boldsymbol{p} \right)$ on $\mathbb{D}^l_{train}$ and on $\mathbb{D}^l_{val}$ (left column), and of $\mathcal{G}_l$ accuracy $A_l$ (right column) on $\mathbb{D}^l_{train}$ and on $\mathbb{D}^l_{val}$, both for the SNR$=15$ dB case (top row) and for the SNR$=10$ dB case (bottom row).\label{fig:sin_id}}

118 — 2002.07233

\caption{Visualization of the latent space with 1K samples from the prior ({\color{ForestGreen}green plus sign}), the approximate posterior ({\color{blue}blue circle}) and the delta posterior ({\color{red}red cross}) of Gauss-base (top) and Flow-small (bottom) on a {\iwsltdeen} test example.}

119 — 2002.07553

\caption{\label{fig:bsp}Example MapReduce problem with {\color{green}$m=146$}, {\color{green}$\hat{m}=12$}, {\color{magenta}$w=101$}, and {\color{magenta}$\hat{w}=11$}. Using the BSP algorithm on $p=4$ PEs, the bottleneck work for mapping is {\color{magenta}27} units and {\color{magenta}15} units for reducing. Bottleneck communication volume is {\color{green}21} units for the first superstep and {\color{green}2} for the second one.}

120 — 2002.07843

\caption{ Geometric representation of complex light fields on a High Order Poincar\'e Sphere (HOPS). The top right insets show experimental intensity profiles of a set of modes generated along the green dashed line (see also\blue{Media 1}). The bottom right insets show the intensity profile of beams generated along the yellow solid line (see also \blue{Media 2}). In all cases we show the shape of the vector field after passing through a linear polariser. $\hat{h}$, $\hat{v}$, $\hat{d}$, $\hat{a}$, $\hat{r}$ and $\hat{l}$ represent the horizontal, vertical, diagonal, antidiagonal, right- and left-handed unitary polarisation vectors.}

121 — 2002.07875

\caption{Lake ice segmentation results (IoU) on the \textit{Photi-LakeIce} dataset. For comparison, we also show results of \cite{prs_report} for comparison, in \textcolor{gray}{grey}. We outperform them in all instances.}

122 — 2002.07953

\caption{ We propose \textit{DANCE}, which combines a self-supervised clustering loss \textcolor{red}{(red)} to cluster neighboring target examples and an entropy separation loss \textcolor{gray}{(gray)} to consider alignment with source (best viewed in color).}

123 — 2002.08373

\caption{In each flux \textcolor{referee}{density} bin, we average the star/galaxy estimator from SExtractor for three different J - K colour cuts from \citet{Fleuren2012}: J - K$_{\rm{S}}$ $>$ -0.1, $>$ 0 and $>$ 0.21. Low values of the star/galaxy estimator suggest it is more likely to be a galaxy, while values close to one suggest it is more likely to be a star. All orange lines refer to the greater-than colour cut ($>$), while all blue lines correspond to the less-than ($<$) colour cut. The black line looks at all sources. The thin lines correspond to the analysis on the individual fields (GAMA-9, -12, -15 and SGP). While the J - K $>$ 0.21 shows the most reliable cut for a galaxy selection cut, we will be removing a subset of galaxies. Therefore, we choose the J - K$_{\rm{S}}$ $>$ 0 selection. All colour cuts converge to a similar value for the star/galaxy estimator at high magnitudes, potentially because the star/galaxy classifier fails for faint sources.}

\caption{B(r) (the 'blanks'), the fraction of positions without a VIKING galaxy within a radius r, plotted against radius. Random positions are shown with \textit{blue plusses} and the positions of HerBS sources are shown by \textit{red squares}. The \textit{black circles} show the result of dividing the two distributions, which corrects B(r) for the HerBS sources for unrelated VIKING galaxies falling within $r$ arcsec of the \textit{Herschel} position. The \textit{black line} shows the best-fitting Gaussian to this corrected distribution and the \textit{black dash-dotted line} shows the Gaussian fitted to the corrected distribution of \citet{Fleuren2012}. The poor fit-quality leads us to take $Q_0$ from the \textit{black circles} directly, at $\theta$ = 10, giving $Q_0$ = 0.82. The (\textit{grey line} shows the form expected if lensing is not important and if the distribution is caused by astrometric errors with a FWHM of 1 arcsec, the value found by \citet{Bourne2016}. The \textit{orange line} shows the distribution expected if we include both astrometric errors and the lensing offsets measured by \citet{Aris2018}. The large disagreement between the \textit{grey line} and the empirical results (\textit{black points}) shows that lensing is important. The large disagreement between the \textit{orange line} and the \textit{black points} shows that the lensing is occurring on a large angular scale and suggest it is not just the result of lensing by individual galaxies, \textcolor{referee}{since the \textit{orange line} shows the predicted behaviour of lensing by individual galaxies according to our Toy Model (Sec. \ref{sec:toymodel})}.}

\caption{The photometric redshift estimates from \textit{Herschel}/SPIRE-based redshifts are shown against the photometric redshifts from VIKING flux \textcolor{referee}{densities} for all sources with a reliability greater than 0.8. We use the sub-mm redshifts from \protect\cite{Bakx2018}. The VIKING-based photometric redshifts are, where available, extracted from \protect\cite{Wright2018}, which uses VIKING and KiDS photometry. If this is not available, we use the Eazy package to calculate the photometric redshifts using the VIKING flux \textcolor{referee}{densities} extracted in this paper. A single source is located close to the y = x line, and has a 10\% chance to be at the same redshift, but all other sources are less than 1\% likely to be the same source.}

\caption{The lensing fraction, corrected for false-positives, as a function of their 500$\mu$m selection flux \textcolor{referee}{density} shown for the entire H-ATLAS catalogue. The behaviour of the entire sample agrees with the HerBS sources. ALMA observations of the 500$\mu$m risers suggest 40\% are gravitationally-lensed, although our models suggest this is not true in general for \textit{Herschel} sources.}

124 — 2002.08568

\caption{The table shows the number of branches covered by Vuzzer and T-Fuzz. \textcolor{red}{\ding{55}} means fuzzer crashed on the program.}

125 — 2002.08709

\caption{Results with benchmark datasets. We report classification accuracy for all combinations of weight decay ($\checkmark$ and $\times$), early stopping ($\checkmark$ and $\times$) and flooding ($\checkmark$ and $\times$). The second column shows the training/validation split used for the experiment. W stands for weight decay, E stands for early stopping, and F stands for flooding. ``---'' means that flooding level of zero was optimal. ``N/A'' means that we skipped the experiments because zero weight decay was optimal in the case without flooding. The best and equivalent are shown in \textbf{bold} by comparing ``with flooding'' and ``without flooding'' for two columns with the same setting for W and E, e.g., the first and fifth columns out of the 8 columns. The best performing combination is \colorbox{pink}{highlighted}. }

126 — 2002.08852

\caption{ \label{fig:mean_std} ($a$) Mean and ($b$) standard deviation of $\Delta \tau$ versus time, normalised with the mean skin friction of the base flow. Lines and symbols indicate the characteristics of the forcing: $\opentriangle$ for $f_0 = +8 u_\tau^2/h$, $\opentriangledown$ for $f_0 = -8 u_\tau^2/h$; solid lines for $L_f^+ = 50$, dashed for $L_f^+ = 100$; {\color{bblue} blue} for $T_f^+=25$, {\color{rred} red} for $T_f^+ = 50$ and {\color{ggreen} green} for $T_f^+ = 100$. In both panels, the dotted vertical lines indicate $t=T_f$. }

127 — 2002.09309

\caption{Empirical estimates of 2-Wasserstein distances between true posteriors and empirical distributions of $100,000$ samples at 1024 test locations $\m{X}_{*}$ given varying amounts of training data, shown in terms of quartiles measured over 64 independent trials. Weight-space (orange) and decoupled (blue) sampling utilized a total of $\numBasisTotal = n + \ell$ basis functions. Results using $\ell \in \{1024, 4096, 16384\}$ initial bases correspond with $\{\text{light}, \text{medium}, \text{dark}\}$ tones and $\{ \bigtriangleup, \hbox{\scalebox{1.5}{$\diamond$}}, \hbox{{\FiveStarOpen}} \}$ markers. }

128 — 2002.09383

\caption{ Snapshots of the vertical velocity in the vertical (left column) and horizontal mid-plane (center column) cross-sections, and time signals of $u_z$ (right column) at $\theta = 0$, $z = 0.25$ (\textcolor{red}{---$\!$---}), $0$ (\textcolor{green}{---$\!$---}), $-0.25$ (\textcolor{blue}{---$\!$---}), and $r = 0.42$ ({\it e}), $0.44$ ({\it a, c, d, f}), $0.46$ ({\it b}) are shown. The parameters (${\it Ra}$, ${\it Ha}$ and the ratio between ${\it Ra}$ and ${\it Ra}_c$ of the \citet{Chandrasekhar61} stability limit) are indicated in the right column. }

129 — 2002.09452

\caption{Map of considered residential area of dimension $\unit[600]{m}\times\unit[800]{m}$; basestation location marked with a \textcolor{red}{red ``X''}\label{fig:Residential-area}}

130 — 2002.09616

\caption{Results of the different imaginators generation performance (in BLEU score) and accuracy score on the same TextCNNs based arbitrator. Better results between imaginators are in \textbf{BOLD} and best results on datasets are in \color[HTML]{CB0000} \textbf{RED}.}

\caption{Accuracy Results on Two datasets. Better results between baselines and corresponding ITA models are in \textbf{BOLD} and best results on datasets are in \textcolor{red}{RED}. Random result is the accuracy of script that making random decisions.}

131 — 2002.09860

\caption{\label{fig:var loss}Relation between mean squared error and variance loss. The different colors refer to different neural architectures: {\color{blue} $\bullet$} (blue) Dense Networks; {\color{red}$\bullet$} (red) ResNet-like; {\color{green} $\bullet$} (green) Convolutional Networks; {\color{yellow} $\bullet$} Iterative Networks (DRAW-GQN-like) }

132 — 2002.10111

\caption{\textbf{Network Structure of SMOKE.} We leverage DLA-34 \cite{dla_2018} to extract features from images. The size of the feature map is 1:4 due to downsampling by 4 of the original image. Two separate branches are attached to the feature map to perform keypoint classification (\textcolor{pink}{pink}) and 3D box regression (\textcolor{darkgreen}{green}) jointly. The 3D bounding box is obtained by combining information from two branches.}

\caption{Visualization of difference between 2D center points (\textcolor{red}{red}) and 3D projected points (\textcolor{orange}{orange}). Best viewed in color.}

133 — 2002.10174

\caption{The Comparisons of FID, KID Score and IS. RelationGAN represents Relation GAN with objective function in equation~\eqref{relu-mean} and RelationGAN$^*$ represents Relation GAN with objective function in equation~\eqref{mean-relu}. The best two scores are shown in {\color{red}{red}} and {\color{green}{green}}, respectively.}

134 — 2002.10373

\caption{A conceptual illustration of the internal data structure that constitutes a single anchor, and which is first initiated by a percept $\pi$ from a raw image. The volatile and static attributes are derived from this percept, while predicates such as \predicate{red}, are derived from static attributes (which are not indexed by time), e.g. the static color histogram attribute.}

\caption{ Depicted are two training points in the data set that were used to learn the transition rule of an object to another object. The panels on the left show a \predicate{ball} that is being occluded by a \predicate{box}, and on the right, the same \predicate{ball} that is being grabbed by a hand (or a \predicate{skin} object, as we have only trained our used GoogLeNet model to recognize general human skin objects instead of particular human body parts, cf. Section~\ref{section:requirements_of_framework}). The plotted dots on top of the occluding object represent samples drawn from the probability distribution of the occluded object, in other words the object that is labeled in the data set to transition into the occluding counterpart. }

\caption{ Screen-shots captured during the execution of a scenario where the stream of sensor data is obscured. Visually perceived anchored objects are symbolized by a unique anchor identifiers (e.g., \predicate{mug-1}), while occluded hidden objects are depicted by plotted particles that represent possible positions of the occluded object in the inference system. The screenshots illustrate a scenario where the RGB-D sensor is covered and a \predicate{ball} is hidden under either one of three larger objects. These larger objects are subsequently shuffled around before the whereabouts of the hidden \predicate{ball} is revealed. }

\caption{ The two scenario show how a learned ToO is used to perform semantic relational object tracking. In both scenarios, an object is occluded by a \predicate{box} and successfully tracked before the occluded object is being revealed and again \textit{re-acquired} as the same initial object. }

\caption{ A scenario that demonstrates transitive occlusions based on learned rules for handling the theory of occlusions. First the \predicate{ball} is occluded by the \predicate{mug} (indicated by the yellow dots) and subsequently the \predicate{mug} is occluded in turn by the \predicate{box} (indicated by the black dots). Once the \predicate{mug} is observed again the \predicate{ball} is still believed to be occluded by the \predicate{mug}. }

135 — 2002.10638

\caption{ Comparison with the state-of-the-art methods on R2R. \textcolor{blue}{Blue} indicates best value in a given setting. {\textbf{S}} indicate single-instruction setting, {\textbf{M}} indicates multiple-instruction settings.% and {\textbf{E}} indicates exploration of the unseen environments. }

\caption{Results on CVDN measure by Goal Progress. \textcolor{blue}{Blue} indicates best value in a given setting. }

\caption{ Results on test splits of HANNA. The agent with ``perfect assistance'' uses the teacher navigation policy to make decisions when executing a subtask from the assistant. \textcolor{blue}{Blue} indicates the best value. }

\caption{Ablation study of the pre-training objectives on CVDN measured by Goal Progress. \textcolor{blue}{Blue} indicates the best value. }

\caption{Ablation study on R2R: feature-based vs fine-tuning. \textcolor{blue}{Blue} indicates the better value.}

136 — 2002.10648

\caption{Overview of the MAD competition procedure. \textbf{(a)}: A large unlabeled image set of web scale. \textbf{(b)}: The subset of natural images selected from (a) on which two classifiers (VGG16BN and ResNet34 in this case) make different predictions. Note that collecting the class label for each image in this subset may still be prohibitive because of its gigantic size. \textbf{(c)}: Representative examples sampled from top-$k$ images on which VGG16BN's and ResNet34's predictions differ the most, quantified by Eq.~(\ref{eq:wd}). Although the two classifiers have nearly identical accuracies on the ImageNet validation set, the proposed MAD competition successfully distinguishes them by finding their respective counterexamples. This sheds light on potential ways to improve the two classifiers or combine them into a better one. The model predictions are shown along with the images, where \textcolor{green}{\underline{green underlined}} and \textcolor{red}{\textit{red italic}} texts indicate correct and incorrect predictions, respectively. }

137 — 2002.10770

\caption{\textbf{Occlusion comparison over Sintel final pass}. Comparison of occlusion estimations created by: (a) FlowNet-CSSR-ft-sd~\cite{ISKB18}, (b) IRR-PWC~\cite{Hur2019CVPR} baseline, and (c) ScopeFlow (ours). First frame on the left column and ground truth flow on the right column. For each occlusion map: {\color{blue}false positive} are in blue, {\color{red}false negative} in red, and true positive in white. All occlusion maps estimated using Sintel Final samples and the original models published by the authors. Our improvements are mainly for foreground objects on the image margins.}

138 — 2002.10826

\caption{Overview of our proposed LEAP framework. The head data and tail data are fed into the deep network to obtain the features. We calculate the distribution of angles between the features and the class center for head class and tail class, respectively. Subsequently, we transfer the angular variance of head class ({\color{red}{red curve}}) to tail class ({\color{green}{green curve}}). In other words, based on the original distribution of tail class, we add an additional distribution ({\color{yellow}{yellow curve}}). Then we get a new distribution of tail class ({\color{blue}{blue curve}}). Finally, we use the head data and the new tail data to calculate the loss.}

139 — 2002.10857

\caption{The change of $s_p$ and $s_n$ values during training. We linearly lengthen the curves within the first 2$k$ iterations to highlight the initial training process (in the \textcolor{green}{green} zone). During the early training stage, Circle loss rapidly increases $s_p$, because $s_p$ deviates far from the optimum at the initialization and thus attracts higher optimization priority.}

\caption{Visualization of the similarity distribution after convergence. The \textcolor{blue}{blue} dots mark the similarity pairs crossing the decision boundary during the whole training process. The \textcolor{green}{green} dots mark the similarity pairs after convergence. (a) AMSoftmax seeks to minimize $(s_n-s_p)$. During training, the similarity pairs cross the decision boundary through a wide passage. After convergence, the similarity pairs scatter in a relatively large region in the $(s_n, s_p)$ space. In (b) and (c), Circle loss has a circular decision boundary. The similarity pairs cross the decision boundary through a narrow passage and gather into a relatively concentrated region. }

140 — 2002.10864

\caption{Ablation analysis w.r.t. effectiveness of CFA/CFD. Res50 is the ResNet-50 backbone. CFA and CFD in our method are important for improving performance. Best and second best results are shown in \textbf{black} and \textcolor{aa}{\textbf{red}}, respectively.}

\caption{Ablation analysis w.r.t. different configurations of CFA. Design of CFA achieves better performance than other settings. Best results are shown in \textcolor{aa}{\textbf{red}}.}

\caption{Ablation analysis of CFD with different distribution configurations. \textbf{(D)} refers to w/o CFD module defined in Tab.~\ref{weight analyse}. Each level feature in CFD contributes a lot to the progressive fusion. Best results are highlighted in \textcolor{aa}{\textbf{red}}.}

\caption{Comparisons of max F-measure and MAE values on VGG \cite{VGG} and ResNet~\cite{resnet:He2015Deep} backbones are reported. Results of our method are shown in \textcolor{bb}{\textbf{blue}}, \textbf{black}, and \textcolor{aa}{\textbf{red}}, respectively. With different backbones, the proposed method consistently achieves better performance than the previous state-of-the-arts. Best viewed in color.}

141 — 2002.10893

\caption{\textbf{3D LIDAR semantic segmentation accuracy vs speed} on the SemanticKITTI \cite{behley2019semantickitti} benchmark (test set). Point-based methods are drawn as \textcolor{Green}{green} circles and projection-based methods as \textcolor{Red}{red} squares. Areas of squares and circles depict the number of parameters used in each method. Our proposed 3D-MiniNet outperforms previous methods while being more parameter efficient and faster. Best viewed in color. }

142 — 2002.10917

\caption{Improvement in makespan when solving a problem using qubit initialization provided by a classical planner (Section~\ref{sec:initialization}). Example: entry 8.1\%(8/10) for cell IBM-16, G2R4, OPTIC means: among 10 randomly generated QIs for solving G2R4 on IBM-16, OPTIC can solve 8 with random QI while it can solve all 10 with QIs produced by the Fast Downward classical planner solving the same random problems. Accross the 8 problems that are solved with both QI setups, the average makespan improvement is 8.1\%. When the number of solved problem is not listed, it means all 10 are solved with both QI options. \textcolor{red}{RED} indicates entries where the Fast Downward's QI performs worse than random QI.}

\caption{Comparing different initialization strategies. \textbf{M}: \textbf{M}anual initialization (results from Table~\ref{tab:result1}); \textbf{I}: Solving the combined QI + Routing (Routing-I) problem in one temporal-planning run (see Section~\ref{sec:initialization}); \textbf{Random} and Fast Downward(\textbf{FD}) are initialization setups explained in Table~\ref{tab:qi_result1} with ``AVG'' and ``Best'' representing the average and best values across 10 random QIs. Values in \textcolor{red}{RED} show the best value across all setups; values in \textbf{bold} are best for a given planner. Random vs FD qubit initialization: Light Yellow background indicates better makespan by FD while Light Cyan indicates better makespan for random QI. }

143 — 2002.10937

\caption{Tri-training \cite{ruder2018strong} - {\color{blue}Modified}}

144 — 2002.10981

\caption{\textcolor{Black}{Sound Quality Matrix Analysis Results: Average normalized cross-correlation value obtained from comparing the original and generated audio signals for model 1 and 2 in all sound classes}}

\caption{\textcolor{Black}{Comparison of the average accuracy (\%) among the Ablation Models and our proposed method 1 (Frame Sequence Network) on AutoFoley Dataset.}}

\caption{\textcolor{Black}{Comparison of the average accuracy (\%) among the Ablation Models and our proposed method 2 (Frame Relation Network) on AutoFoley Dataset.}}

\caption{\textcolor{Black}{Effect of Interpolation technique: Performance of the Frame-Sequence and Frame-Relation Networks with frame replication (k factor) instead of Interpolation. Comparing accuracy with proposed system with the Interpolation method}}

\caption{\textcolor{Black}{Human Evaluation Results: Selection percentage of each sound category for the first and second human survey questions}}

\caption{\textcolor{Black}{Human Evaluation Results: Selection percentage of each sound category for the third and fourth human survey questions}}

145 — 2002.11049

\caption{APFD (higher the better) results for different models with or without the \textbf{Hard} strategy on the ``hard to find'' SATDs. Medians and iqrs (lower the better) are calculated for easy comparisons. If \textbf{Hard} = no, human oracles on the target project are not utilized, the model is just a one-time trained supervised learning model. On the other hand, if \textbf{Hard} = yes, human oracles on the queried comments are utilized to update the model before it is applied to find its next highest predictions for humans to verify. {\IT} utilizes \textbf{Hard} = yes. A threshold of Cohen'd small effect size (0.02) is applied to determine which treatment performs best in each target project and color them in \colorbox{gray!40}{gray}. The column \textbf{\#Best} shows the number of projects each treatment performs the best in.}

\caption{APFD (higher the better) results for different treatments on finding all the SATDs. Medians and iqrs (lower the better) are calculated for easy comparisons. The proposed treatment \textbf{{\IT}=Easy+Hard}. A threshold of Cohen'd small effect size (0.01) is applied to determine which treatment performs best in each target project and color them in \colorbox{gray!40}{gray}. The column \textbf{\#Best} shows the number of projects each treatment performs best in.}

146 — 2002.11244

\caption{Average PSNR of the denoised images, where the inputs are corrupted by AWGN with $\sigma = 15, 25,$ and $50$, for the images from Set12 and BSD68 datasets. (\color{red}{red}: \color{black} the best result, \color{blue}{blue}: \color{black} the second best)}

\caption{Average PSNR of the denoised images on the DND benchmark, we denote the environment of training, {\em i.e.}, training with SN data only, RN data only, and both. $^*$ denotes geometric self-ensemble~\cite{timofte2016seven} result. (\color{red}{red}: \color{black} the best result, \color{blue}{blue}: \color{black} the second best)}

\caption{Average PSNR of the denoised images on the SIDD benchmark, we denote the environment of training, {\em i.e.}, training with SN data only, RN data only, and both. $^*$ denotes geometric self-ensemble~\cite{timofte2016seven} result. (\color{red}{red}: \color{black} the best result, \color{blue}{blue}: \color{black} the second best)}

147 — 2002.11267

\caption{\color{red}{Numerical solutions of the delayed model (Eqs. \ref{modseir-retHs} - \ref{modseir-retVi}) with $\epsilon =0.5$ (red) and the corresponding ABM considering fixed periods (other colours curves), for $b=0.3$ (left panel) and $b=0.5$ (right panel). In black line the solution of the delayed model (Eqs. \ref{modseir-retHs} - \ref{modseir-retVi}) without seasonality ($\epsilon=0$).}}

148 — 2002.11397

\caption{{\bf Numerical comparison with state-of-the-art blind methods on DIV2K realistic-wild validation set (SR scale ${\bf \times 4}$).} The best and second-best results are highlighted in {\color{red} red} and {\color{blue} blue}, respectively. We use the officially provided evaluation script\protect\footnotemark (validation stage setting). Throughout this paper, the real configuration is used for ZSSR, and the Inception backbone model is used for DeblurGAN-v2.}

149 — 2002.11616

\caption{Quantitative comparison of our results and two-stage VFI and VSR methods on testsets. The best two results are highlighted in \textcolor{red}{red} and \textcolor{blue}{blue} colors, respectively. The total runtime is measured on the entire Vid4 dataset \cite{liu2011bayesian}. Note that we omit the baseline models with Bicubic when comparing in terms of runtime.}

150 — 2002.11852

\caption{Schematic of the microscale detail of a single patch (\textcolor{mtxt}{coloured teal in the pdf version}). Short vertical lines show the microscale mesh. Three mesh-points on the patch connect to the macroscale (\textcolor{Mtxt}{orange in the pdf version}): the centre of the patch, and the two edge locations. As shown in \cref{fig:pats} the simulation at each patch centre is coupled between patches to provide edge values to each patch.}

\caption{This illustrates both a double patch (centre), with index \(j=s\), and the inter-patch coupling to obtain the edge value on the left side of the double patch. The two `shock' nodes are labeled \textcolor{Mtxt}{$X^l_s$} and~\textcolor{Mtxt}{$X^r_s$}. Inter-patch coupling is by the usual interpolation~\eqref{patCoup} (\cref{fig:pats}) except that the interpolation is adjusted so that it does not cross the double patch.}

151 — 2002.11927

\caption{Parameters size and inference time of different models compared to ours. The lower the better. Models were bench-marked using Nvidia GTX1080Ti GPU. The inference time is the average of several single inference steps. We notice that \ours has the least parameters size compared and the least inference time compared to others. The text in \textcolor{blue}{blue} show how many times our model is faster than others. }

\caption{The first column is the ground truth, while the other columns illustrate samples from our model. The first two rows show two different scenarios where pedestrians merge into a direction or meet from opposite directions. The second and third columns show changes in \textcolor{green}{speed} or \textcolor{blue}{direction} in samples from our model. The last column shows \textcolor{orange}{undesired} behaviors. The last row show \textcolor{pink}{failed} samples.}

152 — 2002.11936

\caption{CON \textcolor{cyan}{$\blacksquare$}}

\caption{GGO \textcolor{yellow}{$\blacksquare$}}

\caption{HCM \textcolor{red}{$\blacksquare$}}

\caption{EMP \textcolor{green}{$\blacksquare$}}

\caption{NOR \textcolor{brown}{$\blacksquare$}}

153 — 2002.12017

\caption{\textbf{Ablation study on distance metric.} We do not consider uncertainties here for clear comparison. ECE and mean Entropy of confidence scores are computed just before taking the initial transduction step. Red color: \textcolor{red}{overconfident} and blue color: \textcolor{blue}{underconfident}. Inductive*: the results are from inductive inference with the transductively trained model. $d(\cdot,\cdot)$ denotes euclidean distance, and we let $\widebar{\mathbf{a}} := \mathbf{a}/\|\mathbf{a}\|_2$. $s \in \mathbb{R}$ is a learnable parameter initialized to $10$, following \citet{gidaris2018dynamic} and \citet{lifchitz2019dense}.}

\caption{Reliability plots. For \textbf{(a)} and \textbf{(b)}, c=1, s=10, c=10 and $d_{\phi}$ denotes $d(\widebar{\mathbf{a}}_1,\widebar{\mathbf{a}}_2)$, $s \cdot d(\widebar{\mathbf{a}}_1, \widebar{\mathbf{a}}_2)$, $10 \cdot d(\widebar{\mathbf{a}}_1, \widebar{\mathbf{a}}_2)$ and $d_\phi(\mathbf{a}_1, \mathbf{a}_2)$ respectively. If plot is above(under) dotted line that denotes perfectly calibrated one, it means \textcolor{blue}{underconfident}(\textcolor{red}{overconfident}). All results are conducted on ResNet-12.}

154 — 2002.12106

\caption{Prototypes of our hybrid \highlighttext{imaging} system. a) The smartphone rig (left) is a simple setup with Moto G6 (main camera) and Nokia 6.1 (auxiliary camera). b) The digital camera rig (right) is designed using two Nikon S1 and a 50:50 beam-splitter to emulate a small baseline setup.}

\caption{\highlighttext{This scene is captured under low light condition and, thus, the auxiliary frames are noisy. EDVR and DUF are unable to reconstruct the details (left arrow) and remove the input noise (right arrow). Due to the rolling shutter effect, the light torch is off but the reflection is visible in the left keyframe, which is problematic for Super SloMo and DAIN. Our method, however, is able to handle this challenging scene.}}

\caption{\highlighttext{Inference performance on GTX 1080 Ti GPU}}

\caption{\highlighttext{Evaluating output quality by changing gamma and hue of the auxiliary video frames}}

\caption{\highlighttext{Example of auxiliary video inputs for the experiments in Sec.~\ref{sec:gammahue} (Gamma and Hue row) and Sec.~\ref{sec:noisy} (Noisy and Denoised row). The extent of perturbation is specified by the numbers in the lower right corner of images.}}

\caption{\highlighttext{Evaluating the effect of noise in auxiliary video on the result's quality. We show the results using both noisy and denoised (using VBM4D~\cite{vbm4d}) auxiliary videos.}}

\caption{\highlighttext{Evaluating the effect of temporal desynchronization between main and auxiliary videos}}

\caption{\highlighttext{We show two consecutive main keyframes and their corresponding auxiliary keyframes. Here, since the flag is close to the camera rig, there is a large disparity between the main and auxiliary keyframes, as indicated by the arrows. Our flow estimation network cannot handle such a large misalignment and, thus, our system produces warping artifacts around the motion boundaries (see also supplementary video).}}

155 — 2002.12213

\caption{The average PSNR/SSIM results on various kernels with $\times 2$ on benchmarks. The numbers in parenthesis in our methods stand for the number of gradient updates. The best results are highlighted in \textcolor{red}{red} and the second best are in \textcolor{blue}{blue}.}

\caption{Average PSNR/SSIM results on the scaling factor $\times 4$ on benchmarks. The numbers in parenthesis in our methods stand for the number of gradient updates. The best and the second best are highlighted in \textcolor{red}{red} and \textcolor{blue}{blue}, respectively.}

156 — 2002.12324

\caption{\textbf{Top.} Our system accurately re-localizes within a known environment given a single image. We show estimated camera positions in {\color{violet}purple} and ground truth in {\color{cyan}cyan}. In this instance, the system was trained using RGB images and associated ground truth poses, only ({\color{gray}gray} trajectory), In particular, the scene geometry, displayed as a 3D model, was discovered by the system, automatically. \textbf{Bottom.} To visualize the re-localization quality, we render the learned 3D geometry using estimated poses over gray-scale input images.}

\caption{\textbf{Results For Indoor Scenes. First Row:} Camera positions of training frames in {\color{gray}gray} and of test frames in {\color{cyan}cyan} for all scenes of the 7Scenes \cite{shotton13scorf} dataset. \textbf{Remaining Rows:} Estimated camera positions of test frames, color coded by position error. We also state the percentage of test frames with a pose error below 5cm and 5$^\circ$. Each row represents a different training setup. For a more informative visualization, we show the ground truth 3D scene model as a faint backdrop, and we connect consecutive frames within 50cm tolerance.}

\caption{\textbf{Results For Outdoor Scenes. First Row:} Camera positions of training frames in {\color{gray}gray} and of test frames in {\color{cyan}cyan} for scenes of the Cambridge Landmarks \cite{kendall2015convolutional} dataset. \textbf{Remaining Rows:} Estimated camera positions of test frames, color coded by position error. We also state the percentage of test frames with a position error below 0.5\% of the scene size. We derive the threshold for each scene from the scene extent given in \cite{kendall2015convolutional}. In particular, we use 35cm for St. Mary's Church, 45cm for Great Court, 22cm for Old Hospital, 38cm for King's College and 15cm for Shop Facade. Each row represents a different training setup. For a more informative visualization, we show the ground truth 3D scene model as a faint backdrop, and we connect consecutive frames within 5m tolerance.}

157 — 2002.12328

\caption{Examples of generated utterances from different models, along with its corresponding dialog acts (DAs) and references. The first two examples are sampled from \data{} and the last one is from MultiWOZ. Each generated utterance is followed by a brief description explaining the errors (starting with ``\%''). (Better viewed in color. \colorbox{red!30}{wrong}, \colorbox{mygreen!30}{redundant}, \colorbox{blue!30}{missing} information)}

\caption{Examples of generated utterances with novel dialog acts. SC-GPT produces better utterances than GPT-2 for with edited dialog acts. Since both models produce similar responses to references for the original dialog act, the results are not shown here. (Better viewed in color. \colorbox{mygreen!30}{insert a slot}, \colorbox{blue!30}{substitute a slot value}, \colorbox{red!30}{ delete a slot}).}

158 — 2002.12430

\caption{Total energy ($H_M$) plotted as a function of charge ($q$) and current ($\dot{q}$) for an applied voltage $V_M = 18 ~V$. Projection on the plane $\dot{q} = 0$ gives the potential energy ($U_M$) - charge ($q$) plot, as shown in Fig. \textcolor{blue}{\ref{fig:Static_EQ}}. Projection on the plane $H_M$ = constant gives the phase plane plot, as shown in Fig. \ref{fig:Dynamic_EQ_PP}.}

159 — 2002.12502

\caption{Mode structure for \textcolor{black}{the Kerr} resonator, showing the cold- (cross) and hot-cavity (open circle) resonances, pump laser (dashed line), and light in each mode (solid circle). \textcolor{black}{The left panels show the } (\textbf{A}) Turing pattern, and (\textbf{B}) DKS state in the \protect\includegraphics[width=0.75\baselineskip]{B_logo.png} resonator, with the \textcolor{black}{DKS} Kerr mismatch (dashed red arrow). (\textbf{C}) and (\textbf{D}) show the \protect\includegraphics[width=0.75\baselineskip]{P_logo.png} resonator with photonic crystal shift (dashed blue arrow) at the corresponding Kerr shift, with (\textbf{D}) in the pulse state. (\textbf{E}) Illustration of optical pulse formation in a photonic ring resonator. (\textbf{F}) Simulated peak power versus pump laser detuning for the \protect\includegraphics[width=0.75\baselineskip]{B_logo.png} (green) and \protect\includegraphics[width=0.75\baselineskip]{P_logo.png} (blue) resonators, with the analytic flat amplitude (dashed gray) for reference. The corresponding intensity profiles are shown in the right panels.}

\caption{ The balancing of Kerr shift (red) and dispersion (black) for (\textbf{A}) the \protect\includegraphics[width=0.75\baselineskip]{B_logo.png} and (\textbf{B}) \protect\includegraphics[width=0.75\baselineskip]{P_logo.png} soliton pulses, where the sum of the two is magnified and shown in blue. (\textbf{C}) Calculated Kerr mismatch for various mode shift values and pump detuning. (\textbf{D}) Simulated time traces and (\textbf{E}) intensity plots during the spontaneous pulse generation process.}

160 — 2002.12585

\caption{Comparisons of model complexity and speed. \#Parameters are estimated. Time and Speed is measured on a single NVIDIA GeForce GTX 1080 Ti. ips stands for images per second. The symbol\ssymbol{2} denotes the result reported from original papers. \label{tab:time}}

161 — 2002.12641

\caption{Illustration of weight distribution on the three branches \textbf{b, c, d} of different GCN layers obtained by our adaptive aggregation module over mini-ImageNet. The \textcolor{red}{red}, \textcolor{green}{green}, and \textcolor{blue}{blue} points denote the weights of GCN layer \textcolor{red}{1}, \textcolor{green}{2}, and \textcolor{blue}{3}, respectively.}

162 — 2002.12674

\caption{Equivalent to Table 2 in the main paper. Ablation results without discriminator output matching (DOM) when training on \textcolor{red}{chairs}/\textcolor{blue}{couches} ``one per model'' datasets. We either fix the pre-trained neural renderer (``Fixed''), or continuing to train it during GAN training (``Retrained''). The generator samples fed to the discriminator are rendered using either OpenGL or the neural renderer. For reference, our model is equivalent to the Retrained OpenGL setup with the addition of the DOM loss and achieves FID scores \textcolor{red}{32.1}/\textcolor{blue}{36.5}. FID scores calculated using an Inception network trained on ImageNet.}

163 — 2002.12697

\caption{\label{fig:rsGF_DW_HI_SF} SF Green's function for different phases. Because of periodic boundary conditions, it is $G(r)=G(L-r)$. {\color{red} Red squares}: The DW phase for $t=0.2$. We see an exponential decay (note the logarithmic scale) for all system sizes. $G(r)$ approaches zero for larger distances implying zero winding and absence of superfluid density. {\color{blue}Blue circles}: The HI phase with $t=0.23$. While $G(r)$ for small chain lengths looks similar to $G(r)$ for the SF phase, large sizes display an exponential decay like in the DW phase. {\color{green}Green triangles}: In the SF phase ($t=0.34$) the Green's function is size-independent and $G(r)$ stays nearly constant for all distances.}

164 — 2002.12785

\caption{Cost of Lee-Brickell's ISD algorithms in the Hamming and Lee metric, for fixed $p,n,k$ and choosing $t$ according to the respective GV bounds.\textcolor{red}{paolo: guys, there's something a bit strange about some lines in this table. indeed, in some cases, Stern is worse than Prange for the Lee metric. How is this possible?}}

165 — 2002.12796

\caption{Energetics of a randomly flashing ratchet. (a) Load $F$ dependence of efficiency $\eta$ (thick line, left axis) and output power $\dot{W}$ (thin lines, right axis), with $U_{\max}=20$, $w_{12}=100$ and $w_{21}=200$. Dashed line: $\alpha =0.8$. Dash-dotted line: $\alpha=0.9$. Solid line: $\alpha=0.98$. The maximum of efficiency is marked by \textcolor[RGB]{0,115,189}{$\blacksquare$} and the efficiency at maximum power is marked by \textcolor[RGB]{163,20,46}{$\bigstar$}. (b) Maximum power $\dot{W}_{\max}$ as a function of transition rates $w_{12}$ and $w_{21}$ with $U_{\max}=20$, $\alpha=0.8$. (c) Comparison of external load at maximum efficiency ($F_{\eta_{\max}}$, lines) to that at maximum power ($F_{\dot{W}_{\max}}$, lines with $\bullet$). For all lines $w_{12} =100$ while $\alpha$ and $w_{21}$ are illustrated in the legend. (d) Maximum efficiency ($\eta_{\max}$, lines) and efficiency at maximum power ($\eta_{\dot{W}_{\max}}$, lines with $\bullet$), as the change of potential depth $U_{\max}$. Corresponding output power are presented in the inset. Parameters $\alpha$, $w_{12}$ and $w_{21}$ for each line are the same as those in (c). Other parameters: $k_BT=1$, $\gamma=1$ and $D=k_BT/\gamma=1$.}

166 — 2002.12802

\caption{Summary of the runs performed for this study (full symbols \textcolor{red}{$\bullet$}), along with past numerical \citep{scheel2003,sanchez2005square,horn2017prograde,kunnen,bodenschatz} and experimental \citep{zhongEK,aurnou2018rotating,kunnen} studies. The lower black line corresponds to the onset of wall modes as predicted by linear theory while the upper corresponds to the onset of bulk modes. Our study focuses on the gray region in between where only the wall mode is unstable. Note that the studies reported on this plot (symbols) do not all have the same Prandtl number $Pr$ nor the same aspect ratio $\Gamma$.}

\caption{(a) Drift frequency $-\omega_d$ as a function of $Ra$ for $\Gamma=1.5$, $E=10^{-6}$ and $Pr=1$. The results for the full cylinder (\textcolor{blue}{$\square$}) and the cylinder with a barrier (\textcolor{orange}{$\triangle$}, see section~\ref{sec:barrier} below) coincide. The theoretical value $\omega_c\approx-59 E/Pr$ predicted by \cite{busse1993} for the onset of the instability in the presence of a planar wall is also reported (open circle). The dot-dash line corresponds to the linear scaling $|\omega_d|=|\omega_c|+c(Ra-Ra_c^{\textrm{wall}})$ with $c$ an arbitrary constant. (b) Corresponding volume- and time-averaged zonal velocity $\left<u_{\phi}\right>_{V,t}$. The dot-dash line corresponds to the scaling $\left<u_{\phi}\right>_{V,t}\sim Ra-Ra_c^{\textrm{wall}}$. The volume integration is only performed over the left half of the cylinder for the cases with barrier (see section~\ref{sec:barrier}). (c) Azimuthally, vertically and temporally averaged zonal velocity $\left<u_{\phi}\right>_{\phi,z,t}$ as a function of the radial coordinate $r$. Positive values correspond to cyclonic motions while negative values correspond to anticyclonic motions. The two vertical lines indicate the Stewartson layer scales $E^{1/3}$ and $E^{1/4}$ \citep{stewartson_1957}. (d) Divergence of the Reynolds stress in the azimuthal direction for the case $Ra=5\times10^8$. Results are computed at a given time during the exponential phase of the instability (arbitrarily rescaled) and time-averaged during the nonlinear saturated phase.}

167 — 2002.12877

\caption{\textcolor{red}{fix table}}

168 — 2003.00174

\caption{Time evolution of mobility for ten different runs (different seeds) of a system with dimensions of $L_{x}=256$ and $L_{y}=64$ and with $\protect% \alpha =20$. The dynamics relaxes to a state of optimal flow when $\protect% \rho =0.5$ (a) and relax to \textcolor{red}{a}/\textcolor{blue}{an} immobile state when $\protect\rho =1.5$ (c). For $\protect\rho =1.0$ (b), the system presents a metastable steady state where either the mobile or the immobile phase can arise depending sensibly on the initial conditions.}

169 — 2003.00197

\caption{{\color{red} Video Class Prediction $\rightarrow$ CE loss, also predictions of 3D CNN}The framework of the proposed video semi-supervised learning framework which can be trained both with labeled data and unlabeled data. The network is optimized with three loss functions: 1) the video cross entropy loss on the labeled data which paired with human-annotated labels, 2) the pseudo cross entropy loss on the unlabeled data while the label is the soft score generated by the 3D video classification network, and 3) the log softmax loss on both unlabeled and labeled data to teach the video classification network to capture the appearance information.}

170 — 2003.00266

\caption{\label{fig:epsart}Geometry of the two-view X-ray fluoroscopy imaging system SyncTraX FX4\textregistered with a radiotherapy system. Two views cross at the isocentre without being blocked by the gantry head. Source-object distance (SOD) = 2353 $\mathrm{mm}$. Source-image distance (SID) = 4172 $\mathrm{mm}$.}

171 — 2003.00321

\caption{Data split of digits classification task. \textcolor{cyan}{Blue} test data is $T_{Source}$, \textcolor{green}{Green} test data is $T_{Target}$ and \textcolor{red}{red} test data is $T_{Target}^{New}$. In this case, all labeled data is $\sim30\%$ of all training data. Train:validation is about 8:2.}

\caption{Data split of fetal US standard plane classification (\textbf{anatomies vs. artifacts}). \textcolor{cyan}{Blue} test data is $T_{Source}$, \textcolor{green}{Green} test data is $T_{Target}$ and \textcolor{red}{red} test data is $T_{Target}^{New}$. In this case, all labeled data is $\sim30\%$ of all training data. Train:validation is about 8:2.}

\caption{Data split of fetal US standard plane classification (\textbf{anatomies vs. imaging devices}). \textcolor{cyan}{Blue} test data is $T_{Source}$, \textcolor{green}{Green} test data is $T_{Target}$ and \textcolor{red}{red} test data is $T_{Target}^{New}$. In this case, all labeled data is $\sim30\%$ of all training data. Train:validation is about 8:2.}

172 — 2003.00482

\caption{An overview of our video object segmentation pipeline. SAT can be divided into three parts by the dotted line in {\color{gray}gray}: Joint Segmentation Network, State Estimator, and Feedback. \textbf{Joint Segmentation Network} fuses the feature of the saliency encoder~(in {\color{orange}orange}), the similarity encoder~(in {\color{yellow}yellow}), and the global feature~(in {\color{green}green}), and then decodes the fused feature to predict a mask. Afterward, \textbf{State Estimator} evaluates the prediction result and calculates a state score to represent the current state. Finally, based on the state estimation result, \textbf{Cropping Strategy Loop} switches the cropping strategy to keep a more stable tracklet. \textbf{Global Modeling Loop} constructs a global representation to enhance the feature of the segmentation network. }

\caption{Switches between the mask-box~(in white) and the regression-box~(in color). The first column shows that the mask-box is more robust to distractors. When the two players are twisted together~(second row), regression-box fails, and State Estimator chooses mask-box. The second column shows the regression-box provides a complete representation when the object is truncated or partially occluded. The third column shows that the regression-box can retrieve the target object in case of fast motion. The dotted line in {\color{cyan}cyan} represents the search region of the similarity encoder; the one in {\color{red}red} indicates the input region of the saliency encoder. }

\caption{Quantitative results on DAVIS2017 validation set. OL denotes online fine-tuning. FPS denotes frame per second. The best two results among offline methods are marked in {\color{red}red} and {\color{blue}blue} respectively. *: STM requires more training data and longer training time than other works.}

\caption{Quantitative results on Youtube-VOS benckmark. OL denotes online fine-tuning. The subscript $s$ denotes seen categories while $u$ denotes unseen categories. The best two results among offline methods are marked in {\color{red}red} and {\color{blue}blue} respectively. *: STM requires more training data and longer training time than other works. }

\caption{Quantitative results on DAVIS2016 validation set. OL denotes online fine-tuning. FPS denotes frame per second. The best two results among offline methods are marked in {\color{red}red} and {\color{blue}blue} respectively. *: STM requires more training data and longer training time than other works. }

173 — 2003.00944

\caption{ \label{fig:GOTOskeleton} (L) The first of 55 (out of 20000) realizations of a control flow graph for a program skeleton generated through 16 uniformly random conditional \texttt{goto}s that result in $\tilde \beta_2 > 0$. {\color{blue}Blue} (resp., {\color{red}red}) arcs indicate branches where a Boolean predicate (placeholder) \texttt{b} evaluates to {\color{blue}$\top$} (resp., {\color{red}$\bot$}). This particular example has $\tilde \beta_\bullet = (0,11,1,0,\dots)$. (R) The first of the remaining 19945 realizations that \emph{do not} result in $\tilde \beta_2 > 0$. This particular example has $\tilde \beta_\bullet = (0,13,0,\dots)$. }

174 — 2003.00976

\caption{ \label{fig:toyComplex} Weighted Dowker graph for the toy data using ``reverse'' order ($\{\bullet\}$). Faces of the Dowker complex (see \S \ref{sec:Theory}) correspond to graph nodes, which are marked with tuples of rows followed by face counts. {\color{red}Inconsistent edges are red.} }

\caption{ \label{fig:Govdocs1Complex} Weighted Dowker graph for the \texttt{Govdocs1} corpus. Inconsistencies labeled in {\color{red}red}. Graph nodes correspond to the subsets of $\{0,\dots,4\}$ indicated below the nodes, where $0,\dots,4$ respectively correspond to \texttt{caradoc}, \texttt{mutool}, \texttt{pdfminer\_dumppdf}, \texttt{pdftools\_pdfparser}, and \texttt{poppler\_pdfinfo}. The corresponding weights are shown above the nodes. While this visual representation is more convenient than those in Figures \ref{fig:polyvenn5}, \ref{fig:Govdocs1Relation}, and \ref{fig:Govdocs1RelationSorted}, the algorithmic approach of \S \ref{sec:Theory} to identifying consistent subsets of the power set of programs is ultimately necessary. }

\caption{ \label{fig:venn} (Left) Venn diagram for the projected relation. There is a ``gap'' between {\color{green}the set of 3 files accepted by program $B$ and rejected by programs $A$ and $C$} and the sets of {\color{red}2 files accepted by program $A$ and rejected by programs $B$ and $C$}, of {\color{purple}3 files accepted by programs $A$ and $C$ and rejected by program $B$}, and of {\color{blue}2 files accepted by program $C$ and rejected by programs $A$ and $B$}. (Right) Another representation of the same diagram as a $(1,2)$ polyVenn. }

175 — 2003.01018

\caption{Schematic diagram of identification of the primary and collateral tracks of communication. In black the primary track (fluent), in other colors the words in the collateral track, in \textcolor{blue}{blue} a filler, in \textcolor{darkgreen}{green} a phrase repetition. }

176 — 2003.01062

\caption{ \textbf{Variational Comfort Space}: We consider a varying comfort space $c$ around a person based on their position (defined by the view-group $g$) in front of the robot. In scenario 1, the pedestrian approaches the robot from the front. Here, as the pedestrian is aware of the robot's presence, it needs to be more respectful of the proxemic comfort space and thus takes action $V_{comfort}$ represented by the \textcolor{OliveGreen}{green} arrow. In scenario 2, the robot is approaching the person from behind. An unaware pedestrian need not be disturbed by the robot, due to which it can be more liberal with its actions. The \textcolor{violet}{violet} arrow representing the safe action $V_{safe}$ coincides with $V_{comfort}$ in this case. \vspace{-0.7cm} }

\caption{ \textbf{Emotionally-Guided Navigation}: We use the emotions detected by {\algoname} along with the LIDAR data to perform \textit{Proxemic Fusion}. This gives us a comfort distance $c$ around a pedestrian for \textit{emotionally-guided} navigation. The \textcolor{OliveGreen}{green} arrows represent the path after accounting for $c$ while the \textcolor{violet}{violet} arrows indicate the path without considering this distance. Observe the significant change in the path taken, especially in the \emph{sad} case. Note that the overhead image is representational, and {\algoname} works entirely from a egocentric camera on a robot. \vspace{-0.4cm}}

\caption{\textbf{{\algoname} Network Architecture}: The network is trained on image embeddings of the 5D gait set G, which are scaled up to $244\times244$. The architecture consists of four group convolution (GC) layers. Each GC layer consists of four groups that have been stacked together. This represents the four group convolution outcomes for each of the four emotion labels. The group convolutions are stacked in two stages represented by \textcolor{blue}{\textbf{Stage 1}} and \textcolor{Dandelion}{\textbf{Stage 2}}. The output of the network has a dimension of $4\times4$ after passing through a $softmax$ layer. The final predicted emotion is given by the maxima of this 4$\times$4 output. \vspace{-0.3cm}}

\caption{\textbf{ProxEmo}: We present a gait-based emotion and proxemics learning algorithm to perform socially-aware robot navigation. The \textcolor{red}{\textbf{red}} arrow indicates the path of the robot without social awareness. The \textcolor{OliveGreen}{\textbf{green}} arrow indicates the new path after an \textit{angry} emotion is detected. Observe the significant shift away from the pedestrian when an \textit{angry} gait is detected. This form of navigation is especially useful when the robot is expected to navigate safely through crowds without causing discomfort to nearby pedestrians. \vspace{-0.5cm}}

177 — 2003.01282

\caption{1-Nearest neighbor graph classification performance on 4 datasets with VNGE and NetLSD. Exact computation results are in bold. Approximations that are close to or better than the exact metric computation are highlighted \colorbox{cycle4!25}{in green.} }

178 — 2003.01314

\caption{Examples chosen from experimental grasp attempts. {\color{red}{$\mathbf{\times}$}} indicates grasp failure and {\color{ForestGreen}{\textbf{\checkmark}}} indicates success. Refer to \secn{discussion} for details.}

179 — 2003.01464

\caption{Variation of final free energy with the input state parameter is depicted for channel parameters $p=q=0.8$. The \textcolor{blue}{dashed} curve denotes the free energy of the final output under the causally inseparable combinations of the channels $\mathcal{N}_{PF}$ and $\mathcal{N}_{GAD}$, whereas the \textcolor{red}{thick} line stands for the final free energy under causally separable combinations of these two channels.}

\caption{The free energy difference between the final and the input state with respect to the input bath is plotted for $p=q=0.8$. The potential work content of the final output in causally inseparable and separable combinations of two thermal channels are denoted by the \textcolor{blue}{dashed} and \textcolor{red}{thick} curves respectively.}

\caption{Channel parameter is chosen as $p=q=0.8$ and input state probability $r=0.5 + 0.01 s$ as mentioned earlier. (a) {\it Fixed bath scenario:} The \textcolor{blue}{thick} straight line at the bottom denotes the free energy of the final output under the causally separable combination of the channels $\mathcal{N}_{PF}$ and $\mathcal{N}_{GAD}$, whereas the \textcolor{brown}{dashed} line stands for the final free energy under causally inseparable combination of these two channels using the free thermal state as the controlling qubit. (b) {\it Varying bath scenario:} The \textcolor{red}{thick} curve denotes the free energy difference between the final output (under the causally separable combination of the channels) and the initial state. The \textcolor{blue}{dashed} line stands for the same under causally inseparable combination of these two channels with thermal controller.}

\caption{\textbf{Thermodynamic advancement using resource in the controller qubit.} For both the plots \textcolor{yellow}{yellow} region denotes the variation of final free energy under causally separable order of occurrence for $\mathcal{N}_{GAD}$ and $\mathcal{N}_{PF}$ and the \textcolor{blue}{blue} region stands for their inseparable causal combination. The phase flip parameter $q$ is chosen as $0.3$. In (a) the final free energy is calculated with respect to the bath fixed by the $\mathcal{N}_{GAD}$ channel parameter $p$, wherein (b) the same is calculated by assuming that the receiver has an access to the same bath as that of the input qubit.}

\caption{\textbf{Thermodynamic advancement using resource free controller qubit.} The phase flip parameter $q$ is chosen as earlier. (a) The variation of the final free energy for causally definite sequence of these two channels is depicted by the \textcolor{yellow}{yellow} region, whereas the \textcolor{blue}{blue} region stands for causally inseparable order of occurrence. (b) Here the \textcolor{green}{green} region depicts the causally inseparable case and the \textcolor{red}{red} region depicts causally separable case.}

180 — 2003.01473

\caption{Comparison with the previous state-of-the-art methods. \textcolor{blue}{\textbf{Bold}} indicates best value overall. $\text{Unified VLP}^\star$ and ${\text{\modelshort}}^\star$ perform \lpretrain. The former is initialized from UniLM, while the latter is pre-trained from scratch with less text data, which is detailed in Section~\ref{sec:exp_settings}. Both Unified VLP and \modelshort\perform\vlpretrain\(see Section~\ref{sec:exp_settings}) where the weights are initialized from Text Pre-training and pre-trained on different tasks, respectively.}

181 — 2003.01517

\caption{Features during transition for images for Asymmmetric Mutation (\protect\plotblack), Uniform Random Walk (\protect\plotnavy), Biased Random Walk (\protect\plotmagenta), EA-UniformWalk (\protect\plotcyan), EA-BiasedWalk (\protect\plotlime), EA-AsymUniformWalk (\protect\plotyellow) and EA-AsymBiasedWalk (\protect\plotrot) for images from Figure 1 (left), Black-White (middle), Figure~\ref{fig:color} (right). Generation number is shown on the $x$-axis and feature values on the $y$-axis.}

182 — 2003.01711

\caption{Comparison between the convolutional operations used in the DARTS search space (\ref{fig:real_sep_conv} and~\ref{fig:real_dil_conv}) and the proposed ones (\ref{fig:bin_sep_conv} and~\ref{fig:bin_dil_conv}). $k \times k$ denotes the kernel size, \protect\tikz \protect\node[circle, draw=black, fill=white, inner sep=0cm, minimum size=0.35cm] (out) {$+$}; is the element-wise summation operation while each rectangle represents a given operation defined by the inner text.}

183 — 2003.01821

\caption{Performance of all classifiers for the {\color{red} NAME} dataset. }

184 — 2003.02541

\caption{Mitigating the effects of uncertainty propagation from source. [\textcolor{blue}{blue}: source domain, \textcolor{red}{red}: target domain, \textcolor{hui}{gray}: adversarial alignment.]}

\caption{t-SNE visualizations for two transfer tasks A$\to$D (upper row) and Ar$\to$Cl (bottom row). ({\color{blue}blue: source data}, {\color{red}red: target data}).}

185 — 2003.02546

\caption{Comparison of Euclidean distance heatmaps between triplet and EE + triplet loss during training CARS196 dataset. In each heatmap, given two samples from each class, all the rows and columns are the first and the second samples from each class, respectively. The main diagonal is the distance of positive pairs, where the entries outside the diagonal are the distance of negative pairs. The smaller distance of negative pair ({\color{yellow}yellow} and {\color{red}red}) indicates the harder negative pairs, where all distance is normalized between 0 and 1.}

186 — 2003.02938

\caption{Visualization of the distribution of exposure variable (left) and relationship between exposure and outcome (right). In the left figure, the high-density region of $A$ is highlighted. In the right figure, the true marginal relationship is shown in {\color{red} \textbf{red}}, a simple unweighted linear smoother is shown in {\color{blue} \textbf{blue}}, and an unweighted quadratic estimation is provided in {\color{green} \textbf{green}}.}

\caption{Performance estimating the dose-response curve across repeated simulated samples: Local Linear Regression. The {\color{red} \textbf{red}} line is the true population dose-response curve. The {\color{blue} \textbf{blue}} lines represent the estimated curves; the solid line is the mean across replications and the dotted lines represent a 95\% equal-tail interval of the density of estimates. }

\caption{Performance estimating the dose-response curve across repeated simulated samples: Linear Regression - second order polynomial in treatment. The {\color{red} \textbf{red}} line is the true population dose-response curve. The {\color{blue} \textbf{blue}} lines represent the estimated curves; the solid line is the mean across replications and the dotted lines represent a 95\% equal-tail interval of the density of estimates. }

\caption{Simulation results for bootstrap confidence intervals. Upper left panel contains a single visualization of a bootstrapped confidence interval for the curve. Throughout the {\color{red} \textbf{red}} curves represent the ``truth'' and the {\color{blue} \textbf{blue}} solid line represents the estimated curve; the dotted blue lines represent estimated 95\% confidence intervals. Upper right panel contains the point-wise coverage of the 95\% bootstrapped confidence intervals. Lower panels contain the magnitude (left) and ratio (right) of the average point-wise bootstrap standard error across the bootstrap simulations as compared to the standard error of the estimated curves obtained in this simulation. In all figures the vertical lines represent the $1^{st}, 5^{th}, 95^{th}, 99^{th}$ quantiles of the distribution of the exposure variable $A$ in the high-density region.}

\caption{Visualization of the distribution of treatment variable (left) and relationship between treatment and outcome (right). In the left figure, the high-density region of $A$ is highlighted. In the right figure, the true marginal relationship is shown in {\color{red} \textbf{red}}, a simple unweighted linear smoother is shown in {\color{blue} \textbf{blue}}, and an unweighted linear estimation is provided in {\color{green} \textbf{green}}.}

187 — 2003.02953

\caption{Receptive field centers of the output neurons in a strided FCN on a 256x256 px input image ({\color{blue} +}: stride 32, {\color{darkgreen}$\times$}: stride 16). \textit{Left:} Normal striding logic, where the top left result is kept per 2x2 block. Consequently, the receptive field centers are not symmetrically distributed and dense prediction introduces bias. \textit{Right:} We use centered striding by reversing the stride logic in the last strided layer (\ie, bottom right result taken, instead of top left). This way the receptive fields are symmetrically distributed over the image and dense prediction at test-time introduces new bins in a proportional manner around each training-time bin. }

188 — 2003.03014

\caption{Example paragraphs with extremely high and low valence scores, along with an interpretation of the patterns we find. Words with extremely high valence scores (greater than 0.85) appear in {\color{blue}blue}, and somewhat high-valence words (scores between 0.7 and 0.85) appear in {\color{RoyalBlue}light blue}. Words with extremely low valence scores (less than 0.15) appear in {\color{red}red}, and somewhat low-valence words (scores between 0.15 and 0.3) appear in {\color{Rhodamine} pink}.}

\caption{Examples mischaracterized by paragraph-level valence analysis. Words with extremely high valence scores (greater than 0.85) appear in {\color{blue}blue}, and somewhat high-valence words (scores between 0.7 and 0.85) appear in {\color{RoyalBlue}light blue}. Words with extremely low valence scores (less than 0.15) appear in {\color{red}red}, and somewhat low-valence words (scores between 0.15 and 0.3) appear in {\color{Rhodamine} pink}.}

189 — 2003.03031

\caption{Out-of-plane XRD results for Cu-MNN films on MgO (001) substrates. (a) XRD profiles vs.\$T_s$ around (002) peaks under N$_2=4.0\%$. (b) Lattice constant $c$ vs.\substrate temperature. The dashed line indicates the bulk lattice constant.\cite{Cu-MNN_AHE} (c) XRD profiles vs.\N$_2\%$ around (002) peaks at $T_s=375\,^{\circ}$C. (d) Lattice constant $c$ vs.\N$_2\%$. }

\caption{ Dependence of the Hall resistivity of (111) films on the annealing conditions: (a) without annealing, with annealing (b) under a vacuum (vacuum anneal), and (c) under the same atmosphere used for growing the film (gas anneal). \red{The enlarged Hall resistivity of films without and with annealing is shown in panel (d) and (e).} The gas annealing data for the (111) films are adapted from Ref.~\onlinecite{Cu-MNN_AHE}. }

\caption{ \red{Anomalous Hall resistivity and out-of-plane magnetization at 7~T as a function of temperature of gas annealed Cu-MNN (a) (001) and (b) (111) films. For $\rho_{xy}$ of (001) films, the contribution of ordinarily Hall effect is subtracted. The gas annealing data for the (111) films are adapted from Ref.~\onlinecite{Cu-MNN_AHE}.}}

\caption{ \red{Comparison of the normalized AHC by magnetization with similar Mn-based magnetic films.} }

190 — 2003.03103

\caption{Total (\CIRCLE), parallel (\textcolor{red}{$\blacksquare$}) and perpendicular (\textcolor{blue}{$\blacktriangle$}) diffusion coefficients of oblate and prolate HBPs in nematic LCs. The vertical dashed line at $W^*=\sqrt{L^*} \approx 3.46$ indicates the transition from prolate to oblate particle shapes. The solid lines are a guide for the eyes.}

\caption{Total ($\Circle$), parallel (\textcolor{red}{$\square$}) and perpendicular (\textcolor{blue}{$\triangle$}) self-part of the van Hove correlation function of prolate HBPs with $W^*=1$. Left, middle and right frames refer, respectively, to $t/\tau=0.1$, 3.3 and 2400. Dashed lines are Gaussian distributions obtained from Eqs.\(6) and (7).}

\caption{Total ($\Circle$), parallel (\textcolor{red}{$\square$}) and perpendicular (\textcolor{blue}{$\triangle$}) self-part of the van Hove correlation function of oblate HBPs with $W^*=12$. Left, middle and right frames refer, respectively, to $t/\tau=0.1$, 3.3 and 2400. Dashed lines are Gaussian distributions obtained from Eqs.\(6) and (8).}

191 — 2003.03309

\caption{Results for $m=5$ diffusively coupled map lattices CMLs, (a) {clustering coefficient} $ \mathcal{C}$, and (b) average path length $\mathcal{L}$ of \textcolor{blue}{the (regular)} recurrence network.}

192 — 2003.03389

\caption{Comparison of mean-vorticity change $\overline{\zeta}-\zeta_0$ (\textcolor{red}{$\circ$}) with $\Dlt \mathcal{A}/2$ (solid line) along a radial line and for $t=12.7$ (left), 30 (middle) and 43.2 (right) inertial periods.}

193 — 2003.03468

\caption{Comparison of two different FDTD simulations at 30 kHz, one in which the lunar surface is assumed to be smooth (blue) and one in which elevation data from the LOLA instrument was used to model the topography lunar surface (orange). As in \textcolor{blue}{Figure~\ref{fig:RFI_surf}}, the grey region indicates the geometric quiet region without diffraction. The results show that craters and mountains on the Moon create both constructive and destructive interference leading to differences in the intensity of the RFI. These small variations are on the order of a few dB.}

194 — 2003.03771

\caption{Landmarks predicted (\textcolor{red}{red dots}) on images without human faces. Heatmaps of four landmarks are also presented to show the positions of reponses. (a) Training and testing on plain black images. (b)-(d) Training on face images, and testing on CIFAR-10 images that do not contain human faces\label{cnn-position}}

\caption{An example of image translation based data distillation. (a) Predicted landmarks (\textcolor{red}{red dots}) as well as ground-truths (\textcolor{green}{green dots}) on a cross-domain image. Heatmaps of four landmarks are also visualized for better understanding. (b) The ensembled predictions and heatmaps from four translated images. (c)-(f) Predictions and heatmaps of four translated images (10 pixels up, down, left and right) respectively\label{false-positive}}

195 — 2003.04012

\caption[test]{ \textbf{Universal scaling of the rheotactic velocity.} \textbf{(\textit{A})} Dependence of the scaled mean rheotactic velocity $v_y/v_0$ on the shear rate $\dot{\gamma}$ for non-tumbling bacteria in simple shear (simulations) for different parameter sets (rotational diffusion $D_r$, bacterium aspect ratio $\alpha$, chiral strength $\nu$). \textbf{(\textit{B})} Results as shown in \textbf{(\textit{A})} but plotted against the chirality number $\mathcal{C}$. \textbf{(\textit{C})} Data in \textbf{(\textit{B})} for $\alpha=5$ compared to tumbling bacteria in simple shear flow and Poiseuille flow. \textbf{(\textit{D})} Slow algebraic saturation at high shear rates. Color code indicated in \textbf{(\textit{D})}. Symbol code used in all subfigures: \newline \includegraphics[height=7pt]{symb1.eps} $D_r=0.057$ $\alpha=5$ $\nu=0.06$; \includegraphics[height=7pt]{symb4.eps} $D_r=0.057$ $\alpha=5$ $\nu=0.006$; \includegraphics[height=7pt]{symb7.eps} $D_r=0.2$ $\alpha=5$ $\nu=0.06$; \includegraphics[height=7pt]{symb8.eps} $D_r=0.057$ $\alpha=5$ $\nu=0.02$; \includegraphics[height=7pt]{symb9.eps} $D_r=0.057$ $\alpha=5$ $\nu=0.1$; \includegraphics[height=7pt]{symb10.eps} $D_r=0.1$ $\alpha=5$ $\nu=0.06$; \includegraphics[height=7pt]{symb5.eps} $D_r=0.057$ $\alpha=3$ $\nu=0.06$; \includegraphics[height=7pt]{symb2.eps} $D_r=0.057$ $\alpha=10$ $\nu=0.06$. }

196 — 2003.04151

\caption{Comparison of test accuracy against state-of-the art methods for 1-shot and 5-shot classification using \textit{mini}Imagenet and \textit{tiered}Imagenet. The second column shows the number of parameters of each model in thousands (K). $^*Robust-20$ uses an 18-layer residual network. \textcolor{gray}{Gray colored results are obtained using $224\times 224$ pixels instead of the standard $84 \times 84$ pixel images.}}

\caption{Comparison with the state of the art on CUB-200-2011. $^*Robust-20++$ uses an 18-layer residual network, and \textcolor{gray}{Accuracies obtained with $224\times 224$ images appear in gray.}}

197 — 2003.04249

\caption{Different regimes of Compton-plasma interaction. Left column: electrostatic field. Middle column: electron phase space. Right column: photon space space. a) \textcolor{blue}{incoherent wake: the space charge force is too weak to pull back electrons most of the scattered electrons}, $\lambda>\lambda_C$ and \textcolor{blue}{$\mathcal{E}_0 \sim 0.01~\mathcal{E}_{\mathrm{min}}$} b) coherent wake: the space charge is strong enough to pull back most of the electrons, $\lambda>\lambda_C$ and \textcolor{blue}{$\mathcal{E}_0 \sim \mathcal{E}_{\mathrm{min}}$} c) beam driven wake: the scattered electrons are relativistically kicked forward and formed on top of the photon burst a dense beam that contributes to drive the wake, $\lambda < \lambda_C$.}

\caption{Amplitude of the electrostatic field as a function of the energy density of a resonant photon burst. We simulated both a burst of mono-energetic photons (circles) and a burst with an energy spread of \textcolor{blue}{$50\%$ (crosses)}. The linear theory of Eq.(\ref{Eq:Efield_scaling}) is displayed by the solid line.}

198 — 2003.04260

\caption{(a) and (c) are RGB image and the corresponding 3D point cloud acquired by camera and LiDAR sensor. (b) and (d) are semantic segmentation results of (a) and (c). {\textcolor[rgb]{0.8,.26,.29}{Red}}, {\textcolor[rgb]{0.29, 0.52, 0.25}{green}}, {\textcolor[rgb]{0.16, 0.14, 0.47}{blue}} represents pedestrians, vehicles, bicycles class respectively. Three filled circles in (d) and (d) indicates semantic centroids (SC) of each class. }

\caption{Cost change of the designed cost function along with the (\protect\subref{loss_fig_rot_ang}) angular and (\protect\subref{loss_fig_trans}) translation displacement of \textcolor[HTML]{e74c3c}{$\textbf{x}-$}, \textcolor[HTML]{2ecc71}{$\textbf{y}-$}, \textcolor[HTML]{3498db}{$\textbf{z}-$}axis respectively. The cost is calculated with 20 pairs. solid line \protect\tikz[baseline=-0.5ex]{\protect\draw[thick] (0,0) -- (0.5,0);} : Vehicles; dash-dot line \protect\tikz[baseline=-0.5ex]{\protect\draw[thick,dash dot] (0,0) -- (0.5,0);} : Pedestrians; dashed line \protect\tikz[baseline=-0.5ex]{\protect\draw[thick,dashed] (0,0) -- (0.5,0);} : Cyclists. The interval for angle displacement is 0.01$^{\circ}$ and 5[mm] for translation displacement. }

\caption{Correspondence of semantic centroids with the estimated initial parameters from 50 pairs. {\color{green}Green} numbers indicate semantic centroids from images and {\color{blue}blue} numbers show projected point cloud semantic centroids. The number indicate the index of the image-pointcloud pair.}

199 — 2003.04262

\caption{\small\textbf{Visual results for relation segmentation, on PIC \texttt{test} set in PIC$_{19}$ Challenge} (\S\ref{sec:41}). First column: Instance segmentation results. Last five columns: Top ranked $_{\!}\left\langle\textit{{human}, {verb}, {object}}\right\rangle_{\!}$ triplets. For each triplet, the \textit{human} and \textit{object} are shown in {\color{red}red} and {\color{green}green}. %Our model can also recognize fine-grained human-human interactions. }

200 — 2003.04422

\caption{\textcolor{\mycol}{Accuracy on CIFAR-10 depending on the dataset size}}

\caption{\textcolor{\mycol}{Accuracy on CIFAR-100 depending on filter size}}

\caption{\textcolor{\mycol}{Accuracy on CIFAR-100 for VGG-10 without BN for epochs$\leq$30}}

\caption{\textcolor{\mycol}{Accuracy on CIFAR-100 for VGG-10 without BN}}

\caption{\textcolor{\mycol}{Accuracy on CIFAR-100 for VGG-10 with BN}}

\caption{\textcolor{\mycol}{Accuracy on CIFAR-100 depending on initial learning rates }}

201 — 2003.04547

\caption{The overall architecture of our residual encoder-decoder QRNN3D. The network contains layers of symmetric QRU3D with convolution and deconvolution for encoder \textcolor{blue}{(blue)} and decoder \textcolor{orange}{(orange)} respectively. Symmetric skip connections are added in each layer. Besides, alternating directional structure is equipped in all layers except the top and bottom ones with bidirectional structure to avoid bias. }

202 — 2003.04716

\caption{Quantitative evaluations on the REDS dataset~\cite{REDS} in terms of PSNR and SSIM. All the results are generated according to the published models for fair comparisons. The best two results are shown in \textcolor[rgb]{1.00,0.00,0.00}{\textbf{red}} and \textcolor[rgb]{0.00,0.00,1.00}{\underline{blue}}. }

\caption{Quantitative evaluations on the Vid4 dataset~\cite{Bayesian/vsr/tpami14} and SPMCS dataset~\cite{xintao/iccv17} in terms of PSNR and SSIM. All the results are generated according to the published models for fair comparisons. * means the values from the reported results~\cite{tof}. The best two results are shown in \textcolor[rgb]{1.00,0.00,0.00}{\textbf{red}} and \textcolor[rgb]{0.00,0.00,1.00}{\underline{blue}}. }

203 — 2003.04784

\caption{A UAV is tracked using multiple unsynchronized cameras (bottom) with unknown poses. Our method robustly retrieves the 3D trajectory (\textcolor{black}{black}), camera poses and camera synchronization. The trajectory has a mean error of 7.6 cm compared to the ground truth (\textcolor{red}{red}).}

204 — 2003.04813

\caption{(\textcolor{red}{$\CIRCLE$})In-plane and (\textcolor{blue}{\textbf{$\Circle$}}) out-of-plane magnetization loop for the CMO thin film at 10K.}

205 — 2003.04979

\caption{\it Model of the wetting of a droplet on a planar surface. The points \large $\bullet$ are the first fluid and at the interface, where $\xi =$ {\em 1} and $\kappa^* = \kappa$, the points \textcolor{blue}{\large $\bullet$} are in the second fluid, where $\xi =$ {\em 0}, and the point \large $\bullet$ is on the triple line, where $\xi =$ {\em 1} and $\kappa_{\hbox{\scriptsize c}} = \kappa$.}

206 — 2003.05046

\caption[Comparison of theory and experiment]{Summary of cases where theory fits the experiments, in terms of the position or width as a function of distance or the trend in $\Delta$position or $\Delta$width as a function of corner angle or print speed. Experiments match the smoothing (\textcolor{Maroon}{sm}), ringing (\textbf{\textcolor{Gray}{ri}}), and/or swelling (\textbf{sw}) theory, exhibit trends that could come from a combination of theories or from the absence of all effects (none), or exhibit trends that occur in none of the theories (?).}

207 — 2003.05059

\caption{\textcolor{blue}{Vehicle trajectory violating rear-end safety constraint.}}

\caption{\textcolor{blue}{Vehicle trajectory subject to the safety constraint.}}

\caption{\textcolor{blue}{Accumulated fuel consumption over time for the baseline and optimal controlled vehicles.}}

208 — 2003.05078

\caption{Across the world, there are many different ways to refer to \protect\inlinegraphics{im/small_dog.png}. But in the visual domain, a \protect\inlinegraphics{im/small_dog.png} is simply a \protect\inlinegraphics{im/small_dog.png} everywhere on Earth. In this work, we leverage this observation to learn to translate words in different languages without \emph{any} paired bilingual data.}

209 — 2003.05257

\caption{Example results on \textit{synthetic-3D-lanes}. Our detected lanes \textcolor{blue}{(blue)}, the ground truth \textcolor{red}{(red)} and 3d-LaneNet lanes \textcolor{cyan}{(cyan)}. It is visible that our method is less constrained and detects all lanes in the scene.}

\caption{Example results on \textit{3D-lanes}. (a) Depicts examples from the training set, and (b) examples from the test set. It is clear that the surface geometries and curvatures appear in the test set are different from the train set. Our detected lanes are shown in \textcolor{blue}{blue} and 3D-LaneNet lanes in \textcolor{cyan}{cyan}.}

210 — 2003.05326

\caption{Frame per second (FPS) and millisecond per frame (MSPF) of real-time trackers using single CPU reported on UAV123@10fps. {\color{red}Red} , {\color{green}green}, and {\color{blue}blue} fonts indicate the first, second and third place, respectively.}

211 — 2003.05865

\caption{Non-energy weighted ($m_0$), energy weighted ($m_1$), and inverse energy weighted ($m_{-1}$) sum rules for monopole, dipole, and quadrupole transitions in $^4$He. NCSM and SA-NCSM calculations are performed for the N3LO-EM and NNLO$_{\mathrm{opt}}$ interactions \blue{(NN only)}; NCSM and SA-NCSM results are the extrapolated values and include estimated uncertainties $\sigma$ based on small variations in $\hw$. }

212 — 2003.05922

\caption{Data distributions from 1300 SE projects (shown in \colorbox{cadetblue}{ \textcolor{white}{teal}}) \& 59 CS projects (shown in\colorbox{amethyst}{ \textcolor{white}{purple}}).}

213 — 2003.05925

\caption{ Decay of the number of atoms in the \Fo{} and \Ft{} states. Results are shown for clouds held in a single-frequency, linearly polarised \rf{}-dressed potential, for Rabi frequencies of \SI{290}{\milli\gauss} (\textcolor[rgb]{0.0859,0.1758,0.3125}{\rule{7pt}{1.5pt}}), \SI{570}{\milli\gauss} (\textcolor[rgb]{0.4144,0,0.5179}{\rule{7pt}{1.5pt}}) and \SI{940}{\milli\gauss} (\textcolor[rgb]{0.1,0.6,0.6}{\rule{7pt}{1.5pt}}). a) Measurements of \Fo{} atom number against hold time, in contact with atoms of \Ft. The lines indicate best-fit curves for exponential decay. Inset: exponential fits of \Fo{} atom number vs. hold time for three Rabi frequency shells for atoms with \Fo{} alone. b) Measurements of \Ft{} atom number against hold time, in contact with atoms of \Fo. c) Measured 1/e lifetimes for \Fo{} as a function of the initial number density of \Ft. Shaded regions represent the lifetime of \Fo{} atoms alone when trapped in an identical potential. d) Measured 1/e lifetimes for \Ft{} as a function of the initial number density of \Fo. The vertical error bars in both plots correspond to the uncertainty in the fitted rate coefficient, while horizontal error bars indicate the uncertainty in the initial number density.}

214 — 2003.05943

\caption{GCM simulation of \wasp \citep{Parmentier2018} without \hh\dissociation. Temperature (top) and the water abundance (bottom) for equatorial cut (left), limb cut (middle), and pole cut (right). From center outward, the 5 solid lines are respectively the$1,434.10^7$, $10^3$, 1, $10^{-2}$, and $10^{-4}$ Pa pressure levels. The colormap for water abundance maps goes from $5.10^{-4}$ to $10^{-7}$. Note that the radius of the planet and the atmosphere are shown to scale.}

\caption{(Left): Transmission spectra of \wasp at resolution of R = 100 for \gcm simulations assuming a constant \hh abundance in the whole atmosphere. When water dissociation is taken into account (light blue line), the water features become shallower compared to when we assume no water dissociation (blue line). (Right): Transmission spectra of \wasp at resolution of R = 100 for \gcm simulations taking into account \hho dissociation in the atmosphere. When \hh\dissociation is considered (blue line), the\co\features appears more clearly compared to when we neglect\hh\dissociation (light blue line). Black line and grey line correspond to the transmission spectra for an atmosphere without\co\for the water constant and dissociated case respectively to highlights the features of\co\in the other curves.}

\caption{Summary of all the retrieval results for \Tm, \Tp and \gcm simulation considering every cases. We show the \COratio ratio (top left), the temperature (top right), the log abundances of \co\(middle left) and \hho\(middle right), the \redchi\(bottom left) and the planetary radius (bottom right). Those retrievals have been calculated with a shot noise assuming a floor noise of 30ppms through the whole spectral domain. The red line represents the input value from our simulations and the black dot line shows where the \redchi=1.}

215 — 2003.05950

\caption{Additional \aastex\symbols}

216 — 2003.06068

\caption{As we take longer samples, the network grows. Nodes \textcolor{blue}{(BLUE)} indicate new addresses, and edges \textcolor{red}{(RED)} indicates new transactions. R$^{2}$ values for best fit lines for nodes and edges are 0.997 and 0.996 respectively}

217 — 2003.06125

\caption{Comparison of our DTMNet with the state of the arts on DAVIS 2016 val. \textcolor[rgb]{1.00,0.00,0.00}{\textbf{Red}} and \textcolor[rgb]{0.00,0.00,1.00}{{\textbf{blue}}} bold fonts indicate the best, the second-best performance respectively. }

218 — 2003.06167

\caption{Statistic comparisons of our GCAGC with the other state-of-the-arts. \textcolor[rgb]{1.00,0.00,0.00}{\textbf{Red}} and \textcolor[rgb]{0.00,0.00,1.00}{{\textbf{blue}}} bold fonts indicate the best and second best performance, respectively. }

\caption{Ablative studies of our model on iCoseg and Cosal2015. Here GCAGC-N, GCAGC-M, GCAGC-P denote our GCAGC in absence of AGCN, AGCM and the projection matrices $\textbf{P}$ in (\ref{eq:Ak}), respectively. \textcolor[rgb]{1.00,0.00,0.00}{\textbf{Red}} bold font indicates the best performance.}

219 — 2003.06364

\caption{The dependence of the increments of the mean particles density on time for ellipses, spherocylinders, rectangles and dimers of width-to-height ratio $x=2.0$. The inset shows the dependence of the exponent $d$ from Eq. \ref{eq:fl} on the dimensionless time $t$ \red{(\ref{t})}. The value of parameter $d$ for a given time $t$ was estimated as a best fit of Eq. \ref{eq:fl} to numerical data in the range $[10^{-2}t, t]$. Ends of the lines correspond to the time $t_{min}$ for which the first of $100$ generated packing saturates. }

220 — 2003.06576

\caption{(a) \textbf{visual-explainable ability}: The \textcolor{green}{\textbf{green}} boxes denote their scores $s(\hat{a}, \bm{v}) \textgreater 0$, \ie, positive contributions to final predictions; The \textcolor{red}{\textbf{red}} boxes denote their scores $s(\hat{a}, \bm{v}) \textless 0$, \ie, negative contributions to final predictions. Only objects which are highly related to the QA pair are shown (\ie, $\mathcal{SIM} \geq 0.6$). (b) \textbf{question-sensitive ability}: The different shades of green color in the question denotes the relative values of $s(\hat{a}, \bm{w})$. Thus, the word with darker green denotes the word has larger contribution to final predictions.}

221 — 2003.06613

\caption{\textcolor{blue}{show correlation or mutual information instead of reduction in parameter and response variance} }

222 — 2003.06651

\caption{Top nearest neighbours of the fastText vector of the word \textit{Ruby} are clustered according to various senses of this word: {\color{myred} programming language}, {\color{myblue} gem}, {\color{myorange} first name}, {\color{mygreen} color}, but also its spelling variations (typeset in black color).}

223 — 2003.06700

\caption{\blue{Speedups by composability-based pruning with different subspace sizes.}}

224 — 2003.06761

\caption{Detailed comparisons on VOT2018. The best two results are highlighted in {\color{red} red} and {\color{blue} blue} fonts. DiMP is the ResNet-50 version (DiMP-50), the same below.}

225 — 2003.06906

\caption{\small Top down view of two robots ({\color{red}top}, {\color{blue}bottom}) separated by obstacles ({\color{green}center}) must meet each other. Both robots are decentrally controlled and there's no communication. How should they move in order to meet? Example trajectories are illustrated in dashed arrows with the robot's corresponding colors.}

\caption{\small Training environment with randomly filled obstacles used for training the dynamics prediction models $\f_i, \f_{-i}$. All agents ({\color{blue}{left}}, {\color{red}{upper right}}) are given the same random goal ({\color{green}{center}}) and move with their own P2P policies towards it. \label{fig:training_env}}

\caption{\small Goal ($\goal_0$ and $\goal_G$) evaluation in the high-level policy $\hp$. At the end of a simulated trajectory, the agents (\textcolor{red}{left} and \textcolor{blue}{right}) are either a) far or b) close to each other. A goal reward is based on the negative final distance among agents. $\goal_G$ is a better goal than $\goal_0$ because agents end up closer to each other. \label{fig:goal_evaluation_examples}}

\caption{\small Performance of \alg\on wall world varying the prediction model. Lower is better.\texttt{delta-pose-lidar} (blue) is ours. \label{fig:prediction_perf}}

226 — 2003.06951

\caption{Calculating one-to-one operation in a parallel computing friendly way. Given $G$ images in a group ($G=4$ in this example), the number of sequential executions can be reduced from (a) $\mathrm{P}_G^2$ (or $\mathrm{C}_G^2$ when an operation is commutative) to (b) $G-1$. With the cyclic shift $S(\cdot,k)$, the operations colored in \textcolor{red}{red} arrows can be calculated in parallel. For each execution, the step size of cyclic shift $k$ traverses from $1$ to $G-1$.}

227 — 2003.06977

\caption{Sequence of snapshots taken at varying steps where the robot executes an entire clean-up task. A motion planner is used to ground the high-level commands ``remove object'', ``add object'', ``do nothing'' obtained from the policy $\pi(x,w,u)$. The borders of the area $\mathcal{C}$ is only shown in the third snapshot and are not visible to the robot's vision sensor. The rendering of the floor was omitted for clarity but it is visible to the robot. Experiments can be watched in the following {\color{blue} \href{https://gjmaeda.github.io/research/invariant_task_progress_estimation/task_progress_estimation_1min.mp4}{video}} }

228 — 2003.07064

\caption{\textcolor{red}{change the labels of plot, new image is needed!} }

\caption{ \textcolor{red}{change the labels of plot. maybe we can put them next to each other}}

\caption{\textcolor{red}{change the labels of plot} }

\caption{\textcolor{red}{Person re-identification experiment. Resnet-18, 34 and 50 are train from scratch. The baseline model loses its performance when the model complexity rises. For F-Conv model, the highest performance is obtain with the Resnet-50. The ranking scores follow mAP scores} }

229 — 2003.07233

\caption{Model Yield and Neural Cleanse Detection for Random-Rectangular trigger. \\ \\ {\scriptsize In each cell, the denominator indicates the number of models which passed the 95\% threshold for clean and triggered data performance, and the numerator indicates how many of those were detected to be anomalous by Neural Cleanse. For each trigger configuration, cells colored in \textcolor{green}{green} indicate the best Neural Cleanse performance, and cells colored in \textcolor{red}{red} indicate the worst Neural Cleanse performance.}}

230 — 2003.07245

\caption{Wrinkles are not expressed in equi-biaxially pre-stretched charge-controlled plates. Here the solid curves are the loading curves for the neo-Hookean dielectric model with pre-stresses $s=0, 0.8, 1.5, 2.5, 4.5$. The dashed curve is the thick-plate limit \eqref{ideal-shortwave}. None of the pre-stretched curves cross the greyed zone where wrinkling occurs, between the thick-plate (dashed curve) and thin-plate ($s=0$ loading curve) limits, so wrinkling does not take place. The dots are the result of Finite Element calculations using COMSOL Multiphysics{\small \textregistered} (Section \ref{Finite Element simulations}), which turn out to be very stable numerically. We conducted the same calculations for the Gent dielectric with $J_m = 97.2$ and found almost identical plots (not shown here).}

\caption{Stretches in the dielectric plate after uni-axial loading by a weight, prior to activation, as computed by FE analysis using COMSOL Multiphysics{\small \textregistered}. We used the same physical characteristics as those in the experiments by Keplinger et al. \cite{Kepl10}. Dimensions: length 100 mm; width 50 mm; thickness 1 mm. Attached mass: 150 g. Constitutive model: neo-Hookean dielectric with $\mu =$ 9833.07 Pa.}

231 — 2003.07268

\caption{Left: Measured average forecasting performance in terms of sMAPE using the predicted forecasting error to perform model selection in the \textsc{weekly} dataset. Results with Bayes-LeNet and GPs as \monitoring{}, and using fixed forecasting models over the whole horizon. Error bars denote standard deviation. Right: Worst (top) and best (bottom) model selection performances in comparison with \tsensembler{} and \hyndmeta{}. GPs is used as \textit{monitoring model} with six (GP-6) and ten \monitored{} (GP-10).}

232 — 2003.07496

\caption{Visualization of some examples of the nodes and the edges of DEPARA. For the nodes, we visualize three examples from taskonomy data, Indoor Scene and COCO, respectively. For the edges, we randomly sample $30$ nodes from taskonomy data and show their interconnections. Note that some weak connections are omitted for better visualization. Here we select two {\color{green}3D} tasks, three {\color{blue}2D} tasks, two {\color{red}geometric} tasks, and two {\color{magenta}semantic} tasks for visualization. The task similarity tree derived from taskonomy is depicted above task names. }

233 — 2003.07618

\caption{Performance on the Intel\textregistered Core\texttrademark i7-6700K 4.00GHz CPU in OpenVINO\texttrademark R3 2019 Toolkit. Batch Size is Set to 1, Input Resolution is $256\times128$, Inference Precision is FP32}

234 — 2003.07640

\caption{Data source used for training EventSR. (R/S for real/synthetic, P1/P2/P3 for phase 1/2/3, Eval for numerical evaluation, Gen. for generalization to real data, \checkmark/ \xmark for yes/no, and \redcheck~ indicates very crucial for training EventSR.)}

235 — 2003.07668

\caption{The first (top left) panel shows the averages $\langle x\rangle$ (solid lines) and $\langle v\rangle$ (dashed lines) against time. The second (top right) panel shows the corresponding standard deviations $\sigma_x$ (solid) and $\sigma_v$ (dashed). The third (bottom left) panel shows the cross-correlation. The fourth (bottom right) panel is the phase portrait in $(\langle x\rangle,\langle v\rangle)$. {The dots with associated numbers correspond to $t=0, 15, 40$.} For all panels, [black,{\color{blue}blue},{\color{red}red}] correspond to $D_x=[1,4,16]\cdot10^{-4}$, respectively.}

236 — 2003.07758

\caption{Another example of the qualitative results for a video in the validation set. In the video, a lady is shown speaking twice (in $p_2$ and $p_{10}$). Since MDVC is conditioned not only on visual (V) but also speech (S) and audio (A) modalities, it managed to hallucinate a caption containing a ``\textcolor{blue}{woman}'' instead of a ``\textcolor{red}{man}''. We invite a reader to watch it on YouTube for a better impression (\texttt{\href{https://www.youtube.com/embed/EGrXaq213Oc?rel=0}{EGrXaq213Oc}}). Note: the frame size mimics the MDVC input; the scale of temporal segments is not precise. Best viewed in color.}

237 — 2003.07820

\caption{Judging statistics for the Document Ranking and Passage Ranking tasks. Given are the number of documents judged (any variant of) relevant, the total number of documents judged, and the fraction of judged documents that are relevant (Relevant Ratio). Topics were excluded from the evaluation set if they had fewer than 3 relevant or if the fraction of judged documents that are relevant was greater than 0.6. Data for excluded topics are given \textcolor{gray}{in gray}. The final rows gives the total number of documents judged and the number of documents judged when not counting excluded topics.}

238 — 2003.07847

\caption{\textbf{(Top)} Previous work has studied 3D MOT and trajectory forecasting separately. The entire pipeline is in a cascaded manner where the tracking outputs are fed to the forecasting module. \textbf{(Bottom)} Our proposed model jointly achieves the tracking and forecasting. Also, we propose two innovations: (1) a feature interaction using GNNs (shown as \textcolor{blue}{blue}) to improve the tracking association and trajectory forecasting in the presence of multiple agents; (2) a diversity sampling (shown as \textcolor{orange}{orange}) to improve the sample efficiency and produce diverse and accurate trajectory samples.}

239 — 2003.08237

\caption{\label{fig:FixEfficienNet} Improvement brought by FixRes (in \textbf{bold}) to several popular architectures from the literature. Our FixEfficientNet (\textcolor{orange}{orange curve}) surpasses all EfficientNet models, including the models trained with Noisy student (\textcolor{red}{red curve}) and adversarial examples (\textcolor{blue}{blue curve}). The sws models are from~\cite{Yalniz2019BillionscaleSL}. Tables~\ref{tab:sota_extra_data} and~\ref{tab:sota} report results on larger models. }

240 — 2003.08290

\caption{\textcolor{r3}{Wiedemann 1974 model and car-following state \citep{ptv2018ptv}}}

\caption{\textcolor{re}{Local coordination strategy}}

\caption{\textcolor{re}{Time-to-collision (TTC) CDF}}

\caption{\textcolor{r2}{List of Abbreviations}}

\caption{\textcolor{re}{Flowchart for CACC local coordination} \citep{lee2014mobility, NAP25366}}

241 — 2003.08333

\caption{An \textbf{overview} of CFBI. F-G denotes Foreground-Background. We use \textcolor{red}{red} and \textcolor{blue}{blue} to indicate foreground and background separately. The deeper the red or blue color, the higher the confidence. Given the first frame ($t=1$), previous frame ($t=T-1$), and current frame ($t=T$), we firstly extract their pixel-wise embedding by using a backbone network. Second, we separate the first and previous frame embeddings into the foreground and background pixels based on their masks. After that, we use foreground-background pixel-level matching and instance-level attention to guide our collaborative ensembler network to generate an accurate prediction.}

242 — 2003.08436

\caption{Illustration of the proposed Collaborative Distillation framework (best viewed in color). (a) and (b) depict two kinds of the encoder-decoder collaborative relationship for universal neural style transfer: image reconstruction for WCT~\cite{li2017universal} and style transfer for AdaIN~\cite{Huang-2017-arbitrary}, respectively. {\color{blue}{Blue}} arrows show the forward path when training the collaborator network (namely, the decoder). {\color{green}{Green}} arrows show the forward path when the small encoder (``SEncoder") is trained to functionally replace the original encoder (``Encoder"). (c) shows the proposed linear embedding scheme to resolve the feature size mismatch problem and infuse more supervision into the middle layers of the small encoder.}

243 — 2003.08437

\caption{Top 10 most frequent scene roles in \textcolor{myblue}{Chinese} versus \textcolor{myorange}{English}.}

\caption{Top 10 most frequent functions in \textcolor{myblue}{Chinese} versus \textcolor{myorange}{English}.}

\caption{Top 10 Construals where scene$\neq$function in \textcolor{myblue}{Chinese} versus \textcolor{myorange}{English}.}

244 — 2003.08760

\caption{Two question-image examples that ultimately fail in current approaches, but succeed in the proposed method. ``+Att'' denotes a method with the attention scheme. $\relu$ and ``tanh'' denote the rectified linear unit and hyperbolic tangent activation functions, respectively. \textcolor{wrongColor}{Red} and \textcolor{rightColor}{Green} denote the wrong and correct predictions, respectively.}

\caption{Example of medical and natural images and \gls{gradcam} maps from different models on the VQA-Med and VQA datasets. The vertical text on the left shows the pairs of \gls{qa} ground truths used in each row and the predictions of \gls{qcmlb} models without (No Att.) and with attention (Att.) mechanism, respectively. Note that in the last row, although the \gls{qcmlb} model with attention mechanism successfully highlighted the hydrant region, it failed to answer the question correctly (the answer is ``Red'' while the system's answer was ``Red and blue''). \textcolor{wrongColor}{Red} and \textcolor{rightColor}{Green} denote the wrong and correct predictions, respectively.}

\caption{Example images and \gls{gradcam} maps from different models on the IDRiD, BACH and Tools datasets. The vertical text on the left shows the pairs of \gls{qa} ground truths used in each row and the predictions of \gls{qcmlb} models without (No Att.) and with attention (Att.) mechanism, respectively. Notice that in the last row, \gls{qcmlb} with attention mechanism put a focus on the top left corner where a location question about ``(0,~0) to (32,~32)'' is asked. Here, (0, 0) and (32, 32) denote the $(x, y)$ coordinates of the corners of the region of interest. Note that in the penultimate row, both networks (with and without the attention mechanism) failed to answer the question (the answer is ``\textit{n.a.}'' while the systems' answers were ``Hook''). \textcolor{wrongColor}{Red} and \textcolor{rightColor}{Green} denote the wrong and correct predictions, respectively.}

245 — 2003.08798

\caption{\textit{(Best viewed in color)} Plots of detection accuracies of three class-incremental settings on PASCAL VOC [7] dataset. The classes that are introduced in the incremental step is colored \textcolor{magenta}{magenta} in the x-axis for improved readability. Each bar group contains accuracies when trained on \textcolor{NavyBlue}{all 20 classes} (upper-bound), using \textcolor{BurntOrange}{standard training on the new class data}, \textcolor{ForestGreen}{results from Shmelkov \etal}~[41] and \textcolor{Mahogany}{our approach}. The numbers in the legend is the mAP at an IoU threshold of 0.5 (mAP@50).}

246 — 2003.08821

\caption{DHOG architecture. The skeleton is a ResNet18~\cite{he2016identity}. The final ResNet block is repeated $k-3$ times ($k=8$ here). \yellowcircle{1} Augmentations of each image, $\xRV^{a\ldots d}$, are separately processed by the network. \yellowcircle{2} Each shallow ResNet block ($1\ldots3$) constitutes shared computation for deeper blocks, while also computing separate probability vectors, $\zRV_1 \ldots \zRV_3$. Each $\zRV_i$ is viewed as the probability for each outcome of the random variable $c_i$ that makes a discrete labelling choice. \yellowcircle{3} The deepest ResNet blocks compute further $\zRV_{>3}$. \yellowcircle{4} The network is trained by maximising the MI between allocations $c_i$ from \emph{all data augmentations}, and \yellowcircle{5} separately for each node $i$, minimising the MI between $c_i$ and $c_{< i}$ for the \emph{same data augmentation}. \yellowcircle{6} This is implemented as a global optimisation by stopping gradients such that they are \emph{not back-propagated} for later computation paths. \label{fig:architecture}}

247 — 2003.09168

\caption{Accuracy per class in CCT20 test datasets. Total training samples 13k including CCT20+ 1180 samples with keypoint annotation and 2122 empty class samples. Numbers below animal classes indicate training samples with keypoints per class. Methods with PrPool (ours) marked in \textcolor{PineGreen}{green}. Solid lines denote avg-pooling, dashed lines cov-pooling.}

\caption{Comparison of learning strategies. In the standard network with parameters $\theta$, an input $\v{x}$ is mapped to a latent encoding $\v{F}$ an on to a prediction $\hat{y}$. % Distillation first learns a teacher network with parameters $\phi$ using also privileged information $\xstar$, then learns the weights $\theta$ to approximate that teacher network. % Multi-task learning jointly learns to predict also $\xstar$ with a decoder with parameters $\phi$. % The proposed framework adds an attention mechanism with parameters $\theta_3$ and supervises it with $\xstar$. % \textcolor{PineGreen}{Green} denotes quantities used only during training. }

\caption{Accuracy per class in CCT20 dataset. Training only 1,180 with keypoint annotation) . Methods with PrPool (ours) marked as \textcolor{PineGreen}{Green}. Average Pooling with solid-line and Cov. Pooling with dotted-line. Methods with $\mathbf{(-)}$ denote no test-cropping.}

248 — 2003.09222

\caption{(Color online) Minimized free energy $ \mathcal{F} = F_{\mathrm{T}} - 4 \pi \eta $: We set the reference of the total free energy to the condensation energy $ 4 \pi \eta $ of the entire sphere. The dots indicate the $ \eta $-values beyond which wall defects are unstable ($ \zeta(\eta) = 0 $), see Fig.\textcolor{blue}{(\ref{Zeta})}. }

249 — 2003.09294

\caption{Disparity estimation, minimum and average per-view PSNR results (in dB) for the performance evaluation of different light field reconstruction methods on ED1. The best two results are highlighted in \textcolor{red}{red} and \textcolor{blue}{blue} colors. }

\caption{Disparity estimation, minimum and average per-view PSNR results (in dB) for the performance evaluation of different light field reconstruction methods on ED2. The best two results are highlighted in \textcolor{red}{red} and \textcolor{blue}{blue} colors. }

250 — 2003.09405

\caption{Top: while autonomous vehicles face complex scenes, composed of many objects, only a few of these are action-inducing. Bottom: each action-inducing object has an associated explanation for the related action. The arrows represent actions ``move forward'', ``turn left'', ``stop/slow down'', and ``turn right'' (count-clockwise order). {\color{green} Green} identifies the acceptable action. %In this example, the action of ``slow down'' %can be explained by wither ``a red light'' or ``a pedestrian %that crosses the street''.} }

\caption{Examples of network predictions, objects selected as action-inducing, and explanations. {\color{amber}Yellow bounding boxes} identify the objects detected by the Faster R-CNN, while {\color{red}red bounding boxes} identify the objects selected as action-inducing by the proposed network. "G" stands for ground truth and "P" for prediction. For explanations, {\color{green} green} indicates true positives, {\color{red}red} false positives, and {\color{gray}gray} false negatives (\ie valid explanations not predicted). }

251 — 2003.09565

\caption{\textbf{Generating videos by exchanging unseen actions by identities.} Each cell in this table indicates a video in the dataset. Only cells containing the symbol \textcolor{black}{$\bullet$} indicate that the video was part of the training set. We randomly generated videos corresponding to rest of the cells indicated by symbols \textcolor{red}{$\bullet$}, \textcolor{green(ryb)}{$\bullet$}, \textcolor{yellow}{$\bullet$}, and \textcolor{blue}{$\bullet$}, visualized in Fig.~\ref{fig:motion_exchange}.}

\caption{\textbf{Examples of action exchange to generate unseen videos.} This figure shows the generated videos unseen during the training of the model with colored bounding boxes indicating the colored dots (\textcolor{red}{ $\bullet$}, \textcolor{yellow}{ $\bullet$}, \textcolor{blue}{ $\bullet$}, \textcolor{green(ryb)}{ $\bullet$}) referred to in Tab.~\ref{tab:exchange}. This demonstrates the effectiveness of our method in disentangling static and transient portion of videos.}

252 — 2003.09773

\caption{Classification accuracy (\%) of the state-of-the-art methods and our proposed method \textcolor{blue}{on testing set of three datasets}. Best accuracy is in bold and the second best accuracy is underlined. The asterisk (*) symbol represents no published results on the corresponding dataset. }

\caption{Classification accuracy (\%) of each individual type of features ($OP$, $OW$, $SP$, and $SW$) \textcolor{blue}{on testing set of three datasets}. }

\caption{Classification accuracy (\%) of our hybrid deep features ($HDF$) achieved by four different aggregation methods (Max, Mean, Min, and Concatenate) \textcolor{blue}{on testing set of three datasets}. }

253 — 2003.09833

\caption{An illustration of the proposed Sparse Apdative Connection. (a) shows the process of SAC to construct edges and then perform self-attention on these edges ({\color{red}{Red}} is for text and {\color{green}{green}} is for graphs). (b) shows the edge prediction process of (a, {\color{red}{red}}) with distance encodings.}

254 — 2003.10027

\caption{Plots of input and output values of DY-ReLU in a well trained model (using MobileNetV2 $\times 0.35$) over 50,000 validation images in ImageNet \cite{deng2009imagenet}. We choose the dynamic ReLU after the depthwise convolution in every other mobile block. Block 1 (the top-left plot) corresponds to the lowest block, and Block 17 (the bottom-right plot) corresponds to the highest block. The two \textcolor{red}{red lines} correspond to $y=x$ and $y=-x$, respectively. Best viewed in color.}

255 — 2003.10176

\caption{The deep soft Procrustes analysis enables end-to-end geometric supervision for a semantic segmentation model. On the first row, the corresponding tensor operations are depicted. Starting from a \textcolor{cyan}{light blue} $W \times H \times K$ tensor $P$ containing each of the $K$ classes' probabilities and the \textcolor{gray}{gray} $3 \times W \times H$ vertices tensor $V$ obtained by de-projecting the input depthmap, we establish soft correspondences as follows: \textbf{i)} we multiply ($\oast$) the tensors $P$ and $V$ after expanding ($\sqsupset$) -- or otherwise, broadcasting -- $V$ to $3 \times W \times H \times K$; \textbf{ii)} the resulting $3 \times W \times H \times K$ \textcolor{green}{light green} tensor $P \oast (V\sqsupset)$ is reduced via a mean operation across the spatial dimensions $W$ and $H$, resulting to the \textcolor{orange}{orange} $3 \times K$ tensor $C$ containing the soft correspondences' 3D coordinates; \textbf{iii)} after masking with the ground truth labels and performing a SVD operation ($\ocirc$), the remaining correspondences in the \textcolor{yellow}{yellow} tensor $C^{\prime}$ are now aligned and any error function between them can be back-propagated to the semantic segmentation network. The bottom row illustrates each operation's results visualizations. }

256 — 2003.10178

\caption{Coverage simulation with comparison between: \\ \color{color1}\ding{110} \color{black}$h(x)$ and \color{color6}\ding{110} \color{black}$h_{local}$.}

257 — 2003.10401

\caption{Given inputs with different scale distributions, the proposed dynamic routing will choose corresponding forward paths. For example, the architecture of {\em large-scale} instances~\ref{fig:arch_intro_large} could ignore low-level features. The {\em small-scale} objects~\ref{fig:arch_intro_small} may depend on low-level details as well as higher resolution. And the {\em mixed-scale} things~\ref{fig:arch_intro_mix} would enjoy both connection patterns. \textcolor{red}{Red} lines in diagrams denote the difference among them. }

258 — 2003.10420

\caption{Sky distribution (Mollweide projection, equatorial coordinates) of the 4 subsamples of the JLA catalogue: low $z$ (red dots), SDSS (green dots), HST (black dots), clusters of many SNe~Ia from SNLS (blue dots) \. The directions of the CMB dipole (star), the SMAC bulk flow (triangle) (\protect \cite{Neill:2007fh}), and the 2M++ bulk flow (inverted triangle) (\protect \cite{Carrick:2015xza}) are shown in grey.}

\caption{{\it The colours have different meaning from previous Skymap.} Red corresponds to low z SNe with a positive line-of-sight (LOS) velocity (back calculated) and blue for negative SNe. Green corresponds to SMAC clusters with positive line-of-sight velocities, and yellow for negative line-of-sight velocities. The size of the markers correspond to the magnitude of the velocity. \textcolor{blue}{i think we need merge figure 1 and figure 4, and just have one figure. There seems to be many more SNe~Ia in Figure 1 (e.g. SDSS along declination zero plane ???)} }

259 — 2003.10428

\caption{Average PSNR(dB) results of different methods for different combinations of scale factors, blur kernels and noise levels. The best two results are highlighted in \textcolor[rgb]{1.00,0.00,0.00}{red} and \textcolor[rgb]{0.00,0.00,1.00}{blue} colors, respectively.}

260 — 2003.10491

\caption{Showing the components of a 4-exponential fitting (1 rise, 3 decay, plus constant) of the pure LS precursor (90 g/L PPO in LAB) response to a pulsed X-ray source. The data is normalized by integration. Decay lifetimes and weight fractions for the component fitting of all samples are displayed in Tables \ref{t:decayComponentPPO} and \ref{t:decayComponent}. Data collected out to +550 ns were used to fit these decay components. \color{red} \textbf{ }}

261 — 2003.10580

\caption{\label{fig:reduced_mpl}The \reducedmpl~training procedure has 3 steps: (1) a large teacher $q_\text{large}$ (red box) is pre-trained; (2) $q_\text{large}$ assigns class distributions to the student's training data; (3) A small multi-layered perceptron $q_\Psi$ calibrates the distributions computed by $q_\text{large}$ to train the student. $q_\Psi$ is trained along with the student, like the teacher in normal \mpl.}

\caption{\label{fig:reduced_mpl_labels}Target distributions that \reducedmpl~computes throughout the course of training the student. For each image, the first column shows the distribution computed by a pre-trained model, while other columns show distribution computed by the teacher in \reducedmpl~every quarter of the student's training process. Images are taken from the TinyImages dataset. The general pattern is that the distributions become more flat as the student is trained further, and we suspect this prevents overfitting in the student. However, there are exceptions, such as in the last row, where the distribution stays relatively sharp at the end.}

262 — 2003.10608

\caption{ In the first row (1)-(4), we illustrate the \textit{physically-constrained 3D random walk}. For better visualization, we use a camera object to represent the viewpoint (marked with \textcolor{green}{green boxes and arrows}). In the second row, we compare viewpoints from the proposed method with randomly sampled viewpoints. }

\caption{ Illustration of the refinement of initial proposals. We draw \textcolor{green}{green bounding boxes} to represent proposals in 2D screen space, and use planar meshes to represent proposals in 3D space. (1) Initial proposals are made in 2D space. (2) When we project them into 3D world and inspect them from the front view, they are in distorted forms. (3) Based on the sizes of the distorted proposals and the positions of the center points, we re-initialize orthogonal squares on the same surfaces with horizontal sides orthogonal to the gravity direction. (5) Then we expand the squares. (6) Finally, we obtain text regions in 2D screen space with natural perspective distortion. }

263 — 2003.10780

\caption{The training set of iNaturalist 2018 exhibits a long-tailed class distribution~\cite{inaturalist}. We connect domain adaptation with the mismatch between the long-tailed training set and our expectation of the trained classifier to perform equally well in all classes. We also view the prevalent class-balanced methods in long-tailed classification as the target shift in domain adaptation, i.e., $P_s(y)\neq P_t(y)$ and \textcolor{blue}{$P_s(x|y)=P_t(x|y)$}, where $P_s$ and $P_t$ are respectively the distributions of the source domain and the target domain, and $x$ and $y$ respectively stand for the input and output of a classifier. We contend that the second part of the target shift assumption does not hold for tail classes, e.g., \textcolor{blue}{$P_s(x|\textit{King Eider})\neq P_t(x|\textit{King Eider})$}, because the limited training images of \textit{King Eider} cannot well represent the data at inference time.}

264 — 2003.11236

\caption{Diagram of supernet training for our proposed GreedyNAS. The supernet greedily shrinks its training space from all paths (\red{red} and \blue{blue} dots) into potentially-good paths (\red{red} dots), and further into candidate pool.}

265 — 2003.11291

\caption{\small \label{tb:mot16}\textbf{Quantitative results on MOT16.} The best scores of online and offline MOT methods are marked in {\color{red} \textbf{red}} and {\color{blue} \textbf{blue}}, respectively. }

\caption{\small \textbf{Quantitative results on MOT17.} The best scores of online and offline MOT methods are marked in {\color{red} \textbf{red}} and {\color{blue} \textbf{blue}}, respectively. }

266 — 2003.11305

\caption{Eigenfrequencies of the resonator shown in ~Fig.~\ref{fig:fig02}\textcolor{blue}{(a)}. The eigenfrequencies $\tilde{\omega}_k$ are contained in the circular contour $C_\mathrm{r}$, which is centered at $1.41 \times 10^{15}\,\mathrm{s}^{-1}$ and has a radius of $6.8\times 10^{13}\,\mathrm{s}^{-1}$.}

\caption{Modal expansions of Purcell enhancement and PCE for the resonator with a localized light source shown in ~Fig.~\ref{fig:fig02}\textcolor{blue}{(a)}. Eigenfrequencies $\tilde{\omega}_1,\dots,\tilde{\omega}_9$ are considered, see~Tab~\ref{tab:table1}. (a)~Modal expansion of the Purcell enhancement. The contributions $\tilde{\Gamma}_1(\lambda_0),\dots, \tilde{\Gamma}_4(\lambda_0)$ correspond to the eigenfrequencies $\tilde{\omega}_1,\dots,\tilde{\omega}_4$, respectively. The remaining modal contributions are added to the remainder of the expansion, $\sum_{k=5}^9 \tilde{\Gamma}_k(\lambda_0) + \Gamma_\mathrm{r}(\lambda_0)$. The term $\Gamma_\mathrm{r}(\lambda_0)$ includes also modal contributions corresponding to eigenfrequencies outside the integration contour $C_\mathrm{r}$. (b)~Modal expansion of the PCE. Total modal expansion, $\eta_\mathrm{tot}(\lambda_0) = \sum_{k=1}^9 \tilde{\eta}_k(\lambda_0) + {\eta}_\mathrm{r}(\lambda_0)$, single modal contributions, $\tilde{\eta}_1(\lambda_0),\dots, \tilde{\eta}_4(\lambda_0)$, and the sum of other contributions, $\sum_{k=5}^9 \tilde{\eta}_k(\lambda_0) + \eta_\mathrm{r}(\lambda_0)$.}

267 — 2003.11555

\caption{{\bf Left:} Fraction of galaxies $\fbnd$ dynamically associated with the galaxy clusters as a function of cluster radius. The points with error bars correspond to the measurements in each individual radial bin. The blue band shows the 68\% region of the posterior from our final model detailed in section~\ref{threemodel}. The edge radius, marked by the vertical dashed line, is taken from the fit of our final model. {\bf Right:} The velocity dispersion $\sigmabnd$ of the galaxies dynamically associated with \redmapper\clusters as a function of radius. Remarkably, the velocity dispersion appears to be constant beyond the edge radius$R/\rlambda\approx 2.2$.}

\caption{The distribution of line-of-sight velocities of galaxies around \redmapper\clusters. The points with error bars correspond to the velocity histogram measurements, while the orange solid line is our best fit model. The remaining three lines correspond to the orbiting galaxy contribution (purple dot-dash), the infalling galaxy contribution (dark blue dotted), and the line-of-sight contribution (light blue dashed). Each panel is a slice of$R/\rt$, as illustrated by the inset panel.}

\caption{Model parameters describing the radius-dependent distribution of line-of-sight velocities of galaxies in the vicinity of a SDSS \redmapper\clusters. The reported values with errors are the posteriors from our analysis, in the units described below (where appropriate). In all cases, the subcripts ``orb'', ``inf'', and ``los'' refer to orbiting, infalling, and line-of-sight galaxies.$\fbnd$ is the fraction of galaxies dynamically associated with a galaxy clusters, and $\fvir$ is the fraction of orbiting galaxies. All parameters had flat priors, except for $a_1$ and $c_1$, the linear terms of $f_{\rm da}$ and $f_{\rm orb}$, which we demanded were negative, i.e. the fractions of dynamically associated and orbiting galaxies decrease with radius at zero radius. }

268 — 2003.12056

\caption{\textbf{ImageNet-1K classification results of the architectures searched by NAS and \unnas{} algorithms}. \grayfy{Rows in gray} correspond to invalid \unnas{} configurations where the search and evaluation datasets are the same. $\dagger$ is our training result of the DARTS architecture released in \cite{liu2018darts}.}

\caption{\textbf{Cityscapes semantic segmentation results of the architectures searched by NAS and \unnas{} algorithms}. These are trained from scratch: there is no fine-tuning from ImageNet checkpoint. \grayfy{Rows in gray} correspond to an illegitimate setup where the search dataset is the same as the evaluation dataset. $\dagger$ is our training result of the DARTS architecture released in \cite{liu2018darts}. }

269 — 2003.12230

\caption{Frame-Frame tracking results. \red{$^1$} Meshes are constructed from depth images. Depth images are preprocessed by the bilateral filter to reduce observation noise. \red{$^2$} Initial alignment is done by simply setting the camera poses of both frames to identity. \red{$^3$} The alignment error (hotter means larger) measures the point to point distance between target mesh and the transformed source mesh. }

270 — 2003.12633

\caption{Description generator and discriminator. $G$ can compute the likelihood for a given change caption $w$ for a given pair of images $(I_t,I_{t'})$, as well as generate (sample; dashed arrow) a caption $\widehat{w}_G$ for that pair. $D$ can estimate the probability that a given caption $w$ is a valid description of change between $I_t$ and $I_{t'}$. During the final phase of training, the generator receives feedback, illustrated in \textcolor{red}{red}, from two sources: ground truth captions, which should have high likelihood under $G$, and the discriminator $D$, which outputs a probability that the sampled captions from $G$ are valid for the pair. }

\caption{Visual representation of graph given by a time series. Selecting a changepoint $\tau$ (denoted by the dotted line), partitions the edges into $E_\tau$, in black, and \textcolor{Green}{$E^c_\tau$, in green}.}

\caption{Precision-recall curve for (a) CLEVR-Sequence, and (b) Street Change. Methods compared are \textcolor{Plum}{Stepwise with Language (Step)}, \textcolor{Plum}{Stepwise with Images Only (Step-IO)}, \textcolor{Emerald!70!black}{Regularized Cut with $\lambda=0$}, \textcolor{Emerald!70!black}{Regularized Cut with Images Only with $\lambda=0$ (RC-IO $\lambda=0$)}, \textcolor{blue}{Graph Cut (GC)}, \textcolor{blue}{Graph Cut with Images Only (GC-IO)}, \textcolor{red}{Regularized Cut (RC)}, and \textcolor{red}{Regularized Cut with Images Only (RC-IO)}. }

\caption{A sample sequence from ``Street Change.'' Ground Truth Annotations: ``Construction signs are gone from the street,'' ``the road sign is gone,'' ``the construction sign is gone,'' ``the construction signs are gone.'' \textcolor{red}{Generated Annotations}: \textcolor{red}{``the street sign is missing,'' ``the construction work is done,'' ``the signs were placed.''} Note the visual distortions and occlusion in the initial half of the sequence, which is present in several other sequences in the dataset.}

\caption{A sample sequence from ``Street Change.'' Ground Truth Annotations: ``The trash is gone,'' ``the garbage can is gone,'' ``the bush is no longer there,'' ``the garbage can has been removed,'' ``the yard now has grass.'' \textcolor{red}{Generated Annotations}: \textcolor{red}{``The garbage can was removed,'' ``the trash can is gone,'' ``garbage is gone.''} Shorter sequences tend to contain larger viewpoint changes between frames.}

\caption{A sample sequence from ``Street Change.'' Ground Truth Annotations: ``There is no more sign on sidewalk,'' ``the wooden barrier is gone,'' ``the barricade is gone,'' ``the wooden barricade on the sidewalk disappeared,'' ``the construction barrier is gone,'' ``there is no longer a wooden barrier on the sidewalk.'' \textcolor{red}{Generated Annotations}: \textcolor{red}{``the saw horse is gone,'' ``the construction barricade is gone,'' ``the construction barrier on the sidewalk is no longer there.''} While visually finding the changepoint is straightforward in some sequences, there are many examples like this sequence, where the visual distinction between the first and second halves of the sequence is subtle.}

271 — 2003.12751

\caption{Quantitative Results on Sony set of the SID dataset. The noise models are indicated as follows. $G$: the Gaussian model for read noise $N_{read}$; $G^*$: the tukey lambda model for $N_{read}$; $P$: the Gaussian approximation for photon shot noise $N_p$; $P^*$: the true Poisson model for $N_p$; $R$: the Gaussian model for row noise $N_r$; $U$: the uniform distribution model for quantization noise $N_q$. The best results are indicated by \textcolor{red}{red} color and the second best results are denoted by \textcolor{blue}{blue} color. }

272 — 2003.12754

\caption{The results predicted by BERT-RE and HIN-BERT. The reasoning type of each example is different and the first row for each example is the input document. The {\color{blue}{\textbf{\emph{head}}}}, {\color{cyan}{\textbf{\emph{tail}}}}, {\color{magenta}{\textbf{\emph{relation}}}} and \textcolor[rgb]{1.0, 0.44, 0.37}{\textbf{supporting sentences}} are colored accordingly.}

273 — 2003.12949

\caption{Precision and speed comparison between AutoTrack with deep trackers on UAVDT~\cite{Du2018ECCV}. * means GPU speed. \textcolor[rgb]{ 1, 0, 0}{Red}, \textcolor[rgb]{ 0, 1, 0}{green} and \textcolor[rgb]{ 0, 0, 1}{blue} respectively mean the first, second and third place.}

\caption{Average speed (fps) and precision of top ten CPU-based trackers on four benchmarks. \textcolor[rgb]{ 1, 0, 0}{Red}, \textcolor[rgb]{ 0, 1, 0}{green} and \textcolor[rgb]{ 0, 0, 1}{blue} respectively mean the first, second and third place. All the reported speed is run on a single CPU. Noted that AutoTrack is the best real-time tracker on CPU.}

\caption{Estimation of camera position and the respective errors on six datasets. Lines with \textcolor[rgb]{ 1, 0, 0}{red}, \textcolor[rgb]{ 0, 1, 0}{green} and \textcolor[rgb]{ 0, 0, 1}{blue} color denote x, y and z positions, respectively. The ground truth is not displayed because there is no noticeable differences with our results at such scale.}

274 — 2003.12971

\caption{\small Comparisons of the \emph{single-view} classification accuracy (\%) of our method aganist the state-of-the-art \textbf{supervised} point cloud models on \textbf{ModelNet40}. We also list results that use more points, normal information (``nor'') or/and multi-view voting trick (``vote'') in {\color{gray} gray} as references. Besides, we show the supervised baselines of our models. }

275 — 2003.13063

\caption{Comparison of PSNR and SSIM performance with state-of-the-art FSR methods. The best and second best performance is \textbf{highlighted} in \textcolor{red}{\textbf{red}} and \textcolor{blue}{\textbf{blue}}, respectively. }

\caption{Comparison of NRMSE performance with state-of-the-art FSR methods. The best and second best performance is \textbf{highlighted} in \textcolor{red}{\textbf{red}} and \textcolor{blue}{\textbf{blue}}, respectively}

276 — 2003.13170

\caption{Comparison of SR methods. White and gray rectangles indicate input and output frames, respectively. Small and large rectangles indicate S-LR and S-HR frames, respectively. We omit the feature extraction steps from images to features. (a) and (b) are original {\color{Red}S-SR} and {\color{aqua}T-SR} methods, respectively. For ST-SR, (c) performs {\color{aqua}T-SR} to produce in-between frames then enlarge the frames using {\color{Red}S-SR} (e.g., DAIN~\cite{DAIN}$\rightarrow$RBPN~\cite{RBPN2019}). The other way around, (d) performs {\color{Red}S-SR} then the SR frames are used to produce in-between frames using {\color{aqua}T-SR} (e.g., RBPN~\cite{RBPN2019}$\rightarrow$DAIN~\cite{DAIN}). Our STARnet (e) jointly optimizes all tasks ({\color{Red}S-SR}, {\color{aqua}T-SR}, and {\color{violet}ST-SR}) for augmenting space and time features mutually in multiple resolutions. The purple arrows present direct connections from LR to HR for {\color{violet}ST-SR}. In addition to upsampling, {\color{green}down-sampling} is used to transform S-HR features back to S-LR features for the mutual connection in multiple resolutions. }

\caption{Baseline comparison of STAR with DBPN~\cite{DBPN2019} and $L_f$. {\color{red}Red} in all tables indicates the best performance. }

\caption{Comparison on ST-SR ($I^{sr}_{t\Plus})$ using $L_{r}$. $\alpha \rightarrow \beta$ indicates the output of $\alpha$ is the input of $\beta$. {\color{red}Red} indicates the best and {\color{blue}blue} indicates the second best performance in all tables in Section \ref{subsection:comparison}. * indicates a joint learning of RBPN and DAIN methods to perform ST-SR. }

277 — 2003.13312

\caption{Exemplary non-linear function and the respective piecewise linear {\color{blue}upper} and {\color{red}lower} bounds.}

\caption{Vehicle model with wheelbase $l$ and {\color{cyan} disk-based collision-shape} of radius $r$. The variables in red are {\color{red}unavailable} in the MIQP model formulation. The orientation $\orientation$ is defined clockwise.}

278 — 2003.13606

\caption{Summary of our achieved performance and efficiency on Reddit. \textbf{The lower left corner indicates the desired lowest complexity in time (training time) and memory consumption (GPU memory usage).} The size of markers represents F1 scores. Blue circles (\textcolor{blue}{$\bullet$}) are state-of-the-art mini-batch training algorithms, red circle (\textcolor{red}{$\bullet$}) is L-GCN, and red star (\textcolor{red}{\ding{72}}) is L$^2$-GCN. % \red{(YS: with a pre-trained controller?)} \blue{(Learned.)} % \red{[YS: then again as in abstract how can you disregard controller training and claim ``total'' training time here? I am thinking of dropping total in the bold-faced sentence here and explain in Table 4.]} \blue{(Yes we should drop ``total".)}. The corresponding F1 scores are: GraphSAGE (\textcolor{blue}{$\bullet$}, 93.4), FastGCN (\textcolor{blue}{$\bullet$}, 92.6), VRGCN (\textcolor{blue}{$\bullet$}, 96.0), L-GCN (\textcolor{red}{$\bullet$}, 94.2) and L$^2$-GCN(\textcolor{red}{\ding{72}}, 94.0).}

\caption{Comparison with state-of-the-art on performance, training time and GPU memory usage (GPU memory usage during training). The best results for each row / dataset are highlighted in \textcolor{red}{red}. % \red{YS: Yuning please confirm that these Acc in Tables 4--8 are actually F1 scores or not. If yes, change all Acc in Tables 4--8 with ``F1 (\%)'' (search and replace but be careful)} \blue{(Yes, addressed.)} \red{I still see all the Acc's in the Table 4-8 headings? If F1, change to F1 (\%)} \blue{(Addressed.)} }

279 — 2003.13886

\caption{Example scenarios of the TITAN Dataset: a pedestrian bounding box with tracking ID is shown in \protect\inlinegraphics{figures1/markers/theme/ped_w_id.png}, vehicle bounding box with ID is shown in \protect\inlinegraphics{figures1/markers/theme/veh_w_id.png}, future locations are displayed in \protect\inlinegraphics{figures1/markers/theme/future_location.png}. Action labels are shown in different colors following Figure~\ref{fig:titan_dataset}.}

\caption{Qualitative evaluation on the TITAN dataset: ground truth future trajectory \protect\inlinegraphics{figures1/markers/ours/gt.png}, TITAN prediction \protect\inlinegraphics{figures1/markers/ours/pred.png}, last observation bounding box \protect\inlinegraphics{figures1/markers/ours/last.png}. The color of detected action labels indicates each action set described in Figure~\ref{fig:titan_dataset}. Images are cropped for better visibility. }

\caption{Comparison with others: ground truth \protect\inlinegraphics{figures1/markers/comp/gt.png}, Titan\_EP+IP+AP (ours)\protect\inlinegraphics{figures1/markers/comp/w_action.png}, Titan\_EP+IP (w/o action)\protect\inlinegraphics{figures1/markers/comp/wo_action.png}, Social-LSTM~\cite{social_lstm} \protect\inlinegraphics{figures1/markers/comp/slstm.png}, Social-GAN~\cite{social_gan} \protect\inlinegraphics{figures1/markers/comp/sgan.png}, Const-Vel~\cite{constvel} \protect\inlinegraphics{figures1/markers/comp/const_vel.png}, bounding box at $T_{obs}$ \protect\inlinegraphics{figures1/markers/comp/last.png}. Images are cropped for better visibility. }

\caption{Comparison with others: ground truth \protect\inlinegraphics{figures2/markers/gt.png}, Titan\_EP+IP+AP (ours)\protect\inlinegraphics{figures2/markers/w_action.png}, Titan\_EP+IP (w/o action)\protect\inlinegraphics{figures2/markers/wo_action.png}, Social-LSTM~\cite{social_lstm} \protect\inlinegraphics{figures2/markers/slstm.png}, Social-GAN~\cite{social_gan} \protect\inlinegraphics{figures2/markers/sgan.png}, Const-Vel~\cite{constvel} \protect\inlinegraphics{figures2/markers/const_vel.png}, bounding box at $T_{obs}$ \protect\inlinegraphics{figures2/markers/last.png}. Images are cropped for better visibility.}

280 — 2003.14248

\caption{Schematic of the LAr scintillation detector (not scaled). The detector including the PMTs is immersed in LAr. Oxygen-free copper (OFC) of roughly $2~\cm$ thick and lead of $10~\cm$ thick surround the cryostat and act as a passive shield against ambient \grays. An \riAm source is installed at the outer surface of the PTFE bulk, and the other sources (\riCs, \riNa, \riBa, and \riCf) are placed on the outside surface of the cryostat wall.}

\caption{(Top) The \gray cross sections for argon provided by XCOM \cite{xcom}. (Bottom) Average number of interaction points for the full-absorption peaks calculated by the Geant4 MC simulation.}

281 — 2003.14398

\caption{Simulated robotic table tennis system. Our coordinate system places $(0,0,0)$ at the table center, and the axes are color-coded as $x=$ \textcolor{red}{red}, $y=$ \textcolor{green}{green}, $z=$ \textcolor{blue}{blue}.}

282 — 2004.00007

\caption{$f_S = 67 \, \rm kHz$, validation of the use of reversed contrast low frequency power Doppler to measure pulsatile blood flow. (a) RC power Doppler on the 0-4 kHz range. (b) Power Doppler on the 6-33 kHz range. (c) Variations measured in an artery and a vein. Corresponding movies in \textcolor{blue}{\href{https://youtu.be/HFphDhQ13tY}{Visualization 1}}. }

\caption{$f_S = 8 \, \rm kHz$ LDH measurement. (a) RC power Doppler image on the frequency range 0-4 kHz. (b) The local power Doppler coefficient of variation differentiates arteries and veins. (c) The RC power Doppler variations in arteries and veins exhibit the waveforms typically observed with high frequency LDH measurements. Movie in \textcolor{blue}{\href{https://youtu.be/hOyqRTL7Tfg}{Visualization 2}}. }

\caption{Making low/high flow composite images with high and low frame rates. (a), (b), and (c) show the usual composite images previously demonstrated~\cite{Puyo2019}. (d), (e), and (f) show the result with a 8 kHz frame rate: the high flow image is obtained from the RC low frequency power Doppler. The composite movies are shown in \textcolor{blue}{\href{https://youtu.be/VCUJbFDZr3Q}{Visualization 3}} and \textcolor{blue}{\href{https://youtu.be/I4kSQBgbKAQ}{Visualization 4}}. }

\caption{$f_S = 4 \, \rm kHz$ LDH measurement. (a) RC power Doppler image on the 0-2 kHz range, the corresponding movie is shown in \textcolor{blue}{\href{https://youtu.be/D2TNOsrAP-k}{Visualization 5}}. (b) The high dynamic range composite color image reveals both slow and fast blood flow in the ONH microvasculature and in large vessels. }

283 — 2004.00061

\caption{ The graphs show the contribution of each component to the robustness of the joint model.\\ RS + US ({\color{blue}Blue Straight}), RS ({\color{ForestGreen}Green Dotted}), US ({\color{red}Red Dashed}).}

284 — 2004.00230

\caption{ Performance comparisons with the holistic and occluded methods on the three reported datasets. The 1\textsuperscript{st}/2\textsuperscript{nd} best results are in \textcolor{red}{red} and \textcolor{blue}{blue}.}

285 — 2004.00312

\caption{Typical airway pressure for two breathing cycles of pressure controlled ventilation, showing the set-point (\protect \blackline) and the typical response (\protect \redline).}

286 — 2004.00527

\caption{Averages of estimates of $L(r)-r$ obtained from simulations in case of the waves intensity function with 400 simulated points on average. Left to right: DPP, Poisson, LGCP. The estimates are obtained using $\hat K_\mathrm{global}^\mathrm{iso}$ with or without the leave-out approach (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}, \protect\includegraphics[width=1.2cm]{caption/t3w1-5b.png}, respectively) or $\hat K_\mathrm{local}$ with or without the leave-out approach (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}, \protect\includegraphics[width=1.2cm]{caption/t2w1-5g.png}, respectively) for kernel estimation of $\gamma$ or the intensity function. True values of $L(r) - r$ are shown for comparison (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}).}

\caption{Averages of estimates of $L(r)-r$ obtained from simulations in case of the waves intensity function with 400 simulated points on average. Left to right: DPP, Poisson, LGCP. The estimates are obtained using the global (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t3w1-5b.png}~LCV) or local (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t2w1-5g.png}~LCV) estimators of the $K$-function with either CVL or LCV for selecting the bandwidth (in all cases the leave-out approach is used). True values of $L(r) - r$ are shown for comparison (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}).}

\caption{Averages and 95\% pointwise probability intervals for estimates of $L(r)-r$ in case of the waves intensity function with 400 simulated points on average. Left to right: DPP, Poisson, LGCP. The estimators used are the leave-out global estimator using CVL (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}) and the leave-out local estimator using LCV. (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}), with pointwise probability intervals shown in like shade. True values of $L(r) - r$ are also shown (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}).}

\caption{Averages of estimates of $g_1(r)$ obtained from simulations in case of the waves intensity function with 400 simulated points on average. Left to right: DPP, Poisson, LGCP. The estimates are obtained using the global (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t3w1-5b.png}~LCV) or local (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t2w1-5g.png}~LCV) estimators of the pair correlation function with either CVL or LCV bandwidth selection. (In each case, the leave-out approach is used.) True values of $g(r)$ are shown for comparison (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}).}

\caption{Averages of estimates of cross-$L(r)-r$ in case of the waves intensity function with 400 simulated points on average. Left to right: segregation, independence, co-clustering. The estimators used are the standard global (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t3w1-5b.png}~LCV) and local (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t2w1-5g.png}~LCV) leave-out estimators of $K_{12}$ combined with the CVL and LCV methods for the bandwidth selection. True values of $L_{12}(r)-r$ are shown for comparison (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}).}

\caption{Averages and 95\% pointwise probability intervals for estimates of $L_{12}(r)-r$ in case of the waves intensity function with 400 simulated points on average. Left to right: segregation, independence, co-clustering. The estimators used are the leave-out global estimator (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}) and the leave-out local estimator (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}), with pointwise probability intervals shown in like shade. In each case, the bandwidth selection method was chosen to produce the least bias: LCV for the local estimator on the independent process, and CVL for all the other cases. True values of $L_{12}(r) - r$ are also shown (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}). }

\caption{Averages of estimates of $c(r)$ in case of the waves intensity function with 400 simulated points on average. Left to right: segregation, independence, co-clustering. The estimators used are the leave-out global (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t3w1-5b.png}~LCV) and local (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}~CVL, \protect \includegraphics[width=1.2cm]{caption/t2w1-5g.png}~LCV) estimators combined with the CVL and LCV methods for bandwidth selection. True values of $L_{12}(r)-r$ are shown for comparison (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}).}

\caption{Averages and 95\% pointwise probability intervals for estimates of $L(r)-r$ in case of the `waves' (top row) or `deep waves' (bottom row) intensity function with 400 simulated points on average. Left to right: DPP, Poisson, LGCP. The estimators used are the global (\protect\includegraphics[width=1.2cm]{caption/t5w1-5b.png}) and local (\protect\includegraphics[width=1.2cm]{caption/t6w1-5g.png}) estimators using the parametric intensity estimator \eqref{e:par-rho}. Pointwise probability intervals are shown in like shade. True values of $L(r) - r$ are also shown (\protect\includegraphics[width=1.2cm]{caption/t1w1b.png}).}

287 — 2004.00786

\caption{Change map detected with respect to missed alarms (\textcolor{blue}{MA}), false alarms (\textcolor{red}{FA}) and corrrect changed pixels (\textcolor{green}{C}).}

288 — 2004.00830

\caption{Comparison with SOTA trackers on OTB dataset. Trackers are grouped into CF-based methods, siamese-network-based methods, meta-learning-based methods, and miscellaneous. Numbers in \textcolor{red}{red} and \textcolor{blue}{\underline{blue}} are the best and the second best results, respectively.}

\caption{Comparison with SOTA trackers on VOT-2019. Numbers in \textbf{\textcolor{red}{red}} and \textcolor{blue}{\underline{blue}} are the best and the second best results, respectively.}

289 — 2004.00900

\caption{Comparison between networks with using knowledge distillation (shadow fill bars) and without using it (solid fill bars). Results are reported for old classes (\textcolor{yellow}{yellow} bars), new classes (\textcolor{blue}{blue} bars) and old\&new classes (\textcolor{green}{green} bars).}

\caption{Performance comparison for the models trained with and without our meta weight generator. For every incremental phase, the instance segmentation performance was evaluated on the whole new\&old classes, and we only report the results on the new classes to highlight the few-shot learning performances.}

290 — 2004.00994

\caption{Demonstration of the proposed approach on the Mnist handwritten digit dataset. Unmasked features appear in black. The order in which the features were selected is indicated in {\color{blue} blue}.}

291 — 2004.01051

\caption{Drag ($f/f_0$) corresponding to the friction factor ($f$) normalized by that of the no-slip surface ($f_0$) as a function of slip length $L_s$ normalized by the channel half-height, $h$: \SmallSquare , present study; \FilledSmallTriangleUp , \citep{Choi2006b}; \FilledSmallSquare , \citep{Maynes2007b}; \FilledSmallDiamondshape ,\citep{Jung2010b}; \FilledSmallCircle , \citep{Park2013d}; \solidrule , theory for superhydrophobic surfaces on both walls \citep{Choi2006b}.}

292 — 2004.01130

\caption{\small \textbf{Additional qualitative results on Cityscapes}. The left column shows the input image, the second is the ground truth of semantic segmentation. The third shows the segmentation produced by ZS3Net and the last one BudaNet result. Private target classes: \setlength{\fboxsep}{1pt}\colorbox{col_terrain}{\textcolor{white}{terrain}}, \setlength{\fboxsep}{1pt}\colorbox{col_truck}{\textcolor{white}{truck}}, \setlength{\fboxsep}{1pt}\colorbox{col_train}{\textcolor{white}{train}}. Some shared classes: \setlength{\fboxsep}{1pt}\colorbox{col_road}{\textcolor{white}{road}}, \setlength{\fboxsep}{1pt}\colorbox{col_sidewalk}{\textcolor{white}{side walk}}, \setlength{\fboxsep}{1pt}\colorbox{col_car}{\textcolor{white}{car}}, \setlength{\fboxsep}{1pt}\colorbox{col_person}{\textcolor{white}{person}}, \setlength{\fboxsep}{1pt}\colorbox{col_moto}{\textcolor{white}{motorbike}}, \setlength{\fboxsep}{1pt}\colorbox{col_tree}{\textcolor{white}{tree}}, \setlength{\fboxsep}{1pt}\colorbox{col_building}{\textcolor{white}{building}}.}

\caption{\small \textbf{Additional qualitative results on IDD}. Similar arrangement as in Figure~\ref{fig:qual_supmat_cityscapes}. Private target classes: \setlength{\fboxsep}{1pt}\colorbox{col_tuk}{\textcolor{white}{tuk-tuk}}, \setlength{\fboxsep}{1pt}\colorbox{col_animal}{\textcolor{white}{animal}}. Some shared classes: \setlength{\fboxsep}{1pt}\colorbox{col_truck}{\textcolor{white}{truck}}, \setlength{\fboxsep}{1pt}\colorbox{col_road}{\textcolor{white}{road}}, \setlength{\fboxsep}{1pt}\colorbox{col_sidewalk}{\textcolor{white}{side walk}}, \setlength{\fboxsep}{1pt}\colorbox{col_car}{\textcolor{white}{car}}, \setlength{\fboxsep}{1pt}\colorbox{col_person}{\textcolor{white}{person}}, \setlength{\fboxsep}{1pt}\colorbox{col_moto}{\textcolor{white}{motorbike}}, \setlength{\fboxsep}{1pt}\colorbox{col_tree}{\textcolor{white}{tree}}, \setlength{\fboxsep}{1pt}\colorbox{col_building}{\textcolor{white}{building}}.}

\caption{\small \textbf{Additional qualitative results on MS-COCO}. Similar arrangement as in Figure~\ref{fig:qual_supmat_cityscapes}. Private target classes: \setlength{\fboxsep}{1pt}\colorbox{col_bench_coco}{\textcolor{white}{bench}}, \setlength{\fboxsep}{1pt}\colorbox{col_giraffe_coco}{\textcolor{white}{giraffe}}, %\setlength{\fboxsep}{1pt}\colorbox{col_laptop_coco}{\textcolor{white}{laptop}}, \setlength{\fboxsep}{1pt}\colorbox{col_zebra_coco}{\textcolor{white}{zebra}}, \setlength{\fboxsep}{1pt}\colorbox{col_truck_coco}{\textcolor{white}{truck}}. Shared classes: \setlength{\fboxsep}{1pt}\colorbox{col_person_coco}{\textcolor{white}{person}}, %\setlength{\fboxsep}{1pt}\colorbox{col_tv_coco}{\textcolor{white}{tv}}, %\setlength{\fboxsep}{1pt}\colorbox{col_sofa_coco}{\textcolor{white}{sofa}}, %\setlength{\fboxsep}{1pt}\colorbox{col_chair_coco}{\textcolor{white}{chair}}, \setlength{\fboxsep}{1pt}\colorbox{col_cow_coco}{\textcolor{white}{cow}}, \setlength{\fboxsep}{1pt}\colorbox{col_sheep_coco}{\textcolor{white}{sheep}}, \setlength{\fboxsep}{1pt}\colorbox{col_background_coco}{\textcolor{white}{background}}.}

\caption{\small \textbf{Illustration of BudaNet, the proposed approach to the new problem of Boundless Unsupervised Domain Adaptation}. At test time, for an input image (left), we display the closed-set UDA segmentation result (middle) as well as BudaNet segmentation result (right). Both ``\setlength{\fboxsep}{1pt}\colorbox{col_tuk}{\textcolor{white}{tuk-tuk}}'' vehicles corresponding to unseen classes only appearing in the target domain have been correctly identified by our model. Our approach is able to deal with new classes for which no images have been provided during training. The model trained following the closed-set UDA setting wrongly predicts these new vehicles as a mix of \setlength{\fboxsep}{1pt}\colorbox{col_car}{\textcolor{white}{car}} and \setlength{\fboxsep}{1pt}\colorbox{col_truck}{\textcolor{white}{truck}}.}

\caption{\small \textbf{Qualitative results of the three set-ups}. The first and second columns show input images and corresponding segmentation ground truth. The third and fourth columns visualize results produced by ZS3Net and BudaNet. From top to bottom: \textbf{SYNTHIA$\rightarrow$Cityscapes} private classes \setlength{\fboxsep}{1pt}\colorbox{col_terrain}{\textcolor{white}{terrain}}, \setlength{\fboxsep}{1pt}\colorbox{col_truck}{\textcolor{white}{truck}}, \setlength{\fboxsep}{1pt}\colorbox{col_train}{\textcolor{white}{train}}; \textbf{Cityscapes$\rightarrow$IDD} private class \setlength{\fboxsep}{1pt}\colorbox{col_tuk}{\textcolor{white}{tuk-tuk}}; and \textbf{Pascal-VOC$\rightarrow$MS-COCO} private class \setlength{\fboxsep}{1pt}\colorbox{col_laptop_coco}{\textcolor{white}{laptop}}. More results are given in Appendix~\ref{app:qual_res}}

293 — 2004.01168

\caption{Examples of ground-truth $(h, r, t) \in G$ triples and their corresponding $(h, \hat{r}, t) \not\in G$ predictions in the \fb{} calibrated sample from \emph{True or False Facts}. Each group of triples is explained in \S~\ref{sec:tf-discussion}. \textcolor{blue}{Blue ($\star$)}: A factual relation. \textcolor{red}{Red ($\dagger$)}: An erroneous relation. % \textcolor{gray}{Gray ($\circ$)}: An ambiguous relation. Best viewed in color. }

294 — 2004.01340

\caption{Thumbnail images ($4\farcs5\times4\farcs5$) of UDGs and LSB dwarfs denoted in {\color{blue}\bf{\bf Figure \ref{fig_finding}}}. The first column shows RGB color images of each sample (Blue : $F435W+F606W$ images, Green : $F814W$ images, and Red : $F105W$ images). The second column shows $F814W$ images, which were used as input images for GALFIT. The last two columns show galaxy model images and their residual images from GALFIT measurements. Derived effective radii and surface brightness are marked. \label{fig_thumb}}

\caption{Color-magnitude diagrams (CMDs) of galaxies in the Abell 370 central field (a), the parallel field (b), and the XDF (c). Symbols are the same as {\color{blue}\bf{\bf Figure \ref{fig_selection}}}. Gray error bars on the right side of each panel indicate the mean errors of colors and magnitudes for given magnitudes. Yellow shaded regions denote the red sequence of galaxies in the cluster central field. White open symbols mark galaxies excluded from our final samples, because they are redder than the red sequence. Cyan star (UDG-C02) denotes the bluest UDG ($F814W-F105W=-0.07$), and yellow star (UDG-C22) is the largest UDG ($R_{\rm eff,c}=6.16$ \kpc) in our UDG sample. \label{fig_cmd}}

\caption{CMDs of galaxies in three HFF clusters: Abell 370 ($z=0.375$), Abell S1063 ($z=0.348$), and Abell 2744 ($z=0.308$). Symbols are the same as {\color{blue}\bf{\bf Figure \ref{fig_selection}}}. Data for Abell S1063 and Abell 2744 are from \citet{Lee17}. Black solid lines denote linear fitting lines of the red sequences derived from median colors and magnitudes of galaxies brighter than $F814W<23.5$ $\rm mag$. \label{fig_cmd3}}

\caption{Histograms of color differences from the red sequences of the three HFF clusters (See {\color{blue}\bf{\bf Figure \ref{fig_cmd3}}}). Gray dashed lines denote the red sequence. The upper panels show the color distributions of bright galaxies (blue histograms), and the lower panels show those of UDGs (red histograms) and LSB dwarfs (yellow histograms). We select each galaxy population with the same absolute magnitude criteria: bright galaxies with $M_{F814W}<-18.0$ mag and LSB galaxies (UDGs and LSB dwarfs) with $-18.0<M_{F814W}<-14.0$ mag. \label{fig_chist}}

\caption{(a) Spatial distribution of galaxies in the central field of Abell 370. Symbols of galaxies are the same as in {\color{blue}\bf{\bf Figure \ref{fig_selection}}}, but here we plot only bright red sequence galaxies. Black crossmark denotes the center of Abell 370, and two large blue circles denote BCG-N and BCG-S. Gray circle with radius of $0\farcm5$ ($\sim$150 kpc) represents the boundary we used to divide the cluster central field. (b) Radial number density profiles (RDPs) of galaxies in the Abell 370 cluster. Blue circles, yellow stars, and red stars are the RDPs of the bright galaxies, LSB dwarfs, and UDGs. Yellow and cyan shaded regions represent the cluster central field and the parallel field. \label{fig_radial}}

295 — 2004.01382

\caption{Performance comparison of the L3 representations of ResNet-based models [\textcolor{green}{first}, \textcolor{blue}{second}, and \textcolor{red}{third} FENs are shown in color].}

296 — 2004.01573

\caption{An overview of the different components of our proposed DFNet. (a) Multi-scale Attention Guided Module. This module performs convolutions with kernels of multiple sizes. Then, after concatenation, we use the Channel Attention Block to weight the multi-scale features. (b) Attention-based Multi-level Integrator Module. This module first concatenates high stage features with low stage features. Then the Channel Attention Block is used to assign different weights to multi-level features. Finally, a $3\times3$ convolutional layer is used to refine the features. (c) Channel Attention Block. This block computes a weight vector to re-weight the input feature maps. Note that in all figures, the `\#' symbol denotes the number of layer filters.}

\caption{The avgF, wF, maxF, and MAE scores of different saliency detection methods on five datasets. The best score under each setting is shown in {\color[HTML]{FE0000}{\textbf{red}}}, the second best score under each setting is shown in {\color[HTML]{3166FF}{\textbf{blue}}}, and the best score under all settings is underlined. DFNet with VGG-16, ResNet50, NASNet-Mobile, and NASNet-Large backbones, are denoted as DFNet-V, DFNet-R, DFNet-M, and DFNet-L, respectively. The unit of the total number of parameters (denoted as \#Par) is million. Note that the authors of\cite{zhang2018progressive} did not release the code, and they just provided the saliency maps, and thus reporting the total number of parameters is not possible for this method.}

\caption{Ablation analysis. The performance of different settings of our model (The best score is shown in {\color[HTML]{FE0000} \textbf{red}}).}

297 — 2004.01643

\caption{Results of our extensive augmentation study on the KITTI~\cite{KITTI} validation set. Most~significant~improvements~in~\textbf{bold}. Augmentation policy of PointPillars~\cite{PointPillars} in \color{magenta}magenta (\#36)\color{black}. Our improved augmentation policies in \color{cyan}cyan (\#39-42)\color{black}.}

298 — 2004.01703

\caption{Example Evolved Mario Levels. \normalfont Each level is shown with \textcolor{blue}{blue X} marks at positions checked by A* search, and a trail of \textcolor{red}{red X} marks along the solution path. Both (\subref{fig:marioLevelDirectToGAN}) Direct2GAN and (\subref{fig:marioLevelCPPNtoGAN}) CPPN2GAN levels are shown. These levels are in the same bin, but from different runs. The CPPN2GAN level begins with one repeating pattern of segments, but then switches to another in the latter half.}

\caption{Example Evolved Dungeons. \normalfont The (\subref{fig:dungeonDirectToGAN}) Direct2GAN and (\subref{fig:dungeonCPPNtoGAN}) CPPN2GAN dungeons with 50 reachable rooms each have the same fitness (38\% of reachable rooms traversed in solution path). There is also a (\subref{fig:dungeonCPPNtoGAN100}) CPPN2GAN dungeon with 100 reachable rooms and a comparable fitness of 34\% reachable rooms. Direct2GAN could not produce dungeons with so many reachable rooms. %The @ symbol is the start point, and the triangle is the goal. The red X marks from the start to the goal mark the solution path, and the white X marks correspond to locations checked by A* search in order to find the solution. The Direct2GAN dungeon is more sprawling, and has several rooms that are not reachable (\textcolor{magenta}{magenta X}). Both CPPN2GAN dungeons are more cohesive, and themes can be noticed in different regions of the dungeons. Larger versions of these figures are included in supplementary material.}

\caption{Direct2GAN Dungeon: 50 Reachable Rooms (large version of Fig.~\ref{fig:dungeonDirectToGAN}). \normalfont The @ symbol is the start point, and the triangle is the goal. The red X marks from the start to the goal mark the solution path, and the white X marks correspond to locations checked by A* search in order to find the solution. The Direct2GAN dungeon is more sprawling, and has several rooms that are not reachable (\textcolor{magenta}{magenta X}). CPPN2GAN dungeons are more cohesive, and themes can be noticed in different regions of the dungeons.}

299 — 2004.01712

\caption{Notion of File Recovery using Linux \texttt{mlock()}. \textbf{(a)} Let there are 4 files which are opened within a specific time quantum. \textbf{(b)} Backup these files with Linux \texttt{mlock()} command (marked with \textcolor{OliveGreen}{green} color). \textbf{(c)} Let ransomware encrypts 3 files before being detected by RAPPER (marked with \textcolor{red}{red} color). \textbf{(d)} We can easily retrieve the encrypted files from the backup.\label{fig:file_recover}\vspace{-0.2cm}}

300 — 2004.01786

\caption{Molecular gas properties: (1) galaxy name; (2) spectroscopic redshift as in Table~\ref{tab:BCG_properties}; (3-4) CO(J$\rightarrow$J-1) transition and observer frame frequency; (5) CO(J$\rightarrow$J-1) velocity integrated flux; (6) molecular gas mass obtained with $\alpha_{\rm CO}=4.36~M_\odot\,({\rm K~km~s}^{-1}~{\rm pc}^2)^{-1}$; (7) depletion timescale $\tau_{\rm dep}=M_{H_2}/{\rm SFR}$; (8) molecular gas-to-stellar mass ratio; (9-10) depletion timescale and molecular gas-to-stellar mass ratio predicted for MS field galaxies with redshift and stellar mass of our targets, following \citet{Tacconi2018}. Upper limits are at 3$\sigma$. We refer to the text for further details. \\ \crosssymbol~The reported $M_{H_2}$ of M2129 is estimated from the CO(2$\rightarrow$1) flux and has been increased by a factor of two to take into account the possibility that the fit misses a substantial part of the CO(2$\rightarrow$1) emission.}

\caption{Summary of our IRAM 30m results for the sources with secure or tentative CO detections. Column description: (1) BCG name; (2) CO transition; (3) integrated CO line flux; (4) signal-to-noise ratio of the CO(J$\rightarrow$J-1) detection; (5) full width at half maximum of the CO(J$\rightarrow$J-1) line; (6) CO(J$\rightarrow$J-1) velocity integrated luminosity; (7) redshift derived from the CO(J$\rightarrow$J-1) line.\\ \crosssymbol~The reported $L^\prime_{\rm CO(2\rightarrow 1)}$ of M2129 is estimated from the CO(2$\rightarrow$1) flux and has been increased by a factor of two to take into account the possibility that the fit misses a substantial part of the CO(2$\rightarrow$1) emission. }

301 — 2004.01808

\caption{Overview of the proposed model, TimeGate, with two stages. The first stage is the timestep selector, left. Based on a lightweight CNN, \textit{LightNet}, the model learns to select the most relevant timesteps for classifying the video. This selection is conditioned on both the features of timestep and its context. The second stage is the video classifier, right. In which, only the selected timesteps (\textcolor{ForestGreen}{\CheckBoxCustom}) are considered, while the unselected timesteps (\textcolor{BrickRed}{\CrossedBox}) are completely ignored. In this stage, a heavyweight CNN, \textit{HeavyNet} is used for feature representation of only the selected timesteps, followed by MLP for classification.}

302 — 2004.01915

\caption{\label{fig:bl_} Black soliton. Normalized axial density profile $\rho(z)=h^2(z) =(f(z)/f_{\infty})^2$ and transverse width profile $\sigma(z)$ {\cblue vs axial coordinate $z$, for three values of the parameter $\delta=\gamma |f_{\infty}|^{4/5}$, with $\gamma$ given by equation (\ref{gammaio}) and $f_{\infty}$ the bulk value of the soliton wavefunction.} }

\caption{\label{fig:gr_} Gray soliton. {\cblue Upper panel: Scaled axial density profile $\rho(\tilde{\zeta}) = h(\tilde{\zeta})^2=(f(z)/f_{\infty})^2$ vs scaled comoving axial coordinate $\tilde{\zeta}=(z-vt)\sqrt{\delta}$. Lower panel: Scaled transverse width $\tilde{\sigma}(\tilde{\zeta})$ vs $\tilde{\zeta}$. Notice that $\delta=\gamma |f_{\infty}|^{4/5}$, with $\gamma$ given by equation (\ref{gammaio}) and $f_{\infty}$ the bulk value of the soliton wavefunction. $\tilde{v}=v/v_s$ is the velocity rescaled by the sound velocity $v_s$.} }

\caption{\label{fig:th_til_}{\cblue Scaled} phase $\tilde{\theta}(\tilde{\zeta})$ of gray ($\tilde{v}>0$) and black ($\tilde{v}=0$) solitons {\cblue vs the scaled comoving axial coordinate $\tilde{\zeta}$.}}

303 — 2004.02009

\caption{The ablation analysis of our proposed network. Models are trained on the first fold and the Axial view. The best results are shown in {\color[HTML]{FE0000} \textbf{red}}.}

\caption{The 5-fold cross-validation ensemble results on Axial and Coronal views along with the results of the Multi-view Fusion technique. The best results are shown in {\color[HTML]{FE0000} \textbf{red}}.}

\caption{Comparison of our method and the methods of BRATS 2018 on validation set. {\color[HTML]{FE0000} \textbf{red}} and {\color[HTML]{3166FF} \textbf{blue}} demonstrates the best two results, respectively.}

\caption{Comparison of our method and the methods of BRATS 2017 on validation set. {\color[HTML]{FE0000} \textbf{red}} and {\color[HTML]{3166FF} \textbf{blue}} demonstrates the best two results, respectively.}

304 — 2004.02046

\caption{A high-level overview of our methodology. {\bf (a)} Given as input: nodes with associated attribute and label data. For example, attributes are some arbitrary distribution, and labels are a categorical value (Purple, Orange). {\bf (b)} On the input data, we define networks according to a collection of network models and their hyper-parameters. {\bf (c)} We generate node subsets based on various node weighting functions. Illustrated is the node subset corresponding to network adjacency ({\color{moss}green}) and nodes within the same community as node $i$ ({\color{midnight}blue}). {\bf (d)} We train predictors on the node subsets generated by our node weighting function, for all nodes. We evaluate this set of predictors and select the network and node weighting function which maximizes our selection criteria. In this particular example, our framework selects the Threshold network, using adjacency weighting.}

305 — 2004.02047

\caption{A schematic of the network prediction \textbf{interval}. We observe the local neighborhood of node $i$ at time $t$. Attribute distributions are represented by the call-out. Node $i$ is removed from the graph at time $t+1$. We then observe only the changing attribute distributions of the remaining nodes. At time $t + \Delta$ we receive the graph structure observed at time $t$, and construct a model $\mathcal{M}$ on the \textit{current} attributes in the \textcolor{macorchid}{\textbf{neighborhood}} of $i$ (excluding $i$). The dashed edges indicate this may not be the \textit{current} graph structure. Node i is the \textcolor{macgrey}{\textbf{target}} for label inference.}

\caption{An overview of interval alignment. (Top) shows a toy neighborhood network of $i$ at each time step. (Middle) a \textit{interval} is measured on the graph starting at time $t$ (\textcolor{red}{red}), and measures the predictability of node $i$ using this fixed network at each available time step $>t$ (\textcolor{macpurple}{purple}). Finally, (Bottom) we align the intervals at their starting point and measures expectations on a particular time shift $\Delta$ of this alignment (e.g. across red values), where black indicates null fill-values.}

306 — 2004.02143

\caption{An example of Single-hop question (SHQ) from the \texttt{SQuAD} dataset and a Multi-hop Question (MHQ) from the \texttt{HotPotQA} dataset. The relevant sentences and answer required to form the question are highlighted in \textcolor{blue}{blue} and \textcolor{red}{red} respectively.}

307 — 2004.02147

\caption{\textbf{Overview of the Bilateral Segmentation Network.} There are mainly three components: two-pathway backbone in the {\color{Purple}{purple}} dashed box, the aggregation layer in the {\color{orange}{orange}} dashed box, and the booster part in the {\color{Gold}{yellow}} dashed box. The two-pathway backbone has a Detail Branch (the {\color{Cerulean}{blue}} cubes) and a Semantic Branch (the {\color{LightGreen}{green}} cubes). The three stages in Detail Branch have $C_1, C_2, C_3$ channels respectively. The channels of corresponding stages in Semantic Branch can be made lightweight by the factor $\lambda(\lambda<1)$. The last stage of the Semantic Branch is the output of the Context Embedding Block. Meanwhile, numbers in the cubes are the feature map size ratios to the resolution of the input. In the Aggregation Layer part, we adopt the bilateral aggregation layer. $Down$ indicates the downsampling operation, $Up$ represents the upsampling operation, $\varphi$ is the Sigmoid function, and $\bigotimes$ means element-wise product. Besides, in the booster part, we design some auxiliary segmentation heads to improve the segmentation performance without any extra inference cost. }

308 — 2004.02182

\caption{Comparing the performance of resampling techniques with various evaluation metrics. Note that, in the table, imbalanced refers to the standard GAN methods without using any resampling technique. The best results are \textcolor{blue}{highlighted}.}

309 — 2004.02428

\caption{Quantitative evaluation. The mean F-measure, S-measure and MAE of different saliency detection methods on 2RSOD and three benchmark datasets. The best four results are highlighted in {\color{red}{red}}, {\color{blue}{blue}}, {\color{green}{green}} and {\color{violet}{purple}}. $ \dag $ \&$ \ddag $ denote methods based on SR and deep learning. ``PAS-S'' \& ``DUT-O'' represent datasets PASCAL-S and DUT-OMRON.}

310 — 2004.02493

\caption[]{Remaining height error in the Munich test set for stereo DSM (b) and prediction results (c). Comparing the error maps to a building mask \tikz{\fill[yellow] (0,0) rectangle (1.5ex,1.5ex);} and stereo DSM (a) reveals the distinct influence of newly built and demolished buildings, especially in the northwestern and eastern part of the study area.}

\caption[]{Location of the considered study areas \tikz{\fill[orange, opacity=0.5] (0,0) rectangle (1.5ex,1.5ex);} within the densely built-up \tikz{\fill[black, opacity=.3] (0,0) rectangle (1.5ex,1.5ex);} cities of Berlin (a) and Munich (b) \tikz{\draw[thick] (0,0) rectangle (1.5ex,1.5ex);} that are located in the northern (Berlin) and southern (Munich) part of Germany (c).}

311 — 2004.02647

\caption{The concept of depletion is sketched by the example of two hard-core spherical colloids (left), three hard-core spherical colloids (centre) and two hard-core pear-shaped colloids (right) dissolved in a liquid of smaller hard spheres (indicated in light blue). The system is driven mainly by the entropy of the solvent particles and maximises the free energy by minimising the excluded volume of the bigger colloidal particles. The excluded volume (\orangeline[dashed]) cannot be penetrated by the depletant due to the presence of the colloid. Thus, the larger objects pack together such that their excluded volumes maximally overlap (indicated in orange) and more space is provided for the depletants. Overall this mechanism can be interpreted as an effective, entropically driven attraction between the colloids.}

\caption{The relative orientation (a \& b) and lateral distance distribution (c\& d) of two HPR/PHGO particles surrounded by 1498 hard spheres, acting as a solvent at global density$\rho_g {=} 0.45$, on the left. The particle parameters are set to $k{=}3$, $\theta_k{=}15^{\circ}$ and $r_\text{depl}{=}0.31\sigma_w$ ($\frac{V_\text{depl}}{V_\text{pear}} {=} 0.08$). Only pair-configurations are considered if the pear-shaped particles are close to each other such that the excluded volumes overlap. Positive angles $\alpha$ indicate V-configurations (blunt ends together), whereas negative $\alpha$ values describe A-configurations (pointy ends together). On the bottom, typical arrangements of the HPR (I+II) and PHGO (III-IV) depletion systems, extracted from both type of simulations, are shown. The left snapshot (dashed line, (I)) corresponds to the indicated peak in (a) and coincides with the parallel solution for maximal excluded volume overlap. The centre left configuration (dash-dotted line, (II)) contributes to the second peak of (a) and matches the anti-parallel solution in terms of minimised excluded volume. The centre right snapshot (dotted line, (III)) shows a V-configuration, which corresponds to the indicated peak in (b). This configuration does not coincide with the parallel solution for maximal excluded volume overlap of B\'ezier pears. The right configuration (dash-dotted line, (IV)) contributes to the second peak in (b) and matches the anti-parallel solution in terms of minimised excluded volume.}

312 — 2004.02669

\caption{Top: The contact profiles according to the PHGO model (\orangeline[dashed]) and the HPR model (\blackline[dotted]) for identical pear-shaped particles with $k=3$ and $\theta_k=15^\circ$ at different angles between the molecules $\phi=\arccos(\mathbf{u}_i{\cdot}\mathbf{u}_j)$ in the xz-plane. The surrounding pears are positioned in contact according to the PHGO model. The arrows showcase the different contact between blunt (red) and pointy (blue) ends depending on $\phi$. Bottom: The maximal overlap volume $V_\text{overlap}$ between two PHGO particles with different tapering parameters $k_\theta$ when in contact. The volume is given in comparison to the volume of the B\'ezier pear$V_\text{pear}$.}

313 — 2004.02677

\caption{ Results on the BMAX500 using standard evaluation (black) and our proposed {\color{blue}single-annotation protocol (blue)}. Gains for the ligature-weighted version of BMAX500 are denoted by *. Timings are averages over the BMAX500 test set. Cost function computation times are excluded from the runtime measurements to compare the two algorithms head to head. % Matching the output to all GT skeletons of a given image (black) favors methods that produce more, % potentially redundant, points: a point has multiple ``shots'' at matching with the GT (P$\uparrow$), and, % conversely, a GT point is more likely match a detected point (R$\uparrow$). % Using \emph{one} of the GT per image (blue), calibrates P/R, revealing the superiority of ASG over the AMAT \cite{tsogkas2017amat}. % Timings \emph{do not} include the cost computation step, which is the same for both methods, % and are averages over the 200 images in the BMAX500 test set. % Notably the fraction of ASG over AMAT skeletal pixels is only 0.68 and 0.41, at half and full resolutions, respectively. }

314 — 2004.02990

\caption{Our diversity metric evaluation framework checks the capability of metrics to capture different aspects of diversity. Presented are two sets of responses to the same question, generated by crowdsourcing workers. While both sets are diverse in terms of the \emph{form} of the sentences, only set A is diverse in terms of \emph{content}. Each graph presents the distribution over a diversity metric for sets with high content diversity (\textcolor{blue}{blue}) and low content diversity (\textcolor{orange}{orange}). Distributions are approximated over $200$ sets such as the two presented. We observe that the human score metric (absHDS) separates the two distributions, while an n-gram based metric (distinct-n) fails, illustrating that n-gram metrics do not capture content diversity. The dotted lines correspond to the specific sets A and B presented above.}

315 — 2004.03003

\caption{Mean velocity profiles for: reference LES ({\color{CornflowerBlue}{\textemdash}})~\citep{Breuer2009}; PIV ($\circ$)~\citep{Rapp2009}, our LES ({\color{red}- - - -}) and RANS with k-$\omega$ SST turbulence modelling ({\color{YellowOrange}-$\cdot$-$\cdot$-}).}

\caption{Mean velocity profiles for PH cases: reference LES ({\color{CornflowerBlue}{\textemdash}})(INCA), Frozen RANS ({\color{red}- - - -}), SpaRTA RANS ({\color{green}-$\cdot\cdot$-$\cdot\cdot$-}) and RANS with k-$\omega$ SST turbulence modelling ({\color{YellowOrange}-$\cdot$-$\cdot$-}).}

\caption{Three iterations of the proposed method. (reference by LES ({\color{CornflowerBlue}- - - -}), HK ({\color{black}{\textemdash}}), EI ({\color{YellowOrange}-$\cdot$-$\cdot$-}), hi-fidelity LES samples ({\color{red}{$\bullet$}}), and low-fidelity custom RANS ({\color{red}{$\circ$}}) )}

\caption{Convergence of bi-fidelity optimization (custom RANS as low-fidelity model({\color{red}{\textemdash}}), baseline RANS as low-fidelity model ({\color{blue}- - - -})), and single fidelity optimization using LES only ({\color{black}-$\cdot$-$\cdot$-}).}

316 — 2004.03066

\caption{Illustration of key properties of classical entailments, implicatures, and presuppositions. Solid arrows indicate valid commonsense entailments, and arrows with X's indicate lack of entailment. Dashed arrows indicate follow up statements with the addition of \emph{in fact}, which can either be acceptable (marked with `{\color{red}\xmark}') or unacceptable (marked with `{\color{green}\cmark}'). }

317 — 2004.03080

\caption{\small \textbf{3D object detection results on the KITTI validation set.} We report \APBEV ~/ \AP (in \%) of the \textbf{car} category, corresponding to average precision of the bird's-eye view and 3D object detection. We arrange methods according to the input signals: S for stereo images, L for 64-beam LiDAR, M for monocular images. PL stands for \PL. \emph{Results of our end-to-end \PL are in {\color{blue} blue}.} Methods with 64-beam LiDAR are in {\color{gray} gray}. Best viewed in color.}

\caption{\small \textbf{3D object (car) detection results on the KITTI test set.} We compare \ETE ({\color{blue}blue}) with existing results retrieved from the KITTI leaderboard, and report \APBEV~/ \AP at IoU=0.7.}

\caption{\small \textbf{Ablation studies on the point-cloud-based pipeline with \PRCNN.} We report \APBEV ~/ \AP (in \%) of the \textbf{car} category, corresponding to average precision of the bird's-eye view and 3D detection. We divide our pipeline with \PRCNN into three sub networks: Depth, RPN and RCNN. $\surd$ means that we set the sub network trainable and use its corresponding loss in joint training. We note that the gradients of the later sub network would also back-propagate to the previous sub network. For example, if we choose Depth and RPN, the gradients of RPN would also be back-propogated to the Depth network. The best result per column is in {\color{blue} blue}. Best viewed in color.}

\caption{\small \textbf{Ablation studies on the quantization-based pipeline with \vPIXOR.} We report \APBEV at IoU $=0.5$ / $0.7$ (in \%) of the \textbf{car} category. We divide our pipeline into two sub networks: Depth and Detector. $\surd$ means we set the sub network trainable and use its corresponding loss in join training. The best result per column is in {\color{blue} blue}. Best viewed in color.}

\caption{\small\textbf{Qualitative results from the bird's-eye view.} The {\color{red}red} bounding boxes are the ground truth and the {\color{green}green} bounding boxes are the detection results. PL++ (image-only) misses many far-away cars and has poor bounding box localization. By applying end-to-end training, we get much accurate predictions (first and second columns) and reduce the false positive predictions (the third column).}

\caption{\textbf{3D object detection via the point-cloud-based pipeline with \PRCNN on Argoverse dataset.} We report \APBEV ~/ \AP (in \%) of the \textbf{car} category, using \PRCNN for detection. We arrange methods according to the input signals: S for stereo images, L for 64-beam LiDAR. PL stands for \PL. \emph{Results of our end-to-end \PL are in {\color{blue} blue}.} Methods with 64-beam LiDAR are in {\color{gray} gray}. Best viewed in color. \label{tb::argo}}

\caption{\small\textbf{End-to-end image-based 3D object detection:} We introduce a \emph{change of representation (CoR)} layer to connect the output of the depth estimation network as the input to the 3D object detection network. The result is an end-to-end pipeline that yields object bounding boxes directly from stereo images and allows back-propagation throughout all layers. Black solid arrows represent the forward pass; {\color{blue}Blue} and {\color{red}red} dashed arrows represent the backward pass for the object detection loss and depth loss, respectively. The * denotes that our \emph{CoR} layer is able to back propogate the gradients between different representations.}

318 — 2004.03090

\caption{Sample generated response on Syrian air strikes. \textbf{Bold} emphasizes specificity and topicality. \textcolor{red}{Red} denotes factually incorrect or inconsistent segments.}

\caption{Sample generated response. \textbf{Bold} emphasizes specificity and topicality. \textcolor{red}{Red} denotes factually incorrect or inconsistent segments.}

319 — 2004.03143

\caption{ Distribution of camera viewpoints relative to the human subject. We show the distribution of camera azimuth $(-180^\circ,180^\circ)$ and elevation $(-90^\circ,90^\circ)$ for 50k poses sampled from each representative dataset (\textcolor{blue}{\textbf{H36M}}, \textcolor{red}{\textbf{GPA}}, \textcolor{OliveGreen}{\textbf{SURREAL}}, \textcolor{Goldenrod}{\textbf{3DPW}}, \textcolor{black}{\textbf{3DHP}}). %(c): Distribution of view-independent body-centered pose, visualized %as a 2D embedding produced with UMAP \cite{mcinnes2018umap-software}. }

\caption{We visualize viewpoint distributions for train (3DHP) and test (\textcolor{blue}{H36M}) overlayed with the \textcolor{red}{reduction} in pose prediction error relative to baseline}

\caption{Model predictiosn on 5 datasets from model trained on Human3.6M dataset. The 2d joints are overlaid with the original image, while the \textcolor{red}{3d prediction (red)} is overlaid with \textcolor{blue}{3d ground truth (blue)}. 3D prediction is \textbf{visualized in body-centered coordinate} rotated by the relative rotation between ground truth camera-centered coordinate and body-centered coordinate. From top to bottom are \textsc{H36M, GPA, SURREAL, 3DPW} and \textsc{3DHP} datasets. We rank the images from left to right in order of increasing MPJPE.}

\caption{Our prediction on 5 diverse dataset with model trained on GPA dataset. The 2d joints are overlaid with the original image, while the \textcolor{red}{3d prediction (red)} is overlaid with \textcolor{blue}{3d ground truth (blue)}. 3D prediction is \textbf{visualized in body-centered coordinate} rotated by the relative rotation between ground truth root-relative coordinate and body-centered coordinate. From top to bottom are H36M, GPA, SURREAL, 3DPW and 3DHP datasets. We rank the images from left to right in MPJPE increasing order.}

320 — 2004.03160

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering. The left panel: predicted favoured values of $({m_{1}, \theta_{13}})$ plane for best fit values of $ \delta_{CP} = 222^{0}$ with $\Delta \chi^{2} =6.2$ w/o SK-ATM {\color{blue}\cite{pdg}} ( allowed by updated values of correct baryon asymmetry of the Universe) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix. Similarly, The right panel: predicted favoured values $({m_{1}, \delta_{CP}})$ plane, for best fit value of $ \theta_{13} = 8.41 $ with $\Delta \chi^{2}=9.5${\color{blue}\cite{pdg}}}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering: Contour plot for predicted favoured values of $({m_{1}, \theta_{13}})$ plane for best fit values of $ \delta_{CP} = 222^{0}$ of $\Delta_{\chi^{2}}=6.2$ w/o SK-ATM {\color{blue}\cite{pdg}} ( allowed by updated values of correct baryon asymmetry of the Universe) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix.}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering: The left panel: three dimensional plot of preferred values of $({m_{1}, \delta_{CP}})$ plane for best fit values of $ \theta_{13} = 8.41 $ of $\Delta \chi^{2}=9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}} (in the light of recent ratio of the baryon to photon density bounds $5.8\times 10^{-10} < \eta < 6.6 \times 10^{-10} $) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix. Similarly, The right panel: three dimensional plot of favourable values of $({m_{1}, \theta_{13}})$ plane, for best fit value of $ \delta_{CP} = 222^{0}$ of $\Delta \chi^{2}=6.2$ {\color{blue}\cite{pdg}} (in the light of recent ratio of the baryon to photon density bounds $5.8\times 10^{-10} < \eta < 6.6 \times 10^{-10} $).}

\caption{ Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering: The left panel: preferred values of $\delta_{CP}$ for best fit values of $ \theta_{13} = 8.41 $ of $\Delta \chi^{2} = 9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}} (in the light of recent ratio of the baryon to photon density bounds $ 5.8\ times 10^{-10} < \eta < 6.6 \times 10^{-10} $) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix. Similarly, the right panel: favoured values of lightest neutrino mass, $m_{1}$, for best fit value of $ \delta_{CP} = 222^{0}$ of $\Delta \chi^{2} = 6.2$ {\color{blue}\cite{pdg}} (in the light of recent ratio of the baryon to photon density bounds $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $).}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering: The right panel: preferred three dimensional regions of ($ \delta_{CP} $, $ \theta_{23} $, $ J_{CP} $) plane for best fit values of $ \theta_{13} = 8.41 $ of $\Delta \chi^{2} = 9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}}. The left panel: allowed two dimensional space of ($ \delta_{CP} $, $ J_{CP} $) plane for best fit values of $ \theta_{13} = 8.41 $ of $\Delta \chi^{2} = 9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}}.}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering: The right panel: preferred three dimensional regions of ($ \delta_{CP} $, $ \theta_{23} $, $ J_{CP} $) plane for favoured values of $ \delta_{CP} \in [303, 308]$ (in the light of recent ratio of the baryon to photon density bounds $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $) for best fit values of $ \theta_{13} = 8.41 $ of $\Delta \chi^{2} = 9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}}. The left panel: allowed two dimensional space of ($ \delta_{CP} $, $ J_{CP} $) plane for favoured values of $ \delta_{CP} \in [303, 308]$ (in the light of recent ratio of the baryon to photon density bounds $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $) for best fit values of $ \theta_{13} = 8.41 $ of $\Delta \chi^{2} = 9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}} (in the light of recent ratio of the baryon to photon density bounds $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $) .}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering: The right panel: preferred three dimensional regions of ($ \theta_{13} $, $ \theta_{23} $, $ J_{CP} $) plane for best fit values of $ \delta_{CP} = 222^{0} $ of $\Delta \chi^{2} = 6.2 $ w/o SK-ATM {\color{blue}\cite{pdg}}. The left panel: allowed two dimensional space of ($ \theta_{13} $, $ J_{CP} $) plane for best fit values of $ \delta_{CP} = 222^{0} $ of $\Delta \chi^{2} = 6.2 $ w/o SK-ATM {\color{blue}\cite{pdg}}}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering: The left panel depicts predicted three dimensional space of $(m_{1},\delta_{CP}, \theta_{13})$ for $ m_{ee} $ [eV], $ 0\nu\beta\beta $ decay for favoured values of $m_{1}$,$\delta_{CP}, \theta_{13}$ (in the light of recent ratio of the baryon to photon density bounds, $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $). The right panel depicts predicted three dimensional space of ($m_{1},\delta_{CP}, \theta_{13}$) for $ m_{ee} $ [eV], $ 0\nu\beta\beta $ decay for values of $m_{1}$,$\delta_{CP}, \theta_{13}$ in the given three $ \sigma $ range, corresponding to $\Delta \chi^{2} = 6.2 $ and $\Delta \chi^{2} =9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}}.}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Normal ordering. Left panel depicts predicted three dimensional space of $(m_{ee},\delta_{CP}, m_{1})$ for $ m_{ee} $ [eV], $ 0\nu\beta\beta $ decay for favoured values of $m_{1}$,$\delta_{CP}, \theta_{13}$ (in the light of recent ratio of the baryon to photon density bounds, $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $). Left panel depicts predicted three dimensional space of $(m_{ee},\delta_{CP}, m_{1})$ for $ m_{ee} $ [eV], $ 0\nu\beta\beta $ decay for favoured values of $m_{1}$,$\delta_{CP}, m_{ee}$ for values of $\delta_{CP}$ in the given three $ \sigma $ range, corresponding to $\Delta \chi^{2} =9.5 $ w/o SK-ATM {\color{blue}\cite{pdg}}}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Inverted ordering. The left panel: predicted favoured values of $ \delta_{CP}$ (in the light of recent ratio of the baryon to photon density bounds, $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $) for lightest $ \nu $ mass $ m_{3} = 0.0657 $ eV, as a result of contribution of type I Seesaw mechanism to neutrino mass matrix. Similarly, The right panel: predicted allowed three dimensional space of $(\delta_{CP}, \theta_{13},J_{CP})$ plane for allowed regions of Jarkslog invariant, $ J_{CP} $ values for for best fit value of $ \theta_{23} = 48.6^{0} $ of $\Delta \chi^{2} = 6.2$ {\color{blue}\cite{pdg}} as a result of contribution of type I Seesaw mechanism to neutrino mass matrix.}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Inverted ordering: Three dimensional plot for predicted favoured values of $(m_{3}, \delta_{CP}, \eta )$ plane for best fit values of $ \theta_{13} = 8.49^{0}$ of $\Delta \chi^{2}= 9.5$ w/o SK-ATM {\color{blue}\cite{pdg}} (in the light of recent ratio of the baryon to photon density bounds, $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix.}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Inverted ordering. The left panel: density plot of predicted favoured values of $({m_{3}, \theta_{13}})$ plane for best fit values of $ \delta_{CP} = 285^{0}$ of $\Delta \chi^{2}=6.2$ w/o SK-ATM {\color{blue}\cite{pdg}} ( allowed by updated values of correct baryon asymmetry of the Universe) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix. Similarly, The right panel: Three dimensional plot for predicted favoured values of $(\theta_{13}, \delta_{CP}, \eta )$ plane for lightest $ \nu $ mass, $ m_{3} = 0.0657$ (in the light of recent ratio of the baryon to photon density bounds, $ 5.8 \times 10^{-10} < \eta < 6.6 \times 10^{-10} $) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Inverted ordering. The left panel: density plot of predicted favoured values of $({m_{3}, \theta_{13}})$ plane for best fit values of $ \delta_{CP} = 285^{0}$ of $\Delta \chi^{2}=6.2$ w/o SK-ATM {\color{blue}\cite{pdg}} ( allowed by updated values of correct baryon asymmetry of the Universe) as a result of contribution of type I Seesaw mechanism to neutrino mass matrix. Similarly, The right panel: The predicted two dimensional space of $(m_{3}, \theta_{13})$ for $ m_{ee} $ [eV], $ 0\nu\beta\beta $ decay, for best fit values of $ \delta_{CP} = 285^{0}$ of $\Delta \chi^{2}= 6.2$ w/o SK-ATM {\color{blue}\cite{pdg}}}

\caption{Predictions in broken $ \mu-\tau $ symmetry model for Inverted ordering: predicted allowed three dimensional space of $(delta_{CP}, \theta_{23},J_{CP})$ plane for allowed regions of Jarkslog invariant, $ J_{CP} $ values for for best fit value of $ \theta_{13} = 8.49 $ of $\Delta \chi^{2} =9.5$ {\color{blue}\cite{pdg}} as a result of contribution of type I Seesaw mechanism to neutrino mass matrix}

321 — 2004.03306

\caption{ \textsc{Left panel}: Differential energy spectrum of the VHE \gray\emission of\kuv. \textsc{Right panel}: Differential energy spectrum of the VHE \gray\emission of\pks. \textit{Upper plot}: Time-averaged VHE spectrum measured from the two sources. Overlaid spectral points were rebinned, requiring a minimum point significance of 2$\sigma$ per bin. The butterfly represents the $1 \sigma$ confidence level error band of the fitted spectrum using a power-law hypothesis. \textit{Lower plot}: Residuals of the reconstructed data points compared to the model. }

\caption{ \textsc{Top Panel}: Averaged multi-wavelength spectral energy distribution of \kuv. \textsc{Bottom Panel}: Contemporaneous multi-wavelength spectral energy distribution of \pks. In both plots the \hess\spectrum is represented by the filled butterfly and purple points at the highest energies, indicating 1$\sigma$ statistical errors. The measured \fermi\spectrum is represented by the blue dotted lines and points (see Sec.\\ref{fermianalysis} for details). Upper limits are calculated if the significance of the energy bin is less than 3 $\sigma$. The \fermi\butterflies include only statistical errors.}

\caption{ \textsc{ Left panel}: Optical depth as a function of energy for \kuv. The red line represents the upper limit at the 95\% CL as derived from the combined \fermi\and\hess\data using\refeq{eq:tau2} which includes the \fermi\errors. Additionally shown by dashed and dot-dashed lines are the EBL model predictions from\citet{dom11} and \citet{fra08} that correspond to the upper limits on the redshift. \newline \textsc{ Right panel}: Optical depth as a function of energy for \pks. The red line represents the upper limit at the 95\% CL as derived from the combined \fermi\and\hess\data using\refeq{eq:tau2}. Additionally shown by dashed and dot-dashed lines are the EBL model predictions for the two upper limits on the redshift. }

322 — 2004.03313

\caption{a) Top and side view of the relaxed BP monolayer. Blue arrows mark the armchair and the zigzag axis. b) Top view of the hBN monolayer. The non-unitary cell of hBN is drawn with a dashed line. c) Top and side view of \Pblue{ }monolayer. All structures are visualized with the VESTA package~\cite{vesta3}.}

\caption{Stability curve of the monolayers. Black Circles: BP strained along the armchair axis; Black squares: BP strained along the zigzag axis; Black upward-pointing triangles: sheared BP biaxially strained; Blue downward-pointing triangles: \Pblue{} biaxially strained; Red diamonds: hBN biaxially strained. Dashed colored lines: fit of ~\ref{eq:energy-curve}.}

\caption{Band structure of the isolated \Pblue{ } monolayer biaxially strained.}

\caption{Density of states of the isolated hBN and \Pblue{ }monolayers (top panel) and the H-cell. In the bilayer (bottom panel), the \Pblue{ } and the hBN curves refer to the monolayer contributions to the total DOS. In red solid line, the deformed \Pblue{ } monolayer contribution. The top valence of all materials are aligned at 0. All data have been broadened with a Gaussian of $\sigma=0.05$ eV.}

323 — 2004.03354

\caption{\best{Top:} Examples of within-space and cross-space nearest neighbors (NNs) by cosine similarity in GreenBioBERT's wordpiece embedding layer. \textcolor{blue}{Blue}: Original wordpiece space. \textcolor{mygreen}{Green}: Aligned Word2Vec space. \best{Bottom:} Biomedical NER test set precision / recall / F1 (\%) measured with the CoNLL NER scorer. \best{Boldface}: Best model in row. \underline{Underlined}: Best inexpensive model (without target-domain pretraining) in row.}

324 — 2004.03548

\caption{ \textbf{Information Flow:} \textbf{Black arrows} illustrate the aggregation directions while the \textcolor{orange}{orange arrows} denote the IO stream from \emph{Temporal Modulation} to \emph{Final Prediction} of Figure \ref{fig:framework}. The channel dimensions and up/downsample operations are omitted. }

325 — 2004.03597

\caption{Results on JHU-CROWD++ dataset (\textbf{``Val Set''}). {\color[HTML]{CB0000}{\ul \textbf{RED}}} indicates best error and {\color[HTML]{3531FF} \textbf{BLUE}} indicates second-best error.}

\caption{Results on JHU-CROWD++ dataset (\textbf{``Test Set''}). {\color[HTML]{CB0000}{\ul \textbf{RED}}} indicates best error and {\color[HTML]{3531FF} \textbf{BLUE}} indicates second-best error.}

326 — 2004.03708

\caption{Analysis of contrastive representation. Column \texttt{Contrastive + Group} is the prediction of our full model. Column \texttt{Group} and column \texttt{Contrastive} are the predictions when only the group or only the contrastive representation is fed into the decoder respectively. \textcolor{blue}{Blue} text denotes the common part while \textcolor{red}{red} text denotes the contrastive part.}

\caption{Analysis of contrastive representation. Column \texttt{Contrastive + Group} is the prediction of our full model. Column \texttt{Group} and column \texttt{Contrastive} are the predictions when only the group or only the contrastive representation is fed into the decoder respectively. \textcolor{blue}{Blue} text denotes the common part while \textcolor{red}{red} text denotes the contrastive part.}

\caption{Examples of only using group representation or only using contrastive representation (Corresponding to Table 4 in the main paper). As shown, common information in both image groups (\textcolor{blue}{blue} text) is encoded in the group representation, while the difference between two groups (\textcolor{red}{red} or \textcolor{orange}{orange} text) is in contrastive representation. The first four examples are good cases while the last two examples are failure cases.}

327 — 2004.03742

\caption{{\small Two adversarial examples generated by \texttt{AdvChar} for Chinese BERT classifiers on the THUCTC and Wechat Finance datasets. Simply replacing \textcolor{blue}{one character} with \textcolor{red}{another} can lead the \textcolor{blue}{correct prediction} to a \textcolor{red}{wrong one}.}}

328 — 2004.03774

\caption{Inputs of recommendation systems. \textcolor{red}{need to be modified}}

329 — 2004.03828

\caption{Illustration of how the feature statistics of the feature maps are affected by their computed regions. (a) Generation result. (b-d) Learned attention maps of our method on ImageNet dataset \cite{deng2009imagenet}. Their above tuples indicate the computed \textcolor{blue}{mean} and \textcolor{red}{standard deviation} on the corresponding $32 \times 32$ feature maps. The statistics are calculated on the whole region of (a) and are only processed on the highlighted regions of (b-d).}

330 — 2004.03839

\caption{The subplots of Figure (a) from top to bottom are five component signals, mixing stimulated signals that are divided for training({\color{blue}blue}) and testing({\color{red}red}), and the supervised signals. Figure (b) shows the testing MSE of FT0 with a variety of activations ($sigmoid$, $tanh$, $modReLU$, $zReLU$, and $PReLU$) evolves as training iteration increases. Figure (c) shows the comparative results about FT1. Figure (d) displays the effect of neuron quantity on the performance of our FTNet.}

\caption{Figure (a) plots the simulated signals, that is, from top to bottom, the cos function with a period of 3, the sin function with a period of 3, and the mixture functions over 300 timestamps. Figure (b)-(d) display the relation of supervised signals ({\color{red}red}), memory ({\color{magenta}magenta}), and prediction ({\color{blue}blue}), respectively.}

331 — 2004.03915

\caption{\footnotesize Quantitative results in comparison with the state-of-the-art methods. Best three methods are highlighted by {\color{red}red}, {\color{blue}blue} and {\color{green}green}, respectively.}

332 — 2004.03931

\caption{Schematic picture of the simplex volume minimization, {\color{red} which is based on Figure 1 in \cite{2015ITGRS..53.5530L} \citep[see also][]{2017AJ....154..189F}}. The black dots represent the observed data and the three triangles indicate a simplex that encloses all of the data points. The dashed triangle is the simplex whose volume is minimized. The end members are defined by three vertices of the dashed triangle. \label{fig:volmin}}

\caption{Left: Input map of a toy model. The three colors indicate the different surface types, land, vegetation, and ocean, corresponding to white, gray, and black, respectively. Right: Color composite map for the same model. {\color{red} The color composite is based on the retrieved components in Figure \ref{fig:ref}; the components 0, 1, and 2 correspond to green, orange, and blue, respectively. } \label{fig:map}}

\caption{Retrieved maps for different unmixed components 0, 1, and 2 from {\color{red} left to right}. We adopt L2-VRDet model with $\lambda_A=10^{-1}$ and $\lambda_X = 10^{2}$. \label{fig:C}}

\caption{The residuals, the surrogate of the normalized spectral volume, mean MSRA, and CPR as a function of $\lambda_X$ from top to bottom. {\color{red} We fix $\lambda_A = 10^{-1}$ in these panels.} \label{fig:regx}}

\caption{Unmixed spectra for $\lambda_X=10^{-2},10^2$, and $10^{4}$. The color is the same as Figure \ref{fig:ref}. {\color{red} We fix $\lambda_A = 10^{-1}$ in these panels. } \label{fig:regxf}}

\caption{Mean residual, a surrogate of the normalized spectral volume, mean MSRA and CPR as a function of $\lambda_A$ from top to bottom. {\color{red} We fix $\lambda_X=10^2$ in these panels. }. \label{fig:rega}}

\caption{Example of the color composite map for insufficient spatial regularization ( $\lambda_A=10^{-3}$). We adopt $\lambda_X = 10^{2}$ {\color{red} to make this figure}. \label{fig:regaf}}

\caption{Mean residual and the surrogate of the normalized spectral volume as functions of $\lambda_A$ (top; $\lambda_X = 10^{-4.5}$) and $\lambda_X$ (bottom; $\lambda_A = 10^{-2}$). We take $10^{-2}$ as the optimal value of $\lambda_A$ because of a significant increase at $\lambda_A=10^{-1.5}$. Also, we take $10^{-4.5}$ as the optimal value of $\lambda_X$ because of a significant increase {\color{red} in the mean residual} at $\lambda_X=10^{-4}$. \label{fig:regdd}}

\caption{{\color{red} Normalized} unmixed spectra (top) and color composite map (bottom) for the DSCOVR data. In the top panel, both 0.688 and 0.764 $\mu$m bands are strongly affected by oxygen absorption (shaded by blue vertical lines). The bottom panel shows a color composite map. We use white, blue, green, and brown for components 0, 1, 2, and 3, respectively. \label{fig:dscovr}}

333 — 2004.04092

\caption{Sentence transfer via arithmetic operation in the latent space. The output sentences are in \textcolor{blue}{blue}. In this example, we see content transition from {\em relaxing} to {\em working}. }

\caption{Sentence transfer via arithmetic operation in the latent space. The output sentences are in \textcolor{blue}{blue}. In this example, we see two type of style transition: (1) from singular to plural subject, and (2) from daily-life activity to sport. }

\caption{Sentence transfer via arithmetic operation in the latent space. The output sentences are in \textcolor{blue}{blue}. In this example, we see two type of style transition: (1) from plural/old to singular/young subject, or and (2) sentences are expended. }

334 — 2004.04315

\caption{{\color{red} Do we really need this Figure?}}

335 — 2004.04494

\caption{Error analysis. {\textcolor[rgb]{0.26,0.45,0.77} \XSolidBrush} indicates RoBERTa-MC's prediction.}

336 — 2004.04725

\caption{Seq-BBP: {\color{blue}blue}, {\color{yellow}yellow}, and {\color{green}green} blobs represent activation, gradients, and the module that is being updated.}

337 — 2004.04730

\caption{{\textbf{{X3D}}} networks progressively {\color{expandcolor}{expand}} a {\color{gray}{2D network}} across the following axes: Temporal duration \gat, frame rate \gatau, spatial resolution \gaxy, width \gaw, bottleneck width \gab, and depth \gad. }

\caption{\textbf{Accuracy/complexity trade-off} on Kinetics-400 for different number of inference clips per video. The top-1 accuracy (vertical axis) is obtained by \clipscolor{$K$}-Center clip testing where the number of temporal clips $\clipscolor{K}\in \{\clipscolor{1,3,5,7,10}\}$ is shown in each curve. The horizontal axis shows the full inference cost per video. }

\caption{\textbf{Accuracy/complexity trade-off} on K400-\textbf{val} (top) \&\textbf{test} (bottom) for varying \# of inference clips per video. The top-1 accuracy (vertical axis) is obtained by\clipscolor{$K$}-Center clip testing where the number of temporal clips $\clipscolor{K}\in \{\clipscolor{1,3,5,7,10}\}$ is shown in each curve. The horizontal axis measures the full inference cost per video. The left-sided plots show a linear and the right plots a logarithmic (\textbf{log}) scale. }

338 — 2004.04750

\caption{Different processes used to constrain $ L _i - L _j $ vectors with diagrams shown as computed in the Goldstone boson equivalent theories. We show semi-leptonic meson decay ({\bf \color{blue!40!black} top-left}), neutrino decay ({\bf \color{red!40!black}bottom-left}), neutrinoless double beta decays ({\bf \color{green!40!black}top-right}), and neutrino annihilations ({\bf \color{orange!60!black}bottom-right}). Black dots indicate the $ L _i - L _j $ Goldstone boson emission point. For neutrinoless double beta decay, the internal neutrinos are in the flavor basis as emphasized by the $ \hat{ {}} $~.}

\caption{Constraints on $ L _\mu - L _\tau $ for Majorana neutrinos. Previous constraints are shown in {\bf \color{cgray} gray} and are from neutron star binaries~\cite{Dror:2019uea}, black hole superradiance~\cite{Baryakhtar:2017ngi}, and $ \Delta N _{ {\rm eff}} $ measured through Big Bang Nucleosynthesis, with the latter depending on whether the universe reheated above the muon mass~\cite{Escudero:2019gzq} (dashed) or below~\cite{Huang:2017egl} (solid). Enhanced constraints come from unitarity ({\bf \color{corange} orange}) meson decays ({\bf \color{cblue} blue}) with $ K \rightarrow \ell \nu X $ being most stringent, $ \Delta N _{ {\rm eff}} $ measured with Big Bang Nucleosynthesis due to enhanced neutrino annihilations ({\bf \color{cgreen} green}), and neutrino decays ({\bf \color{cred} red}). The neutrino decay bounds arise from both terrestrial (solid) and cosmological (dashed) searches, the latter which assumes the neutrinos are present during recombination.}

\caption{Constraints on $ L _e - L _\mu $ in the case of Majorana neutrinos. In {\bf \color{cgray} gray} are previous constraints from fifth forces~\cite{Wise:2018rnb}, deviations to neutrino oscillation data~\cite{Bustamante:2018mzu}, and black hole superradiance~\cite{Baryakhtar:2017ngi}. The enhanced constraints come from unitarity ({\bf \color{corange} orange}), meson decays ({\bf \color{cblue} blue}), dominated by $ K \rightarrow \ell \nu X $, neutrinoless double beta decay searches ({\bf \color{cpink} pink}), Big Bang Nucleosynthesis/supernova ({\bf \color{cgreen}green}), and neutrino decays ({\bf \color{cred} red}). Neutrino decay bounds depend on whether or not neutrinos are present during recombination (dashed vs solid)}

\caption{Constraints on $ L _i - L _j $ for Majorana and Dirac neutrinos. Previous constraints ({\bf \color{cgray} gray}) are from black hole superradiance~\cite{Baryakhtar:2017ngi}, neutron star binaries~\cite{Dror:2019uea}, fifth forces/Equivalence principle tests~\cite{Wise:2018rnb}, deviations to neutrino oscillations~\cite{Bustamante:2018mzu}, and $ \Delta N _{ {\rm eff}} $ measured through Big Bang Nucleosynthesis, with the latter depending on whether the universe reheated above the muon mass~\cite{Escudero:2019gzq} (dashed) or below~\cite{Huang:2017egl} (solid). The enhanced constraints come from unitarity ({\bf \color{corange} orange}), meson decays ({\bf \color{cblue} blue}), dominated by $ K \rightarrow \ell \nu X $, neutrinoless double beta decay searches ({\bf \color{cpink} pink}), Big Bang Nucleosynthesis/supernova ({\bf \color{green} green}), and neutrino decays ({\bf \color{cred} red}).}

339 — 2004.04807

\caption{Ground truth training (\textcolor{blue}{blue}) and test (\textcolor{green}{green}) trajectories of our ambiguous scenes and example RGB images.}

340 — 2004.04849

\caption{Output of two hypothetical algorithm $\algA$\and$\algB$\, on the same dataset.\colorbox{lightgreen}{Success} and \colorbox{lightred}{failures} are color-coded. }

\caption{ Model accuracy ($y$-axis) with a fixed total budget and varying cluster size $c$ ($x$-axis), for two cases: (i) \yellowtext{$r=1.0$} denoting a fixed total number of \emph{questions}, (ii) \redtext{$r=0.0$} denoting a fixed total number of \emph{clusters}. %for varying sizes of clusters (hence, varying instance budget). (the yellow curve) , for varying sizes of the clusters (hence, varying cluster budget). % $x$-axis indicates the maximum cluster size $c$? and $y$-axis is the accuracy of the model. % Human and model accuracies on different \boolqc{c} datasets with the cluster budget, $c$ shown on the $x$-axis as $\leq c$. As the cluster budget increases, the human-machine gap increases for the RoBERTa model. The qOnly classifier also performs worse on \boolqc{50}, showing the reduction in identifiable artifacts with increasing cluster size. }

\caption{ Model accuracy ($y$-axis) with a fixed total budget $b$ and varying cost ratio $r$ ($x$-axis), in two cases: (i) \bluetext{$c=1$} denoting singleton clusters, (ii) \greentext{$c=4$} % \ashish{changed this and text reference to green; TODO: update line in plots} denoting cluster size 4. The smaller the cost, the higher the returns from clustered questions. % Human and model accuracies on different \boolqc{c} datasets with the cluster budget, $c$ shown on the $x$-axis as $\leq c$. As the cluster budget increases, the human-machine gap increases for the RoBERTa model. The qOnly classifier also performs worse on \boolqc{50}, showing the reduction in identifiable artifacts with increasing cluster size. }

341 — 2004.04858

\caption{Real-world datasets used in the experiments. In columns 1 and 2, we report names and descriptions of the hardware designs used to generate the simulation traces. In columns 3 and 4, we give the number of primary inputs resp.\of primary outputs. In column 5 we report the length of the simulation trace, and in columns 6 and 7 the size of the alphabet and the number of colors, respectively. For each design we fixed a color$y$, and report in col.\8 number$n_y$ of $y$ characters.\label{Table:real data sets}}

342 — 2004.05439

\caption{Classification of research papers according to our taxonomy. We use color to indicate salient meta-objective or application goal focus. We focus on the main goal of each paper for simplicity. The color code is as follows: \red{sample efficiency} (red), \green{learning speed} (green), \purple{asymptotic performance} (purple), \blue{cross-domain} (blue).}

343 — 2004.05615

\caption{Additional \aastex\symbols}

344 — 2004.05703

\caption{The CPU time of each step of training models or conducting inference on CIFAR-100 and ImageNet Tiny, protecting consecutive last layers using TrustZone (For example: when putting the last layers in the TrustZone, $1$ refers to the cost function and the softmax layer, $2$ includes $1$ and the previous fully-connected layer, $3$ includes $2$ and the previous convolutional layers, etc. Horizontal dashed lines (~\dottedred~and~\dottedblue~) represent the baseline where all layers are out of the TrustZone. 20 times for each trial, and error bars are 95\% CI. Several error bars of data points are invisible as they are too small to be shown in this figure as well as the following figures).}

\caption{The CPU time of each step of training models or conducting inference on CIFAR-100, protecting consecutive last layers using TrustZone (Note: The x-axis corresponds to several last layers included in the TrustZone. \emph{CT}, \emph{SM}, \emph{FC}, \emph{D}, \emph{MP}, and \emph{C} refer to the cost, softmax, fully connected, dropout, maxpooling, convolutional layers. 1, 2, 3, 4, and 5 in the x-axis are corresponding to the x-axis of Figure~\ref{fig:CPU_cost}. Horizontal dashed lines (~\dashedblack~) represent the baseline where all layers are out of the TrustZone. 20 times for each trial, and error bars are 95\% CI).}

\caption{The CPU execution time in user mode and kernel mode of each step of training the model or conducting inference on CIFAR-100, protecting consecutive last layers using TrustZone (Note: Horizontal dot-dashed lines (~\dashedred~) represent the baseline where all layers are out of the TrustZone. 20 times for each trial. CPU time in user mode in Figure \ref{fig:SM_exe_time_inference_details} is too small to be shown).}

\caption{The memory usage and power consumption of training models, while conducting training or inference on CIFAR-100 and ImageNet Tiny, protecting consecutive last layers using TrustZone (Note: Horizontal dashed lines (~\dottedred~and~\dottedblue~) % represent the baseline where all layers are outside the TrustZone. 20 times for each trial, error bars are 95\% CI).}

345 — 2004.05919

\caption{Phase diagram of the assembly of ellipses in a diblock copolymer with $f_0=0.3$. The number of particles is explored in the X axis $\phi_p$ and the strength of the interparticle potential is tuned via $U_0$. Markers relate to the value of the orientational order parameter as: blue cross {\color{blue} x} for $S<0.01$; red dots {\color{red} $\boldsymbol{\cdot}$} for $0.01<S<0.3$; and black plus sign {\color{black} +} for $S>0.3$ }

346 — 2004.06038

\caption{State-of-art picture of the canalized and polarization-degenerate surface waves at the self-complementary metasurface under study. The \textcolor{red}{red} and \textcolor{blue}{blue} arrows demonstrate the instantaneous direction of \textcolor{red}{magnetic} and \textcolor{blue}{electric} fields of \textcolor{red}{TM} and \textcolor{blue}{TE} surface plasmons excited by vertical \textcolor{red}{electric (probe)} and \textcolor{blue}{magnetic (loop)} dipole-like sources, respectively. The conical shapes schematically show the field amplitude sharply decreasing with distance from a metasurface. The white arrows correspond to the wave propagation directions emulating the canalization propagation regime of the surface plasmon-polaritons. }

347 — 2004.06305

\caption{Qualitative image search results using the vehicle query images from the CityFlow dataset. We select the four query images from different viewpoints, \ie, the front view, the overhead view, the rear view and the side view. The results are sorted from left to right according to the similarity score. The true-matches are in \textcolor{ForestGreen}{green}, when the false-matches are in \textcolor{red}{red}. }

348 — 2004.06323

\caption{Sample resistance as function of the source-drain voltage V$_{\text{SD}}$. Plots are given for three sample labeled by their thickness. The nominal field used for ohmic-regime measurement was typically F=1V\textperiodcentered m$^{\text{-1}}$. The source-drain separation for all these samples was 1mm.The upper plot is taken on the same sample as in Fig.3 above.}

349 — 2004.06438

\caption{A sketch of the proposed model. {\color{blue}Blue circles} are item keywords, {\color{red}red circles} are query words, {\color{orange}orange circles} are the associated words retrieved by the Association module from the Association Knowledge Graph. The Generation module is designed to generate advertisements based on the extended sub-graph.}

350 — 2004.06465

\caption{\label{tab:lime_examples}Examples showing word with the highest predictive word for both \textcolor{red}{\textbf{mBERT}} and \textcolor{blue}{\textit{LASER + LR}}.}

351 — 2004.06496

\caption{ Illustration of $\bm{\epsilon}$-Ball, $\mathcal{B}_p(\bm{s},\bm{\epsilon})$, for $n=2$. Let $\bm{\epsilon}=[\epsilon_1, \epsilon_2]$. Depending on the choice of $L_p$ norm (e.g., \textcolor{\lonecolor}{\boldmath{$L_1$}}, \textcolor{\ltwocolor}{\boldmath{$L_2$}}, and \textcolor{\linfcolor}{\boldmath{$L_\infty$}}), $\mathcal{B}_p(\bm{s},\bm{\epsilon})$ is the set of points inside the corresponding colored outline. The adversary can perturb nominal observation $\bm{s}$ to any point inside $\mathcal{B}_p(\bm{s},\bm{\epsilon})$. The values of $\{n,\bm{\epsilon},p\}$ are application-specific choices and the components of $\bm{\epsilon}$ need not be equal. }

352 — 2004.06502

\caption{Overview of our proposed \SHORTTITLE{}~model: given an input video sequence, we first decompose it to the content by a Content Encoder and the style by a Style Encoder. Then the content is processed by special RNN units, namely TrajGRUs \cite{shi2017deep} in order to get the content used for translation and interpolation in a recurrent manner. Finally, the translation content and the interpolation content are decoded to the translated video and the interpolated video together with the style latent variable. We also show the video adversarial loss ({\color{orange} orange}), the cycle consistency loss ({\color{violet} violet}), the video interpolation loss ({\color{green!60!black} green}) and the style encoder loss ({\color{blue} blue}) }

353 — 2004.06549

\caption{\gray{} spectral parameters (log-parabola and power-law models) for the three considered sources according to the Fourth Fermi-LAT Catalog \cite{2019arXiv190210045T}. $E_{0}$ is the pivot energy [MeV]; $\alpha$ is the photon index of the log-parabola model; $\beta$ is the curvature; $K_{\rm lp}$ is the normalization of the log-parabola model [$10^{-12}$~ph cm$^{-2}$ s$^{-1}$ MeV$^{-1}$] at the pivot energy; $\Gamma$ is the photon index of the power-law model; $K_{\rm pl}$ is the normalization of the power-law model [$10^{-12}$~ph cm$^{-2}$ s$^{-1}$ MeV$^{-1}$] at the pivot energy. \label{tab:sample}}

\caption{Scatter-plot of the two spectral parameters ($\alpha$ and $\beta$) for the different object categories (see Figure~\ref{Fig:AlphaBetaScatter} for the description of the different symbols). The black filled squares represent the 4FGL values, while the black filled triangles represent the value of the photon index $\Gamma$ for a power-law fit in 4FGL. The black open circles represent the value of the photon index $\Gamma$ for the different sources in the high \gray{} state as reported in~\cite{2020arXiv200211737R}. Blue rectangles represent the intervals, in $\alpha$ and $\beta$, defined by the red and green dashed curves in Figure~\ref{Fig:SedBoettcherLogPar0846}. (\textbf{a}) SBS~0846$+$513. (\textbf{b}) PMN~0948$+$0022. (\textbf{c}) PKS~1502$+$036. \label{Fig:AlphaBetaBoxes}}

\caption{\gray{} spectral parameters and fluxes (log-parabola models) for the three considered sources. {\bf Model} refers to the red or green models in Figures~\ref{Fig:SedBoettcherLogPar0846}, \ref{Fig:SedBoettcherLogPar0948}, and~\ref{Fig:SedBoettcherLogPar1502}. $F_{\rm (140-200)\,GeV}^{\rm mod}$ and $F_{\rm (200-280)\,GeV}^{\rm mod}$ are the fluxes in units of [$10^{-13}$~erg cm$^{-2}$ s$^{-1}$], while $F_{\rm (140-200)\,GeV}^{\rm sim}$ and $F_{\rm (200-280)\,GeV}^{\rm sim}$ are those reported in Tables~4, 5, and 7 of~\cite{2020arXiv200211737R}, same units.\\ Notes: $^{(1)}$: curve with $\alpha=1.95$ and $\beta=0.05$. $^{(2)}$: curve with $\alpha=1.95$ and $\beta=0.07$.}

354 — 2004.06870

\caption{ Examples from QUOREEF~\citep{QUOREF}. {\color{blue}\bf \textit{Answers from BERT$_{\texttt{Base}}$}}, {\color{red} \bf Answers from CorefBERT$_{\texttt{Base}}$}, and {\color{rel} \bf \texttt{Clue}} are colored accordingly. }

355 — 2004.06965

\caption{Average PSNR and SSIM values for various UDVD configurations on Set5. Degradation parameters include scaling factor $\times3$, kernel width $1.3$ and noise level $15$ . The best results are highlighted in \textcolor{red}{red} color.}

\caption{Average PSNR values on variations of multiple degradations. We use the provided official code to compute the results, except IRCNN. For IRCNN, results are extracted from the publication~\cite{IRCNN}. The best results are highlighted in \textcolor{red}{red} color.}

\caption{Average PSNR values on spatial variations of degradations. We use the official code of SRMD to compute its results. The best results are highlighted in \textcolor{red}{red} color.}

\caption{Average PSNR values on noise-free degradations. We use the official code to compute the results, except SFTMD and IRCNN. For SFTMD and IRCNN, results are extracted from the publications~\cite{SFTMD, IRCNN}. The best two results are highlighted in \textcolor{red}{red} and \textcolor{blue}{blue} colors.}

\caption{Average PSNR/SSIM values on fixed degradations. “$*$” indicates a unified model for \textbf{BI} and \textbf{DN}. We use the provided official code to compute the results, except SRCNN and VDSR. For SRCNN and VDSR, results are extracted from the publications~\cite{SRCNN, VDSR, RDN}. The best two results are highlighted in \textcolor{red}{red} and \textcolor{blue}{blue} colors.}

356 — 2004.07112

\caption{Comparison between experimental ({\color{blue}blue} curve) and numerical simulation ({\color{red}red} curve) $u_\phi$ time series at mid-gap~($r_{in}+d/2$) and mid-height position~($H/2$). Please, note that the time intervals are different in figures (a) and (b).}

357 — 2004.07195

\caption{\label{fig:schemat} (a) The phase waveform used for time-lensing. One can use either of the two parabolic extrema. Parabola no.\\textcolor{knof1}{(1)} needs to be combined with normal dispersion, e.g.,\SMF28e optical fiber. Parabola no.\\textcolor{knof1}{(2)} requires anomalous dispersion, e.g.,\a dispersion compensating fiber. The phase waveform was measured using the electro-optic sampling technique\cite{Wu95,Jachura18}. (b) Experimental setup, see text for details. }

\caption{\label{fig:kwantowe} Spectra of non-modulated (\textcolor{knof2}{orange}), modulated single-photon pulses acquired in 30 minutes (\textcolor{knofgreen}{green}) and 24\,hours (\textcolor{knof1}{blue}), in both cases without any feedback loop. The inset shows the drift of the peak of the spectrum measured at 5\,minute intervals. The maximal observed drift rate was 60\,pm/h. It is associated with environment instabilities, such as temperature, atmospheric pressure or humidity.}

\caption{\label{fig:aperiodic} (a) Oscilloscope trace of the double pulse train monitored with a photodiode. Delay $\Delta \tau$ is defined as the delay between two pulses from a single pair. (b) The enhancement (ratio of spectral intensity peaks with and without modulation, see text) obtained by using the aperiodic time lens (experimental values) as compared to the standard sinusoidal time lens with matched parameters, $f_{\mathrm{m}} = 9\;\mathrm{GHz}$ (simulation). The spikes in enhancement for the sinusoidal time lens are achieved for $\Delta \tau$ being multiples of the sine period $f_{\mathrm{m}}^{-1}$. The blue dashed line and blue shade shows an average and standard deviation of enhancement of aperiodic time lens. (c) The output spectra with modulation turned off (\textcolor{knof2}{orange}), with modulation applied to both pulses (\textcolor{knof1}{blue}), or to only on one of them (\textcolor{knofgreen}{1 -- green}, \textcolor{knofpurple}{2 -- purple}). }

358 — 2004.07283

\caption{Parameter evolution for double-Schechter function fits to the \flares\composite galaxy stellar mass function (GSMF, blue) and star formation rate function (SFRF, orange). The low (1) and high (2) mass components are shown with solid and dashed lines, respectively. Shaded regions show the$16^{\mathrm{th}}-84^{\mathrm{th}}$ percentile uncertainty obtained from the fit posteriors (see \App{fitting} for details) The low-mass slope of the high-mass component ($\alpha_{2}$) is fixed at -1. The characteristic mass of the GSMF ($\mathrm{M_\star}$) and the characteristic SFR of the SFRF ($\SFR_*$) are shown in the bottom panel, labelled $D_*$. $\SFR_*$ is offset by $+10^{8}$ to aid comparison with $M_\star$. The GSMF and SFRF show very similar behaviour; the normalisation of both components and the low-mass slope all increase with decreasing redshift. The characteristic mass increases with decreasing redshift for the GSMF, whereas the characteristic star formation rate of the SFRF shows a flatter redshift relation. }

\caption{Evolution of the \flares\composite star formation rate distribution function (coloured, solid lines), compared with observational constraints from UV data and other model predictions.\protect\cite{smit_star_2012} derive SFRs from UVLF data, as do \protect\cite{katsianis_evolution_2017} using \protect\cite{bouwens_uv_2015} data. Both are corrected to a Chabrier IMF using the conversion factors quoted in \protect\cite{kennicutt_jr_star_2012}. The Santa-Cruz SAM \protect\citep[][dashed line]{yung_semi-analytic_2019} and \bluetides\simulation\protect\citep{wilkins_properties_2017} show a different behaviour, with a power law shape at higher redshifts, in contrast to the prominent knee seen in \flares\up to$z = 10$. Both \lgals\models also show similar behaviour, though with lower normalisation at the high-SFR end\citep{henriques_galaxy_2015,henriques_l-galaxies_2020}. }

\caption{Cumulative distribution of stellar masses for all \flares\regions combined (solid) and the fiducial\eagle\Reference volume (dashed).}

\caption{Dark matter element resolution against simulated volume. The colour of individual points describes the approximate number of resolution elements (dark matter + baryonic gas, excluding stars). We show the following simulation projects: GIMIC \citep{crain_galaxies-intergalactic_2009}, EAGLE \protect\citep{schaye_eagle_2015,crain_eagle_2015}, CROC \protect\citep{gnedin_cosmic_2014}, CoDa \protect\citep{ocvirk_cosmic_2016}, Illustris \protect\citep{vogelsberger_introducing_2014}, Renaissance \protect\citep{barrow_first_2017}, the \protect\cite{katz_interpreting_2017} simulations, SPHINX \protect\citep{rosdahl_sphinx_2018}, and \bluetides\\protect\citep{feng_bluetides_2016}. We also show \flares\with the total resimulated high-resolution volume, as well as a vertical line showing the representative volume, given by that of the parent box. There is a strong negative correlation for periodic volumes between the volume that can be simulated and the resolution that can be achieved. The resimulation approach, with appropriate weighting, allows us to extend the volume axis significantly.}

359 — 2004.07443

\caption{Qualitative comparison of segmentation results for six representative test cases. The left three columns show COVID-19 cases, the right three columns show COPD cases. From top to bottom: input image, 3DU-Net baseline, PDV-Net, FRV-Net, the proposed RTSU-Net, and the segmentation reference. \textcolor{RULC}{\rule{.2cm}{.2cm}} right upper, \textcolor{RMLC}{\rule{.2cm}{.2cm}} right middle, \textcolor{RLLC}{\rule{.2cm}{.2cm}} right lower, \textcolor{LULC}{\rule{.2cm}{.2cm}} left upper, \textcolor{LLLC}{\rule{.2cm}{.2cm}} left lower lobes.}

360 — 2004.07453

\caption{\label{fig:test_results} Test accuracy and processing time of our approach ({\color{blue}{blue squares}}, each point representing a different confidence threshold), our standard baseline (std., {\color{DarkGreen}{green diamond}}), efficient baselines (eff., {\color{red}{red dots}}), and oracle baseline ({\color{orange}{orange star}}). Left and higher is better. Our method presents similar or better speed/accuracy tradeoff in almost all cases. }

\caption{\label{fig:distilled-results} Experiments with tinyBERT. Our method (\textcolor{lightblue}{light-blue pentagons}) provides a better speed-accuracy tradeoff compared to the standard (\textcolor{lightgreen}{light-green diamonds}) and efficient (\textcolor{lightred}{small light-red dots}) baselines. For comparison, we also show the results of our method (\textcolor{blue}{blue squares}) and our efficient baselines (\textcolor{red}{large red dots}) with BERT-large. Our method applied to BERT-large provides the overall best tradeoff. }

\caption{\label{fig:val_results} Validation accuracy and processing time of our approach ({\color{blue}{blue line}}) and our standard baseline (std., {\color{DarkGreen}{green diamond}}), our efficient baselines (eff., {\color{red}{red dots}}) and our oracle ({\color{orange}{orange star}}). Left and higher is better.}

361 — 2004.07485

\caption{\textbf{Joint training with memory features is restricted by limited hardware resource.} In this minor experiment, we take a 32-frame video clip with $256\times 340$ resolution as input. The backbone is ResNet-50. During joint training (\textcolor{yellow}{yellow} line), rapidly growing GPU memory and computation time restricted the length of memory features to be very small value (8 in this experiment). With larger input or deeper backbone, this problem will be more serious. Our method (\textcolor{cyan}{cyan} line) doesn't have such problem. }

362 — 2004.07532

\caption{Fake detection performance results in terms of EER (\%) and AUC (\%) over the final evaluation datasets. Two approaches are considered as input to the fake detection systems: \textit{i)} selecting the entire face (\textit{Face}), and \textit{ii)} selecting specific facial regions (\textit{Eyes}, \textit{Nose}, \textit{Mouth}, \textit{Rest}). \nth{1} generation databases: UADFV and FaceForensic++. \nth{2} generation databases: Celeb-DF and DFDC. For each database, we remark in \textbf{bold} the best fake detection results, and in {\color[HTML]{00BFFF}blue} and {\color[HTML]{ED872D}orange} the facial regions that provide the {\color[HTML]{00BFFF}best} and {\color[HTML]{ED872D}worst} results.\vspace{3mm}}

363 — 2004.07657

\caption{\blue{Dynamics of AUC performance over training epochs: The baseline shows high fluctuations while our approach not only shows stability across various epochs but also yields higher AUC.}}

\caption{Our proposed \blue{OGNet} framework. Phase one is the baseline training, carried out to obtain a reasonably trained state of $\mathcal{G}$ and $\mathcal{D}$. A frozen low epoch state ($\mathcal{G}^{old}$) of the generator is stored during this training. In phase two, only $\mathcal{D}$ is updated to distinguish between good and bad quality reconstructions. \new{Good quality examples correspond to real training images as well as the images reconstructed using $\mathcal{G}$ while bad quality examples are obtained using $\mathcal{G}^{old}$ as well as the proposed pseudo-anomaly module. This module assists $\mathcal{D}$ to learn the underlying patterns of anomalous input reconstructions. During test, inferences are carried out through $\mathcal{G}$ and $\mathcal{D}$ only and the output of $\mathcal{D}$ is considered as anomaly score. Best viewed in color.}}

\caption{\red{Example images from different stages of our framework. (a) Left to right: Original image ($X$), high quality reconstructed ($\hat{X}$), low quality reconstructed ($\hat{X}^{low}$), pseudo anomaly ($\hat{\bar{X}} $), pseudo anomaly reconstructed ($\hat{X}^{pseudo}$). (b) Left column shows outlier / anomaly examples whereas right column shows respective regenerated outputs $\mathcal{G}$($X$)}.}

\caption{\red{AUC and $F_1$ score performance comparison of our framework on Caltech-256 \cite{griffin2007caltech} with the other state of the art methods. Following the existing work \cite{you2017provable_novelty}, each subgroup of rows from top to bottom shows evaluation scores on inliers coming from 1, 3, and 5 \green{different random} classes respectively (best performance as bold and second best as underlined).}}

\caption{\red{$F_1$ score results on MNIST dataset. Compared to state-of-the-art, our method retains superior performance even with an increased percentage of outliers at test time.} }

364 — 2004.07662

\caption{Role of gender on the battery charging threshold, for user groups with different ages and occupations, respectively, from which we can find that females charge their mobile phones significantly earlier than males (\textcolor{MyBlue}{$p=0.05$}).}

365 — 2004.07676

\caption{Pair-plot showing the score distribution for real (orange \textcolor{mat_orange}{$\bullet$}) and fake (blue \textcolor{mat_blue}{$\bullet$}) samples for each pair of networks on \gls{ff++} (a) and \gls{dfdc} (b) datasets.}

366 — 2004.07703

\caption{We propose a two-step self-supervised domain adaptation technique for semantic segmentation. Previous works solely {\color{blue} adapt} the segmentation model from the source domain to the target domain. Our work also consider {\color{red} adapting} from the clean map to the noisy map within the target domain.}

367 — 2004.07775

\caption{\label{fig:1} Schematic of the system and background of data. a) Diagram of the interfacial stress rheometer including oil/water contact line pinned at the two glass walls and the axially displacing needle. b) A top view schematic with a description of coordinates and the idealized displacement field, $\delta(y,t)$. Also shown is an image of the particle micro-structure representing about 1/24th of a total image. The vertical edge is 250$\mu m$ long. Crystallized grain clusters may be observed, surrounded by expansive amorphous boundaries. c) Storage, G', and loss modulus, G'', as a function of strain amplitude $\gamma_0$, both showing inflection at the classic yield point of $\sim$3\% (\color{red}\textbf{- - -}\color{black}). d) Characterization of the fraction of particles displaying irreversible and reversible non-affine events. The total number of reversible and irreversible events diverge at the yield point.}

\caption{\label{fig:4} Above yield trajectories ($\gamma_0=$6.8\%). Trajectories are black with a red plus, (\color{red}+\color{black}), at the beginning of the cycle. For reference, local displacement is offset above in blue (\color{blue}---\color{black}). a-b) Trajectories dominated by mechanical noise. c) A low area example of a trajectory with arc length equal to the expected displacement ($L_N=0$). d) A high area example of a trajectory with $L_N=1.0$.}

\caption{\label{fig:5} Inverse normalized arc-lengths ($1/L_N$) and enclosed area (color bar) compared with the mean particle position between the needle and the wall. Irreversible particles are shown in black. a) Below yield the system is dominated by reversibly elastic, and irreversibly plastic particle trajectories. All trajectories have $1/L_N<1.0$, indicating that trajectories are long relative to the displacement field. This means they are dominated by mechanical noise. b) Near yield, plastically reversible particles emerge near the needle. Overall the $1/L_N$ shifts nearer to one (especially the plastically reversible particles) indicating a transition to low mechanical noise relative to affine displacements. c) Particles in the middle of the channel are exclusively plastically irreversible. Plastically reversible particles reach $1/L_N \sim 1.0$ indicating that these trajectories are completely dominated by background displacement, while simultaneously enclosing high area. It is worth noting that not a single particle is observed to have a $1/L_N >> 1.0$. }{\includegraphics{5}}

368 — 2004.07817

\caption{\fontsize{9}{9}\selectfont (a) Typical dimple shapes for different impact conditions in the multi-dimple regime corresponding to the circled red numbers in (b). Multi-pinch-offs dimple: \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{1}}} $D=1.16$ mm, $U=1.7$ m/s, $Fr=259$, $We= 493$; \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{2}}} $D=1.02$ mm, $U=2.1$ m/s, $Fr=421$, $We= 617$; \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{3}}} $D=0.93$ mm, $U=2.05$ m/s, $Fr=463$, $We= 560$ and singular telescopic dimple: \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{4}}} $D=0.73$ mm, $U=2.38$ m/s, $Fr=792$, $We= 593$. The scale bars are 100 $\mu$m long. (b) Characterization of the dimples and jets in {\it Fr-We} space for drop impacts of immiscible liquids. The two dash curves are the bounds of the regular bubble entrapment measured by \cite{Ref8,Ref9}, for identical liquids. % using the best fits from \cite{oguz1990bubble}. The two solid curves mark the bubble entrapment region based on our study. % The meaning of the large symbols are given graphically in Fig. 5. The symbols correspond to different dimple shapes: (\textcolor{mypink1}{$\ovoid$}) no pinch-off shallow dimple; (\textcolor{mypink2}{$\triangle$}) dimple pinch-off with bubble going out with jet; (\textcolor{black}{$\triangledown$}) tiny bubble pinched off near secondary critical pinch-off; (\textcolor{black}{$\largewhitestar$}) singular telescopic dimple; (\textcolor{black}{$\triangledown$}) pinched-off bubble entrapped in PP1 drop; (\textcolor{blue}{$\boxvoid$}) liquid column break-up without dimple pinch-off; (\textcolor{green}{$\Diamond$}) water entrapped in PP1 drop without pinch-off. The dashed cyan lines mark the region of multi-dimples.}

\caption{\fontsize{9}{9}\selectfont (a) Logarithmic scaling of the dimple radius vs time before pinch-off. There is a transition of power-law exponents from 2/3 to 0.55 closest to the pinch-off. The background shading marks the validity of each, with the arrow indicating the approximate cross-over time $t_c$. The data is taken from two video clips spanning time-scales from 100 ns to 200 $\mu$s before pinch-off. %The solid lines are the power law, black circle is experimental data and the point lines indicate where the transition starts and ends approximately. %The inset shows the logarithmic scaling at the vicinity of the final pinch-off. %Both the painted areas show a prominent difference of the power law. (b) It shows how $t_c$ normalized by the impact time $D/U$ changes with $We$, for dimple pinch-off (\textcolor{mypink2}{$\triangle$} \&$\triangledown$) and singular jets (\textcolor{red}{$\times$}, \textcolor{red}{$\plus$} \&\textcolor{red}{$\largewhitestar$}). The vertical arrows indicate these are lower bounds, as for these cases the dynamics remain inertial for the entire video clip.}

\caption{\fontsize{9}{9}\selectfont (a) Typical dimple shapes for different impact conditions in the multi-dimple regime corresponding to the circled red numbers in (b). Multi-pinch-offs dimple: \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{1}}} $D=1.16$ mm, $U=1.7$ m/s, $Fr=259$, $We= 493$; \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{2}}} $D=1.02$ mm, $U=2.1$ m/s, $Fr=421$, $We= 617$; \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{3}}} $D=0.93$ mm, $U=2.05$ m/s, $Fr=463$, $We= 560$ and singular telescopic dimple: \raisebox{.5pt}{\textcircled{\raisebox{-.9pt}{4}}} $D=0.73$ mm, $U=2.38$ m/s, $Fr=792$, $We= 593$. The scale bars are 100 $\mu$m long. (b) Characterization of the dimples and jets in {\it Fr-We} space for drop impacts of immiscible liquids. The two dash curves are the bounds of the regular bubble entrapment measured by \cite{Ref8,Ref9}, for identical liquids. % using the best fits from \cite{oguz1990bubble}. The two solid curves mark the bubble entrapment region based on our study. % The meaning of the large symbols are given graphically in Fig. 5. The symbols correspond to different dimple shapes: (\textcolor{mypink1}{$\ovoid$}) no pinch-off shallow dimple; (\textcolor{mypink2}{$\triangle$}) dimple pinch-off with bubble going out with jet; (\textcolor{black}{$\triangledown$}) tiny bubble pinched off near secondary critical pinch-off; (\textcolor{black}{$\largewhitestar$}) singular telescopic dimple; (\textcolor{black}{$\triangledown$}) pinched-off bubble entrapped in PP1 drop; (\textcolor{blue}{$\boxvoid$}) liquid column break-up without dimple pinch-off; (\textcolor{green}{$\Diamond$}) water entrapped in PP1 drop without pinch-off. The dashed cyan lines mark the region of multi-dimples.}

\caption{\fontsize{9}{9}\selectfont (a) Logarithmic scaling of the dimple radius vs time before pinch-off. There is a transition of power-law exponents from 2/3 to 0.55 closest to the pinch-off. The background shading marks the validity of each, with the arrow indicating the approximate cross-over time $t_c$. The data is taken from two video clips spanning time-scales from 100 ns to 200 $\mu$s before pinch-off. %The solid lines are the power law, black circle is experimental data and the point lines indicate where the transition starts and ends approximately. %The inset shows the logarithmic scaling at the vicinity of the final pinch-off. %Both the painted areas show a prominent difference of the power law. (b) It shows how $t_c$ normalized by the impact time $D/U$ changes with $We$, for dimple pinch-off (\textcolor{mypink2}{$\triangle$} \&$\triangledown$) and singular jets (\textcolor{red}{$\times$}, \textcolor{red}{$\plus$} \&\textcolor{red}{$\largewhitestar$}). The vertical arrows indicate these are lower bounds, as for these cases the dynamics remain inertial for the entire video clip.}

369 — 2004.07878

\caption{\red Illustration of the contour levels for the implausibility function around a target level (solid red line) of the Branin funcion. The GP emulator, trained with the samples shown as dots, provides the dashed lines as its prediction. The left panel shows the implausibility measure using a deterministic emulator, the GP mean. The right panel shows the probability of implausibility derived from the stochastic emulator.\nc}

\caption[History matching waves for Franke's function]{\red Results for Franke's function in the history matching setting. In \cref{subfig:franke_cntr}, contour levels of the probability of implausibility are shown with lighter shades. Pink dots represent training runs used for the simulator at wave $t$, and orange diamonds new points identified in NROY space with good predicted improvement performance. Each subpanel in \cref{subfig:franke_nroy} shows the samples in NROY space, with those satisfying a good predicted improvement in darker colours. The points selected to run the simulator to improve the GP emulator are shown as orange diamonds. \nc}

370 — 2004.08110

\caption{Regressions \textcolor{cyan}{The linear regression at 5 Mbps creates a negative value, which does not work well with logaritmic axes. If we use a polynomial regression the line is identical to the average. }}

371 — 2004.08280

\caption{Additional \aastex\symbols}

372 — 2004.08361

\caption{We train a classifier to predict the gender (\OPGENDER) of the person that text is addressed to (\COMMENTTEXT), while demoting features that are predictive of gender but not predictive of bias. Posts with similar content are \textcolor{matchcolor}{matched} through propensity scores, and unmatched posts are discarded. Latent traits of the addressee (e.g., nationality) are \textcolor{demotecolor}{demoted} through an adversarial objective. Overtly gendered language (``Bro'') is \textcolor{subscolor}{substituted}. Comments that are indicative of gender despite these restrictions are likely to contain \textcolor{biascolor}{bias}.}

373 — 2004.08400

\caption{In the top row, form left to right: the HST F814W band of the fours arcs, indicated with red boxes; the narrow-band MUSE data at 1914\AA\and\ciiired\rest-frame, with the emission from the{\tt Tr} (transient) marked with red circles. Black arrows show the arcs in which such emission should simultaneously lie, if it was not a transient. In the bottom the zoomed regions of arcs I, II and III are shown at three different epochs: future, past and present. The transient {\tt Tr} is indicated with red arrows and appears only in II. The three knots {\tt A}, {\tt B} and {\tt C}, are also labelled.}

\caption{A collection of the VLT/MUSE, X-Shooter and ESPRESSO spectroscopic observations in the ultraviolet portion of the spectrum. In the main panel, the MUSE spectrum at R=2500 of the star cluster (knot A, indicated with the red line) and the transient ({\tt Tr}, indicated with the black line) are shown, with the insets blowing up the regions around the fluorescence lines due to H~\lya\pumping (see text for details). The corresponding two-dimensional X-Shooter spectra are also shown with resolution R=11400. The one dimensional spectrum at R=70000 obtained with ESPRESSO at the focus of the 4 VLT/UTs is shown in the top-left (the green/red line corresponds to R=70000/7000). The\feiii\UV34 1914\AA\emission is present in all spectra of{\tt Tr}.}

\caption{Additional atomic transitions are shown, extracted from X-Shooter VIS and NIR arms. On the left side the $11$ arcsec X-Shooter slit is shown. For the knots A (the star cluster) and B, elements like Mg, Ne, H$\beta$, O and H$\alpha$ are detected in emission. For the transient ({\tt Tr}) only \neiii\is clearly identified, despite the seeing limited X-Shooter observations, while the other lines are not identified.\neiii\is marked with a blue ellipse. On the left the HST/F814W image with the X-Shooter slit are shown.}

374 — 2004.08408

\caption{Gallery of six galaxy cluster volumes from \threehundred project at $z=0$. Shown are projected gas distributions within 5$R_{200}$ for a range of mass and "relaxedness", a measure for the dynamical state of the central region of a cluster. It is quantified by combining the fraction of mass in sub-halos, the centre-of-mass offset and the virial ratio (see section \ref{subsec:relaxedness} and figure \ref{fig:relaxedness}). The circles indicates $R_{200}$ of each cluster; mass and relaxedness values of the examples are printed in each left upper corner. Clusters with high relaxedness values are more relaxed.}

\caption{Galaxy cluster mass as a function of relaxedness for 324 clusters in \threehundred simulations. Clusters are divided into unrelaxed ($R$<1) and relaxed ($R$>1) populations (see text for description). High mass clusters are usually unrelaxed; they are dynamically active e.g., through accreting matter from their surroundings. Low mass clusters show a wide range of $R$. The black error bands shows the average of all simulated clusters in our sample and is neither dynamically active nor relaxed. The diagonal dash-line marks the approximate location of the envelope of the point distribution discussed in the text. The insert shows the histograms of all clusters (light shade), unrelaxed clusters (medium shade) and relaxed clusters (dark shade). Dashed lines in the insert indicate the median values and show the preference for low (high) mass clusters to be relaxed (unrelaxed).}

\caption{Example filament network (yellow lines) of the central node of cluster 0001 from \threehundred project, based on the geometric three-dimensional ridge extractor \disperse\(see text for details). The figure demonstrates the filament extraction of our reference filament network using smoothed gas particles. Seen is the projected gas distribution at $z = 0$ within a $15\hMpc$ sphere of the central cluster; $R_{200}$ is shown as a white circle.}

\caption{The figure highlights the impact the mass-weighting of halos has on the extraction of filaments. It shows the Delaunay tessellation, used by \disperse\to identify filaments, in a slice of thickness 75~kpc around the centre of one cluster. Images are equally scaled. Units of the colour bar are arbitrary, but help to compare the two panels. Left: unweighted tessellation: all halos are equally weighted. Right: mass-weighted tessellation: the halos are weighted by their mass. We do this to achieve a closer resemblance to the gas distribution, our reference in this experiment (see Sec.\ref{subsec:fil_WEAVE} for details).}

\caption{One (random) example cluster of \threehundred Project depicted at four different angles. Each pair shows the cluster in gas particles (left) and \disperse\filament network with associated mock galaxies (right). The filament network was extracted from the distribution of mock galaxies with$M_*>3\times 10^{9} \Msun$.}

\caption{Percentage of mock galaxies in gas filaments ( $D_{\rm{skel}}<0.7 \hMpc$) as a function of radius for 324 clusters from \threehundred project at $z = 0$ (black lines and solid mean), normalised by $R_{200}$. Grey lines show the percentage of random associations to filaments. Lines converge inside $R_{200}$ where filaments are closer together than they are thick. The corrected percentage of galaxies in filaments is plotted in the lower panel. The percentage of galaxies in filaments increases from $\sim13$\% at the edge of the box to $\sim21$\% at $\sim1.5\,R_{200}$.}

\caption{Distribution of galaxies in simulated cluster of \threehundred. }

375 — 2004.08584

\caption{Illustration of the asymptotic tail behaviour (red line) of the correlation curve and its builing blocks under a $m=3$ component mixture model: (a) $\rho(y)$, (b) $\beta(y)$, (c) $\sigma^2(y)$, and (d) $p_k^*(y)$. The mixture model has parameters $\left(\sigma_1, \sigma_2, \sigma_3 \right) = \left(2,4,6 \right)$, $\left(\mu_1, \mu_2, \mu_3 \right)= \left(1,2,4 \right)$, $\left( \rho_1, \rho_2, \rho_3 \right) = \left(0.7, 0.8, 0.6 \right)$ and $\left(p_1, p_2, p_3 \right)= (0.3, 0.3, 0.4)$. \label{Fig:asymptotic}} \end{figure} Identical correlations in both tails may seem unmotivated for family data. Still, within the data range the correlation curve will be determined by all of the mixture components, in accordance with \eqref{correlationcurve}, which allows for different behaviour in the tails. Case II, on the other hand, allows for different asymptotic correlation in the left and right tail, with the differences being the use of $\rho_n$ versus $\rho_m$ in \eqref{eq:asymrho}. Theorem~\ref{asymptotic} is further illustrated in Figure \ref{Fig:asymptotic} showing the limiting behaviour of $\beta(y)$, $\sigma^2(y)$, and $\rho(y)$ for a three-component mixture under Case~I. Note that the limiting correlation satisfies $\tilde\rho_3<\min(\rho_1,\rho_2,\rho_3)$ for the parameter values used in the figure. This is counter-intuitive because the posterior probability $p_3^{*}(y)$ approaches~1 in the tails (upper left panel), but still the limiting correlation is not simply $\rho_3$. The peak in correlation around $\mu_2=2$ is reasonable as the second component has the highest $\rho$. \subsubsection{The case of equal $\sigma_k$'s} It is worth studying the special case that $\sigma_1=\sigma_2=\cdots=\sigma_m$, with their common value denoted by $\sigma_0$. This is Case~II of Theorem~\ref{asymptotic} with $q=1$. From~\eqref{Globalsigma_mean} we get $\sigma^2=\sigma_0^2+\sigma_\mu^2$, where \begin{equation} \sigma_\mu^2 = \sum_{k=1}^{m}p_{k}(\mu_{k}-\mu)^{2}, \label{def:sigma2_mu} \end{equation} which is the variance due to differences in locations of mixture components. Recall the convention that the mixture components are ordered such that $\mu_1<\mu_2<\cdots<\mu_m$. We are now ready to state the following corollary to Theorem~\ref{asymptotic}. \begin{corollary}\label{prop1} When $\sigma_1=\cdots=\sigma_m$ the asymptotic behavior of $\rho(y)$, given by~\eqref{correlationcurve}, is \begin{equation} \lim_{y\rightarrow-\infty}\rho(y) =\rho_1\sqrt\frac{1+\gamma}{1+\gamma\rho_1^2} \quad\hbox{and}\quad \lim_{y\rightarrow\infty}\rho(y) =\rho_m\sqrt\frac{1+\gamma}{1+\gamma\rho_m^2}, \end{equation} where $\gamma=\sigma_\mu^2/\sigma_0^2$ is the ratio of between and within-component variance in the Gaussian mixture. \end{corollary} The limiting correlations always exceed (in absolute value) $\rho_1$ and $\rho_m$, respectively. When $\gamma\rightarrow\infty$, i.e.~the mixture components gets increasingly spread out, both limits approach 1 in absolute value. \subsection{Estimation}\label{sec:est} In this section we explain how to fit Gaussian mixtures to family data. On one hand, they are fully parametric distributions, which can be exploited in estimation and inference. On the other hand, allowing the number of mixture components $m$ to grow, mixtures become increasingly flexible, which allows us to view them also as nonparametric tools. In particular, Gaussian mixtures seem well suited to model small perturbations from Gaussianity. First, let $\boldsymbol{y}=(y_1, y_2, y_3)$ denote the trait vector for the mother-father-child trio, which is assumed to have the following mixture density: \begin{equation*} \sum_{k=1}^m p_k \phi_3\!\left(\boldsymbol{y}; \boldsymbol{\mu}_{k}, \boldsymbol\Sigma_{k}\right). \end{equation*} Here $\boldsymbol{\mu_k}$, $\boldsymbol{\Sigma_{k}}$ are structured in the following way: \begin{eqnarray} \boldsymbol{\mu}_k = (\mu_k, \mu_k, \mu_k),\quad \boldsymbol{\Sigma}_{k} = \begin{pmatrix} \sigma^2_{k} & \sigma^2_{k}\rho_{k}^{(MF)} & \sigma^2_{k}\rho_{k}^{(MC)}\\ \sigma^2_{k}\rho_{k}^{(MF)} & \sigma^2_{k} & \sigma^2\rho_{k}^{(FC)}\\ \sigma^2_{k}\rho_{k}^{(MC)} & \sigma^2_{k}\rho_{k}^{(FC)} & \sigma^2_{k} \end{pmatrix}, \label{eq:3mixture} \end{eqnarray} where we use superscripts on the $\rho$'s to denote relationship. Integrating the above joint density with respect to any one of the three family members ($y_1$, $y_2$, or $y_3$) will result in the bivariate Gaussian mixture (\ref{eq:gaussianmixture}) from which we defined the correlation curve. The reason for performing joint estimation, rather than pairwise, is to optimally utilize the information contained in mother-father-child trios. Note that the three marginals are identical by construction, although the joint distribution is not exchangeable unless $\rho_{k}^{(MF)}=\rho_{k}^{(MC)}=\rho_{k}^{(FC)}$ for $k=1,\ldots,m$. Given $n$ such trios, the parameters ($\mu_k$, $\sigma_k$, $\rho_k$, $p_k$) can be estimated by maximizing the following log-likelihood: \begin{equation} \log L = \sum_{i=1}^{n} \log \left[ \sum_{k=1}^m p_k \phi_3\!(\boldsymbol{y}_{i}; \boldsymbol{\mu}_k, \Sigma_{k}) \right]. \label{def:loglik_trio} \end{equation} \noindent Once the parameters are estimated, the heritability curve $a^2(y)$ can be obtained via the correlation curves as described in Definition~\ref{def:heritability_curve_trios}. For twins, consider first a dizygotic pair with trait vector $\boldsymbol y = (y_{1}, y_{2})$. The likelihood contribution from $n^{(MZ)}$ such pairs is: \begin{equation} \log L^{(MZ)} = \sum_{i=1}^{n^{(MZ)}}\log \sum_{k=1}^m p_k \phi_2\!(\boldsymbol{y}_{i}; \boldsymbol{\mu}_k, \boldsymbol{\Sigma}_{k}), \label{def:loglik_twins} \end{equation} where $\boldsymbol{\mu}_k$ and $\boldsymbol{\Sigma}_{k}$ are structured as in~\eqref{def:mu_Sigma}. The likelihood contribution of $n^{(DZ)}$ dizygotic twin pairs, $\log L^{(DZ)}$, is defined analogously using the same number $m$ of mixture components. The only parameters that differ between the MZ and DZ cases are the correlation parameters $\rho_k$ in~\eqref{def:mu_Sigma}. The fact that $p_k$, $\mu_k$, and $\sigma_k$ are shared across the MZ and DZ mixtures, calls for using a combined log-likelihood $\log L= \log L^{(MZ)}+\log L^{(DZ)}$. Once the parameters are estimated, the heritability curve $a^2(y)$ can be obtained via the correlation curves as described in Definition~\ref{def:heritability_curve_twins}. Both of the log-likelihoods~\eqref{def:loglik_trio} and~\eqref{def:loglik_twins} will be maximized using the R-package TMB~\citep{kristensen2016tmb}. In TMB the (negative) log-likelihood is implemented as a C++ function, which is compiled and linked into the R session, where the standard function minimizer {\tt nlminb} is employed. In addition, TMB calculates the gradient and Hessian (1st and 2nd order derivatives) of the log-likelihood by Automatic Differentiation~\citep{kristensen2016tmb}. Such derivative information can substantially speed up the minimizer and make it more robust. Finally, TMB uses derivatives to calculate the approximate standard deviation of any interest quantity, as a function of the parameters, using the delta method. This feature of TMB will be used to estimate pointwise confidence intervals of correlation and heritability curves. For the purpose of selecting the number of mixture components, $m$, we calculate both of the criteria $\text{AIC}=-2\log(L)+2Q$ and $\text{BIC}=-2\log(L)+\log(n)Q$ for each candidate model, where $Q$ is the number of parameters and $\log(L)$ is obtained either from~\eqref{def:loglik_trio} or ~\eqref{def:loglik_twins}. Contributing to $Q$ is the total number of $p_k$'s, $\mu_k$'s, $\sigma_k$'s, and $\rho_k$'s, but due to the constraint $\sum_{k=1}^{m}p_k=1$ there are only $m-1$ free $p_k$'s. Hence, for the trio likelihood~\eqref{def:loglik_trio} we have $Q=6m-1$, while for the twin likelihood~\eqref{def:loglik_twins}, with different $\rho_k$ for MZ and DZ twins, we have $Q=5m-1$. It is clear that for $\log(n)>2$, BIC will be more conservative than AIC, in the sense of favoring smaller values of $m$. As will be shown below, the correlation curve tends to be more unstable (fluctuating) for larger values of~$m$. For this reason we will use BIC as our model selection criterion, but we will still report AIC as a comparison. \section{Applications} \label{section:Appl} \subsection{BMI of twins} We use the ``twinData'' dataset found in the R-package ``OpenMx'' \citep{neale2016openmx}. As our response, we take BMI measurements (around age 18) for $n^{(MZ)}=534$ monozygotic and $n^{(DZ)}=328$ dizygotic female-female twin pairs. Table~\ref{Table: twin data} compares models in the range $1\leq m\leq 5$, and it is seen that the pure bivariate Gaussian model ($m=1$) fits considerably worse than any of the mixture models ($m>1$). The lowest AIC and BIC values occur for $m=5$ and $m=2$, respectively, but it is seen that AIC is almost indecisive between models with $m>1$. Due to its heavier penalization, $\log\left(n^{(MZ)}+n^{(DZ)}\right)=\log(862)=6.8$, of the number of parameters, BIC more clearly favours $m=2$. According to our decision to base model selection on BIC, we choose the model with $m=2$. \begin{table}[!htp] \centering \begin{tabular}{ccrr} \hline $m$ & no. of parameters & AIC & BIC \\ \hline 1 & 4 & 259.4 & 227.6 \\ 2 & 9 & 8.0 & 0 \\ 3 & 14 & 2.8 & 18.5 \\ 4 & 19 & 6.5 & 46.0 \\ 5 & 24 & 0 & 63.3 \\ \hline \end{tabular} \caption{Model comparison for the twin BMI data, where $m$ is the number of mixture components and $5m-1$ is the number of parameters in the model. AIC and BIC values are relative to the best fitting models (respectively, $m=5$ and $m=2$). \label{Table: twin data}} \end{table} Table~\ref{Table:estimation2} shows the parameter estimates. The first mixture component is dominating with $p_1=0.81$. For MZ twins there is high correlation ($\rho_k$) within in each of the two components, while for DZ twins $\rho_2$ is close to zero. The (global) correlations for the mixtures as a whole, matches exactly the empirical Pearson correlations, which are $0.78$ (MZ) and $0.30$ (DZ), respectively. \begin{table}[!htp] \centering \begin{tabular}{crrr} \hline Parameters & $k=1$ & $k=2$ & Global \\ \hline $\mu_k$ & 21.20 & 22.20 & 21.39 \\ $\sigma_k$ & 0.63 & 1.26 & 0.88 \\ $\rho^{(MZ)}_k$ & 0.75 & 0.70 & 0.78 \\ $\rho^{(DZ)}_k$ & 0.28 & $-$0.04 & 0.30 \\ $p_k$ & 0.81 & 0.19 & \\ \hline \end{tabular}\caption{Parameter estimates for the chosen Gaussian mixture ($m=2$) for the twin data. The mixture components are ordered according to the value of $\sigma_k$. The global quantities, $\mu$, $\sigma$, $\rho^{(MZ)}$ and $\rho^{(DZ)}$ are calculated from~\eqref{Globalsigma_mean}. } \label{Table:estimation2} \end{table} \begin{figure}[!htp] \centering \includegraphics{rho_y_twins.pdf} \caption{Estimated monozygotic (MZ) and dizygotic (DZ) twins correlation curves for the BMI data, with pointwise 95\% confidence intervals (in grey). The dashed lines display the (overall) Pearson correlation within MZ and DZ twin pairs, respectively. The vertical green lines represent the $0.05$ and $0.95$ quantiles of the data.} \label{fig:correlation_twins} \end{figure} Figure~\ref{fig:correlation_twins} displays the estimated correlation curve for both MZ and DZ twins, using the parameter values from Table~\ref{Table:estimation2}. Also shown are 95\% confidence intervals calculated using the delta method. Both correlation curves are fairly flat within the center 90\% data range (represented by the two vertical green bars), while they both drop for low and high BMI. This yields (Figure~\ref{fig:herandenv_twins}) an estimated heritability curve $a^2(y)$ that does not differ significantly (except maybe around $y=22.3$) from the classical heritability coefficient~\eqref{eq:ADE_moment}. The TMB (R and C++) code used to produce the parameter estimates in Table~\ref{Table:estimation2} plots in Figure~\ref{fig:herandenv_twins} is available from \url{https://github.com/skaug/Supplementary}. \begin{figure}[ht] \centering \includegraphics{curves_plot_twins.pdf} \caption{Estimated dominant genetic component $d^2(y)$, heritability curve $a^2(y)$, and environment curve $c^2(y)$ for the BMI data under the ADE model (Definition~\ref{def:heritability_curve_twins}), with pointwise 95\% confidence intervals (in grey). The red dashed lines display the classical estimates of dominant component, heritability, and environment, given by~\eqref{eq:ADE_moment}. The vertical green lines represent the $0.05$ and $0.95$ quantile in data.} \label{fig:herandenv_twins} \end{figure} \subsection{Birth weight of family trios}\label{subsec:BW} To illustrate the family trio analyses, we used birth weights of $n=81,144$ complete mother--father--child trios. The data originally derived from the Medical Birth Registry of Norway, where the birth weight variables were added some random noise and rounded off to guarantee anonymity. The same data with some additional restrictions on parity, plurality, etc. were previously described and analyzed elsewhere~\citep{Magnus01}. The data were restricted to all births (mother, father, and child) taking place within the years 1967--1998. Due to Norwegian ethical and legal restrictions, Norwegian data used in this study are available upon request to the Medical Birth Registry of Norway, the Norwegian Institute of Public Health. URL: https://www.fhi.no/hn/helseregistre-og-registre/mfr. Requests for data access can be directed to Datatilgang@fhi.no<mailto:Datatilgang@fhi.no>. We did not have information about the gender of the child; hence, we performed a standardization of the data. We assumed a $50 \% $ sex ratio in the offspring, and introduced the quantity $D \triangleq \frac{1}{2} \left(\bar{y}_M - \bar{y}_F \right)$, where $\bar{y}_M$ is the mean of the birth weights of mothers, and $\bar{y}_F$ is the mean of the birth weights of fathers. We hence added $D$ to the father's weight and subtracted it to the mother's weight; in this way, the average among mothers and fathers is the same, and close (25g deviation) to the average in the offspring. This standardization is of little consequence to the end result. Figure~\ref{fig:scatmat} summarizes the marginal and bivariate properties of the data. The marginal distributions are close to a Gaussian shape, but the left tail of the child birth weights is slightly heavier than the right tail. As suggested in the Introduction, this may be indicative of strong but rare factors dominating in producing the lowest birth weigths, which is what we will confirm in our analyses of local heritability below. The scatter plots are roughly symmetric around the identity line, which is consistent with the exchangeability assumption made in Section~\ref{section:corr}. It should be noted, however, that the left hand tail of the marginal distributions is somewhat heavier in the children than in the parents; this is likely because parents are selected by the fact that they have children; it is known that individuals born with low birth weight have somewhat reduced fertility later in life. We have, however, not taken this into consideration in our model. From the non-parametric regression (blue curve), it is clear that there is no association between mother and father, which is reflected in the low Pearson correlation of $0.0209$. For the two relationships involving the child, the non-parametric regression curve indicates a non-linear relationship, particularly for mother-child. For birth weights less than 3000g there seems to be a low association, while for larger birth weights the association is increasing. The Gaussian mixture (\ref{eq:gaussianmixture}) was fit by maximum likelihood for $m=1,\ldots,7$. We computed both AIC and BIC values for this model. According to the BIC criterion, the best fitting mixture has $m=4$ components (see Table~\ref{Table:trios_data}). Parameters estimates for this model are given in Table~\ref{Table:estimation4}. Figure \ref{fig:ellipses} shows the underlying mother-child pairs, overlaid by the five mixture components. \begin{figure}[!htp] \centering \includegraphics{Ellipses_m4_ACE.pdf} \caption{Birth weight (gram) of a random subset of $5000$ mother-child pairs taken from Figure~\protect\ref{fig:scatmat}. Also shown are 95\% level curves (ellipses) for each of the $m=4$ mixture components in Table~\protect\ref{Table:estimation4}, i.e.~each elipse include 95\% of the probability mass for that bivariate normal component. } \label{fig:ellipses} \end{figure} \begin{table}[ht] \centering \begin{tabular}{c c c c } \hline $m$ & no. parameters& $\Delta$ AIC &$\Delta$ BIC \\ \hline $1$ & $5$ & 14848 & 14749 \\ 2 & 11 & 1148 & 904.4 \\ 3 & 17 & 480.4 & 292.5 \\ 4 & 23 & 132.1 & 0 \\ 5 & 29 & 109.7 & 33.5 \\ 6 & 35 & 36.3 & 16.0 \\ 7 & 41 & 0 & 35.5 \\ \hline \end{tabular} \caption{Model comparison for family trios, where $m$ is the number of mixture components. The total number of (free) parameters is $6m-1$, counting all $p_k$, $\mu_k$, $\sigma_k$, $\rho_k^{(MC)}$, $\rho_k^{(FC)}$ and $\rho_k^{(MF)}$. AIC and BIC values are relative to the lowest one, represented in red. \label{Table:trios_data}} \end{table} The mother-child distribution is pear-shaped relative to a bivariate normal distribution, with more spread around the identity line ($y_1=y_2$) for small birth weights. The mixture model adapts to this shape by assigning negative $\rho_k$'s to its two components ($k=3,4$) with the smallest $\mu_k$. The remaining two components ($k=1,2$), which together constitute 87\% of the probability mass, form a bivariate distribution that is hard to distinguish visually from a Gaussian distribution. The estimates of global correlation for the mixture in~Table~\ref{Table:trios_data}, closely match the corresponding empirical Pearson correlations given in Figure~\ref{fig:scatmat} for MC, FC and MF pairs. %The marginal density~\eqref{eq:marginal_mixture} (not shown;xxx should we add it to fig?), which is the same for mother, father, and child, are shown in Figure~\ref{fig:scatmat}. It is seen to fit the empirical marginals fairly well, and to posses a heavier left hand tail. %characteristic of these data. \begin{table}[htp] \centering \begin{tabular}{crrrrr} \hline Parameters & $k=1$ & $k=2$ & $k=3$ & $k=4$ & Global \\ \hline $\mu_k$ & 3516 & 3687& 3093 & 2243 & 3493 \\ $\sigma_k$ & 440.5 & 572.9 & 690.5 & 1116 & 555.0 \\ $\rho^{(MC)}_k$ & 0.240 & 0.143 & $-$0.189 & $-$0.826 & 0.123 \\ $\rho^{(FC)}_k$ & 0.134 & 0.053 & $-$0.254 & $-$0.845 & 0.201 \\ $\rho^{(MF)}_k$ & $-$0.011 & $-$0.084 & $-$0.289 & 0.750 & 0.068 \\ $p_k$ & 0.636 & 0.231 & 0.126 & 0.007 & \\ \hline \end{tabular}\caption{Parameter estimates and standard deviations for the Gaussian mixture ($m=4$) fit to the mother–father–child trios. The mixture components are ordered according to the value of $\sigma_k$. The global quantities, $\mu$, $\sigma$, $\rho^{(MC)}$, $\rho^{(FC)}$ and $\rho^{(MF)}$ are calculated from~\eqref{Globalsigma_mean}. } \label{Table:estimation4} \end{table} Figure \ref{fig:correlations} shows the two estimated correlation curves $\rho^{(FC)}(y)$ and $\rho^{(MC)}(y)$, which are the components going into $a^2(y)$, $c^2(y)$, and $e^2(y)$, given respectively by \eqref{eq:a(y)_triplets}--\eqref{eq:tm(y)_triplets}. Overall, the Pearson correlation and the correlation curves for MF exceed those for FC. Both curves exceed their respective Pearson correlations in the center of the data, while they decrease for both low and high birth weights. The FC curve has its maximum somewhat to the left of the maximum of the MC curve. As a robustness check, we also computed the local Gaussian correlations ~\citep{Tjostheim2013} between mother and child as displayed in Figure \ref{fig:locgauss}. These exhibit the same behaviour as the correlation curve; large values in the center of the data which are decreasing towards both tails. Figure \ref{fig:herandenv} shows heritability and environment curves. The overall conclusion is that variation in birth weight is mostly attributable to environment, which was also seen in previous publications~\citep{Magnus01,lunde_genetic_2007,Gjessing08}, and is reflected in the classical measures of heritability $a^2=0.246$ and environment $c^2=0.754$, and the variation in the corresponding curves. \begin{figure}[H] \centering \includegraphics{rho_y_m4} \caption{Estimated mother-child (MC) and father-child (FC) correlation curves for the Norwegian Birth Registry data, with pointwise 95\% confidence intervals (in grey). The dashed lines display the (overall) Pearson correlation within MC and FC pairs, respectively. The vertical green lines represent the $0.05$ and $0.95$ quantiles of the data.} \label{fig:correlations} \end{figure} \begin{figure}[H] \centering \includegraphics[scale=1]{mcxy300.pdf} \caption{Estimated local Gaussian correlation between mother and child. Note that this correlation measure has two location arguments ($y_1$ and $y_2$). } \label{fig:locgauss} \end{figure} Recall that, under the assumed model \eqref{eq:ACE_MFC} the heritability curve $a^2(y)$ is completely determined by the FC correlation curve $\rho^{(FC)}(y)$. Since the FC correlation curve exceeds the Pearson FC correlation in the center of the data, the heritability curve also exceeds the classical heritability measure in the same region. \begin{figure}[ht] \centering \includegraphics{curves_plot.pdf} \caption{Estimated heritability curve $a^2(y)$, environment curve $c^2(y)$, and residual environment $e^2(y)$ for the Norwegian Birth Registry data under the ACE model (Definition~\ref{def:heritability_curve_trios}), with pointwise 95\% confidence intervals (in grey). The red dashed lines display the classical estimates of heritability and environment, i.e.~empirical versions of~\eqref{eq:ACE_MFC}. The vertical green lines represent the $0.05$ and $0.95$ quantiles of the data.} \label{fig:herandenv} \end{figure} \section{Discussion and conclusion} We have provided closed-form expressions for the correlation curve for exchangeable bivariate Gaussian mixtures. To our knowledge, this result is new and should be useful generally in situations where exchangeability can be assumed. Since differences in mean values may accounted for using a linear predictor like~\eqref{eq:multmean}, it is only exchangeability of the residuals, or the weaker condition~\eqref{def:exchangability}, that is required. In the context of our family data, the exchangeability assumption is rather reasonable for twin data. In nuclear families, it is less obvious that parents and children have the exact same marginal distribution even when using covariates to adjust for systematic generational differences. With our generational birth weight data, we observe that the left hand tail in the parental distribution is smaller than among the children. As discussed in Subsection~\ref{subsec:BW}, this may well be a selection phenomenon; somebody born with a very low birth weight is less likely to become a parent, and are thus possibly under-represented in our data file. For instance, increased mortality among the smallest newborns is thought to lead to a selection pressure on the birth weight distribution over generations~\citep{in_cavalli-sforza_genetics_1999}. A restriction of our model is that we have applied it only in situations with simple family structures where moment estimators of the heritability are explicit. In larger family structures, several pairwise relationships may provide information about the same heritability parameters. For instance, family trios with sibling data add the sibling correlation as a source of information~\citep{lunde_genetic_2007}. We will not discuss that issue further, but note that if pairwise correlation curves are estimated from larger data structures, weighted least squares estimation may provide a way of combining them into a common estimate of heritability curves~\citep{Gjessing08}. In our twin BMI example, we chose the ADE model for the estimation since for the estimated overall correlations, $\rho^{(MZ)}>2\rho^{(DZ)}$. However, as seen in Figure~\ref{fig:herandenv_twins}, there are values for $y$ (the BMI) where the estimated $d^2(y)$ drops below zero. This indicates that in this region, the ACE model might be more appropriate. Note that there is no difficulty in letting the local heritability curves switch from an ADE model to an ACE model locally. In particular, we see that when $\rho^{(MZ)}=2\rho^{(DZ)}$, both (\ref{eq:Falconer}) and (\ref{eq:ADE_moment}) provide the same estimates for $a^2$ and $e^2$, and both $c^2$ and $d^2$ are estimated as zero. The estimated heritability curves would thus still be continuous if switching from one model to another. The choice of Gaussian mixtures was made due to their flexibility, in the spirit of non-parametric estimation. Our approach is pragmatic in the sense that we have not attempted to interpret individual mixture components as sub-populations. One reason for this is the negative estimates for some of the $\rho_k$ seen in both Table~\ref{Table:estimation2} and~\ref{Table:estimation4}, which would be hard to interpret biologically. On the other hand, Gaussian mixtures are fully parametric models, which allows us to use the standard parametric toolbox. For instance, covariates can easily enter the mean, as in~\eqref{eq:multmean}, and it would also be straight forward to formulate model in which the $\sigma_k$ were affect by family level covariates. %In our analysis of the mother-father-child trios we could have replaced the preprocessing step, involving the quantity $D$, by a model with different means for males ($\mu^{(M)}$) and females ($\mu^{(F)}$), so that the joint mother-father-child mean vector would be $\left(\mu^{(F)},\mu^{(M)},\frac12(\mu^{(M)}+\mu^{(F)})\right)$. A further benefit of having a parametric model is that we can select model complexity ($m$) based on standard AIC or BIC criteria. The parametric structure is also the basis for the results about the tail behaviour of the correlation curve in~Theorem~\ref{asymptotic}. While the center of the distribution may have sufficient data to allow stable non-parametric estimation of the heritability, the estimates in the tails are more dependent on the model structure. This is both a strength and a weakness of the mixture model. The heritability curves converge to constant values in the tails, which makes the estimates more stable; on the other hand, those estimates depend on the dominant mixture components in the tails, and the number and placement of mixture components may not always be clear cut. %In Section~\ref{sec:asym}, we studied the asymptotic behaviour of the correlation curve. We did not write explicitly the computations for the heritability curve, but what follows from Theorem~\ref{asymptotic} is that it has a finite asymptote, given by $4\tilde{\rho}_K^{(DZ)}-\tilde{\rho}_K^{(MZ)}$ for twins and $\tilde{\rho}_K^{(FC)}$ for family trios, where $\tilde{\rho}_K$ is defined in equation~\eqref{eq:asymrho}. %A finite asymptote is in general preferable to an infinite one, since we have more control on the tail behaviour. At the same time, as we see in Figure~\ref{Fig:asymptotic}, the curves tend to stabilize on the asymptote relatively fast. This tendency must be taken into consideration while studying the estimated heritability curves, especially while examining the tail behaviour. There are also well known problems with Gaussian mixtures. Among these are local maxima on the likelihood surface ~\citep{baudry2015mixtures}, which can be explored by using different initial values for the numerical optimization. We avoided the classical ``label switching'' problem by constraining the parameters of the mixture ($\sigma$'s and $\mu$'s), but have nevertheless observed some sensitivity of the parameter estimates in Table~\ref{Table:estimation4}. Although we cannot guarantee that we have found the global optimum of the likelihood surface, the choice of model complexity ($m$) seems to be robust to the choice of initial values. Similarly, the shape of the correlation curves (and consequently heritability and environment curves) are quite stable. A related problem is that of singlularity of the Fisher information matrix which can occur for mixture models~\citep{drton2017bayesian}. This could potentially affect the validity of AIC and BIC criteria, as well as the standard deviations based on the observed Fisher information that have been used throughout this paper. Such standard deviations are produced automatically by TMB, and are very convenient in an exploratory phase, but we recommend that they are validated by simulation (parametric bootstrap). \section{Acknowledgements} This research was supported by Research Council of Norway grant 225912/F50 “Health Registries for Research” and the Centres of Excellence funding scheme (Grant 262700). \clearpage \bibliographystyle{apalike} \bibliography{heritability_curves} \clearpage \appendix \section{Proofs \label{app1}} \begin{proof} [Proof of Proposition \ref{derivative}] Let $g(y)$, $g_{k}(y)$, $p_{k}^{*}(y)$ etc.~be defined as in Section~\ref{section:Gauss}. First, note that \[ \frac{g_{k}'(y)}{g_{k}(y)}=d_{k}(y). \] Furthermore, define \[ d(y):=\sum_{i=1}^{m}p_{i}^{*}(y)d_{i}(y), \] i.e.~the weighted average of the $d_{i}(y)$'s. Then \[ \frac{g'(y)}{g(y)}=\frac{\sum_{i=1}^{m}d_{i}(y)g_{i}(y)}{g(y)}=d(y). \] For any fraction $s(y)=a(y)/b(y)$ of differentiable functions, note that the chain rule can be written as $\frac{s'(y)}{s(y)}=\frac{a'(y)}{a(y)}-\frac{b'(y)}{b(y)}$. Thus, \[ \frac{p_{k}^{*'}(y)}{p_{k}^{*}(y)}=\frac{g_{k}'(y)}{g_{k}(y)}-\frac{g'(y)}{g(y)}=d_{k}(y)-d(y). \] Recall from (\ref{expectation}) that $\mu(y)=\E\left[Y_{1}\mid Y_{2}=y\right]=\sum_{i=1}^{m}p_{i}^{*}(y)\mu_{i}(y)$ is the conditional expectation, \begin{align*} \beta(y)=\mu'(y) & =\sum_{i=1}^{m}\left(p_{i}^{*}(y)\mu{}_{i}'(y)+p_{i}^{*'}(y)\mu_{i}(y)\right)\\& =\sum_{i=1}^{m}p_{i}^{*}(y)\left(\rho_{i}+\mu_{i}(y)\left(d_{i}(y)-d(y)\right)\right)\\& =\sum_{i=1}^{m}p_{i}^{*}(y)\left(\rho_{i}+\left(\mu_{i}(y)-\mu(y)\right)\left(d_{i}(y)-d(y)\right)\right)\\& =\sum_{i=1}^{m}p_{i}^{*}(y)\left(\rho_{i}+\left(\mu_{i}(y)-\mu(y)\right)d_{i}(y)\right), \end{align*} where we make use of $\sum_{i=1}^{m}p_{i}^{*}(y)\left(d_{i}(y)-d(y)\right)=0$ and $\sum_{i=1}^{m}p_{i}^{*}(y)\left(\mu_{i}(y)-\mu(y)\right)=0$. \end{proof} \subsubsection{Proof of Theorem \ref{asymptotic} - asymptotic behavior of $\beta(y)$, $\sigma^2(y)$, and $\rho(y)$} \label{app:beta} For two functions $a(y)$ and $b(y)$, as $y\to\infty$ (or $-\infty$), we use the standard notation that $a(y)\sim b(y)$ means $\lim_{y\to\infty}a(y)/b(y)=1$, and $a(y)\ll b(y)$ means $\lim_{y\to\infty}a(y)/b(y)=0$. Our proofs below follow mostly from standard theory on asymptotic behavior of real functions\cite{bender_advanced_2013}. \paragraph{Asymptotic behavior of mixture components} For one mixture component $g_{k}(y)$, the asymptotic behavior when $y\to\pm\infty$ is \[ g_{k}(y)\sim C_{k}\exp\left(\frac{\mu_{k}}{\sigma_{k}}y-\frac{1}{2\sigma_{k}^{2}}y^{2}\right), \] for a constant $C_{k}$. Comparing two components $g_{k}(y)$ and $g_{l}(y)$ with $\sigma_{k}^{2}<\sigma_{l}^{2}$, we clearly have \begin{equation} g_{k}(y)\ll g_{l}(y)\quad\text{as}\quad y\to\pm\infty\label{eq:pm} \end{equation} since the $y^{2}$-term dominates the asymptotics. If $\sigma_{k}^{2}=\sigma_{l}^{2}$, assume that $\mu_{k}<\mu_{l}$. Then \begin{equation} g_{k}(y)\ll g_{l}(y)\quad\text{as}\quad y\to+\infty,\label{eq:p} \end{equation} and \begin{equation} g_{l}(y)\ll g_{k}(y)\quad\text{as}\quad y\to-\infty.\label{eq:m} \end{equation} Let $a_{k}(y)$ be non-zero polynomial functions in $y$ for $k=1,\ldots,m$. Since polynomials are asymptotically dominated by exponentials of polynomials, the products $g_{k}(y)a_{k}(y)$ are asymptotically ordered in the same way as in (\ref{eq:pm}), (\ref{eq:p}), and (\ref{eq:m}) above. \paragraph{Asymptotic behavior of mixtures} Recall the definition of $K$ in Theorem~\ref{asymptotic}. The results above apply directly to the sum $\sum_{k=1}^{m}g_{k}(y)a_{k}(y)$, which will asymptotically follow the dominant term with $k=K$. I.e., \[ \sum_{k=1}^{m}g_{k}(y)a_{k}(y)\sim g_{K}(y)a_{K}(y). \] In particular, for the full density we get \[ g(y)=\sum_{i=1}^{m}g_{i}(y)\sim g_{K}(y). \] Similarly, if $k\neq K$, \begin{equation} p_{k}^{*}(y)a_{k}(y)=\frac{g_{k}(y)a_{k}(y)}{g(y)}\to0,\label{eq:k.neq.K} \end{equation} and \[ p_{K}^{*}(y)a_{K}(y)\sim a_{K}(y). \] \paragraph{Conditional mean $\mu(y)$} Applying the above results to $\mu$, we obtain \[ \mu(y)=\sum_{k=1}^{m}p_{k}^{*}(y)\mu_{k}(y)\sim\mu_{K}(y)\sim\rho_{K}\cdot y. \] Furthermore, letting $a_{k}(y)=\rho_{k}+\left(\mu_{k}(y)-\mu(y)\right)d_{k}(y)$, we get \[ \beta(y)=\sum_{k=1}^{m}p_{k}^{*}(y)a_{k}(y)\sim a_{K}(y). \] However, by~\ref{eq:k.neq.K}, \[ \left(\mu_{K}(y)-\mu(y)\right)d_{K}(y) =\sum_{k=1}^{m}p_{k}^{*}(y)(\mu_{K}(y)-\mu_{k}(y))d_{K}(y)\to 0 \] since the $K$'th term vanishes. It follows that \[ \beta(y)\sim a_{K}(y)\to\rho_{K}. \] \paragraph{Conditional variance $\sigma^{2}(y)$} For the conditional variance, \begin{align*} \sigma^{2}(y) & =\sum_{k=1}^{m}p_{k}^{*}(y)\left[\sigma_{k}^{2}(1-\rho_{k}^{2})+\left[\mu_{k}(y)-\mu(y)\right]^{2}\right]\\& \sim\sigma_{K}^{2}(1-\rho_{K}^{2}). \end{align*} \paragraph{Correlation curve $\rho(y)$} Finally, the result for the correlation curve $\rho(y)$ follows directly from the results for $\sigma^{2}(y)$ and $\beta(y)$. \end{document} }

376 — 2004.08787

\caption{AD-Cluster alternatively trains an image generator and a feature encoder, which respectively {\color{red}Max}imizes intra-cluster distance (\textit{i.e.}, increase the diversity of sample space) and {\color{red}Min}imizes intra-cluster distance in feature space (\textit{i.e.}, decrease the distance in new feature space). It enforces the discrimination ability of re-ID models in an adversarial min-max manner. (Best viewed in color)}

377 — 2004.09015

\caption{Examples from Python API documentation and pre-processed code snippets, including class constructors, methods, and top-level functions. We use \textcolor{red}{red}, \textcolor{blue}{blue}, and \textcolor{green}{green} to denote required, optional positional, and optional keyword arguments respectively.}

\caption{Examples, where \refmark is the ground-truth code snippet, \xmark is the original output, and \cmark is the output with our proposed methods. Correct and erroneous function calls are marked in {\color{deepblue}blue} and {\color{deepred}red} respectively.}

378 — 2004.09269

\caption{Nusselt number as a function of the normalised positions on the plane. \textcolor{red}{$\circ$}~:~Experimental data points. \protect\rule[0.5ex]{0.25cm}{0.4mm}\hspace{0.25cm}\protect\rule[0.5ex]{0.25cm}{0.4mm} : Theoretical curve derived from the theoretical water temperature field; \protect\rule[0.5ex]{0.1cm}{0.2mm}\hspace{0.1cm}\protect\rule[0.5ex]{0.1cm}{0.2mm}\hspace{0.1cm}\protect\rule[0.5ex]{0.1cm}{0.2mm} : Asymptotic value of $Nu$ found for a rectangular duct of aspect ratio 1/20; \protect\rule[0.5ex]{0.5mm}{0.2mm}\hspace{0.5mm}\protect\rule[0.5ex]{0.5mm}{0.2mm}\hspace{0.5mm}\protect\rule[0.5ex]{0.5mm}{0.2mm}\hspace{0.5mm}\protect\rule[0.5ex]{0.5mm}{0.2mm}\hspace{0.5mm}\protect\rule[0.5ex]{0.5mm}{0.2mm} : Asymptotic value of $Nu$ found for a rectangular duct of aspect ratio 1/10. }

379 — 2004.09401

\caption{Additional \aastex\symbols}

380 — 2004.09594

\caption{\label{fig:DWScorr} Diffusive light scattering from sedimented emulsion. (A) 7.2\,$\mu$m diameter droplets are sealed in a thermostatted glass tube, where they settle and consolidate for several months. The vertical position of the tube is adjusted so that the laser illuminates the sediment at a specific vertical distance below the top, $d$. A camera and optical fiber collect cross-polarized, backscattered light. (B) Scattered light intensity autocorrelations, $g_2(\tau)$-1, measured at distances $d$\,= 0.5\,cm (\textcolor[RGB]{213, 94, 0}{{\raisebox{0.1em}{\fontsize{8}{12}\selectfont $\bigodot$}}}), 1.3\,cm (\textcolor[RGB]{230,159,0}{$\circledcirc$}), 4.0\,cm (\textcolor[RGB]{0,158,115}{{\raisebox{-0.2em}{\fontsize{20}{22}\selectfont $\circ$}}}), 4.9\,cm (\textcolor[RGB]{86,180,233}{{\raisebox{0.1em}{\fontsize{8}{12}\selectfont $\bigoplus$}}}), and 8.5\,cm (\textcolor[RGB]{0,114,178}{{\raisebox{0.1em}{\fontsize{8}{12}\selectfont $\bigotimes$}}}) below the top of a sedimented emulsion held at 31.5$^{\circ}$C show clear separation between solid-like and fluid-like behaviors. (Inset) Droplets closer to the bottom of the sediment reach a stable plateau MSD, while droplets closer to the top are slowed by the crowding of their neighbors but continue to move.}

381 — 2004.09619

\caption{Solutions for DAX NVM \redundancy and their trade-offs.}

\caption{Throughput for a PMDK key-value store when using three \redundancy options, as a function of the number of threads performing PMDK's insert-only benchmark workload. (Details in \cref{sec:kvs}; RBtree results shown here.) %Each line shows throughput %Multi-threaded red-black tree insert-only workload. }

382 — 2004.09713

\caption{Disassembly of \texttt{main} before \& after increasing\texttt{digest} and \texttt{hexdigest} buffers by 16 and 32 bytes, respectively. Lines containing rewritten instructions are highlighted in \textcolor{green}{green} and changes are in \textcolor{red}{red}. %Debugging information is {\it NOT} used by \acron but only here for the sake of presentation. }

383 — 2004.09715

\caption{(Best viewed in color) List of companies in each rank-shift profile category. \textcolor{green}{Green}, \textcolor{red}{red} and \textcolor{blue}{blue} color represent buckets I, II and III respectively. }

384 — 2004.09822

\caption{(color online) Spherical coordinates decomposition of the Néel vector (upper panel). Angular dynamics {\color{ao(english)} $\theta$(t)} and {\color{blue} $\phi$(t)}, of the Néel vector of the \NiO antiferromagnetic relaxation, starting from a tiny tilt away from equilibrium (middle panel). Fourier transform of the angular dynamics, revealing resonances at $1\THz$ and $0.2\THz$ at low damping $\alpha=2.1\times 10^{-4}$. For practical spintronic devices, $\alpha\approx 5\times 10^{-3}$ is expected and also computed, causing the resonance peaks to flatten and shift (lower panel).\label{fig:FFT}}

385 — 2004.09853

\caption{Top 3 distractors from different ranker running with Probase CSG(- denotes sole Probase CSG) given the stem ``The main source of energy for your body is \underline{\hbox to8mm{}}.'' and the key ``{\color{blue}carbohydrate}''. Red colored distractors are the ground truth, bold distractors are unreliable distractors.}

386 — 2004.09974

\caption{\label{example-table} An example from a novel called ``Fights Break Sphere''. The relations between \textcolor{y}{Yanxiao}, \textcolor{x}{Xuner} and \textcolor{n}{Nalanyanran} are evolutionary. And the characteristic of \textcolor{y}{Yanxiao} changes over time.}

\caption{Relation Network with a reconstruction loss.The edge embedding is shown in \textcolor{red}{red color}.}

\caption{\label{case-table}Comments generated by Trans.+CTX~(\textbf{T}), Graph2Seq++~(\textbf{G}) and our EKG+GAT(V+E)~(\textbf{E}). The passages (i.e., P1, P2, P3, P4) are extracted from the same novel called \emph{Zeng Guofan}. We highlight the passage corresponding to the generated comment from our model~\textbf{E} with \textcolor{blue}{blue color}. Moreover, the relevant fragments are marked with a same color.}

387 — 2004.09984

\caption{Some generated adversarial samples. Origin label is the correct prediction while \textcolor{red}{label} is adverse prediction. Only red color parts are perturbed. We only attack premises in MNLI task. Text in FAKE dataset and IMDB dataset is cut to fit in the table. Original text contains more than 200 words. }

388 — 2004.10166

\caption{\textbf{How informative are line representations?} We set up three categories of synthetic \solidity programs containing 50 programs each. Categories \modd and \nomod modify unique programs in category \base in a controlled and specific manner (details in Section \ref{resultsline}). We compare the representations of specific lines of interest in programs from each of these categories as computed by a trained \vulcan. We compare the average $L^2$-distances of these representations among the three categories (right). Larger values indicate the representations are farther apart.}

389 — 2004.10376

\caption{We frame the SEIR modeling as a recurrent neural network (RNN) architecture, introducing the SEIR-cell which encodes susceptible, exposed, infected, and recovered proportions in its hidden states. Open sourced on \textit{GitHub} -- download \textcolor{blue}{\href{https://github.com/Nu-AI/Livid-About-COVID}{here}}.}

390 — 2004.10396

\caption{Predicted epidemic time series. The upper panel is daily new infected individuals (i.e. $-\Delta (S(t)+E(t))$), lower panel is total number infected ($S(0)-S(t)$). For each network configuration, results show mean and distribution of $100$ simulations over $240$ days. In black $A=B(4)$ for all time. In other simulations \red{$A=B(4)$} for all $t$ until $I(t)>150$, otherwise $A=L(s)$ with values of $s$ from $0$ to $1$. The shaded envelopes are $90\%$ confidence intervals. To compute the fraction of population compliant with social isolation measures $d$ we compute $d=P({\rm no\ rewired\ links})=(1-s)^k$ (here, the number of neighbours $k=4$). \red{Epidemic parameters follow the values established for our later simulations in Table \ref{paramtable}. For the purposes of this plot, we vary only $s$ --- \red{the rewiring probability from $s=0$ to $s=0.15$}.} }

\caption{\red{Model flow chart. A Graphical representation of the model state transition process. Each node can be in one of four states $S$, $E$, $I$, or $R$ with transition between them determined by probabilities $p$, $q$ and $r$ and the contact process of elements $a_{ij}$ of the network adjacency matrix $A$. Hence node-$i$ has probability $pa_{ij}$ of being infected through contact with node-$j$.}}

\caption{Epidemic simulation parameters. The simulation size $N$ is chosen to be a square number to make the construction of $L(s)$ simpler. \red{Latency period of $q=\frac{1}{7}$ is comparable to observation, the other parameters are estimated derived from the values used in \cite{fS20,fM20} for Australian populations. These parameter values ensure growth in infection for $t<t^*$ but barely endemic otherwise (for $A \neq B$). That is, these parameters are selected to match the observed data for our principle region of interest. Subsequent parameter sensitivity computation will indicate that variation of these parameters does} not change the qualitative features, only the scale of the observed simulations.}

\caption{Parameter sensitivity. The \red{four surfaces} explore the expected total number of infections (population $N=1450^2$) for various parameter values $p$ and $r$ (for $t>t^*$) and different control strategies (i.e. $L(s)$ for different $s$). The four surfaces depicted here correspond to (a) $s=0.0025$; (b) $s=0.026$; (c) $s=0.054$; (d) $s=0.065$ \red{(that is, $99\%$, $90\%$, $80\%$ and $70\%$ observance of physical distancing measures).} The three coordinates are (x) $r$; (y) $p$; and (z) $\log(\max_t (S(0)-S(t)))$ (the logarithm base-10 of the total number of infections). In each case we computed $80$ simulations of $300$ days. Other parameters are as reported in Table \ref{paramtable}. Surface (a) and (b) exhibit linear scaling with changing parameter values $p(t>t^*)$ and $r(t>t^*)$, while for (c) and (d) that growth is exponential. That is, when compliance with isolation measures drops below $90\%$ there is an explosive growth in the level of infection with $p(t>t^*)$ and $r(t>t^*)$. }

\caption{Parameter sensitivity. The three panels explore the expected total number of infections (population $N=1450^2$) for various parameter values $p(t>t^*)$ and $r(t>t^*)$ (i.e $p$ and $r$ for $t>t^*$) and different control strategies (i.e. $L(s)$ for different $s$). The four panels depicted here correspond to (a) $s=0.013$; (b) $s=0.026$; (c) $s=0.054$ \red{($90\%$, $80\%$, $70\%$ physical distancing as reported in the panel headings)}. In each case we computed $80$ simulations of $300$ days. Other parameters are as reported in Table \ref{paramtable}). Note that panel (a) has a linear ordinate, panel (b) and (c) are depicted with a logarithmic scale. As in Fig. \ref{parameters} we observe explosive growth in impact with lower levels of compliance. }

\caption{Control evaluation. We depict the effectiveness of control measures for \red{each Australian state and internal territory (excluding Jervis Bay)}. In each case the epidemic diffusion is fitted to data up to the end of the exponential growth phase (that is, the point of inflexion on curves $S(0)-S(t)$). Simulations up to this time point $t^*$ effectively seed the network and provide a distribution of infectious and exposed individuals within the community. Beyond this point we simulate the application of small-world control network structure $L(s)$ for various values of $s$. Here we illustrate $s=0.013$, $s=0.026$ and $s=0.054$ corresponding to $95\%$, $90\%$ and $80\%$ control. Actual observed time series data is also shown and illustrates exception effectiveness of control measures for various Australian states.}

\caption{\red{Recovery and return. Here we depict the effect of various palliative control measures in the event of a reemergence of infection (modelled here by a population seeded with $5$ exposed (infected but asymptomatic) individuals. The four solid lines represent a return to mass gatherings (black), a 50 person limit on gatherings (red), no mass gatherings (blue), and continued physical distancing (green). The dashed lines model the same scenarios with the addition of $50\%$ of the population adopting and using contact tracing software (CT). Note that the red (second solid) line grows exponentially, the black line (top) is faster than exponential and the blue and green (bottom) lines are significantly below exponential. In all cases these lines represent the median of $100$ simulations.}}

391 — 2004.10450

\caption{Examples of sentences at various model likelihoods. Sentences with very low $\log \pmd$ generate \textcolor{blue}{nonsense}, while sentences that have high likelihood under the model often devolve into extreme \textcolor{red}{repetition}. Nonsense and repetition classifications shown here are only for illustrative purposes. Crowdworkers simply rated sentences for overall quality. See Appendix for more details.}

392 — 2004.10458

\caption{(a) The absolute value of the maximum wave steepness at the shoreline (\red\L) as a function of $\mathcal{M}$ for case \textbf{a}. Clearly in this case $|\eta_x|$ grows exponentially with $\mathcal{M}$ and does not converge. (b) The relative error of the maximum wave runup $e_R$ (\L), and the relative error of the maximum wave steepness $e_{\eta_x}$ (\dashL) at the shoreline are shown as functions of the number of grid points $\mathcal{M}$ for the numerical simulation of case \textbf{a'} reported in table \ref{Carrier}. These relative errors are defined with respect to the values obtained for $\mathcal{M}=2^{15}$ (i.e. the finest grid simulated). The relative error of the maximum wave runup for case \textbf{a} (\red\dashdot) is also shown in this plot. }

\caption[]{(a) Time history of nonlinear wave runup height at $y=1.02$ on $S_s$ for a slide entering: (1) a finite rectangular lake ($L_a=1$, $L_b=1.02$, \red\L), (2) a long and narrow lake ($L_a=\infty$, $L_b=1.02$, \blue\dashL), (3) a short and wide lake ($L_a=1$, $L_b=\infty$, {\color{orange}\dotL}), (4) an open coast ($L_a=\infty$, $L_b=\infty$, {\color{green}\dashdot}). (b) Inundation maps up to $t=8$ along $S_s$ for $y\geq0$ for the four cases displayed in figure (a). Physical and simulation parameters other than $L_a$ and $L_b$ are the same as in figure \ref{Snapshots}.}

\caption[]{Inundation maps of the (a) shorelines $S_s$, $S_o$ (for $y \geq 0$), and, (b) side-walls and centerline as predicted by nonlinear (\red\L ; {\color{orange}\dotL}) and linear theories (\blue\dashL ; {\color{green}\dashdot}). Physical and simulation parameters are the same as in figure \ref{Snapshots}. Simulation are performed from $t$=0 to $t$=8 beyond which no further significant changes occur.}

\caption[]{Effect of the slide's width on the difference between nonlinear and linear maximum waveheights along the lake's centerline. The predictions are plotted for a finite width landslide ($w<L_b$, {\color{orange}\L}) and an infinitely-wide landslide ($w\rightarrow \infty$,\blue\dashL). In the latter case the problem is two-dimensional, and therefore this figure also highlights effects of three dimensionality on the significance of nonlinearity. Figures a-e show the difference ($\mathcal N$, c.f. \eqref{51}) for time periods from zero to $t_f$ = 1.64, 3.30, 4.97, 6.63, 8.00 (i.e. same as in figure \ref{Nonlin}). Note that $\mathcal N$ is much higher for the 3D case, and that $\mathcal N$ becomes negative at some time in the 2D case. Physical parameters (other than $w$) and simulation parameters are the same as in figure \ref{Snapshots}.}

\caption[]{Wave energy in the lake ($\mathcal{E}$, c.f. \eqref{B104}) normalized by steady-state wave energy ($\mathcal{E}_{\infty}$) is shown for a landslide with: ($a$) $s=0.07$, $v=0.12$ ($\mathcal{E}_{\infty}=0.967\times 10^{-6}$); ($b$) $s=0.035$, $v=0.12$ ($\mathcal{E}_{\infty}=0.229 \times10^{-6}$); ($c$) $s=0.035$, $v=0.24$ ($\mathcal{E}_{\infty}=2.220\times10^{-6}$). The total wave energy in the lake (\L) is divided to energy in the near $S_s$ area ($0<a<L_a/10$, \red\cdotL); and the rest of the lake ($L_a/10<a<L_a$, \blue\dashL). Other physical and simulation parameters are given in figure \ref{Snapshots}. }

393 — 2004.10643

\caption{Map of the world with language coverage of UD. Locations are approximate. Languages released in v1.0 of the collection (2015) are in \textcolor{MapGreen}{green {\small $\blacksquare$}}, those released in v2.0 (2017) are in \textcolor{MapBlue}{blue $\bigcdot$}~, and those released in v2.5 (2019) are in \textcolor{MapRed}{red $\blacktriangle$}. Coordinates are approximate based on the capital city or centre of the country where either the largest population of speakers lives, or where the treebank was created.}

394 — 2004.10644

\caption{The $\tau(E_{\gamma},z) = 1$ \gray "horizon" plot showing our results with and without including LyC photons (see text) compared with the Fermi plot of their highest energy photons from FSRQs (red), BL Lac objects (black) and and GRBs (blue) vs. redshift \citep[from][]{abdo2010,fermi2018}.}

\caption{An intrinsic power-law spectrum of $\sim E^{-2}$ (blue) adjusted for \gray absorption for our best-fit no LyC (orange) and instantaneous reionization (green) cases for redshifts of $z=$ 1.0, 2.0,3.0, and 5.0}

395 — 2004.10645

\caption{ Word cloud of the edits made in questions; {\protect\color{purple}{$\blacksquare$}} and {\protect\color{orange}{$\blacksquare$}} indicate added and deleted unigrams, respectively. %Numeric values are grouped by the number of digits. }

\caption{ Exact Match (EM) on \nqopen\of different models, counting a prediction as correct if it matches\textit{Any} gold reference, or only the \textit{First} non-null one. }

\caption{ Ablations on \answerprediction\(\dev\data).{\em all} and {\em multi} indicate all examples and examples with multiple question-answer pairs only, respectively. }

\caption{ Ablations on \qd\(\dev\data, multiple answers only). QD model refers to the\qd\model described in Section~\ref{sec:model}. For \answerprediction, we use \modelname$^\dagger$ with co-training ({\em Full task}) or the gold answers ({\em Gold answers given}). }

\caption{ Zero-shot performance on \answerprediction\of the models trained on\nqopen. We report Exact Match (EM) on \nqopen\and\Fanswer\on\dataname. }

396 — 2004.10963

\caption{{\color{blue}{Ablation experiments on Office-31 for unsupervised domain adaptation (ResNet-50)}}}

397 — 2004.11198

\caption{Micro-averaged F1 score average and standard deviation over 100 train/val/test splits for different models, and a different model initialization for each split. The top three performance scores are highlighted as: {\bf \bf \color{red} First}, {\bf \bf \color{violet} Second}, {\bf Third}. }

\caption{Micro-averaged F1 score average and standard deviation over 10 runs with the same train/val/test split but different random model initialization. The top three performance scores are highlighted as: {\bf \bf \color{red} First}, {\bf \bf \color{violet} Second}, {\bf Third}. }

398 — 2004.11207

\caption{Examples of attribution graphs. (a) and (c) are from MNLI, whose BERT predictions are \texttt{entailment} and \texttt{contradiction}, respectively. (b) and (d) are from SST-2, which are both predicted as \texttt{positive} by BERT. The \textcolor{gray}{grey} words from the inputs do not appear in the attribution graphs. }

\caption{Effectiveness analysis of \ours{}. The \textcolor{MidnightBlue}{blue} and \textcolor{Brown}{red} lines represent pruning attention heads according to attribution scores, and attention scores, respectively. The solid lines mean the attention heads with the smallest values are pruned first, while the dash lines mean the largest values are pruned first. The results show that \ours{} better indicates the importance of attention heads.}

\caption{ Evaluation accuracy as a function of head pruning proportion. The attention heads are pruned according to the accuracy difference (baseline; dash \textcolor{Dandelion}{yellow}), the Taylor expansion method (\citealt{are16headbetterthan1}; solid \textcolor{Brown}{red}), and \ours{} (this work; solid \textcolor{MidnightBlue}{blue}). }

399 — 2004.11507

\caption{Temperature angular power spectra of one example of simulated maps from test set. In the top panel, the spectrum calculated from the reference map is shown in \colorindicator{plotgray}{gray}{\HalfCircleLeft}, the spectrum calculated from the reconstructed map is shown in \colorindicator{plotblue}{blue}{\HalfCircleRight}, and the theory spectrum used to create the simulations is shown in \colorindicator{plotgreen}{green}{}. The bottom panel shows the mean and standard deviation of the difference between the map-derived spectra and the theory spectrum, using a bin width of 33, for both the reference map and the reconstructed map.}

\caption{Planck temperature angular power spectra. In the top panel, the published \emph{Planck} spectrum is shown in \colorindicator{plotgray}{gray}{\HalfCircleLeft}, the spectrum calculated from the neural network reconstructed map is shown in \colorindicator{plotblue}{blue}{\HalfCircleRight}, and the published \emph{Planck} best-fit theory spectrum is shown in \colorindicator{plotgreen}{green}{}. The bottom panel shows the mean and standard deviation of the difference between the map-derived spectrum and the published \emph{Planck} best-fit theory spectrum, as well as the difference between the published \emph{Planck} spectrum and the published \emph{Planck} best-fit theory spectrum, using a bin width of 33.}

400 — 2004.11526

\caption{Analysis of the errors, in $\mu$-strain, given by the existing approaches and our proposed approach from the simulated random trials. The best value of each metric for each noise level is coloured \blue{blue} and the worst \red{red}. It should be noted that for the cross-correlation method it was necessary to manually tune the smoothing parameters for each noise level to achieve the best results.}

401 — 2004.11598

\caption{\textcolor[rgb]{0,0,0}{3D head reconstruction result of our method with different settings.}}

402 — 2004.11795

\caption{ While lattice LSTM indicates lattice structure by dynamically adjusting its structure, FLAT only needs to leverage the span position encoding. In \ref{flat lattice},\\includegraphics[width=.25cm]{img/color_token-crop.pdf}, \includegraphics[width=.25cm]{img/color_head-crop.pdf}, \includegraphics[width=.25cm]{img/color_tail-crop.pdf} denotes tokens, heads and tails, respectively. % \todo[inline]{show lstm and transformer,refer BERT fig} }

403 — 2004.11814

\caption{Benchmark results of different image SR methods. Average PSNR/SSIM values for scaling factor $\times2$, $\times3$, and $\times4$. The best performance is shown in {\color{red}{red}} and the second best performance is shown in {\color{blue}{blue}}.}

404 — 2004.11867

\caption{\label{tb_off_target_issue} Illustration of the off-target translation issue with French$\to$German zero-shot translations with a multilingual NMT model. Our baseline multilingual NMT model often translates into the wrong language for zero-shot language pairs, such as \hlred{copying} the source sentence or translating into \hbleu{English} rather than German.}

405 — 2004.11992

\caption{ Samples from all datasets. Top rows: Daimler Pedestrians, CIFAR100, FGVC-Aircraft, CU Birds, VGG-Flowers, UCF101, BACH, Protein Atlas. Bottom rows: GTSRB, SVHN, Omniglot, UC Merced Land Use, Describable Textures, Indoor Scenes, Kather, ISIC. Color coding is by group: \textcolor{blue}{\Natural} \textcolor{mygray}{\Symbolic} \textcolor{red}{\Scenes} \textcolor{green}{\Biological} }

406 — 2004.11995

\caption{Depiction of our used correspondence mapping $f$ and its relation to triplet loss. For the triplet loss, an anchor point (A) is assigned a positive sample (\textcolor{tblue}{P}) and contrasted to a negative one (\textcolor{tred}{N}). In our case, $f$, and thus the model trained with it, is a generative way of converting anchor point A from one domain to another (ideally, resulting in the mean of the positive correspondence points).}

\caption{Visualization of a simulated lane change to the left. The distance to the lane's center line is plotted on the y-axis, time in seconds on the x-axis. The ``noisy'' lane change from domain B is drawn in \textcolor{tred}{red}, a corresponding one from domain A in \textcolor{tblue}{blue} (for simplicity, just one of the $n$ is shown). The output of the Converter (with $T2$) is drawn in \textcolor{tgreen}{green}, and shows a very plausible converted lane change.}

\caption{Visualization of a lane change to the left, on the x-axis time in seconds is plotted. The top plot shows $m$, once the raw data from domain $B$ (\textcolor{tred}{red}) and once the Converter's output (\textcolor{tgreen}{green}). Same holds for $v$, this is indicated by the dashed lines. In the bottom plot the labels of the corresponding time steps are shown: The ground truth label is drawn in \textcolor{tyellow}{yellow}, the prediction of the fine-tuning approach in \textcolor{tred}{red}, the output of our model ($T2$) in \textcolor{tgreen}{green}. 1 / -1 denote lane changes to the left / right, 0 follow periods and -2 ignore labels, which are inserted between follow and lane change labels and after lane changes, to give the models time to reset. The effects of the Converter can clearly be seen, smoothing out fluctuations and scaling down extreme values of the input features, especially during follow periods. Our model outperforms standard fine-tuning, having much less false predictions while virtually predicting identically during lane changes.}

407 — 2004.12006

\caption{Pretraining (left) and QA finetuning (right) examples which encode contexts with background sentences from Wikipedia. The input is minimally structured by including the source page of each background sentence, and separating the sentences using special \sep~tokens. Background is shown in \textcolor{blue}{blue} and entities are indicated in \tf{bold}.}

408 — 2004.12069

\caption{Quantitative RMSE evaluation. We test our two networks ($\PhotonNN=50$ and $\PhotonNN=500$) on four novel scenes with different numbers of emitted photons, and compare RMSE against standard photon mapping \cite{jensen1996global} under the same conditions. Photon mapping's performance varies widely based on the settings but the best results (shown in \blue{blue}) are always with high photon count and $K=50$ nearest neighbors. On the other hand, our results are more consistent with different settings, demonstrating our networks' ability to adapt to different numbers of traced photons. Moreover, our worst results (shown in \red{red}) are usually with low photon counts and are similar to, if not better than the best photon mapping results (with 10x more photons). Meanwhile, our best results (shown in \blue{blue}) are at higher photon counts and have substantially lower errors than photon mapping with the same photon counts.}

409 — 2004.12349

\caption{Average accuracy comparison of our approach with the related methods on Washington RGB-D Object dataset (\%). \textcolor{red}{Red:} Best result, \textcolor{blue}{Blue:} Second best result, \textcolor{darkgreen}{Green:} Third best result.}

\caption{Accuracy comparison of our approach with the related methods on SUN RGB-D Scene dataset (\%). \textcolor{red}{Red:} Best result, \textcolor{blue}{Blue:} Second best result, \textcolor{darkgreen}{Green:} Third best result.}

410 — 2004.12440

\caption{Case study on why teacher-student learning works. The \colorbox[rgb]{0.80, 0.98, 0.85}{GREEN} (\colorbox[rgb]{0.97, 0.82, 0.80}{RED}) highlight indicates a correct (incorrect) label. The real-valued numbers indicate the predicted probability corresponding to the entity label.}

411 — 2004.12441

\caption{AUC scores (\%) for the detection task Receiver operating characteristic (ROC), The best performing DBN configuration is marked in \textcolor{blue}{blue}. The methods with best performance are \textbf{bold}.}

412 — 2004.12811

\caption{Quantitative comparison of different networks for image denoising. {\color{red}Red} indicates the best results.}

\caption{Quantitative comparison of different networks for 4$\times$ image super-resolution on Set5 and Urban100. {\color{red}Red} indicates the best results.}

413 — 2004.12989

\caption{mIoU \vs object \textcolor{blue}{occlusion} and \textcolor{OliveGreen}{depth}.}

414 — 2004.13195

\caption[ The average and standard deviation of critical parameters ]{\small We have highlighted rule boundaries {\color{red}$\alpha$} and {\color{red}$\omega$} in red, and conduit {\color{ForestGreen}$q$} $ \in Q_k$ in green.}

415 — 2004.13200

\caption{Section of M83 SW of the nucleus centered at RA(J2000) 13:36:55.7, Dec(J2000) -29:53:00.1, showing a complicated \hii\region complex. The three panels are a) the continuum-subtracted\ha\data from Magellan\citep{blair12}; b) the ATCA 5.5 GHz data, and c) the combined radio detection image (5.5 GHz + 9 GHz) as described in the text. The region shown is 30\arcsec\in the N-S dimension and red circles are 2\arcsec\in diameter, indicating optical SNR positions. The green regions in the right panel indicate the radio source islands discussed in the text.\label{fig_rad_islands}}

\caption{This figure shows a 1\arcmin\region centered on RA(J2000) 13:37:00.7, Dec(J2000) -29:51:54.8, showing the crowded and complex M83 nuclear region. The three panels are a) the ATCA 5.5 GHz data; b) the ATCA 9 GHz data, and c) the combined radio detection image (5.5 GHz + 9 GHz) as described in the text. In this case, the islands abut one another and individual sources are not confidently resolved in most cases. The improved resolution in the 9 GHz data still does not resolve many of the structures in the nuclear region, although the microquasar MQ1 just outside the bright, confused nucleus is indicated in the middle panel. The red squares are the median positions of each island, as reported in the catalogue.\label{fig_nuc_islands}}

\caption{ A 45\arcsec\region of M83 centered on the nuclear region.{\bf Top left:} stacked {\it Chandra} image in the 2.6--8 keV band. The circles and labels correspond to \protect\citet{long14} sources (but omitting sources not detected in the hard band); the diameter of each circle is $1\farcs0$. L14-233 corresponds to the optical nucleus; L14-237 to the microquasar MQ1 \protect\citet{soria14}. Dashed circles correspond to sources whose positions (derived from the hard band only) appear slightly different ($0\farcs4-0\farcs8$) from what was reported in the L14 catalogue (based on the full band). The white circle labelled C is the location of the photometric center \protect\citep{knapen10}. {\bf Top right:} adaptively smoothed {\it Chandra} image, with red = 0.35--1.1 keV, green = 1.1--2.6 keV, blue = 2.6--8 keV; circles and labels are the same as in the top left panel. {\bf Left middle:} ATCA 9 GHz flux density map; the synthesized beam is overplotted on the bottom left of this panel. The unresolved sources A and B near the nucleus correspond to two candidate SNRs identified by \protect\citet{piqueras12} and discussed in Section 4.5. {\bf Right middle:} {\it HST} broad-band image; red = F814W, green = F555W, blue = F438W. {\bf Bottom left:} continuum-subtracted {\it HST} image in the F657N filter. {\bf Bottom right:} continuum-subtracted {\it HST} image in the F164N filter; notice the prominent \feii\emission from the two candidate SNRs A and B.\label{fig_nucleus_6pan}}

\caption{ Left: H$\alpha$ versus radio flux density for a catalogue of \hii\regions taken from\citet{rumstay83}. The line shows the expected relationship between H$\alpha$ and free-free emission at 5.5~GHz from \citet{caplan86}. Right: H$\alpha$ versus radio flux density for all the sources in our M83 radio catalogue. There is a population of sources distributed near the line. SNRs in unconfused regions (red triangles) generally have excess radio emission compared with the \hii\regions. SNRs in confused regions (green inverted triangles) often have radio emission consistent with that expected for\hii\regions. Open symbols are sources located within the confused nuclear region.}

\caption{Four-panel figure centered at RA(J2000) 13:36:54.8, Dec(J2000) -29:52:54.3, showing a region to the SW of the M83 nucleus. For scale, the region shown is 1\arcmin\in the N-S dimension. Panel a shows the radio detection image with the radio islands overplotted in green. Panel c shows the Magellan continuum-subtracted\ha\image smoothed to the radio beam size. In panel b, we have scaled and subtracted the\ha\image from the radio image such that regions of primarily photoionized emission should disappear or be greatly reduced. Thus, the radio emission that remains in this panel is stronger than expected from typical photoionized emission. Some but not all of the sources that remain in panel b align with SNRs (red circles) and/or X-ray sources (yellow circles). The ones that do not (irregular magenta regions) project almost exclusively onto dark, dusty regions, as shown in panel d, which is a visual continuum image of the region from Magellan. Hence, these radio sources are either background sources or more likely radio\hii\regions or SNRs with no detected optical or X-ray emission.\label{fig_subtract_ha} }

\caption{Three 4-panel figures showing multiwavelength imagery for the three historical M83 SNe that show radio emission, SN1957D (top), B12-174a (middle) and SN1950B (bottom). The four panels show (from left to right) {\it HST} WFC3 subtracted emission lines (\ha\red,\sii\green,\oiii\blue);{\it HST} WFC3 continuum (I band red, V band green, B band blue), {\em Chandra} X-ray (soft red, medium green, hard blue), and the ATCA detection image. The yellow circles are 4\arcsec\in diameter.\label{fig_hist_sne}}

416 — 2004.13370

\caption{The sets $\hat{\m{S}}$ (\legendsquare{fill=mgreen,fill opacity = 0.2,draw = mgreen}), {$\bar{\Xi}$} (\legendsquare{fill=mblue,fill opacity = 0.2,draw = mblue}), and $\m{R}$ (\legendsquare{fill=mpurple,fill opacity = 0.2,draw = mpurple}), considering $\hat{W}_\alpha$ for different values of $\alpha$. \vspace{-2em}}

417 — 2004.13388

\caption{ \textbf{Quantitative evaluations on the benchmark dehazing datasets.} {\color{red}\textbf{Red texts}} and {\color{blue}blue texts} indicate the best and the second-best performance respectively. % $\uparrow$ and $\downarrow$ mean the better methods should achieve higher/lower score of this metric. }

\caption{\textbf{Detection results on the KITTI Haze dataset.} % We apply dehazing methods trained on the RESIDE dataset~\cite{RESIDE} to restore clean images and evaluate their perceptual quality for the object detection task. % % We use YOLOv3~\cite{yolov3} as the default detection algorithm. % The mAP is the abbreviation of mean average precision. % {\color{red}\textbf{Red texts}} indicate the best detection precision. }

\caption{\textbf{Effect of the number of feature levels and ResBlocks.} $L$ denotes the number of feature levels and $B$ denotes the number of ResBlocks~\cite{resnet} in $G_{Res}$. % All the experiments are conducted on the SOTS dataset~\cite{RESIDE}. % {\color{red}\textbf{Red texts}} indicate the best performance. }

\caption{\textbf{Analysis on each component of the MSBDN-DFF.} All the methods are evaluated on the SOTS dataset \cite{RESIDE} using the same training setting as the proposed algorithm. % {\color{red}\textbf{Red texts}} indicate the best performance of each part. }

418 — 2004.13609

\caption{We surveyed on conversations containing \emph{reciprocity}: an \textcolor{ForestGreen}{initiator (green)} makes a comment, a \textcolor{violet}{replier (purple)} replies to the initiator, and the initiator follows up with another comment or a reaction. Surveys asked about facts and opinions in the initiator's opening comment, though for context the survey participant was additionally shown the reply and the Page post on which the exchange took place.}

419 — 2004.13780

\caption{Audio-visual information samples selected from proposed dataset. The visual data contains various variations such as pose, lighting condition and motion. The~\textcolor{green}{\textbf{green}} block contains information of celebrities speaking English and the~\textcolor{red}{\textbf{red}} block presents data of the same celebrity in Urdu.}

\caption{Evaluation protocol to analyze the impact of multiple languages on association between faces and voices.~\textcolor{green}{\textbf{Green}} and the~\textcolor{red}{\textbf{red}} blocks represent training and testing strategies. At test time, the network is evaluated on \textit{unseen-unheard} configuration from the same language (English) \textit{heard} during training along with a completely \textit{unheard} language (Urdu).}

\caption{Evaluation protocol to analyze the impact of multiple languages on speaker recognition.~\textcolor{green}{\textbf{Green}} and the~\textcolor{red}{\textbf{red}} blocks represent training and testing strategies respectively. At test time, the network is evaluated on the same language \textit{heard} during training along with completely \textit{unheard} language of the same identities.}

420 — 2004.13873

\caption{Example \highlightPreSSA{\mbox{expression}} and its \highlightPostSSA{\mbox{ssa form code sequence}}.}

\caption{Linear Kalman filter equations. The method we present in Section~\ref{section:core} extracts the information in the terms related to the \highlightPhysics{\mbox{physics}} of the system as well as properties of the \highlightNoise{\mbox{noise}} when provided in the input to the method.}

421 — 2004.13897

\caption{\label{table:cases} Expanded entity sets for two sample queries, with erroneous entities colored {\color[HTML]{FE0000} red} and marked with a ``*''.}

422 — 2004.14096

\caption{Simplified \textcolor{blue}{UD} and \textcolor{red}{SUD} annotation for an English sentence.}

423 — 2004.14120

\caption{Example of a small post-edit from the training set. Each action is represented by three features: its type (I for insert and D for delete), its position in the sentence and the token to insert/delete. In this example, the token marked \textcolor{red}{\textit{in red}} needs to be removed since it is incorrectly placed. The \textcolor{blue}{\textbf{blue}} token is inserted to obtain the correct \texttt{pe}.}

\caption{Example of a sentence and its minimum-edit actions ordered in three different ways: left-to-right (\textit{l2r}), randomly shuffled (\textit{shuff}) and following human order (\textit{h-ord}). The unfiltered human actions are also presented (\textit{human}). We can see that the human chose to first insert the two words marked \textcolor{blue}{\textbf{in blue}}, later moving back in the sentence to edit the leftmost mistakes.}

424 — 2004.14166

\caption{A CSC data sample from SIGHAN 2014~\cite{DBLP:conf/acl-sighan/YuLTC14} with ID B1-3440-2, the {\color{orange}incorrect}/{\color{blue}correct} characters are in {\color{orange}orange}/{\color{blue}blue}. A BERT model modifies the text into a sentence that is semantically reasonable but dissimilar in pronunciation. By incorporating both phonological and visual similarities, our new method SpellGCN can generate a sentence that is both semantically sensible and phonically similar to the original sentence. %When incorporating the confusion set by GCN, the corrected sentence is feasible in terms of both meaning and pronunciation. The sentence output from SpellGCN means ``this restaurant is very suitable for dating''. }

\caption{Several prediction results on the test set. The first line in the block is the input sentence. The second line is corrected by BERT without SpellGCN. And the last line is the result from SpellGCN. We highlight the {\color{orange}incorrect}/{\color{blue}correct} characters by {\color{orange}orange}/{\color{blue}blue} color. }

425 — 2004.14200

\caption{Computing the probability of selecting words. The last two lines are the result of the original data augmentation $Blanking$ and ours. $Blanking$ is to replace word with a special placeholder \textcolor{blue}{\textit{BLANK}}.}

426 — 2004.14299

\caption{Task-guided pre-training accuracies (abbreviations defined in Table \ref{tab:binary-task-results}). Displayed in order of supervised (middle) and unsupervised (bottom) pre-training. Results are highlighted with \colorbox{cyan!50}{blue} ($\uparrow$) and \colorbox{red!50}{red} ($\downarrow$) with respect to \textsc{No-Pretrain}. Best viewed in color.}

\caption{Unsupervised domain adaptation accuracies (abbreviations defined in Table \ref{tab:binary-task-results}). Results are highlighted with \colorbox{cyan!50}{blue} ($\uparrow$) and \colorbox{red!50}{red} ($\downarrow$) with respect to \textsc{Src-Only}. Best viewed in color.}

427 — 2004.14325

\caption{T-SNE comparison of synset embeddings for whole WordNet learned from SC+UWA10 (left), or just SC (right). Colors represent source of annotations for embeddings (\tikz\draw[fill=sc_color] (0,0) circle (.7ex); SC \tikz\draw[fill=uwa_color] (0,0) circle (.7ex); UWA \tikz\draw[fill=propagated_color] (0,0) circle (.7ex); Propagation).}

428 — 2004.14326

\caption{Sampling strategy for self-supervised training. The \textcolor{red}{red} lines represent the pairs for cross-modal biometrics, the \textcolor{blue}{blue} lines represent the pairs for lip synchronisation.}

429 — 2004.14375

\caption{Comparisons of times, in seconds, for each tool when $\ge71\%$ of the total targets for each \texttt{space} commit are reached. Each blue ``\textcolor{blue}{\texttimes}'' represents one pair of times. The diagonal red ``\textcolor{red}{$y = x$}'' line shows where equal execution times would appear. Thus, each \textcolor{blue}{\texttimes} above the red line is a faster result for TOFU\@; each\textcolor{blue}{\texttimes} below the red line is a faster result for that plot's other tool.\label{fig:space}}

\caption{Comparisons of times, in seconds, for each tool when $\ge40\%$ of the total targets for each \texttt{space} commit are reached. Each blue ``\textcolor{blue}{\texttimes}'' represents one pair of times. The diagonal red ``\textcolor{red}{$y = x$}'' line shows where equal execution times would appear. Thus, each \textcolor{blue}{\texttimes} above the red line is a faster result for TOFU\@; each\textcolor{blue}{\texttimes} below the red line is a faster result for that plot's other tool.\label{fig:diff}}

\caption{Time comparisons (in seconds), for each tool when $\ge21\%$ of the uncovered targets for each \texttt{xmllint} commit are reached. Each blue ``\textcolor{blue}{\texttimes}'' represents one pair of times. The diagonal red ``\textcolor{red}{$y = x$}'' line shows where equal execution times would appear. Thus, each \textcolor{blue}{\texttimes} above the red line is a faster result for TOFU\@; each\textcolor{blue}{\texttimes} below the red line is a faster result for that plot's other tool.\label{fig:xmllint}}

430 — 2004.14444

\caption{ Model and human F1 scores on the original \squad v1.1 test set compared to our new test sets. Each point corresponds to a model evaluation, shown with 95\% Student's t-confidence intervals (mostly covered by the point markers). The plots reveal three main phenomena: (i) There is no evidence of adaptive overfitting on \squad, (ii) all of the models suffer F1 drops on the new datasets, with the magnitude of the drop strongly depending on the corpus, and (iii) humans are substantially more robust to natural distribution shifts than the models. The slopes of the linear fits are \wikislope, \nytslope, \redditslope, and \amazonslope, respectively, and the $R^2$ statistics for the linear fits are 0.99, 0.97, 0.9, and 0.89, respectively. This means that every point of F1 improvement on the original dataset translates into roughly 1 point of improvement on our new datasets. }

\caption{Changes in answer type distributions introduced in Section~\ref{sec:analysis} explain little of the observed performance differences across our new datasets. For each model, we compute the F1 score on each of the answer types on the \squad v1.1 dev set, and then we predict the F1 score on the new test set by reweighing these F1 scores based on the frequency of answer types in the new test set. Concretely, if \squad v1.1 was 50\% {\tt NP} answers and 50\% {\tt Places} answers, and a model has average F1 scores of 100 for {\tt NP} and 75 for {\tt Places}, then if a new dataset had 30\% {\tt NP} answers and 70\% {\tt Places} answers, the predicted F1 score would be 82.5 (versus 87.5 for the original). The $y=x$ line represents the trivial model that predicts the same F1 score on the new test sets as the original. For each of the distribution shift datasets, predictions based on answer category shifts are exceedingly optimistic and explain little of the observed drops. For instance, on the Reddit dataset, answer category shifts suggest models would lose, on average, 2-3 F1 points. However, the average observed shift is \redditdrop F1 points. }

\caption{Changes in syntactic distributions introduced in Section~\ref{sec:analysis} explain only a small amount of the observed performance differences across our new datasets. As in the previous plot, for each model, we compute the F1 score for each observed value of syntactic divergence on the \squad v1.1 dev set, and then we predict the F1 score on the new test set by reweighing these F1 scores based on the frequency of examples with a given syntactic divergence in the new test set. For each of the distribution shift datasets, predictions based on answer category shifts are optimistic. For instance, on the Reddit dataset, syntactic divergence shifts suggest models would lose, on average, 1.9 F1 points, while the average observed shift is \redditdrop F1 points. }

431 — 2004.14500

\caption{\label{tab:example_app} Predicted $\hat{p}$(\%) of true label from \textbf{MLE} and \textbf{\method} with corresponding sentences in RTE (top) and Stanford's politeness (bottom) dataset. True label is either entail or not entail for RTE, and polite or impolite for SPolite. We show the cases where two methods predict the label differently. The case with \textcolor{blue}{INCOR} $\rightarrow$ \textcolor{blue}{COR} means only \method predicts the true label correctly, while the case with \textcolor{red}{COR} $\rightarrow$ \textcolor{red}{INCOR} means only MLE predicts the true label correctly. %Top seven examples are the cases only \method predicts correctly, and bottom seven examples only MLE rightly predicts. %We find that $\hat{p}$ near to but under 0.5 in \textbf{MLE} has been increased about from 10 to 20 \% with \textbf{\method}. }

\caption{\label{fig:cal_example} Histogram of predicted probabilities (top) and their calibration histograms (bottom) between \textbf{MLE} (\colorbox{blue!20}{blue-shaded}) and \textbf{\method} (\colorbox{red!20}{red-shaded}) on RTE in GLUE and SPoliteness in xSLUE. The overlap is \colorbox{purple!40}{purple-shaded}. X-axis is the predicted posterior, and Y-axis is its frequencies (top) and empirical posterior probabilities (bottom). The \udensdash{diagonal, linear line} in (c,d) means the expected (or perfectly calibrated) case. We observe that \colorbox{red!20}{\method} alleviate the posterior probabilities with the small predictions toward \udensdash{the expected calibration}. Best viewed in color. %\vspace{-1mm} }

\caption{\label{tab:example} Predicted $\hat{p}$(\%) of true label from \textbf{MLE} and \textbf{\method} with corresponding sentences in RTE and SPolite dataset. True label is either entail or not entail for RTE, and polite or impolite for SPolite. Provided examples are the cases only \method predicts correctly, which correspond to \textcolor{blue}{INCOR} $\rightarrow$ \textcolor{blue}{COR} in table~\ref{tab:anal_error}. }

432 — 2004.14524

\caption{Some \textcolor{darkgrey}{\dotuline{possible paraphrases}} of `the turtle beat a hare' including a \textcolor{darkblue}{\uline{\textbf{sampled path}}} and some of the other \textcolor{darkgreen}{\textbf{\textit{\dashuline{tokens also considered in the training objective}}}}}

433 — 2004.14564

\caption{Sentences generated via beam search (beamwidth 5) for the multilingual model presented in this work vs parabank2. We note that our model tends to produces copies or near copies of the input, which is the desired behavior for our application. Changes are emphasized with \textbf{bold} or \textbf{\st{strikethrough}}. The parabank2 model tends to produce output with lexical/syntactic changes, which occasionally also significantly change the meaning of the sentence (denoted in \textcolor{red}{red}). References (paraphraser inputs) are the first ten sentences of wmt17 zh-en.}

434 — 2004.14602

\caption{Visualization of position bias with BERT trained on \squad\train~(\textcolor{green1}{\textsc{Orig}}), \squad\trainone~(\textcolor{orange}{\textsc{First}}), and BERT without fine-tuning (\textcolor{purple1}{\textsc{Pre}}). See Section See Section~\ref{sec:2_2} for more details.}

\caption{Predictions of \textcolor{red}{\textbf{standard BERT}} and \textcolor{blue}{\textbf{de-biased BERT}} trained on \squad\train. }

\caption{Visualization of each layer of BERT trained on \squad\train~ (\textcolor{green1}{\textsc{Orig}}), \squad\trainone~(\textcolor{orange}{\textsc{First}}), and BERT without fine-tuning (\textcolor{purple1}{\textsc{Pre}}). As the input passes each layer, position bias becomes more problematic.}

\caption{Visualization of each layer of de-biased BERT. BERT trained on \squad\trainone~without any de-biasing methods (\textcolor{red}{\textsc{None}}), with sentence-level prior bias product (\textcolor{purple2}{\textsc{Product}}), with learned-mixin (\textcolor{blue}{\textsc{Mixin}}). \textcolor{blue}{\textsc{Mixin}} preserves consistent information compared with \textcolor{red}{\textsc{None}} and prevents the bias propagation.}

\caption{Visualization of de-biasing models. BERT trained on \squad\trainone~without any de-biasing methods (\textcolor{red}{\textsc{None}}), with sentence-level prior bias product (\textcolor{purple2}{\textsc{Product}}) and with learned-mixin (\textcolor{blue}{\textsc{Mixin}}) are plotted. (a) The amount of preserved word information at the last layer averaged over \squad\valid. (b) Spearman's rank correlation coefficient between the prediction logits and the amount of word information at each layer.}

\caption{Visualization of position bias with BERT trained on \squad\train~ (\textcolor{green1}{\textsc{Orig}}), \squad\trainone~(\textcolor{orange}{\textsc{First}}), and BERT without fine-tuning (\textcolor{purple1}{\textsc{Pre}}). (a) The amount of preserved position information. (b) Spearman's rank correlation coefficient between the start logits and the amount of position information at each layer. The plots follow the similar trend in Figure~\ref{fig:cos_sim_bias}. The amount of position information preserved in \textcolor{orange}{\textsc{First}} is greater than \textcolor{green1}{\textsc{Orig}} before the 30th word. Spearman's rank correlation coefficient value consistently increases as inputs pass each layers.}

\caption{Visualization of de-biased models. BERT trained on \squad\trainone~without any de-biasing methods (\textcolor{red}{\textsc{None}}), with sentence-level prior bias product (\textcolor{purple2}{\textsc{Product}}) and with learned-mixin (\textcolor{blue}{\textsc{Mixin}}) are plotted. (a) The amount of preserved position information. (b) Spearman's rank correlation coefficient between the start logits and the amount of position information at each layer.}

435 — 2004.14620

\caption{Comparison of original Universal Dependencies annotations (\textbf{edges above}) and our modification (\textcolor{blue}{edges below}).}

436 — 2004.14905

\caption{Development set results for WritingPrompts for generated (Gen) or corpus sampled (Cor) alternative continuations; $\alpha$ indicates sentiment weighting. \textbf{Bold:} best model in a given category; {\color[HTML]{9A0000}\textbf{red:}} best model overall.}

\caption{The film \textit{15 Minutes}, \textbf{\textcolor{surpriseentropycolor}{$S^{\text{Hale}}$}}, \textbf{\textcolor{surprisecolor}{$S^{\text{Ely}}$}}, \textbf{\textcolor{suspensecolor}{$U^{\text{Ely}}$}}, \textbf{\textcolor{suspensestatecolor}{$U^{\alpha\text{Ely}}$}},$\medblackdiamond$ theory baseline, {\color{yellow} $\medstar$} TP annotations, triangles are predicted TPs.}

\caption{The film \href{https://www.imdb.com/title/tt0100405/}{Pretty Woman}, \textbf{\textcolor{surpriseentropycolor}{$S^{\text{Hale}}$}}, \textbf{\textcolor{surprisecolor}{$S^{\text{Ely}}$}}, \textbf{\textcolor{suspensecolor}{$U^{\text{Ely}}$}}, \textbf{\textcolor{suspensestatecolor}{$U^{\alpha\text{Ely}}$}}, $\medblackdiamond$ theory baseline, {\color{yellow} $\medstar$} TP annotations, triangles are predicted TPs.}

\caption{\href{https://www.imdb.com/title/tt1010048/}{Slumdog Millionare}, \textbf{\textcolor{surpriseentropycolor}{$S^{\text{Hale}}$}}, \textbf{\textcolor{surprisecolor}{$S^{\text{Ely}}$}}, \textbf{\textcolor{suspensecolor}{$U^{\text{Ely}}$}}, \textbf{\textcolor{suspensestatecolor}{$U^{\alpha\text{Ely}}$}}, $\medblackdiamond$ theory baseline, {\color{yellow} $\medstar$} TP annotations, triangles are predicted TPs.}

437 — 2005.00054

\caption{Model framework of the proposed \model~({\color{red}red} is prior and {\color{blue}blue} is posterior). $\bm x=[x_1,...,x_T]$ is text sequential data, and $\bm s_k=[s_{k,1},...,s_{k,T}]$ is the pseudo-input. The posterior ({\color{blue}blue}) is obtained by \eqref{eq:reparametrization}, and VampPrior ({\color{red}red}) is achieved by \eqref{eq:vamp_prior}. $\nu_{\bm \psi}(\bm x, \bm z)$ is the dual function.}

\caption{Visualization of the hyperbolic latent space of 5,000 randomly sampled sentences from the test set of PTB. The lengths of the samples are color-coded ({\color{red}red} for short ones and {\color{blue}blue} for longer ones). The four listed sentences of different lengths are created by modifying a single test sample, each of which is encoded to the latent space with the corresponding color.}

438 — 2005.00084

\caption{Synonyms added to the topic query to gather initial training documents from ElasticSearch. For combinations of topics and data sources (\reddit and \cc) that are not listed, we only used the topic as search query.}

\caption{Generated arguments with \reddit as data source. Text in bold shows the given control code, text afterwards represents the generated argument. Numbers in brackets after the text shows the quality score as predicted by the argument quality model.}

439 — 2005.00136

\caption{Examples from the two datasets, where {\color{orange}orange} denotes the sentence to be transferred, and {\color{blue}blue} denotes content that also appears in the context. \textbf{C-Seq2Seq}: Contextual Seq2Seq; \textbf{H-Seq2Seq}: Hybrid Seq2Seq.}

440 — 2005.00170

\caption{\small $\sigma_r$--$\varepsilon_r$ phase plane depicting regions of prolate `A', prolate `B', and oblate drop shapes. The drops on the right-hand side depicts the expected circulation for each shape: counterclockwise (in the first quadrant) for the prolate `A', and clockwise for the prolate `B' and oblate shapes. (a) {\it Solubility effect on deformation}: ($\blacksquare$) indicates instability due to solubility, and the colored ($\bigcirc$) represent the relative change in deformation between clean and soluble drops. Larger sized circle point to greater solubility effect on deformation. (b) {\it Solubility effect on flow}: The symbols denote the effects of surfactant solubility for a given $(\sigma_r,\varepsilon_r)$ pair on the flow in and around the drop: (\tikzcircle{2.5pt}) denotes a qualitative change in flow (reversal or stagnation point), and ($\MyDiamond$) represents no change in flow compared to the clean case. The electric capillary number $\text{Ca}_E = 0.25$, and the transfer parameter $J=10$.}

441 — 2005.00187

\caption{Gains (positive, \textcolor{blue}{blue}) and losses (negative, \textcolor{red}{red}) in LSTM LM accuracies on CLAMS after capitalizing the first character of each evaluation example. Differences are relative to the results in Table~\ref{tab:lstm_lower_results}. Results are averaged across five random initializations.}

442 — 2005.00247

\caption{Performance changes of AdapterFusion compared to ST-Adapters and MT-Adapters. Arrows indicate whether there has been an improvement \greenarrowup \, ($>0.3$), decrease \redarrowdown \, ($<-0.3$), or whether the results have stayed the same \orangearrowright \,$[-0.3,0.3]$. }

443 — 2005.00330

\caption{\textcolor{red}{DVLQA Statistics and diversity based on answer choice types and image types (THIS TABLE WILL BE REMOVED)}}

\caption{\textcolor{red}{Performance benchmarks over testset (853 samples) and hard testset (732 samples) of DVLQA task (THIS TABLE WILL BE REMOVED)}}

444 — 2005.00336

\caption[]{Pitch roll and yaw of the quadcopter after a broken propeller \textcolor{blue}{TODO: increase the font and generate the fig again}}

\caption[]{acceleration on 3 axis after a broken propeller \textcolor{blue}{TODO: increase the font and generate the fig again}}

\caption[]{Pulling the quadcopter down from external source \textcolor{blue}{TODO: increase the font and generate the fig again}}

\caption[]{Pulling the quadcopter down from external source \textcolor{blue}{TODO: increase the font and generate the fig again}}

\caption[]{Pulling the quadcopter down from external source \textcolor{blue}{TODO: increase the font and generate the fig again}}

\caption[]{Pulling the quadcopter down from external source \textcolor{blue}{TODO: increase the font and generate the fig again}}

\caption[]{Pulling the quadcopter down from external source \textcolor{blue}{TODO: increase the font and generate the fig again}}

445 — 2005.00512

\caption{An example showing annotations for entity mentions (\colorbox{dataset}{Dataset}, \colorbox{metric}{Metric}, \colorbox{task}{Task}, \colorbox{method}{Method}), coreferences (indicated by arrows), salient entities (bold), and $N$-ary relation (SQuaD, Machine Comprehension, BiDAF (ensemble), EM/F1) that can only be extracted by aggregating information across sections.}

446 — 2005.00525

\caption{PDF of $V_{n+1} - V_{n}$ in 3D with moving average filter in log-lin, this is obviously not a parabola, so this is not a normal distribution. In 1D, I checked and we have analytically and experimentally a normal distribution. {\color{blue} (Kai): if you have I 'd like to see it. Is it a clear parabola (in the numerics)? Perfect, a nice parabola (left). For the right figure, is it a logistic pdf, cf. $https://en.wikipedia.org/wiki/Logistic_distribution$}}

\caption{PDF of the divergence of the random velocity using Voronoi analysis in 3d. Left: Monte--Carlo simulation (red), the ratio of a normal distribution by a Gamma distribution (green). Right: Monte--Carlo simulation of divergence of Voronoi cells shifted by a velocity having a normal law. {\color{blue} Thibault : I think this new figure gives more information. The linear scale shows that for 99\% of the mass (= 15.32) the curves of the model and St=0 are superimposed, but with Logistic/Gamma we have a better result. With the log scale we observe that, for a negligible quantity, the distributions are not the same..} {\color{red} Keigo: The agreement with the logistic distribution is interesting. I think we need some theoretical explanation for that to describe in the paper. This requires further research.} {\color{red} Thibault: The volume change is not a normal distribution. The explanation will be given in the next paper. Maybe. This requires further research.} }

\caption{Randomly distributed particles. Joint PDF of the exact divergence and the Voronoi divergence, the histogram of the Voronoi divergence {\color{red} Kai: could you add the pdf of the exact divergence to the histogram? }, visualization of the voronoi divergence, the difference between the exact divergence and the Voronoi divergence in log-scale {\color{red}(the most of the error of the divergence is on saddle point)}, Pearson coefficient (limit value $pearsonr = 0.936$) and $L^2$ error as a function of the particle number $N_p$.}

\caption{Mean value $\langle \mathcal D_p \rangle_{Vp}$ (left) and skewness (right) of the divergence ${\cal D}_p$ as a function of the Voronoi volume for $St=1$ (top) and $St=0$ (bottom) for three time steps, $\Delta t = 10^{-3}$, $2 \, \Delta t$ and $\Delta t /2$ , and two number of particles $N$ and $N/2$. \\ % {\color{red} Thibault : Left, all curves cross zero at the same time instant, and $\langle {\cal D}_p \rangle_{Vp}^{\Delta t} = 2 \langle {\cal D}_p \rangle_{Vp}^{\Delta t / 2} = \langle {\cal D}_p \rangle_{Vp}^{2 \times \Delta t}/2 $. } % {\color{blue} Kai: For the mean value it is interesting to see that the zero crossing point does not change and that the tendencies are the same. For very large values we find strong oscillations, for very small volume likewise. Is the scaling with $\Delta t$ you write above exact ? If yes, we have to comment on that. Can this be explained directly? Computing the mean value is a linear operation. \\ For the skewness, the first observation is that we have negative values for St=0 for all volumes. That's surprizing for me. For St=1 the results seem robust, the only unpleasant point is the negative peak in the red curve.} % {\color{red} Thibault : Strong oscillations are due to the lack of data at these points as we can see in figure 3. "Is the scaling with $\Delta t$ you write above exact ?" I didn't understand. %kai: il y a une equation qui relie la moyenne de la divergence pour delta t avec celle calcule avec 2 dt et dt/2. Est-ce que cette equation est exacte ou est-ce que c'est une approximation? I myself am puzzled by the result of the skewness. For the peak we have only one extreme value in one bin, but with the average smoothing, it becomes larger. Is it appropriate to remove manually this data ? %kai: maybe not. Peut-etre on ne montre pas figure 11. On en parle demain, je vais mettre une mail toute a l'heure. Il y a un seminaire de Dmitry a 10h a Rostock et apres j'ai du temps. Ca serait bien de concluire. I have verified and the skewness seems to be right. } }

447 — 2005.00547

\caption{Top 5 words associated with each emotion (\colorbox[HTML]{BEECAF}{positive},\colorbox[HTML]{A6CBF7}{negative}, \colorbox[HTML]{FFFC9E}{ambiguous}). The rounded $z$-scored log odds ratios in the parentheses, with the threshold set at 3, indicate significance of association.}

448 — 2005.00559

\caption{ Given a 3D character mesh, RigNet produces an animation skeleton and skin weights tailored to the articulation structure of the input character. From left to right: input examples of test 3D meshes, predicted skeletons for each of them (joints are shown in green and bones in blue), and resulting skin deformations under different skeletal poses. Please see also our supplementary video: \textcolor{blue}{https://youtu.be/J90VETgWIDg} }

449 — 2005.00577

\caption{%{\bf [MAKE CLEAR THIS IS HOW WE CONFIRM OUR CUTOFFS FOR RGB AND RC ARE REASONABLE.]} Observational HR histogram of APOKASC catalog red giant stars (blue symbols), red clump stars (red), and secondary clump stars (orange). Rectangular blue and red lines indicate our choice of cutoffs for RGB and RC evolutionary states respectively. This demonstrates our choice of cutoffs \red{effectively captures RC stars across the width of the entire giant branch, while avoiding contamination from secondary clump stars.} %\textbf{what is stretches across the lower RGB indicating? do you mean capturing the entire width of the giant branch? maybe rephrase}. %JT %\red{how hard is it to make the background white for printing purposes?}} %Giant branch parameter cutoffs plotted against APOKASC temperature and surface gravity distributions. Blue cutoffs filter for lower branch giants and red cutoffs filter for red clump stars. Plot shows cutoffs accurately filter both RGB and RC stars solidifying their use for our sample. {\bf [Don, it's not clear how the previous sentence can be gleaned from the figure... I think we need a few more words to explain what the bumps and tails in the distributions represent so that it's clear why the vertical lines are proving that the selection is working. And are the cutoffs actually cutoffs or are they ranges? In any case, ``Logg" needs to be ``log g", and ``Temp" should be ``Teff". Alternatively, could we not just show this with a single H-R diagram of log g vs Teff? Something like Figure 2 that shows the APOKASC stars in the H-R diagram with our selection ranges?]} }

\caption{Top: Observational HR Diagram of our Field (single) and RVvar/Binary (binary) samples using APOGEE derived $\log g$ and effective temperature. The colored rectangles and data points are our applied filters for RGB (blue) and RC (red) evolutionary states. Triangle points are stars classified in Simbad \red{(110)} and x points are stars not classified in Simbad \red{(41)}. The orange region to the right of the giant branch is labeled as {beneath branch} %forbidden to highlight potential contaminating sources. Bottom: Bar chart of Simbad stellar classification matches to our giants. For quality control we remove classifications outside of star, red giant branch (RGB*) and eclipsing binary (EB*). %{\bf [Don, ``Logg" should be ``log g" and ``Temperature" should be ``Effective Temperature".]} }

\caption{\red{$NUV$ excess boxplot distributions between evolutionary states and subsamples.}{\bf Boxes represent the inter-quartile range, with the median indicated. Bars represent $\pm$1.5 times the inter-quartile range, or the full spread of the data, whichever is smaller.} Outliers are represented as open circles. }

\caption{Top: Cumulative distribution of the \red{133} giants used to calibrate our $NUV$ excess relations. Black, orange and blue colors represent all giants, Field, and RVvar/Binary, respectively; the dots and dashed lines mark the $95^{th}$ percentile for each sample. Bottom: Histogram of $NUV$ excess separated by sample membership in bins of 0.5~mag. }

\caption{$NUV$ excess versus log $v\sin i$ for the RVvar/Binary (top), Field (middle) and combined samples (bottom) %along with $NUV$ excess versus $log v\sin i$ for the (bottom) combined sample. Best fit lines for RGB (blue) and RC (red) evolutionary states were fit, with the black lines depicting the best fit for the combined sample. \red{Each subplot reports the sample name along with the Pearson correlation coefficient and significance. Red edges identify stars in the RC phase.} %{\bf [Don, I think we can do away with the top three panels and just do the log vsini one. However, it would be good to have a panel that just shows the best fit lines for the Field and RVvar samples in NUV excess vs log vsini, as you do in Figure 13. Also, remove the ``"log" in front of ``km/s".]} }

\caption{$NUV$ excess versus $\log(P_{\rm rot}/\sin i)$ (top) and Rossby number (bottom), for giants in our sample with calculated stellar radii. \red{Solid lines represent a fit at a fixed saturation period, and dashed lines represent a fit at a fixed Rossby number threshold, based on the corresponding M-dwarf saturation threshold and estimated convective overturn timescale. Black symbol edges are for RGB, red edges are for RC and diamonds indicate binary sample membership.}}

\caption{$NUV$ excess versus $v_{rot}$ (top), $\log P_{\rm rot}$ (middle) and rossby number (bottom) for 18 M~dwarfs. \red{Gold} lines are the fit relations for the M~dwarfs. \red{For comparison, black solid and dashed lines represent the fits to the giants from Figure~\ref{fig:nuv_period}.} %The NUV excess versus $v_{rot}$ linear fit for M~dwarfs takes the form of $y=(-0.095 \pm 0.024)x + (0.595 \mp 0.213)$. Points with blue edges represent stars with $P_{\rm rot} < 10$~d. %{\bf [Don, remove the ``log" in front of ``km/s".]} }

\caption{$NUV$ excess versus $\log(P_{\rm rot}/\sin i)$, as in top panel of Figure~\ref{fig:nuv_period}, but now also including a third linear fit to the putative supersaturation regime ($\log P_{\rm rot}<0.6$~d). \red{Black symbol edges are for RGB, red edges are for RC and diamonds indicate Binary sample membership.} %{\bf [Don, I think you can remove the vertical line and the ``Super Saturation Line" annotation.]} }

\caption{Fraction of rotational breakup versus $\log (P_{\rm rot}/\sin i)$ for all giants with calculated $P_{\rm rot}/\sin i$ values. Vertical dashed line marks threshold for supersaturation. \red{Black symbol edges are for RGB, red edges are for RC and diamonds indicate Binary sample membership.} %{\bf [Don, I think you can remove the ``Super Saturation Line" annotation; it would be better to write ``Super-saturation region" within the space to the left of the line.]} }

\caption{Giant/Black Hole binary system plotted in $NUV$ excess versus $\log v\sin i$ along with our \red{fitted rotational NUV excess relation}. Plot shows the ultraviolet emission from the system is consistent with expectations from the giant. %{\bf [Don, remove the ``log" before ``km/s".]} }

450 — 2005.00619

\caption{Since both images and natural text are deeply connected to the physical world, it is not unreasonable that \textit{some} structural similarities emerge from independent visual and linguistic representations. We illustrate this intuition using t-SNE projections \cite{maaten2008visualizing} from representations from a vision (Faster R-CNN) and a textual model (BERT), extracted from 10 object categories in images and captions from MS-COCO. As shown, some similarities can be found in both projections, for instance in \textcolor{red}{fruits} or \textcolor{blue}{round objects}. In this work, we quantify this intuition through probing text models for common ground with visual representations. }

451 — 2005.00636

\caption{Data splitting strategies. Each ball corresponds to a sentence represented in (two-dimensional) feature space. \textcolor{myblue}{Blue} (dark)/\textcolor{myorange}{orange} (bright) balls represent examples for \textcolor{myblue}{training}/\textcolor{myorange}{test}. Numbers represent sentence length. Heuristic splits can, e.g., be based on sentence length; adversarial splits maximize divergence.}

452 — 2005.00661

\caption{Hallucinations in extreme document summarization: the abbreviated article, its gold summary and the abstractive model generated summaries (\ptgen, \citeauthor{see-acl17} \citeyear{see-acl17}; \tconv, \citeauthor{narayan18:xsum} \citeyear{narayan18:xsum}; % \gptzero, \citeauthor{gpt2} \citeyear{gpt2}; and, \gpttune, \tencdec and \bencdec, \citeauthor{berts2s} \citeyear{berts2s}) for a news article from the extreme summarization dataset \cite{narayan18:xsum}. The dataset and the abstractive models are described in Section~\ref{sec:xsum} and~\ref{sec:abssys}. We also present the [\textsc{rouge-1}, \textsc{rouge-2}, \textsc{rouge-l}] F$_1$ scores relative to the reference gold summary. Words in \textcolor{orangered}{red} correspond to hallucinated information whilst words in \textcolor{midnightblue}{blue} correspond to faithful information.}

\caption{Sample of question-answer pairs generated from hallucinated summaries that are correctly answered by their source articles. Highlighted \textcolor{orangered}{spans} in the summaries are marked as extrinsic or intrinsic hallucinations by our annotators.}

453 — 2005.00662

\caption{\baselineskip=10pt Estimation results for the final epidemic size for 40 countries. Grey dots (\textcolor{gray}{$\bullet$}) represent the cumulative numbers of infected cases for 40 countries on May 14th; red dots (\textcolor{red}{$\bullet$}) and horizontal bars (\textcolor{red}{$-$}) represent the posterior means and $95\%$ credible intervals for the $\theta_1$ of the 40 countries. Vertical red dotted line indicates the $1,760,569$ cases, the posterior mean for the US.}

454 — 2005.00700

\caption{ Four formats (color-coded throughout the paper) commonly used for posing questions and answering them: \bluetext{Extractive (EX)}, \redtext{Abstractive (AB)}, \purpletext{Multiple-Choice (MC)}, and \greentext{Yes/No (YN)}. Sample dataset names are shown in square brackets. We study generalization and transfer across these formats. }

\caption{ The results of a leave-one-out ablation. The first row indicates the performance of \unifiedQA on each dataset it was trained on. The rest of the rows exclude one dataset at a time. The rows are sorted based the last column: the dataset with biggest contribution appear first. The \redtext{red highlights} indicate the top 3 performance drops at each column. }

455 — 2005.00728

\caption{Gameplay results on CVDN evaluated when agent voluntarily stops or at 80 steps. Full evaluations are highlighted in \colorbox{gray!20}{gray} with the best results in \Be{blue}, remaining white columns are ablation results. %\JTi{Note full eval versus ablation demarcations.} }

456 — 2005.00789

\caption{Model scores of the XLNet-Base (Single-fact) model on original vs.\transformed datasets. The single-fact model does reasonably well on the original dataset, but its performance on our transformed dataset clearly highlights its poor multi-fact nature.}

\caption{Non-multifact Reasoning: Models can find the answer and the necessary supporting facts without using any meaningful synthesis of information of the supporting facts -- i.e. without any interaction between facts. E.g. A \nmf model could identify the blue supporting fact (\bluefact) since it is the only fact mentioning cold war. \textit{Independently}, it could find the correct answer by selecting the only country getting independence with associated time (India) and hence find the red supporting fact (\redfact).}

\caption{Transformation of a question for contrastive support sufficiency evaluation. Top-Left: An original instance, with annotation on the right denoting red (\redfact) and (\bluefact) supporting facts. Bottom-Left: Its transformation into a group of 3 instances, one with sufficient context and two with insufficient context, with annotation on the right denoting context sufficiency. Right: Behavior of good vs bad models on original vs transformed dataset. A good \mf model would realize that the potentially relevant facts are not sufficient (do not connect) whereas a bad model would find potentially relevant facts and assume that there is sufficient information.}

457 — 2005.00922

\caption{\textbf{Illustration of Optimization.} {\color[rgb]{0,0,1}Blue}: Stereo depth observations $\mathcal{X}$ of a tracked vehicle observed at different time steps $t$. {\color[rgb]{1,0,0}Red}: Motion potentials enforcing consistent poses $\pose$ between successive frames. {\color[rgb]{0,0.6,0}Green}: Shape potentials ensuring a constant shape $\shape$ of the tracked object along the track.}

458 — 2005.00983

\caption{Comparative detection performance in terms of mean average precision (mAP) and F1-score of the proposed network and existing state-of-the-art approaches. \textcolor{red}{Red} bold indicates the optimal performance using actual HR imagery and \textcolor{blue}{blue} bold indicates the second optimal performance using SR images generated by our proposed network.}

\caption{Super-resolution results of our proposed model using different hyperparameter settings for upscale factor 4x on the aerial test datasets. \textcolor{magenta}{Magenta} bold indicates the optimal SR results generated by our proposed network.}

\caption{Vehicle detection results in terms of mean average precision (mAP) and F1-score of our proposed model using different hyperparameter settings on the aerial test datasets. \textcolor{cyan}{Cyan} bold indicates the second optimal performance using SR images generated by our proposed network.}

459 — 2005.00994

\caption{Schematic of GRAPES-3 air shower array shows placement of single-PMT scintillators (\textcolor[rgb]{0,0,1}{$\blacktriangle$}), double-PMT scintillators (\textcolor[rgb]{1,0,0}{$\blacktriangle$}), and muon telescope modules (\textcolor[rgb]{0,0.6,0}{$\square$}). The area marked by dotted lines represents fiducial area under which the showers are selected for analysis.}

\caption{A sample EAS (Event time: 20140829-00:11:55.3700155\,IST) recorded at GRAPES-3 with 365 triggered detectors showing (a) TOF of particles in the shower disc, and (b) lateral density profile of secondary particles. The shower parameters are estimated to be$\theta$=37.3$^\circ$, $\phi$=61.3$^\circ$, $X_c$=22.3\,m,$Y_c$=5.5\,m,$N_e$=1.2$\times$10$^6$, and $s$=1.4.}

460 — 2005.01016

\caption{Top-level view of Lupulus, with details of a single PE in the top-right corner. The inputs, weights, and outputs are stored stored in the input buffers (\tikzcircle[vlightblue, fill=vlightblue]{3pt}), in the SPMs of the PEs (\tikzcircle[vlightgreen, fill=vlightgreen]{3pt}), and in the accumulators (\tikzcircle[vlightyellow, fill=vlightyellow]{3pt}), respectively.}

461 — 2005.01220

\caption{\baselineskip 20.5pt %\footnotesize { { Cartoon of a close binary of supermassive black holes maintaining one circumbinary disc (CBD) composed of the prograde ($R\lesssim 1.5\,$pc) and the retrograde ($3\,{\rm pc\lesssim {\it R}\lesssim 7\,pc}$) parts in NGC\,1068 observed by ALMA\citep{Impellizzeri2019} and MIDI/VLTI \citep{Raban2009}.}} {\cblue The heights measured by MIDI are $H_{0}/R\sim 0.3$ (inner part) and $\sim 0.6$ (outer part), respectively (see details in the text). $R_{\rm CBD}\sim 0.24\,$pc} is the inner edge of the CBD measured by GRAVITY \citep{Gravity2020}, which is consistent with dust sublimation radius. The prograde part is the NIR emission regions where water maser clouds (green circles) are co-spaced. The retrograde part is the FIR emission regions. The red color represents the dusty molecular disc (HCN, HCO$^+$ and CO) radiating in infrared. The interface regions in purple between the prograde and the retrograde parts are undergoing the Kelvin-Helmholtz instability (KHI) to form shocks and turbulences, driving formation of a gap with a width of $\deltaR\approx 0.82\calM H_{0}$. The KHI dissipates the kinetic energy of the counter rotating disc giving rise to bremsstrahlung emissions observable in radio with morphology tightly related to the disc shapes. Moreover, shocks in the KHI layer accelerate some electrons to be relativistic to radiate $\gamma$-ray emissions observed by {\it Fermi}-LAT. }

462 — 2005.01308

\caption{ CPU time to obtain four submatrices of the Green's function matrix for B-doped (001)~Si bulk models. The time ratio is evaluated as [calculating time + combining time]/[calculating time for $P=1$]. The calculations were performed on Intel\textregistered~Xeon\textregistered~CPU E5-2680. }

463 — 2005.01322

\caption{We propose that the next generation of voice assistants should be capable of proactive-initiated interactions. We discuss design principles and validate them using a (a) new hardware platform equipped with (b) multi-modal semantic scene understanding and (c) decision making modules. Using the proposed design assumptions, hardware and software, we demonstrate how users can benefit from transforming (d) reactive devices which typically \textcolor{orange}{only respond requests} \textcolor{blue}{initiated by the users}, to the (e) \textcolor{green}{proactive ones} offering the user the \emph{right information} in the \emph{right way} at the \emph{right time} without being asked.}

464 — 2005.01348

\caption{An example pair from the Winograd Schema Challange (a) and its perturbation (b). The \textcolor{red}{pronoun} resolves to one of the two referents, depending on the choice of the \underline{discriminatory segment}. The perturbation in (b) pluralizes the referents and the antecedents.}

465 — 2005.01424

\caption{\small $L^2$ errors of FEM ({\protect \tikz{ \draw[line width=1.pt, blue] circle (0.6ex);}}) and LOD ({\protect \tikz{ \draw[line width=1.pt, red] (0,0) rectangle (0.18,0.18);}})}

\caption[]{Left: Diffusion coefficient in Example~1. Right: Values of $\mathcal{J}_H$ in the first 20 iterations of the inversion algorithm, using sparsity patterns based on local matrices ({\protect \tikz{ \draw node[thick, cross, blue, fill]{};}}, dotted) and quasi-local matrices with $\ell = 1$ ({\protect \tikz{ \draw[line width=1.pt, color1] (0,0) rectangle (0.18,0.18);}}), $\ell = 2$ ({\protect \tikz{ \draw node[thick, cross, rotate=45, color2, fill]{};}}), $\ell = 3$ ({\protect \tikz{ \draw[line width=1.pt, color3] circle (0.6ex);}}).}

\caption[]{Cross sections of reconstructed functions with boundary condition $u^0(x_1,x_2) = x_1$ based on local stiffness matrices ({\protect \tikz{ \draw[line width=1.pt, blue, fill] circle (0.6ex);}}, dotted) and quasi-local ones with $\ell = 1$ ({\protect \tikz{ \draw[line width=1.pt, color1, fill] circle (0.6ex);}}), $\ell = 2$ ({\protect \tikz{ \draw[line width=1.pt, color2, fill] circle (0.6ex);}}), $\ell = 3$ ({\protect \tikz{ \draw[line width=1.pt, color3, fill] circle (0.6ex);}}) for Example~1 obtained from full boundary data. The corresponding microscopic FE function ({\protect \tikz{ \draw[line width=1.pt, black, fill] circle (0.6ex);}}, dashed) is depicted as a reference. Left: Cross section at $x_2 = 0.5$. Right: Cross section at $x_1 = 0.5$.}

\caption[]{Cross sections of reconstructed functions with random boundary condition $u^0\in\XH$ based on local stiffness matrices ({\protect \tikz{ \draw[line width=1.pt, blue, fill] circle (0.6ex);}}, dotted) and quasi-local ones with $\ell = 1$ ({\protect \tikz{ \draw[line width=1.pt, color1, fill] circle (0.6ex);}}), $\ell = 2$ ({\protect \tikz{ \draw[line width=1.pt, color2, fill] circle (0.6ex);}}), $\ell = 3$ ({\protect \tikz{ \draw[line width=1.pt, color3, fill] circle (0.6ex);}}) for Example~1 obtained from full boundary data. The corresponding microscopic FE function ({\protect \tikz{ \draw[line width=1.pt, black, fill] circle (0.6ex);}}, dashed) is depicted as reference. Left: Cross section at $x_2 = 0.5$. Right: Cross section at $x_1 = 0.5$.}

\caption[]{\small Cross sections at $x_2 = 0.5$ of reconstructed functions with homogeneous Dirichlet boundary conditions based on local stiffness matrices ({\protect \tikz{ \draw[line width=1.pt, blue, fill] circle (0.6ex);}}, dotted) and quasi-local ones with $\ell = 1$ ({\protect \tikz{ \draw[line width=1.pt, color1, fill] circle (0.6ex);}}), $\ell = 2$ ({\protect \tikz{ \draw[line width=1.pt, color2, fill] circle (0.6ex);}}), $\ell = 3$ ({\protect \tikz{ \draw[line width=1.pt, color3, fill] circle (0.6ex);}}). The corresponding microscopic FE functions ({\protect \tikz{ \draw[line width=1.pt, black, fill] circle (0.6ex);}}, dashed) are given as a reference but were not part of the input data. Left: Right-hand side $g_1$. Right: Right-hand side $g_2$.}

\caption[]{Left: Diffusion coefficient in Example~2. Right: Values of $\mathcal{J}_H$ in the first 20 iterations of the inversion algorithm based on local matrices ({\protect \tikz{ \draw node[thick, cross, blue, fill]{};}}, dotted) and quasi-local matrices with $\ell = 1$ ({\protect \tikz{ \draw[line width=1.pt, color1] (0,0) rectangle (0.18,0.18);}}), $\ell = 2$ ({\protect \tikz{ \draw node[thick, cross, rotate=45, color2, fill]{};}}), $\ell = 3$ ({\protect \tikz{ \draw[line width=1.pt, color3] circle (0.6ex);}}).}

\caption[]{Cross sections of reconstructed functions with boundary condition $u^0(x_1,x_2) = x_1$ based on local stiffness matrices ({\protect \tikz{ \draw[line width=1.5pt, blue, fill] circle (0.6ex);}}, dotted) and quasi-local ones with $\ell = 1$ ({\protect \tikz{ \draw[line width=1.5pt, color1, fill] circle (0.6ex);}}), $\ell = 2$ ({\protect \tikz{ \draw[line width=1.pt, color2, fill] circle (0.6ex);}}), $\ell = 3$ ({\protect \tikz{ \draw[line width=1.pt, color3, fill] circle (0.6ex);}}) for Example~2 obtained from incomplete boundary data and the \textit{randomized approach}. The corresponding microscopic FE function ({\protect \tikz{ \draw[line width=1.pt, black, fill] circle (0.6ex);}}, dashed) is depicted as a reference but was not part of the input data. Left: Cross section at $x_2 = 0.5$. Right: Cross section at $x_1 = 0.5$.}

\caption[]{Cross sections of reconstructed functions with boundary condition $u^0(x_1,x_2) = x_1$ based on local stiffness matrices ({\protect \tikz{ \draw[line width=1.pt, blue, fill] circle (0.6ex);}}, dotted) and quasi-local ones with $\ell = 1$ ({\protect \tikz{ \draw[line width=1.pt, color1, fill] circle (0.6ex);}}), $\ell = 2$ ({\protect \tikz{ \draw[line width=1.pt, color2, fill] circle (0.6ex);}}), $\ell = 3$ ({\protect \tikz{ \draw[line width=1.pt, color3, fill] circle (0.6ex);}}) for Example~2 obtained from incomplete boundary data and the full-data approach. The corresponding microscopic FE function ({\protect \tikz{ \draw[line width=1.pt, black, fill] circle (0.6ex);}}, dashed) is depicted as a reference but was not part of the input data. Left: Cross section at $x_2 = 0.5$. Right: Cross section at $x_1 = 0.5$.}

466 — 2005.01451

\caption{(Color online) (a) Local, \emph{$\vartriangle$R}$^{L}_{14,65}$(\emph{V}$^{eff}_{g}$), and (b)\,nonlocal,\emph{$\vartriangle$R}$^{nL}_{62,53}$(\emph{V}$^{eff}_{g}$), photoresistances versus the effective gate voltage in comparison with the heating addition to the resistance \emph{$\vartriangle$R}$^\emph{$\vartriangle$T}$(\emph{V}$^{eff}_{g}$). The inset illustrates (\emph{1}) direct transitions and (\emph{2}) transitions caused by Drude absorption.}

\caption{ (Color online) (a) Local, \emph{R}$^{L}_{14,65}$(\emph{V}$^{eff}_{g}$), and nonlocal, \emph{R}$^{nL}_{62,53}$(\emph{V}$^{eff}_{g}$), resistances versus the effective gate voltage. (b)\,Local,\emph{$\vartriangle$R}$^{L}_{14,65}$(\emph{V}$^{eff}_{g}$), and nonlocal, \emph{$\vartriangle$R}$^{nL}_{62,53}$(\emph{V}$^{eff}_{g}$), photoresistances versus the effective gate voltage in comparison with the heating addition to the nonlocal resistance \emph{$\vartriangle$R}$^\emph{$\vartriangle$T}_{62,53}$(\emph{V}$^{eff}_{g}$). (c, d)\,Current distributions in the sample with bulk leakage at measurements in local and nonlocal geometries, respectively. The current lines through the bulk are given in black, whereas the current of edge states is shown in red. We note that the current in the absence of bulk leakage flows only through edge states.}

467 — 2005.01624

\caption{$f$ chooses through \textcolor{red}{$F$} and \textcolor{red}{$F$} tightens \textcolor{blue}{$G$}, i.e. $\textcolor{red}{F} \prec \textcolor{blue}{G}$.\newline Thus $f$ also chooses through \textcolor{blue}{$G$}.}

468 — 2005.01840

\caption{Two generated summaries. Extracted segments are highlighted in \textcolor{teal}{teal}, and delineated with \textbar. Constituents are presented with context, whereas sentences extract all text.}

\caption{System generated summary, extracted constituents in \textcolor{teal}{teal}, and separated by \textbar. }

469 — 2005.01842

\caption{\texttt{JetVLAD} classification performance in purity and rejection for different jet $p_{T}$ ranges for the cross-section weighted ({\color{red} balanced}) datasets with two working points based on efficiencies of $81\%$ and $50\%$, respectively.}

470 — 2005.01932

\caption{Results on relation extraction datasets. For \spouse and \disease, we report 95\% confidence intervals and for \tacred, we follow the evaluation protocol from \citet{zhang2017tacred}. More details in Appendix \ref{sec:appendix}.}

\caption{$\exbert$ matches the performance of the $\noexp$ baseline with 20x less data on \spouse (Left), and with 3x less data on \tacred (Right).}

471 — 2005.02016

\caption{\label{Fig:PCL} Pipe centerline velocities minus $\kappa^{-1}\ln(\Reytau)+C$ versus $\Reytau$ for (a) $\kappa=.42$ , $C=6.84$ and (b) $\kappa=.436$ , $C=7.65$. $\bullet$, Superpipe data corrected according to McKeon; $\circ$, same data without roughness correction; \textcolor{green}{$\circ$}, Superpipe data of \citet{ZS97} with same roughness correction; $\times$, Superpipe NSTAP data of \citet{Hultetal12}; $\blacklozenge$, \citet{PA77}; $\blacktriangle$, \citet{zanoun2007}; $\blacksquare$, \citet{Monty_thesis}; \textcolor{blue}{$\triangle \triangle \triangle$}, CICLoPE data of \citet{FioriniPhD}; \textcolor{blue}{$\blacktriangle \blacktriangle \blacktriangle$}, new CICLoPE data of \citet{NagibAPS,NagibCICLoPE}; \textcolor{Yellow}{$\blacksquare \blacksquare \blacksquare$}, fig. 6 of \citet{Furuichi18}; \textcolor{red}{$\blacksquare$}, the three DNS of \citet{ElKhoury2013} ($\Reytau=999$), \citet{WuMoin08} ($\Reytau=1142$) and \citet{ChinPipe14} ($\Reytau=2003$). \textcolor{red}{$\cdot - \cdot$}, $\pm 0.5\%$ of reference $\hat{U}^+_{\mathrm{CL}}$; - - -, $\pm 10^3/\Reytau$; $\cdot\cdot\cdot$, slope corresponding to $\kappa =0.40$.}

\caption{(color online)\Model pipe flow profiles$\hat{U}^+ - \hat{U}^+_{\mathrm{CP}}$ from \citet{Monk17} with $\hat{U}^+_{\mathrm{CP}} = (1/0.42) \ln(y^+) + 5.604$. \textcolor{Red}{---}, $\Reytau = 1, 2, 5, 100, 300, 1000 \times 10^3$. \textcolor{LimeGreen}{---, $\blacklozenge \blacklozenge$}, \textcolor{SkyBlue}{---, $\blacktriangle \blacktriangle$}, \textcolor{Violet}{---, $\blacksquare \blacksquare$}, model profiles and hotwire data of \citet{FioriniPhD} for $\Reytau = 14.3, 22.2, 31.0 \times 10^3$. $- \cdot -$ , $\hat{U}^+_{\mathrm{CL}} - \hat{U}^+_{\mathrm{CP}} = 1.24$; - - - , asymptote $[(1/0.384)-(1/0.42)]\,\ln(y^+/500)$ for the deviation of the inner logarithmic part of the profile from $\hat{U}^+_{\mathrm{CP}}$. }

\caption{(color online)\(a) Higher order term $U^+_{\mathrm{out}, 1}(Y)$ of the outer expansion for the optimal $\kappa = 0.42$, obtained with \textbf{pairs} of DNS from table \ref{TableDNS} : ---, (\#1,\#3); - - -, (\#1,\#4);$-\cdot -$, (\#1,\#2);$-\cdot\cdot -$, (\#2,\#3);\textcolor{Magenta}{$\bullet \bullet \bullet$}, fit by equ. (\ref{Uout1}); Gray : $U^+_{\mathrm{out}, 1}(Y)$ with same profile pairs, but $\kappa = 0.41$. \quad (b) Derivative $\dd U^+_{\mathrm{out}, 1}(Y)/\dd Y$ obtained from the same DNS pairs as in (a); \textcolor{Magenta}{$\bullet \bullet \bullet$}, derivative of equ. (\ref{Uout1}). }

\caption{(color online)\(a) Higher order term $U^+_{\mathrm{out}, 1}(Y)$ of the outer expansion obtained from three DNS : ---, (\#1,\#2,\#3);\textcolor{Green}{- - -}, (\#1,\#2,\#4);\textcolor{Gray}{$-\cdot -$}, (\#2,\#3,$\Reytau=3000$ of \citealt{TMG14}); \textcolor{Gray}{$-\cdot\cdot -$}, (\#2,\#3,$\Reytau=4179$ of \citealt{LJ14}); \textcolor{Magenta}{$\bullet \bullet \bullet$}, fit by equ. (\ref{Uout1}). \quad (b) $\kappa$ from the same triplets as in fig. (a); \textcolor{Magenta}{$\bullet \bullet \bullet$}, $\kappa = 0.42$. }

\caption{(color online)\Various channel/duct centerline velocities minus$U^+_{\mathrm{CL}}$ (equ. \ref{UCL}) versus $\Reytau$. \textcolor{Purple}{$\blacksquare$} \textcolor{SkyBlue}{$\blacksquare$} \textcolor{Green}{$\blacksquare$} \textcolor{SpringGreen}{$\blacksquare$}, DNS of table \ref{TableDNS}; \textcolor{Gray}{$\blacklozenge$}, other DNS used in \citet{Monk17}; $\times$, \citet{SchultzFlack2013}; $+$, \citet{ZDN03}. \textcolor{red}{$\cdot - \cdot$}, $\pm 0.2\%$ of $U^+_{\mathrm{CL}}$. $\cdot\cdot\cdot$, slope corresponding to $\kappa =0.396$.}

\caption{(color online)\$W_0(Y)$ obtained from equ. (\ref{Uouttot}), with $U^+_{\mathrm{out}}(Y)$ approximated by the four DNS profiles of table \ref{TableDNS} (colors as in the table). \textcolor{Magenta}{$\bullet \bullet \bullet$}, fit by equ. (\ref{W0def}); \textcolor{Magenta}{$\cdots$}, leading term $-4.87\,(1-Y)^2$ of Taylor expansion around $Y=1$. }

\caption{(color online)\(a) The effect of subtracting the fit $\hat{U}^+_{\mathrm{out}, 1}$ (equ. \ref{Uout1}) from the four DNS profiles of table \ref{TableDNS}. $\quad$ (b) solid lines, $Y\,\dd U^+_{\mathrm{DNS}}/\dd Y$ versus $y^+$ for the three highest $\Reytau$ ; $\bullet \bullet \bullet$, solid lines minus $Y$ times derivative of $\hat{U}^+_{\mathrm{out}, 1}$ (equ. \ref{Uout1}) ; \textcolor{Magenta}{---} , $Y$ times derivative of leading order outer velocity (equ. \ref{Uouttot}); \textcolor{Magenta}{- - -}, small-$Y$ contribution $(1/0.42)+1.47\,Y$ to $Y\,\dd U^+/\dd Y$ from equ. (\ref{Uoutexp}); \textcolor{Magenta}{$\cdots$}, the apparent plateau (1/0.384) in fig. 3a of \citet{LM14}; $-\cdot - \cdot -$, $Y$ times derivative of logarithm in equ.(\ref{Uouttot}). }

\caption{(color online)\First order term$U^+_{\mathrm{in}, 1}(y^+)$ , obtained from differences (\ref{diffin}) of $U^+$-profiles in table \ref{TableDNS}. Profile pairs and line styles as in figure \ref{figUout}a (Green lines up to $Y=0.25$ for the lower $\Reytau$ of the pair, gray lines beyond). \textcolor{Magenta}{$\bullet \bullet \bullet$} , complete fit $\hat{U}^+_{\mathrm{in,1}}$ by equ. \ref{Uin1}; \textcolor{Magenta}{- - -}, linear function $1.47\,y^+ - 340$ matching the linear part of $\hat{U}^+_{\mathrm{out}}(Y \ll 1)$ (equ. \ref{Uoutexp}). Insert : blowup of the origin with \textcolor{Violet}{$\cdots$} , $- (1/2) (y^+)^2$.}

\caption{(color online) Left axis and solid lines : leading order inner velocity $U^+_{\mathrm{in, 0}}$ minus $U^+_{\mathrm{cp, 0}}(y^+)$ obtained from equ. (\ref{Uin0}) and (\ref{Ucpyp}) for the 4 profiles of table \ref{TableDNS}. Right axis and broken lines : leading order inner velocity minus common part equal to $U^+_{\mathrm{DNS}} - \hat{U}^+_{\mathrm{out,0}}$, determined without the $\mathcal{O}(\Reytau^{-1})$ terms in equ. (\ref{Uin0}). \textcolor{LimeGreen}{$\cdots$}, improved Musker profile $\hat{U}^+_{\mathrm{mM}}$ (equ. \ref{mMusker}) without ``hump'', for $\kappa_{\mathrm{M}}=0.398$ and $B_{\mathrm{M}}=4.784$, minus $\hat{U}^+_{\mathrm{cp},0}(y^+)$ (equ. \ref{Ucpyp}); \textcolor{Magenta}{$\cdots$}, $\hat{U}^+_{\mathrm{mM}}-\hat{U}^+_{\mathrm{cp},0}(y^+) - \hat{\Delta}_{\mathrm{log,Ch}}$, including the change in logarithmic slope (equ. \ref{dellog}), and \textcolor{Magenta}{$\bigcirc$}, the breakpoint at $y^+_{\mathrm{break}}=624$.}

\caption{(color online) (a) $U^+_{\mathrm{in, 0}} - \hat{U}^+_{\mathrm{mM}} + \hat{\Delta}_{\mathrm{log,Ch}}$ (equ. \ref{mMusker} and \ref{dellog}) for the 4 profiles of table \ref{TableDNS} (same color scheme); \textcolor{Magenta}{$\bullet \bullet \bullet$}, fit by equ. (\ref{Uin1}) (b) DNS profiles $U^+_{\mathrm{DNS}}$ minus complete composite fit $\hat{U}^+_{\mathrm{comp}}$ up to and including $\mathcal{O}(\Reytau^{-1})$ terms. }

\caption{(color online)\Various Couette centerline velocities minus$\hat{U}^+_{\mathrm{CL}}$ (equ. \ref{CUCL}) versus $\Reytau$ from different DNS. $\circ$, \citet{Tsukahara2006}; \textcolor{Red}{$\blacksquare$}, \citet{lee_moser_2018}; \textcolor{Orange}{$\blacktriangle$}, \citet{avsarkisov_etal_2014}; \textcolor{Green}{$\blacklozenge$}, \citet{pirozzoli2014}; \textcolor{Blue}{$\bullet$}, \citet{krah2018}. - - -, $(1/0.384)\ln(\Reytau)+3.75$ minus equ. (\ref{CUCL}); $\cdot\cdot\cdot$, $(1/0.481)\ln(\Reytau)+7.01$ minus equ. (\ref{CUCL}).}

\caption{(color online)\, (a)\textcolor{Green}{\textbf{---}}, difference between the $U^+$-profile of \citet{krah2018} for $\Reytau =1026$ and $\hat{U}^+_{\mathrm{CP}}$ (equ. \ref{CUCP}); \textcolor{Magenta}{$\bullet\bullet\bullet$}, outer fit $\hat{U}^+_{\mathrm{out}}(Y)$ (equ. \ref{CUout}) minus $\hat{U}^+_{\mathrm{CP}}$ (equ. \ref{CUCP}). \quad (b) \textcolor{Green}{\textbf{---}}, $U^+_{\mathrm{DNS}}(y^+)$ minus the outer fit $\hat{U}^+_{\mathrm{out}}$ (equ. \ref{CUout}); \textcolor{Aquamarine}{\textbf{- - -}}, $\hat{U}^+_{\mathrm{mM}}(y^+; 0.367, 3.30) + \hat{H}_{\mathrm{NC}}(y^+; 0.38, 1, 34)$ (equs. \ref{mMusker} and \ref{Hump}); \textcolor{Magenta}{$\bullet\bullet\bullet$}, $\hat{U}^+_{\mathrm{in}} - \hat{U}^+_{\mathrm{cp}}$ (equs. \ref{fitUin02} and \ref{CUCP}); - - -, asymptote of $\hat{U}^+_{\mathrm{mM}}(y^+; 0.40, 4.64)$. \newline (c) \textcolor{Green}{\textbf{---}}, $U^+_{\mathrm{DNS}}(y^+)$ minus the composite fit $\hat{U}^+_{\mathrm{in}} + \hat{U}^+_{\mathrm{out}} - \hat{U}^+_{\mathrm{cp}}$.}

\caption{\label{Fig:PipeW0} Pipe analogue to figure \ref{figUoutW} with $W_0$ of equ. (\ref{W0pipe}), for the three pipe DNS of figure \ref{Fig:PCL}: \textcolor{Lavender}{---}, $\Reytau=999$, \textcolor{Aquamarine}{---}, $\Reytau=1142$ and \textcolor{Green}{---}, $\Reytau=2003$. ---, - - -, $- \cdot -$, corresponding tentative linear fits with slopes 4.0, 4.9 and 3.3, respectively.}

\caption{(color online) $\hat{U}^+_{\mathrm{M}} - [\kappa_\mathrm{M}^{-1}\ln(y^+)+B_{\mathrm{M}}]$ for $\kappa_\mathrm{M}=0.396$ and $B_{\mathrm{M}}=4.717$ ($S=905.86$), with (\textcolor{red}{---}) and without (- - -) subtracting the corrective term in (\ref{mMusker}) (\textcolor{red}{$\cdot \cdot \cdot$}) from $\hat{U}^+_{\mathrm{M}}$. $-\cdot -\cdot$, asymptotic approach of uncorrected $\hat{U}^+_{\mathrm{M}}$ to the log-law.}

472 — 2005.02082

\caption{Illustration for Theorem~\ref{thm:level}. (a) A proper level drawing $\Gamma$ of a graph $G$; the dashed lines are the levels of $\ell$. (b) A \drawing obtained from~$\Gamma$.}

\caption{Illustration for Theorem~\ref{thm:outerplanar}. (a) Combining $\Gamma_{u_1}$, $\Gamma_{u_2}$,$\Gamma_{u_3}$ such that P.\ref{prp:3} is satisfied. (b) A \drawing of the graph $G$ in Fig.~\ref{fi:outerplanar-1} computed by applying the described algorithm.\label{fi:outerplanar-2}}

473 — 2005.02084

\caption{Phase diagram in the (\textit{a}) $Ra-\Gamma$ parameter space for $Pr=10$ and in the (\textit{b}) $Pr-\Gamma$ parameter space for $Ra=10^8$. Black circles (\large$\bullet$) correspond to only zonal flow, red squares (\textcolor{red}{$\blacksquare$}) denote coexistence of zonal flow and convection rolls, and blue diamonds (\textcolor{blue}{$\blacklozenge$}) indicate that only convection roll states are stable. The black hollow circles mark the cases shown in figure \ref{r8ar64}.}

474 — 2005.02093

\caption{\red{Fidelity relative to success probability for preparation of Fock states $|3\rangle$ (a), and $|5\rangle$ (b). Dashed lines represent PNRD detectors with colors ranging from green ($\eta_M = 1$) to blue ($\eta_M = 0.5$). Solid lines represent MSPD detectors with $M=n$ with colors ranging from yellow ($\eta_M = 1$) to red ($\eta_M = 0.5$). The respective efficiencies for all lines are also marked in the figures. }}

\caption{\red{Genuine quantum nongaussianity witnesses $W_{NG}(3)$ for Fock states $|3\rangle$ (a) and $W_{NG}(5)$ for $|5\rangle$ (b). Dashed lines represent PNRD detectors with colors ranging from green ($\eta_M = 1$) to blue ($\eta_M = 0.5$). Solid lines represent MSPD detectors with $M=n$ with colors ranging from yellow ($\eta_M = 1$) to red ($\eta_M = 0.5$). The respective efficiencies for all lines are also marked in the figures. Genuine quantum non-Gaussian states have values of the witness above zero, which is marked by the black dotted line.} }

475 — 2005.02541

\caption{ Parameters of the inner binary in HR\,6819 compared to those of the inner binary in LB-1\citep{2019Natur.575..618L}. \textcolor{black}{The minimum BH mass for HR\,6819 was derived with$f_{\rm M}-2\sigma = 0.90$ instead of $f_{\rm M}$ itself to obtain a true lower limit.} }

\caption{\label{appfig_ionbalance} \ion{Fe}{iii}$\lambda$\,5156 to\ion{Fe}{ii}$\lambda$\,5169 equivalent-width ratio vs.\effective temperature ($T_{\rm eff}$). Measurements in theoretical spectra \textcolor{black}{\citep[the nonrotating input grid used by][]{2018A&A...609A.108S}} are shown as lines for various values of $\log g$ (as labeled). The range observed in the inner (narrow line) B star of HR\,6819 is shaded in gray. This diagnostic diagram suggests that$T_{\rm eff}$ lies between 16 and 18\,kK. A more precise value is difficult to give owing to blending with the\ion{Fe}{ii}$\lambda$\,5169 emission line of the Be star (see Fig.~\ref{fig_HR6819_trails}). }

\caption{\label{appfig_IUE} Binned and dereddened IUE spectra (black) of HR\,6819. They are scaled to 55\% to represent the flux contribution from the inner B3\,III component to the total flux; spectra plotted in gray indicate 35-75\% ranges (Sect.\,\ref{subapp_distance}.) The theoretical SED for $T_{\rm eff}=16$\,kK and$\log g=3.5$ \citep{2003IAUS..210P.A20C} is overplotted in red, multiplied by $6.2\times10^{-18}$ to account for distance and stellar radius. }

\caption{\label{appfig_HRD}Evolutionary tracks (solid) and isochrones (dashed) from \citet{2012A&A...537A.146E}. Three potential locations of B3\,III stars are marked by filled shapes: 18\,Peg as modeled by\citet[blue filled circle]{2014A&A...566A...7N} as well as by \citet[gray triangle]{2016A&A...591L...6I}, and the generic calibration by \citet[dark gray square]{2010AN....331..349H}, all discussed in Appendix~\ref{app_distance}. The horizontal dash-dotted lines give the respective implied ranges of the intrinsic brightness of the Be star in HR\,6819. The blue and dark gray lines assume that the apparent magnitude of the Be star is the same as that of a B3\,III star as determined by\cite{2014A&A...566A...7N} and \citet{2010AN....331..349H}, respectively. Each pair of lines with the same color indicates a disk excess $\Delta V$ of $-0.3$ and $-0.7$\,mag, respectively.}

476 — 2005.02589

\caption{Illustration of the \emph{x}-coordinate accelerometer signal of each joint for four random Parkinson's disease (\textcolor{red}{red}) and healthy (\textcolor{blue}{blue}) subject trials.}

477 — 2005.02930

\caption{The average bias and coverage for Experiment 1 (heavy-tailed random effects) and Experiment 2 (standard normal random effects) for the methods: RBC ({\color{Orchid}$\bm{\bigcirc}$}), RBC-conv ({\color{blue}$\bm{\bigtriangleup}$}), Copas ({\color{orange}$\bm{\Diamond}$}), CLS ({\color{OliveGreen}$\bm{\Box}$}), and SMA ({\color{red}$\bm{\bigtriangledown}$}). In Experiment 1, RBC had the lowest bias and the highest CP\@. In Experiment 2, RBC-conv had the lowest bias and the highest CP\@.}

\caption{The average bias and coverage for Experiment 3 (several outliers) and Experiment 4 (skewed right) for the methods: RBC ({\color{Orchid}$\bm{\bigcirc}$}), RBC-conv ({\color{blue}$\bm{\bigtriangleup}$}), Copas ({\color{orange}$\bm{\Diamond}$}), CLS ({\color{OliveGreen}$\bm{\Box}$}), and SMA ({\color{red}$\bm{\bigtriangledown}$}). In these experiments, RBC had lower bias and much higher CP than the other methods. This demonstrates that RBC is the most robust to departures from normality. }

478 — 2005.03404

\caption[]{Generation of the interval-map representation according to \citet{Matthaei_Grid_Road_Detection_2013}. Given an extraction path, rectangular intervals of size $w \times l$ are defined (Road elements, REs), each one with an anchor~$\vec{p}_i$ along the path. The underlying grid cells are evaluated by accumulating their states along several extraction lines parallel to the interval direction. Colors correspond to different cell features, e.\,g. freespace (green), lane markings (beige), obstacle types (yellow, orange), or unknown regions (gray).}

\caption[]{Illustration of single- and double-sided occlusion due to a limited field of view or occlusion by another object. Half-circled arrowheads (\begin{tikzpicture}[baseline=-0.25em]\draw[-), thick, tubsGreenMedium100] (0,0)--(0.25,0); \end{tikzpicture}) mark the visible proportions of the occluded sides of the bounding box. The resulting reference point is drawn by a blue circle. The sensor's field of view is indicated in shaded gray.}

479 — 2005.03539

\caption{Upper panel: Phase diagram of the $t-V-V'$ model at half-filling. The phase lines separating the charge density wave (CDW), Luttinger liquid (LL), bond-order (BO), and second charge density wave (CDW-2) phases were determined by Mishra {\it et al.} (Ref. \onlinecite{Mishra11}). The thick dashed line inside the LL phase indicates the main finding of this paper, where the polarization undergoes a discrete change. Along this line the polarization distribution is flat. The maximum of the polarization shifts on either side. Upper panel, upper left inset: polarization distribution for systems defined by stars in the main figure of the upper panel, $V=6;V'=2.0, 2.4, 2.59, 2.8, 3.6$. These points are indicated with asterisks in the main figure. The points are in the phases CDW, LL (below polarization switch), LL (where polarization switch occurs), LL (above polarization switch), BO, respectively. Exact diagonalization calculations with periodic boundary conditions. \textcolor{red}{Upper panel, lower right inset: size scaling exponent of the variance of the polarization. Arrows indicate the four cases shown in Fig. \ref{fig:finite_tvvp}.} Lower panel: heat map of the polarization distribution $P(x)$ as a function of $V'/t$, $V=6.0$. Exact diagonalization calculations with periodic boundary conditions. }

\caption{Polarization distributions as a function of the rescaled coordinate $x/L$ for systems with $t=1,V=6$ and different values of $V'$. For $V'=1.0$ and $V'=6.0$ (CDW-1 and CDW-2 phases, respectively) the polarization distributions exhibit sharp peaks, which sharpen with system size. For the cases $V'=2.4$ and $V'=2.8$, on either side of the polarization switch, there are no sharp peaks, and the distributions exhibit negligible size dependence. \textcolor{red}{The scaling exponent of the variation of the polarization is indicated by arrows in the lower right inset of the upper panel of Fig. \ref{fig:pd_tvvp}.} Exact diagonalization calculations with periodic boundary conditions. }

480 — 2005.03635

\caption{Escape analysis of Jovian Trojans in the L$_4$ and L$_5$ swarms simulated over 4.5 Gyr. Proper elements, semi-major axis ($\Delta a_p$), eccentricity ($e_p$) and sine inclination ($sin I_p$) are taken from the AstDys database \citep{Knezevic2017AstDysTrojans}. \textcolor{gray}{o} indicates objects that are stable over the simulated time frame. X show objects that have at least one particle escaping the population, with their mean respective escape times indicated by colour.}

\caption{Escape analysis of collisional family members located in the L$_4$ Jovian Trojan swarm simulated for $4.5\e{9}$ years. Shown are the instabilities of the reference object. Proper elements, semi-major axis ($\Delta a_p$), eccentricity ($e_p$) and sine inclination ($sin I_p$), are taken from the AstDys database \citep{Knezevic2017AstDysTrojans}. \textcolor{gray}{o} indicates objects that are stable over the simulated timeframe. \textcolor{gray}{x} are unstable background objects. Family membership: Eurybates (1), Hektor (2), 1996 RJ (3), Arkesilaos (4). Black numbers are stable, with colours showing mean particle escape time.}

\caption{Escape analysis of the canonical Eurybates collisional family members identified in \citet{Nesvorny2015AsteroidFamsAIV}, simulated for $4.5\e{9}$ years. Shown are the mean escape time of 126 particles for the object (coloured x). Proper elements, semi-major axis ($\Delta a_p$), eccentricity ($e_p$) and sine inclination ($sin I_p$), are taken from the AstDys database \citep{Knezevic2017AstDysTrojans}. \textcolor{gray}{o} indicates objects that are stable over the simulated time frame.}

\caption{Escape analysis of collisional family members located in the L$_5$ Jovian Trojan Swarm simulated for 4.5e9 years. Proper elements, delta semi-major axis ($\Delta a_p$), eccentricity ($e_p$) and sine inclination ($sin I_p$) are taken from the AstDys database \citep{Knezevic2017AstDysTrojans}. \textcolor{gray}{o} indicates objects that are stable over the simulated time frame. \textcolor{gray}{x} are unstable background objects. Numbers indicate collisional family membership: Ennomos (5), 2001 UV$_{209}$ (6). Black numbers are stable, with colours showing mean escape time of 126 particles for the object.}

481 — 2005.03648

\caption{Visual navigation environments: \textit{Open}, \textit{Table}, and \textit{C-Maze}. Agent in {\color{es-blue}blue}. {\color{red}Red sphere} indicates the desired goal.}

482 — 2005.03754

\caption{An example of \textit{unfaithful} output (highlighted in \textcolor{red}{red}); generated by \citet{gehrmann-etal-2018-bottom}.}

483 — 2005.03819

\caption{ \red{Illustration of Query-Support Feature Similarity Mining. }}

\caption{ Comparison of \red{mAP\textsubscript{50}} with LSTD~\cite{lstd} and Repmet~\cite{repmet} under 1-shot setting on task~\RomanNumeralCaps{1} and~\RomanNumeralCaps{2}.}

\caption{ Comparison of \red{AP\textsubscript{50}} with CoAE~\cite{co-ae} on Task~\RomanNumeralCaps{3}.}

484 — 2005.04058

\caption{\label{fig:survey} An overview of the survey that we deployed. The survey is divided into three sections, shown here as a flow diagram. The first section (A) includes consent forms, contact settings, an introduction to the innovations in the survey, and a summary of responses that redirect to the other two survey portions. The main ``Describe a new dataset'' portion of the survey (B) invites participants to describe a real or imagined dataset, and asks them to reflect upon the extent to which they think about the dataset in terms of the six dataset types that we identified. Where participants reply that they at least ``rarely'' think of their data in terms of a given type, they are asked for more details in a specialized Details section of the survey. The final ``Explore alternative'' portion of the survey (C) invites participants to imagine their dataset as the type that they initially thought about the \emph{least}, and fill in the associated Details portion of the survey with this new perspective. As an example, the Tabular Details interface is shown \textcolor{cbSurveyBlue}{(D)}. Participants are encouraged throughout the survey to look up terminology highlighted in red, where participants can edit the terms and suggest alternative definitions in the glossary \textcolor{cbSurveyRed}{(E)}. In some Details sections, participants are asked for a small sample of what they imagine the data to look like, to help ground their thinking \textcolor{cbSurveyPurple}{(F)}. At any point in a Details section \textcolor{cbSurveyGreen}{(G)}, or at the end of most other sections \textcolor{cbSurveyOrange}{(H)}, participants can choose to skip the section to provide targeted critique on the survey itself if the questions have strayed far enough from the participant's mental model.}

485 — 2005.04075

\caption{Additional \aastex\symbols}

486 — 2005.04096

\caption{Reconfiguration of capability registers (e.g., $c_1$ in the \bluedot of tile $A$) is always consensual (requiring agreement of a majority of tile $A$, $B$ and $C$). The voter installs the majority decision (here, the read only region $[p, p+s]$.}

\caption{FPGA resources required by \bluedot (without / with AXI interface).}

487 — 2005.04112

\caption{The basis of the neural network structures.{\color{red} CHECK THE FIGURE PLACEMENT}}

488 — 2005.04270

\caption{Comparison of local (\solidrule) and weighted variables (\dashedrule) for (a) charge density (b) electrostatic potential, and (c) filling fraction. The parameters are identical to Fig. 2, and the surface charge density is fixed for (a-c) at 60 $\mu$C/cm$^2$.}

489 — 2005.04383

\caption{\red{For the data set \#3 (Khan{\it et al.}), the figure displays the heat map of (a) randomly selected 115 genes and (b) 115 genes that were selected by CRDA1 classifier. We note that the CRDA1 classifier obtained 0\% error rate on all 10 Monte Carlo splits of the data to training and test set, and outperformed all other classifiers.}}

\caption{Classification results of CRDA variants and their competitors (see \autoref{tab:meth}) for data sets \#1-\#3 in terms of test error rate (TER) and feature selection rates (FSR)\red{reported in percentages}. }

\caption{\red{Test error rates (TER) and feature selection rates (FSR) in percentages, and average computational times (ACT) in seconds are reported for the last six real data sets given in \autoref{table:summary}. Results are averages over $L = 10$ random splits of data into training and test sets.}}

490 — 2005.04400

\caption{Performance results of various VQA algorithms on KoNViD-1k. The data is taken from the references listed in the second column. The last two columns designate whether fine-tuning (column `ft') was performed correctly (green checkmark), or with data leakage (red cross), and whether the test set (column `test') was independent (green checkmark) or tainted (red cross). The approach indicated by \ssymbol{1} was published after the referenced publication and is current state-of-the-art. --.--{}-- indicates unreported values. The numbers in bold font in the last line give the true performance of the method in \varga, much below 0.85 PLCC and SROCC as claimed.}

491 — 2005.04543

\caption{Overview of replication market we built.\\ {\color{blue}\url{https://www.replicationmarkets.com/}}}

492 — 2005.04611

\caption{\red{Pairwise statistical significance for the results presented in Table \ref{tab:mainresults}, using the sign test across relations. Each cell reports the p-value of the corresponding pair. The improvements achieved by \ret{} and \ora{} are statistically significant (p-value smaller than the alpha level of 0.05).}}

493 — 2005.04737

\caption{Resilience behavior of the reduced-voltage MLP on four studied FPGAs (x-axis: $V_{CCBRAM}$ (V), y-axisL: MLP inference error rate (percentage), y-axisR: BRAMs fault rate (per 1Mb), shown for \textbf{Masked} [$V_{1st-fault}, V_{min}$) and \textbf{Critical} [$V_{min}, V_{crash}$) regions. \\ + \textcolor{green}{$V_{1st-fault}$}, \textcolor{orange}{$V_{min}$}, and \textcolor{red}{$V_{crash}$} are highlighted. \\ + Among different platforms, slight variation of the voltage regions and the subsequent significant impact on the fault rate and MLP accuracy in the \textbf{Critical} region can be seen.}

494 — 2005.05000

\caption{Additional \aastex\symbols}

495 — 2005.05125

\caption{Comparison of different approaches to object shape reconstruction on some examples from {\redwood} dataset. \label{fig:supp_redwood_result}}

\caption{Quantitative evaluation on 86 sequences of \redwood{}. We compare state of the art competitors Pix2Vox~\cite{xie2019pix2vox} and PMO~\cite{2019-lin} with the results at different stages of our multi-view pipeline (code fusion $\xrightarrow{}$ sparse optimization $\xrightarrow{}$ dense optimization). Average code outperforms majority voting. FroDo outperforms all methods when 35 input images are used.}

\caption{Ablation study of estimates after sparse and dense optimization stages on the \redwood dataset. We compare the effect of different energy terms in Eq.~(\ref{eq:total_energy}).}

\caption{Example 3D reconstructions achieved with different approaches on three sample sequences from {\redwood}. In all cases 35 input views were used. \label{fig:redwood_result}}

496 — 2005.05227

\caption{ \textbf{Example {\ObjTables}-formatted spreadsheet for a dataset of human genes and their splice variants.} The first worksheet (\textbf{a}) contains a table of contents for the spreadsheet. The second worksheet (\textbf{b}) describes the schema for the spreadsheet. The schema includes three classes (`Gene,' `Transcript,' and `Location') that interact via three relationships (the gene that codes for each splice variant and the location of each gene and transcript). The classes are encoded into two worksheets -- one for genes and one for splice variants -- by (a) using gene ids to represent the gene that codes for each splice variant and (b) embedding groups of columns for representing the location of each gene and splice variant into the worksheets for genes and transcripts. The data worksheets (\textbf{c}, \textbf{d}) describe the genes and splice variants in the dataset. \textcolor{linkcolor}{Supplementary File~1} is an XLSX version of this example.}

497 — 2005.05233

\caption{Renormalised bands in the electronic band dispersions of Sr$_4$Ru$_3$O$_{10}$ measured with a photon energy of 60 eV in LHP, in the direction parallel to the $\Gamma-M$ line. The blue vertical lines indicate the position and direction in which the cuts have been acquired with respect to the first BZ (square orange rectangle in inset). The black arrows point to the band of interest. (b) The orange open circles are peak maxima obtained by fitting MDCs extracted from the corresponding ARPES cut in (a). The black straight dashed-line represent band dispersions of a non-interacting system (the bare band dispersion). (c) A representative momentum (MDC) full width half maximum (FWHM) for the band of interest (the kink in Figure~\textcolor{blue}{3b} in the Main Text) plotted versus the binding energy of the peak. The dashed red curve represents a quadratic fit to the low-energy data. The arrow marks the position of the kink.}

498 — 2005.05276

\caption{Typical cup geometries, each consisting of $m$ points. For reasons of symmetry, it is sufficient to simulate the deformation of a quarter cup segment instead of the full cup. The colors indicate the distance of each point to the reference mesh, consisting of the average coordinates of all good cups. We use a different color scale for each subfigure: 0 \protect\rhotcolorbar \subref{fig:cup:before} 70, \subref{fig:cup:ok} 2, \subref{fig:cup:damaged} 6, \subref{fig:cup:broken} 49.}

\caption{Network architectures used to model \cref{eqn:model}. The $k$-dimensional simulation parameters $\color{inpcol}\mathbf{p}$ constitute the input, whereas the output corresponds to the $d$-dimensional mesh ${\color{outcol}\mathbf{\hat{x}}}({\color{inpcol}\mathbf{p}})$. Both architectures consist of a sequence of layers (number of units in brackets), which are fully connected (F) or partially connected, \ie, pruned ({\color{prucol}P}). We apply a dropout after every inner layer for regularization.}

499 — 2005.05414

\caption{An abstract, from \textit{cs.TLT} dataset, with \colorbox{backgG}{golden} and predicted (\colorbox{backgPc}{correct} and \colorbox{backgPw}{errornous} ) labels.}

500 — 2005.05529

\caption{\emph{Left Panel:} Comparison of a sample line profile of the Ca II 8542 \AA~line from the FISS dataset described in Section~\ref{sec:deconvolve_SPSF} before (\textcolor{orange}{orange}) and after (\textcolor{blue}{blue}) convolution with the sPSF (\textcolor{magenta}{magenta}); The \textcolor{magenta}{magenta} curve is the transmission profile used for convolving the orange profile to get the blue one (corresponding to FP 2 of IBIS with R$\sim$45,000); \emph{Central and Right panels:} Comparison of chromospheric quiet Sun region observed with IBIS with low spectral resolution (R$\sim$50,000) on the central panel and with high spectral resolution (R$\sim$200,000) in the right panel.}

\caption{\emph{Top left:} Continuum image/ Line core intensity of Ca II 8542 {\AA} of the FISS dataset used for the experiment in Sections \ref{subsec:deconvolve_FISS} and \ref{sec:recover_multiplex_profiles}; \emph{Bottom left:} A sample profile (coordinates [4, 21] in our datacube, green cross in top left panel) shown before convolution with wide SPSF (\textcolor{blue}{blue line}), after the convolution (\textcolor{green}{green}) and after the deconvolution with the CNN (black); \emph{Top right:} Comparison of the line core intensity recovered with the algorithm -- true line profile (black line is the one-to-one line). The approach for measuring line intensity and width are described in section~\ref{subsec:deconvolve_FISS}. \emph{Bottom right:} Same as the previous panel but for the line core width of the Ca II 8542 \AA~line.}

\caption{\emph{Left panel}: Transmission profile of the components of the IBIS Instrument centered around 8542 {\AA}. The transmission profiles of the two Fabry-P\'erot etalons are in\textcolor{red}{red} and \textcolor{blue}{blue}; the 8542 {\AA} prefilter profile is the black line; The effective transmission profile of the instrument is presented in \textcolor{olive}{green}; \emph{Central panel}: Histogram of the wing intensity in the data sets with single FP (low R) and both FPs (high R) in the optical system; \emph{Right panel}: The result of the deconvolution algorithm applied to real IBIS data (same as top right panel in Figure~\ref{fig:SPSF-deconvolve-FISS}) for the line core intensity.}

\caption{\emph{Top left}: The spectral transmission profile is (\textcolor{magenta}{magenta} scaled by 9 for best representation) applied to the spectral profiles overlaid over the average Ca II 8542~{\AA} line. \emph{Top right}: Sample spectral profile from the multiplexing (\textcolor{red}{red}), the retrieved spectral profile (\textcolor{green}{green}) and the original spectral profile (\textcolor{blue}{blue}); \emph{Bottom left}: Retrieved line core intensity from this approach vs the original line core intensity; \emph{Bottom right}: Maximum Likelihood Intrinsic Dimensionality Estimate for the original FISS data, the multiplexed FISS data and the data convolved with the wide sPSF in Section~\ref{subsec:deconvolve_FISS}.}

501 — 2005.05531

\caption{Time for $\mathcal{D}$ to pre-process 1 GB data with quad-core CPUs. % When $s$ is tuned to around 50, $\mathcal{D}$ achieves the optimal time. % Note this pre-processing time is proportional to the file size. % \red{In the case of $s = 50$, pre-processing speed is around 35.31 MB/s}. }

\caption{Time for $\mathcal{S}$ to generate a proof, $k$ = 300 (i.e., 95\% conf. when 1\% data corrupted). % Time for $\mathbb{Z}_{p}$ operations peaks when $s$ is around 50. % \red{Yet this amount of time for $\mathbb{Z}_{p}$ still counts as a minor role.} % By and large, ECC operations dominate the running time.}

\caption{\red{Time for generating a proof increases, as the probability of storage guarantees goes up (when there is 1\% data corruption). % The dotted line shows the proving process without any on-chain privacy guarantees whilest the solid line shows the opposite.}}

\caption{Annual growth of blockchain size and time for processing all contracts on each $\mathcal{D}$. % \red{The major contributing factor for both is the user base, which could be in the case of thousands}. % \red{Note that the latter is directly affected by \# of$\mathcal{D}$ (typically dozens of) with data on each $\mathcal{S}$.}}

502 — 2005.05623

\caption{Examples of multi-labeled images of our custom dataset. The first row shows the original image. Second row shows the CAMs for the original label whereas third row shows correct (\color{green}green\color{black}) and incorrect (\color{red}red\color{black}) generated labels with their CAMs}

\caption{Examples of the Places365 dataset. The first row shows the original image. Second row shows the CAMs for the original label whereas third row shows correct (\color{green}green\color{black}) and incorrect (\color{red}red\color{black}) generated labels with their CAMs}

503 — 2005.05725

\caption{\textbf{(A-C) Problem identification, and related research question.} The limited \textcolor{orange}{nerve conduction velocity in organic tissue} \citep{more_scaling_2010} \ballnumber{2} presents a significant hazard in legged locomotion. \textcolor{OliveGreen}{Local neuromuscular strategies} \ballnumber{6} provide an \textcolor{OliveGreen}{alternative means} of \textbf{timely and tunable force and power production}. Actuators like the indicated \textcolor{red}{knee extensor muscle} keep the leg extended during stance phase (muscle length L\textsubscript{muscle}) by producing the appropriate amount of muscle force (F\textsubscript{muscle}), correctly \textbf{timed}. \textcolor{orange}{Neuromuscular control} \ballnumber{1} plays a major role in initiating and producing these active muscle forces, but works best only during unperturbed locomotion. \textcolor{orange}{Sensor information} from foot contact travels via \textcolor{orange}{nerves bundles} \ballnumber{2} to the spinal cord, but with \textbf{significant time delays} in the range of $t=\SI{40}{ms}$ \citep[for \SI{1}{m} leg length]{more_scaling_2018} and more. Hence, the locomotion control system can become \textbf{`sensor blind' due to conduction delays, for half a stance phase}, and can miss unexpected perturbations like the depicted step-down. During step-down perturbations \ballnumber{3} additional energy \ballnumber{4} is inserted into the system. Viscous damper-like mechanisms \textbf{produce velocity dependent counter-forces, and can dissipate kinetic energy}. \textcolor{OliveGreen}{Local neuromuscular strategies} \ballnumber{6} producing tunable, viscous damping forces would \textbf{act instantaneously and adaptively}. Such strategies \ballnumber{6} could also be robust to uncontrolled and harsh impacts of the foot after perturbations \ballnumber{5}, better than \textcolor{orange}{sensor-based strategies.} In this work \textbf{(D)}, we are testing and characterizing spring-damper configurations mounted to a two-segment leg structure, during rapid- and slow-drop experiments, for their feasibility to \textcolor{OliveGreen}{physically} and instantaneously produce tunable, speed-dependent forces extending the leg. Work loops \textbf{(E)} will indicate how much effective negative work is dissipated, between touch-down and mid-stance. Prior to impact \ballnumber{7} and during the leg loading \ballnumber{8} the spring-damper's tendons act equally. \mrka{Starting at mid-stance, the main spring extends the knee, leading to leg extension and leaving the damper's tendon slack \ballnumber{9}}.}

\caption{\textbf{Higher energy dissipation with a different model of the hydraulic damper (1210M):} Vertical GRF vs.\,leg length change, a 2-DOF leg with a parallel damper and spring drops onto the force sensor. Two damper orifice settings were tested (blue, red curves). The two resulting curves are compared with the spring-only configuration, provided as reference.}

504 — 2005.05822

\caption{Time evolution of the Lyapunov function $V(x(k))$ (solid {\color{red} red} line) and its components $V_1$ (dashed {\color{blue} blue} line), and $V_2$ (black dotted line), along the trajectories of the LPV discrete-time switched system}

505 — 2005.05859

\caption{\textbf{MAdds vs. Accuracy} trade-off curves\protect\footnotemark comparing \ourmethod{} existing architectures on a diverse set of datasets. The datasets are arranged in ascending order of training set size. Methods shown in the legend pre-train on ImageNet and fine-tune the weights on the target dataset. Methods in {\color{blue}blue} train from scratch or use external training data.\label{fig:dataset_anno}}

506 — 2005.05887

\caption{2D sound maps as the length is varied, at $Re_c$=350,000 and $\alpha_g=10^\circ$. (a--d) shows the frequency band, $f_c$=1~kHz and (e--h) shows the frequency band, $f_c$ = 8~kHz. (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[dash dot,very thick,color=lightgray] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the jet nozzle, (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[dashed,very thick,color=black] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the aerofoil, (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[solid,very thick,color=red] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the interrogation region for acoustic spectra.}

\caption{2D sound maps as the width is varied, at $Re_c$=350,000 and $\alpha_g=10^\circ$. (a--d) shows the frequency band, $f_c$=1~kHz and (e--h) shows the frequency band, $f_c$ = 8~kHz. (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[dash dot,very thick,color=lightgray] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the jet nozzle, (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[dashed,very thick,color=black] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the aerofoil, (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[solid,very thick,color=red] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the interrogation region for acoustic spectra.}

\caption{2D sound maps as the inter-spacing is varied, at $Re_c$=350,000 and $\alpha_g=10^\circ$. (a--d) shows the frequency band, $f_c$=1~kHz and (e--h) shows the frequency band, $f_c$ = 8~kHz. (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[dash dot,very thick,color=lightgray] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the jet nozzle, (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[dashed,very thick,color=black] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the aerofoil, (\protect\begin{tikzpicture} \protect\tikz[baseline=-2pt] \protect\draw[solid,very thick,color=red] (0.0,0.1) -- (0.575,0.1); \protect\end{tikzpicture}) indicates the interrogation region for acoustic spectra.}

507 — 2005.05936

\caption{Air quality nodes at IIIT-H main gate: Airveda on the top and IIIT node below. \textcolor{red}{This can be removed we need space.}}

508 — 2005.06114

\caption{(i) is a conversation sampled from our \decModel(8.3B) model with \nt{(ii)} corresponding to the reference material from speaker B (`\emph{n/a}' indicates no parent comment for the given reference comment). Reference material for speaker A was not included for brevity. The first three turns in (i) are from a real conversation had within the validation set. The last three turns (\emph{italicized}) were generated by sampling turns sequentially, alternating the target speaker between B and A. \colorbox{yellow}{Yellow highlights} indicate where the model appropriately transferred a part of style between references and generations for speaker B (e.g. `i' v. `I'), \cfbox{blue}{boxed words} indicate a transfer of content for speaker B, and \colorbox{lightgreen}{green highlights} indicate consistent style for turns from speaker A (e.g. `3DS' v. `3ds').}

509 — 2005.06221

\caption{Multi-wavelength view of active region NOAA 12371 on 2015 June 22. (a) White light image of the active region showing configurations of leading and trailing sunspot groups, which are shown by dotted boxes. (b) HMI line-of-sight (LOS) magnetogram showing the photospheric magnetic structure of the active region. The flare under investigation primarily originated in the trailing part of the active region. (c) AIA 94 \AA\image of the pre-flare phase showing the hot core region where the M6.6 class flare occurred. (d) AIA 171\AA\image showing high coronal loops that lie over the sunspot groups. (e) AIA 304\AA\image showing faint filament structure in the chromospheric level. (f) BBSO H$\alpha$ image showing clear filament channel above polarity inversion line (PIL). Comparison of panels (c), (e), and (f) reveals that, a filament exists in the chromosphere underneath the hot EUV channel. \label{fig:overview}}

\caption{Panel (a): GOES soft X-ray flux in 1--8 \AA\and 0.5--4\AA\channel from 16:00 UT to 23:00 UT on 2015 June 22. We find two stages in the pre-flare phase that peak at 16:45 UT (marked as P1) and 17:26 UT (marked as P2), respectively. We also observe dual flare-peak structure in the main phase of the M6.6 flare, indicated as F1 and F2 at 18:00 UT and 18:13 UT, respectively. Panel (b): AIA light curves normalized by peak intensity of respective AIA filters. For clear view, light curves have been scaled by factors of 0.55 and 0.8 for 94\AA\and 304\AA\channels, respectively. The peak P2 in GOES soft X-ray light curves in the pre-flare phase corresponds to a peak in AIA light curves, which is shown by dotted line. We readily observe that, the structure of the active region shows significant changes during the course of the flare. As a comparison between pre- and post-flare phases, we plot the active region corona in AIA 94\AA\(cf. panels (c) and (d)), 304 \AA\(cf. panels (e) and (f)), and in 171 \AA\(cf. panels (g) and (h)).\\ (An animation of this figure showing the temporal evolution of the flare is available in the online material.)}

\caption{Temporal evolution of X-ray count rates observed by RHESSI from 16:29 UT to 18:35 UT in energy bands of 3--6, 6--12, 12--25, and 25--50 keV with a time cadence of 4 s. GOES SXR light curves in 1--8 \AA\and 0.5--4\AA\channels are also shown by dashed and solid lines, respectively. The hatched regions denote unavailability of solar X-ray data due to RHESSI night (N) and South Atlantic Anomaly (SAA). Different attenuator states (A0, A1, and A3) are shown by horizontal bars at the top.}

\caption{Running difference images of LASCO C2 (panels (a) and (b)) and C3 (panel (c)) coronagraph. Panel (a) shows first detection of CME in C2 coronagraph. A full disk image of the Sun in AIA 193 \AA\is overplotted on the coronagraph occulter. The CME was first detected in C3 coronagraph at$\approx$18:54 UT\hspace{0.05cm}\textcolor{blue}{$^{3}$}.}

\caption{Pre-flare phase of M6.6 flare shown in AIA 94 \AA\and 304\AA\image sequences. Panels (a)--(f): Sequence of AIA 94\AA\images showing activation and pre-eruption stages of the hot channel (marked by yellow arrows in panels (a), (c), (e), and (f)) and overlying coronal loops (marked by black arrow in panel (e)). Panel (b) shows overplotted co-temporal HMI LOS magnetogram. The positive and negative polarities are shown by red and yellow contours respectively, with contours levels set as$\pm$[500, 800, 1000, 2000] G. The box in panel (e) indicates the field-of-view of the images plotted in Figure \ref{fig:RHESSI_AIA94_pre-flare}. Panels (g)--(l): Simultaneous imaging in the AIA 304 \AA\channel. A filament structure is shown by white arrow in panel (g). White arrows in panel (j) show the appearance of two brightenings on the two sides of the filament channel. The white arrow in panel (l) shows enhanced brightening from the filament channel.}

\caption{Sequence of RHESSI X-ray images in 5--10 keV (red contours), 10--15 keV (blue contours), and 15--25 keV (yellow contours) overplotted on co-temporal AIA 94 \AA\images. Panels (a)--(d): sequence of images for first stage (peaked at P1) of the pre-flare phase, where the X-ray sources are observed to be emitted from the overlying coronal loops. Panels (e)--(h): sequence of images for second stage (peaked at P2) of the pre-flare phase. In this period X-ray emissions are observed from the low-lying hot EUV channel below the coronal loops. The X-ray images are reconstructed by the CLEAN algorithm with integration time of 40 seconds. The contours drawn are at 70\%, 80\%, and 90\% of the peak flux in each image.}

\caption{Sequence of AIA 94 \AA\images showing evolutionary phases of the eruption of hot channel and associated M6.6 flare. Panel (a) shows hot EUV channel (marked by yellow arrow) and overlying coronal loops (marked by white arrow). The erupting front of the hot channel is shown by white arrows in panels (c) and (d). Red arrow in panel (c) shows start of formation of post-flare loops. RHESSI images in 5--10 keV (red contours), 10--15 keV (blue contours), 15--25 keV (yellow contours), 25--50 keV (green contours), and 50--100 keV (black contours) are reconstructed by CLEAN algorithm with integration time of 32 seconds. The contour levels are set as 70\%, 80\%, and 90\% of the peak flux in each image. Panels (g)--(i) show formation of post-flare loop arcades. The red arrow in panel (g) shows the post-flare loops in the northern part of the flaring region, which ultimately converts into dense post-flare loop arcades. Red arrow in panel (h) shows the start of formation of post-flare loops in the southern part of the flaring region. In panel (i), dense post-flare loop arcades in both northern and southern part of the flaring region are indicated by red arrows.}

\caption{Structure of the solar corona and associated active region in EUV (AIA 131 \AA) and UV (AIA 1600 \AA) channels during the peak of the flare. In panel (a), we indicate the erupting hot flux rope structure in 131 \AA\image by white arrow, while the co-temporal observation in AIA 1600\AA\shows conjugate and sheared flare ribbon brightenings, which are shown by red arrows in panel (b).\label{fig:AIA_131_1600}}

\caption{Sequence of AIA 94 \AA\running difference images showing the directions of eruption of the hot channel (shown by yellow arrows in panels (b) and (e)). The arrow in panel (d) shows the erupting front. We plot a time-slice diagram of the erupting hot channel in panel (g). The direction, from B$_{1}$ to B$_{2}$, through which the time-slice plot is drawn, is shown as a yellow slit in panel (a).}

\caption{Coronal magnetic field lines obtained using NLFFF model of extrapolation are shown in panels (a), (b), and (c). The lower boundary of the extrapolation is the photospheric LOS magnetic field. Panels (a) and (b) show the top and side view of the extrapolated field lines, respectively. The magnetic flux rope (MFR), low lying coronal loops (LLCLs), and high coronal loops (HCLs) are clearly indicated in panel (b). The position of the MFR along the PIL of the active region and the LLCLs are shown in panel (c). AIA 171 \AA\image of the active region is shown in panel (d) in pre-flare phase (at$\approx$17:25 UT), overplotted with HMI LOS magnetogram. The positive and negative polarities of magnetogram are shown by red and blue contours respectively with contours levels set as $\pm$[400, 800, 1000, 2000] G. High and low coronal loops (HCL and LLCL, respectively) are shown by white arrows in panel (d). The rectangular box indicates a hot core region, whose enlarged view is shown in AIA 94 \AA\channel in panel (e) overplotted with RHESSI contours in 5--10 keV (red), 10--15 keV (blue), and 15--25 keV (yellow). The X-ray contours are reconstructed by CLEAN algorithm with integration time of 40 seconds. The contours denote 70\%, 80\%, and 90\% of peak flux in each image. LLCLs are also clearly visible above the hot channel/MFR, which is co-spatial with X-ray sources.}

510 — 2005.06249

\caption{The possible transformation of MLM and PLM, where $w_i$ and $p_i$ represent token and position embeddings. $[M]$ is the special mask token used in MLM. The left side of MLM (a) can be seen as bidirectional AR streams (in \textcolor[RGB]{91,155,213}{blue} and \textcolor[RGB]{255,230,153}{yellow}, respectively) at the right side. For MLM (b) and PLM (c), the left sides are in original order, and the right sides are in permuted order, which are regarded as a unified view.}

511 — 2005.06335

\caption{Additional \aastex\symbols}

512 — 2005.06377

\caption{Training sample generation by mutation. Mutated text in dark blocks \rule{1.7em}{0.7em} while original text in the summary in gray blocks \textcolor{gray}{\rule{1.6em}{.7em}}. Sizes are out of scale. }

513 — 2005.06382

\caption{The overview of our Super-Resolution Domain Adaptation Network (SRDA-Net). On the top, the asymmetric multi-task model is depicted, which consists of a Super-Resolution model and a Segmentation model (SRS). During the training phase, a source domain image and a downsampled target domain image are fed to the SRS model. The \textcolor[RGB]{128,0,128}{purple} and \textcolor{red}{red} curve arrows respectively represent the input/output of source and target domains. Further, the two-way arrow indicates the data flow involved in the training process. From this figure, source images take part in the super-resolution and segmentation training in the supervised manner, while target images only participate the super-resolution training in the supervised manner. On the bottom, the Pixel-level Domain Classifier (PDC) and Output-space Domain Classifier (ODC) are demonstrated. The Super-Resolution images and the predicted label distributions from SRS are respectively flowed to PDC and ODC. By adversarially training SRS and the two classifiers, the final SRS will be obtained. During the testing stage, the downsampled test images are fed to the SRS for predicting the segmentation maps.}

514 — 2005.06423

\caption{Examples of main species in CNH-98. The {\color[RGB]{24,116,205}{blue}} box represents an example of the \textit{Species} and the {\color{red}{red}} box indicates an examples of the \textit{Class}.}

\caption{Evaluation results of models with different components on datasets CNH-98. The best records of comparative groups are \textbf{bold} and the best records of all models with the same depth are \textbf{bold} and {\color{red}red}. The variants of our methods are also shown in \textbf{bold}.}

\caption{Visualization of attention activation for example. In \textit{Masks of SCA (left)}, the color of regions tends to be {\color{red}red}, indicating the higher activation values, and vice versa, to be {\color{blue}blue}. In \textit{Activation of CA (right)}, the {\color[RGB]{30,144,255}blue} lines refer to the activation of spatial flow via lateral connection $\mathbf{s}_l^{spa}$, while the {\color[RGB]{255,127,36}orange} lines mean the other ones $\mathbf{s}_l^{sem}$.}

515 — 2005.06536

\caption{The illustration of limitations for appearance based visual tracking. The first column shows the tracking results where the \textcolor{green}{Green}, \textcolor{blue}{Blue}, and \textcolor{red}{Red} bounding box represent groundtruth, DiMP tracker, and our TS-RCN tracker, respectively. The second column illustrates the HSV-color visualization of the optical flow. In all three cases, the target optical flow has a different pattern than that of its local background. Row (a) shows dense similar objects (i.e. crabs) as distractors; Row (b) shows confusing background textures as distractors (i.e., the flying drone blends with background buildings); Row (c) shows the target (i.e., soccer ball) has motion blurs. This figure is best viewed in PDF format.}

\caption{Optical flow visualization: (a-b)\: consecutive video frames of a targeted soccer ball. (c): Color visualization based on displacement vector's magnitude and direction, using the HSV color-space. (d-e): horizontal and vertical displacement vector fields$d_{u}^{t}$, and $d_{v}^{t}$, respectively, with higher intensity representing positive values.}

516 — 2005.06557

\caption{Confusion matrix for the test dataset. The bulk of the mis-classification happens within the regions (marked withs thick border). Outliers are marked in \textcolor{red}{red} where the classification is beyond the region. }

517 — 2005.06727

\caption{Profile of the python code on an Intel\textregistered{} Xeon\textregistered{} system.}

518 — 2005.06728

\caption{Overview of the three stages involved in OD-SGD. Weights ({\color{blue}{$w_{t+1}^{'}$}}) copied at iteration {\color{red}{\textit{i-1}}} is updated at iteration {\color{red}{\textit{i}}} ({\color{blue}{update: ($w_{t+1}^{'}$, $\triangledown{w_{t+1}}$)}}) and used for training at iteration {\color{red}{\textit{i+1}}} with the OD-SGD mechanism. Actually, the \textbf{pull} and \textbf{copy} operations in iteration {\color{red}{\textit{i}}} are conducted when the training task of iteration {\color{red}{\textit{i+1}}} is executing, they can be parallelized.}

519 — 2005.06901

\caption{A full DRTS tree for document: ``\textcolor{blue}{$k_1$:} At least 27 wives of Israeli rabbis have signed a letter urging Jewish women to avoid dating Arab men. \textcolor{blue}{$k_4$:} The letter warns Jewish women that they will suffer if they date Arab men." Red numbers indicate top-down depth-first order traversal of the DRTS skeleton.}

\caption{Discourse representation tree structure examples generated by different models: ``\textcolor{blue}{$k_1$:} At least 27 wives of Israeli rabbis have signed a letter urging Jewish women to avoid dating Arab men. \textcolor{blue}{$k_4$:} The letter warns Jewish women that they will suffer \textcolor{red}{\bf if} they date Arab men." }

520 — 2005.06915

\caption{% From left to right: \three, \six, \onehundred. From top to bottom: individual scores distribution (first row), gold scores distribution (second row), and aggregated scores distribution (third row). }

\caption{% From left to right: \three, \six, \onehundred; agreement with \politifact and \abc, separated by the vertical dashed line. \label{fig:scale-comparison-ground}}

\caption{% Agreement between scales with a breakdown on \politifact statements (first row), and agreement between scales with a breakdown on \abc statements (second row). From left to right: \six vs. \three, \onehundred vs. \three, and \onehundred vs. \six. \label{fig:scale-comparisons}}

\caption{% From left to right: \six cut into \three, \onehundred cut into \three, and \onehundred cut into \six (1\% stratified sampling), cuts sorted by decreasing $\alpha$ values. \label{fig:alpha-cuts}}

\caption{% From left to right: \three, \six, \onehundred; agreement with ground truth. Aggregation function: median (highlighted by the red diamond). Compare with Figure~\ref{fig:scale-comparison-ground}. \label{fig:alternative-aggregation-median}}

\caption{% Agreement with ground truth for merged categories for \politifact. From the left: mean for the three scales \three, \six, \onehundred and then median for the same scales. From top to bottom: three and two resulting categories. The median is highlighted by the red diamond. \label{fig:alternative-aggregation-binned-3}}

\caption{Websites from which workers chose URLs to justify their judgments without considering gold questions for \three, \six, and \onehundred (percentage). Only websites with percentage $\geq 1\%$ are shown. \label{tab:justification-distribution-sheet}}

\caption{Distribution of the ranks in search results for the URLs chosen by workers in \three, \six, and \onehundred (percentage).}

521 — 2005.07327

\caption{Examples of person search results on CUHK-PEDES. We indicate the true/false matching results in \textcolor{green}{green}/\textcolor{red}{red} boxes.}

522 — 2005.07335

\caption{\mycolor{On the left, we show the input image and the corresponding mask. On the right, we visualize a few masks at different layers of the network. Note that, as we move deeper through the network, the masks become blurrier and more uniform. This is expected since the receptive field of the features become larger in the deeper layers.}}

\caption{\mycolor{We evaluate the effectiveness of our masking and pre-training strategies by comparing against other alternatives in terms of MSE and HDR-VDP-2 \cite{mantiuk2011hdr}. Here, SConv, GConv, IMask, and FMask refer to standard convolution, gated convolution~\cite{yu2018free}, only masking the input image, and our full feature masking approach, respectively. Moreover, Inp. pre-training and HDR pre-training correspond to our proposed pre-training on inpainting and HDR reconstruction tasks, respectively. }}

\caption{\mycolor{In regions with both saturated and well-exposed content (boundaries of sky and mountain and bright building lights), the response of the invalid saturated areas in standard convolution dominates the feature maps. Therefore, the network cannot properly utilize the content of the valid regions, introducing high frequency checkerboard artifacts (top row) and blurriness and halo (bottom row). Our approach suppresses the features from the saturated content and allows the network to synthesize the image using the well-exposed information.} }

\caption{\mycolor{From left to right, we compare our method against two other masking strategies as well as a pre-training method, and evaluate the effect of patch sampling. Here, GConv, IMask, and FMask refer to gated convolution~\cite{yu2018free}, only masking the input image, and our full feature masking method, respectively. Moreover, Inp. pre-training refers to our proposed pre-training on inpainting task.}}

\caption{\mycolor{Failure cases of our approach. From top to bottom, our method fails to reconstruct the wrinkles on the curtain, introduces textures that are not in the ground truth, and incorrectly reconstructs the building with sky color. Note that, the top two examples are synthetic, but the bottom one is real for which we do not have access to the ground truth image.}}

523 — 2005.07343

\caption{The enhanced results with the number of cycles $K$=1, 2, 3, 4. The \emph{optimal} results selected by the comparator are {\color{red}{d.1}}, {\color{red}{c.2}}, and {\color{red}{b.3}}.}

524 — 2005.07374

\caption{Behavior of an inverted cantilever plate at zero angle of attack, obtained experimentally. Maximum (\textcolor{blue}{$\circ$}), minimum (\textcolor{red}{$\circ$}) and mean ($\bullet$) deflection angle, $\Phi$, for a plate of (a) AR=5 and $\mu=3.03$, (c) AR=2 and $\mu=3.11$ and (d) AR=2 and $\mu=2.62$. (b) Frequency of motion of the plate of AR=5 and $\mu=3.03$. Experimentally measured divergence flow speed (---) and theoretical prediction of divergence flow speed, using \cite{Sader2016b}(- -).}

\caption{Behavior of an inverted cantilever plate at finite angle of attack ($\alpha=\ang{10}$). (a) Superimposed snapshots of the plate throughout its motion, depicting the four dynamical regimes. (b, c) Maximum (\textcolor{blue}{$\circ$}), minimum (\textcolor{red}{$\circ$}) and mean ($\bullet$) deflection angle, $\Phi$, as a function of non-dimensional flow speed, $\kappa$. (d, e) Non-dimensional frequency of motion, $\Tilde{f}$, as a function of non-dimensional flow speed, $\kappa$. The results were obtained experimentally for a plate of AR=5, $\mu=3.0$ and $Re\sim\mathcal{O}(4)$ (a, b and d) and numerically for a two-dimensional inverted cantilever at $\mu=0.5,Re=200$. (c and e). The nomenclature employed for the different dynamical regimes is specified in (a, b). }

\caption{Maximum (\textcolor{blue}{$\circ$}), minimum (\textcolor{red}{$\circ$}) and mean ($\bullet$) deflection angle, $\Phi$, measured experimentally for an inverted cantilever plate of $\mathrm{AR}=5$ and $\mu=3.03$ as a function of non-dimensional flow speed, $\kappa$, and angle of attack, $\alpha$ (deg).}

\caption{Critical non-dimensional flow speeds measured experimentally as a function of angle of attack. Flow speed for flow separation, $\kappa_{sep}$ ($\nabla$), start of small-amplitude flapping regime (as defined by equation (\ref{flapcondition})), $\kappa_{upper}$ (\textcolor{brown}{$\triangle$}), end of resonance (as defined by the same equation), $\kappa_{lower}$ (\textcolor{red}{$\square$}), start of deflected regime, $\kappa_{def}$ ($\diamond$) and start of large-amplitude flapping regime (\textcolor{blue}{$\circ$}). }

\caption{Maximum (\textcolor{blue}{$\diamond$}), minimum (\textcolor{red}{$\diamond$}) and mean ($\bullet$) deflection angle, $\Phi$, measured experimentally for an inverted cantilever plate of AR=2 and $\mu=3.11$ as a function of non-dimensional flow speed, $\kappa$, and angle of attack, $\alpha$. Maximum and minimum deflection angle for an inverted cantilever plate of AR=5 ($\circ$).}

525 — 2005.07491

\caption{schematic view of the $B^{3+}(1s^2)$ + $He$ collision, the \textcolor{red}{$2p_\sigma$} state populated at the first crossing \textcolor{red}{\textbf{A}} tends to remain fixed in space so that it hardly interacts at the second crossing \textcolor{blue}{\textbf{B}}. In this example, a phase difference of $\pi/2$ gives a circular state \cite{Ostrovsky_1991}. \label{fig:B3_model}}

526 — 2005.07522

\caption{An example that \textbf{F}ormality \textbf{S}tyle \textbf{T}ransfer (FST) benefits from data augmented via \textbf{f}ormality \textbf{dis}crimination (\textbf{F-Dis}) and \textbf{m}ulti-\textbf{task} transfer (\textbf{M-Task}). The mapping knowledge indicated by the color (\textcolor{newblue}{blue}$\to$\textcolor{newpink}{pink}) in FST test instance occur in the pairs augmented by F-Dis and M-Task. F-Dis identifies useful sentence pairs from paraphrased sentence pairs generated by cross-lingual MT, while M-Task utilizes training data from GEC to help formality improvement.}

527 — 2005.07886

\caption{A controversial post $P$ about whether Xiaomi's Mimoji copies Apple's Memoji. These {\color[rgb]{0.275,0.510,0.231} \textbf{Supports}} and {\color[rgb]{0.788,0.008,0.075} \textbf{Refutations}} are to either their respective parent comments or $P$.}

528 — 2005.08081

\caption{Experimental results of machine translation in terms of BLEU. \ssymbol{2} denotes statistically significant results (t-test with $p<0.01$). As a whole, the proposed cross-view decoding with granularity consistent attention significantly improves the baselines. }

\caption{Results of text summarization and image captioning. The \ssymbol{2} is defined similarly. \label{tab:abstractive+imagecaption}}

529 — 2005.08189

\caption{Visualization of proteins: Alzheimer's positives (\protect\marksymbol{square*}{blue}) and negatives (\protect\marksymbol{triangle*}{red}). Best viewed in color.}

530 — 2005.08314

\caption{Content snapshots generated by two models for a \wtq/ \textsc{Dev.}~example. Matched tokens between the question and content snapshots are \textcolor{amaranth}{\underline{underlined}}. }

531 — 2005.08403

\caption{\bf \textcolor{SSECOL}{#1}}

532 — 2005.08455

\caption{ The comparison of different loss functions method. Models are trained in {\it \textcolor{blue}{mini-train}} and evaluated on {\it \textcolor{blue}{mini-val}}.}

\caption{ The effectiveness of concurrent softmax during testing. Models are trained in {\it \textcolor{blue}{mini-train}} and evaluated on {\it \textcolor{blue}{mini-val}}.}

\caption{ The comparison of different sampling methods. Models are trained in {\it \textcolor{blue}{mini-train}} and evaluated on {\it \textcolor{blue}{mini-val}}.}

\caption{ The effect of training scheduler. The $\lambda$ of the soft-balance is set to 0.7. {\it Non-balance I14} denotes the model of epoch 14 trained with non-balance strategy from ImageNet pretrain. {\it Non-balance S20} denotes the model of epoch 20 trained with non-balance strategy from scratch. % Epochs indicates the total epochs for training a model on Open Images Challenge dataset. % Since we are exploring the overall impacts of hybrid training, all models in this table are trained with the entire dataset, \ie, {\it \textcolor{blue}{full-train}} and evaluated on {\it \textcolor{blue}{full-val}}. Soft-balance$^*$ means that concurrent softmax is adopted in both training and testing stage. Models are trained on {\it \textcolor{blue}{full-train}} and evaluated on {\it \textcolor{blue}{full-val}}. }

533 — 2005.08559

\caption[]{Diphoton % signals published by CMS \cite{CMS13016} ({\definecolor{tmpclr}{rgb}{1.000,0.440,0.030}{\color{tmpclr}$\bullet$}}) % and ATLAS \cite{ARXIV13053315} ({\color{blue}$\bullet$}) and four-lepton % signals by CMS Collaboration \cite{PRD89p092007} ({\color{green}$\star$}) % and ATLAS \cite{ARXIV13053315} ({\definecolor{tmpclr}{rgb}{0.000,1.000,1.000}{\color{tmpclr}$\bullet$}}) signals; invariant-mass distributions of $\tau\tau$ in $e^{+}e^{-}\to\tau\tau (\gamma )$ ({\definecolor{tmpclr}{rgb}{0.500,0.000,1.000}{\color{tmpclr}$\diamond$}}), and $\mu\mu$ in $e^{+}e^{-}\to\mu\mu (\gamma )$ ({$\bullet$}) % by L3 \c1ite{PLB479p101}. (references in Refs.~\cite{threshold,Z057}). \\[-5mm] }

534 — 2005.08630

\caption{The \textbf{\algorithmname} architecture for lane marker detection. We extend general encoder-decoder architectures by adding successive horizontal reduction modules for end-to-end lane marker detection. Numbers under each block denote spatial resolution and channels. \textbf{(a)} Arrows with \reductionname\denote a horizontal reduction module of (b). Arrows with\textit{Conv} are output convolution with $1\times1$. Dashed arrows denote the global average pooling with a fully connected layer. \textbf{(b)} \reductionname\is utilized to compress the horizontal representation.$r$ denotes the pooling ratio for width part. Conv kernel size $k$ is set as 3 except the last \reductionname\layer which set as 1.}

\caption{\textbf{Learned representations on decoder and shared \reductionname\layers$_{1,2,3}$:} We visualize how features are encoded in different depths of our shared \reductionname\layers after decoder. For each layer (row), we visualize the first three principal components as RGB values at each spatial locations. We observe that the features become more distinctive, adapted to specific locations and disentangled in the later layers.}

535 — 2005.08995

\caption{Stronger positive correlations between accretion rates and star formation rates lead to steeper neighbour galaxy profiles around star-forming galaxies in the \UM{} simulations. We measure the shapes of the neighbour density distributions using a shape parameter to compare the inner ($0.05 < r < 0.316$ Mpc) and outer neighbour counts ($0.316 < r < 2.0$ Mpc; Eq.~\ref{eq:shape_ratio}). The \textcolor{blue}{blue lines} represent the analogues to the star-forming galaxies from the SDSS, and the \textcolor{red}{red lines} represent the analogues to the quiescent SDSS galaxies. The error bars represent the scatter across jackknife samples, and the \textbf{dashed vertical lines} represent $r_\mathrm{split}=0.316$ Mpc used in the shape parameter calculations. In these plots, the neighbour number density includes neighbours with $\log_{10}(M_*/M_\odot) > 9.0$. The top three panels depict different correlation strengths between dark matter accretion rates and SSFR (0\%, 50\%, and 100\% from left to right), and the bottom two panels depict negative correlation strengths (-50\% and -100\% from left to right). The inset table indicates the shape ratio (\S\ref{sec:shape_ratio}) for each panel, which compares the shape parameters (Eq.~\ref{eq:shape_ratio}) for the distributions. In the $\rho=0.0$ case (no correlation), the offset in the neighbour density distributions between star-forming and quiescent hosts is due to the quiescent sample having larger host halo masses.}

536 — 2005.09024

\caption{Distribution of the global lagged coefficients for the one factor model for workload (left) and recovery (right). {\color{myblue}$\boldsymbol{\circ}$} indicates 95\% credible interval. \label{Fig:AlphaGlobU}}

\caption{Distribution of the individual specific lagged coefficients for the workload covariate in the univariate latent factor model. {\color{myblue}$\boldsymbol{\circ}$} indicates 95\% credible interval.\label{Fig:AlphaIndUW}}

\caption{Distribution of the individual specific lagged coefficients for the recovery covariate in the univariate latent factor model. {\color{myblue}$\boldsymbol{\circ}$} indicates 95\% credible interval.\label{Fig:AlphaIndUR}}

\caption{Distribution of the global lagged coefficients for the two factor model for workload (left) and recovery (right). {\color{myblue}$\boldsymbol{\circ}$} indicates 95\% credible interval. \label{Fig:M1}}

\caption{Distribution of the individual specific lagged coefficients for the workload latent factor. {\color{myblue}$\boldsymbol{\circ}$} indicates 95\% credible interval. \label{Fig:M2}}

\caption{Distribution of the individual specific lagged coefficients for the recovery latent factor. {\color{myblue}$\boldsymbol{\circ}$} indicates 95\% credible interval. \label{Fig:M3}}

537 — 2005.09104

\caption{Node/element ratio after the first coarsening, given the average actual agglomerate sizes produced after the first coarsening. The black curve denotes the Greedy algorithm, the green {\textcolor{foliagegreen}{METIS \cite{karypis_metis-unstructured_1995}}} and the blue {\textcolor{matlabblue}{MGridGen \cite{moulitsas_multilevel_2001}}}}

\caption{Average connectivity after the first coarsening, given the average actual agglomerate sizes produced after the first coarsening. The black curve denotes the Greedy algorithm, the green {\textcolor{foliagegreen}{METIS \cite{karypis_metis-unstructured_1995}}} and the blue {\textcolor{matlabblue}{MGridGen \cite{moulitsas_multilevel_2001}}}}

\caption{Solve time given different spatial grid complexities for the size-based coarsening algorithms in 2D, for the problem shown in \fref{fig:2D_eg} meshed with 59k triangles. The different spatial grid complexities were produced by changing the desired agglomerate size for the first coarsening only (some of these desired agglomerate sizes and resulting spatial grid complexities are given in \tref{tab:2D_size}). On the lower grids, the desired agglomerate size was set to 4. The black curve denotes the Greedy algorithm, the green {\textcolor{foliagegreen}{METIS \cite{karypis_metis-unstructured_1995}}} and the blue {\textcolor{matlabblue}{MGridGen \cite{moulitsas_multilevel_2001}}}}

\caption{Solve time given different spatial grid complexities for the size-based coarsening algorithms in 3D, for the problem shown in \fref{fig:3D_eg} meshed with 223k tetrahedrons. The different spatial grid complexities were produced by changing the desired agglomerate size for the first coarsening only (some of these desired agglomerate sizes and resulting spatial grid complexities are given in \tref{tab:3D_size}). On the lower grids, the desired agglomerate size was set to 8. The black curve denotes the Greedy algorithm, the green {\textcolor{foliagegreen}{METIS \cite{karypis_metis-unstructured_1995}}} and the blue {\textcolor{matlabblue}{MGridGen \cite{moulitsas_multilevel_2001}}}}

538 — 2005.09133

\caption{Example of a failure to break two sentences due to a citation (\textcolor{red}{red text}).}

\caption{An examplar 1-to-2 alignment for clause breaking. The \textcolor{red}{red text} denotes the English clause corresponding to the first Chinese sentence. \textcolor{purple}{Sotagliflozin} is cited once in the English sentence, but repeated in two Chinese sentences.}

\caption{\textcolor{blue}{铂类-紫杉类} and \textcolor{blue}{贝伐珠单抗} were never seen by the baseline model and were translated incorrectly (\textcolor{red}{red text}).}

\caption{\textcolor{blue}{Olaparib} and \textcolor{blue}{bevacizumab} were not seen by the baseline model and were translated incorrectly (\textcolor{red}{red text}).}

539 — 2005.09241

\caption{Histograms of the most frequent answers for examples of question prefixes in \textcolor{blue}{VQA~v2} and \textcolor{green}{VQA-CP}. The training/test distributions in VQA-CP are approximately inverses of one another. This artefact is often exploited to obtain artificially-strong performance, explicitly (\textbf{issue 1}) or implicitly (\textbf{issue 2}). Unfortunately, {in-domain evaluation on VQA~v2} does not reveal this behaviour (\textbf{issue 3}) because it displays a more uniform distribution of answers (additional examples in supp. mat.).\label{figDistribs}}

\caption{Accuracy of the random-image regularizer. A higher weight ($\lambda$) seemingly improves the \textcolor{red}{accuracy on the OOD test set} but the \textcolor{blue}{in-domain accuracy} simultaneously drops. This can be assessed with the proposed held-out validation set (left), while the common practice of retraining on VQA~v2 (right) makes the effect far less obvious. See the supp. material for a breakdown by answer type.\label{figPlotReg}}

\caption{Histograms of the ten most frequent answers for every question prefix in \textcolor{blue}{VQA~v2} and \textcolor{green}{VQA-CP}. The last prefix is empty and is the ``catch all'' default. Stop words are omitted in the figure, making some prefixes appear identical, \eg \textit{What is} and \textit{What is the}. Best viewed electronically with magnification.\label{figDistribsFull}}

540 — 2005.09342

\caption{An example of two strings, \textcolor{red}{\texttt{GCG}} and \textcolor{yellow!70!blue}{\texttt{ATTCGATA}}, occurring in $G(S)$.}

541 — 2005.09387

\caption{ Top: de-reddened MS \kepler\stars with\mct\rotation periods, plotted on a\gaia\CMD. We removed photometric binaries and subgiants from the sample by excluding stars above the dashed lines. Bottom: a zoom-in of the top panel, with stars colored by their gyrochronal age\citep{angus2019}, instead of their rotation period. A general age gradient is visible across the main sequence. Since the \citet{angus2019} relation predicts that the oldest stars in the \mct\sample are late-G and early-K dwarfs, it is probably under-predicting the ages of late-K and early-M dwarfs.}

\caption{ Top: Rotation period vs effective temperature for stars in the \mct\sample, colored by the velocity dispersions of stars calculated over a grid in$\log_{10}$(period) and \teff\(this grid causes the quantized appearance). Black lines show gyrochrones from a gyrochronology model that projects the rotation-color relation of Praesepe to longer rotation periods over time \citep{angus2019}. These gyrochrones do not appear to reflect the evolution of field stars at long rotation periods/old ages because they do not trace lines of constant velocity dispersion. Gyrochrones are plotted at 0.5, 1, 1.5, 2, 2.5, 4 and 4.57 Gyr (Solar age) in both top and bottom panels. Bottom: Same as top panel with rotation period vs {\it mass} \citep[from][]{berger2020}. White lines show gyrochrones from a model that includes mass and age-dependent angular momentum transport between the core and envelope \citep{spada2019}. Qualitatively, these gyrochrones reflect the evolution of field stars at long rotation periods/old ages: they trace lines of constant velocity dispersion by reproducing periods of `stalled' surface rotational evolution for K-dwarfs. The data used to create this figure is available in table \ref{tab:data}. }

\caption{ Velocity dispersions for the entire \kepler\field divided by the velocity dispersions of stars with measured rotation periods in\mct, as a function of effective temperature. A larger ratio indicates that the overall \kepler\field is older, on average, than stars in the\mct\catalog. As this ratio approaches unity the two populations have similar kinematic ages. The large ratio for the hottest stars indicates that G dwarfs become inactive at young ages. This ratio approaches unity at low temperatures, showing that K and early M dwarf rotation periods are measurable over a large range of ages.}

\caption{ Top: rotation period vs. effective temperature for stars in the \mct\sample, separated into three groups. Blue circles show stars with rotation periods longer than the period gap, orange squares show stars with rotation periods shorter than the gap, but longer than the lower edge of the main rotation period distribution, and green triangles show stars with rotation periods shorter than this lower edge. Stars were separated into these three groups using\citet{angus2019} gyrochronology models, with the scheme shown in the legend. Bottom: the velocities of these groups of stars (in the direction of Galactic latitude, $b$) are shown as a function of rotation period. Only stars cooler than 5000 K are plotted in the bottom panel in order to isolate populations above and below the period gap, which only extends up to temperatures of $\sim$4600 K. The black line indicates the velocity standard deviation as a function of period. }

\caption{ This figure demonstrates the variance in the relationship between \vb\and\vz\for stars in the\kepler\field, based on the GUMS simulation. The panels show a kernel density estimator (KDE) (black solid line) for the\vz -- \vb\residuals of stars in the GUMS simulation at four different Galactic latitudes. Blue dashed lines show Gaussian fits to these KDEs. The distributions are close to Gaussian, with slightly heavy tails. The standard deviations of the Gaussian fits increase with Galactic latitude. This figure illustrates how using\vb\instead of\vz\artificially increases velocity dispersion, especially at high latitudes.}

\caption{ Coefficient values for the 7th-order polynomial used to estimate \teff\from\Gaia\\gcolor\color, calibrated in Curtis\etal\(2020, in prep).}

542 — 2005.09633

\caption{ The color map of the $12'\times 12'$ SDSS image of M64. North is up and east to the left. The image scale bar is shown on the bottom right. The large ellipse represents $a_{25,B}=5\farcm0$ where $a_{25,B}$ is the projected galactocentric distance at which the $B$-band surface brightness is 25 mag arcsec$^{-2}$. The location of the {\it HST}/ACS field is marked in rectangles. The small circle on the {\it HST}/ACS field marks the new globular cluster (M64-GC1) found in this study (see {\color{blue}\bf Figure \ref{fig_fig3}}). %Note that the center of the ACS field is located in the outer disk that is much fainter than the inner disk ($r<200''$) of M64. %(b) The color map of the {\it HST}/ACS field where M64-GC1 is marked by a box. }

543 — 2005.09737

\caption{Representative wakes of swept wings. $(a)$ Steady flow with tip vortex \protect\markertwo; $(b)$ unsteady shedding near midspan $\MyDiamond[draw={rgb,255:red,217; green,83; blue,25},line width=0.3mm, fill=white]$; $(c)$ steady flow with midspan structures \protect\markerthree ; $(d)$ and $(e)$ unsteady shedding near wing tip \protect\markerfive; $(f)$ steady flow with streamwise vortices \protect\markerfour. The figures are scaled for visual clarity. }

544 — 2005.09913

\caption{% Realtime factor of \gls{SAD} on an [Intel\textsuperscript \textregistered Xeon\textsuperscript \textregistered CPU E3-1240 v6 @ 3.70GHz, 8GB RAM]. }

545 — 2005.10040

\caption{For uniform $p_\mathbf{x}$, performance of \bluedashedline~US; \bluesolidline~US-LW; \reddashedline~IVR; \redsolidline~IVR-LW.}

\caption{For Gaussian $p_\mathbf{x}$, performance of \bluedashedline~US-IW; \bluesolidline~US-LW; \reddashedline~IVR-IW; \redsolidline~IVR-LW.}

\caption{For uniform $p_\mathbf{x}$, performance of \bluedashedline~US; \bluesolidline~US-LW; \reddashedline~IVR; \redsolidline~IVR-LW.}

\caption{For Gaussian $p_\mathbf{x}$, performance of \bluedashedline~US-IW; \bluesolidline~US-LW; \reddashedline~IVR-IW; \redsolidline~IVR-LW.}

546 — 2005.10091

\caption[]{\textbf{Identifying most relevant classes.} {\color{darkgreen}Success}/{\color{red}collision}/{\color{brown}timeout} percentages on the test environment (Town 02 Test Weather) of the CARLA NoCrash benchmark. For this ablation study, we use ground truth segmentation as inputs to the behavior cloning agent. Reduction from fourteen to seven or six classes leads to a slight increase in success rate, but further reduction to five classes leads to a large number of failures.}

\caption[]{\textbf{Comparing visual abstractions as annotation quantity is reduced.} {\color{darkgreen}Success}/{\color{red}collision}/{\color{brown}timeout} percentages on the test environment (Town 02 Test Weather). Mean over 5 random training seeds. Performance remains consistent with 6400 or 1600 annotated images, with a slight drop as the training dataset for the visual abstraction is reduced to 400 images.}

547 — 2005.10173

\caption{Observed ECG segments (black lines), $FMM_{ecg}$ fits (blue lines) and fiducial marks for $R$ wave( $\bullet$), $T$ wave({\color{green} $\star$ }), $P$ wave({\color{red}$+$}); for (a) NORMAL, (b) PACE, (c) RBBB, (d) APC, (e) PVC and (f) NOISY patterns.}

548 — 2005.10192

\caption{Illustration of predictors for different scenarios along the equilibrium path. $\circ$ and \textbullet \; represent the predicted and actual solutions, respectively. The direction of the predicted solution is denoted with a thick red dashed line.}

\caption{Hinged-clamped 215$^o$ arch: deformed shapes at eight different load steps. \mydashedline \, : original configuration and\mysolidline \, : deformed configuration.}

\caption{Semi-circular arch: deformed shapes at different load steps for symmetric loading. \mydashedline \, : original configuration and\mysolidline \, : deformed configuration.}

\caption{Semi-circular arch: deformed shapes at different load steps for asymmetric loading. \mydashedline \, : original configuration and\mysolidline \, : deformed configuration.}

549 — 2005.10413

\caption[]{Illustration of the five SFCs on a 3-D \texttt{mesh} topology. The curves start on the bottom left corner of the topology and proceed along the lines in the \textcolor{red}{red}-\textcolor{orange}{orange}-\textcolor{yellow}{yellow}-\textcolor{olive}{olive}-\textcolor{green}{green}-\textcolor{blue}{blue} order.}

\caption{Rectangles represent actions, while parallelograms represent information that actions need as input or produce as output. The shape colors represent different types of workflow steps: \textcolor{flowred}{red} steps relate to applications; \textcolor{flowblue}{blue} steps denote mapping-related activities; the \textcolor{floworange}{orange} step concerns machine topologies; and \textcolor{flowgreen}{green} steps indicate simulation, performance evaluation, and analysis.}

\caption[]{ Dilation for all applications, mappings, and topologies. Topologies are shown on the $X$ axis. The dilation values are shown on the $Y$ axis. Each mapping is represented with a distinct color. The \textcolor{sweepgreen}{green} horizontal line denotes the dilation of the \textcolor{sweepgreen}{\texttt{sweep}} mapping, which we consider as the default mapping on a given topology. Circles $\circ$ denote mappings based on \cmcount, while triangles $\triangledown$ indicate mappings based on \cmsize.}

\caption[]{ Simulated parallel execution cost and parallel communication cost for all applications, mappings, and topologies. Topologies are shown on the $X$ axis. The lower range on the $Y$ axis shows the MPI point-to-point cost while the upper range on the $Y$ axis shows the parallel cost. Each mapping is represented with a distinct color. Circles $\circ$ denote mappings based on \cmcount, while triangles $\triangledown$ indicate mappings based on \cmsize. The \textcolor{sweepgreen}{green} horizontal lines delineate the performance achieved by \textcolor{sweepgreen}{\texttt{sweep}} for both cost metrics. }

\caption[]{ Communication model time for all applications, mappings, and topologies. Topologies are shown on the $X$ axis. The $Y$ axis shows the sum of the transmission time over the transport layer according to the \ncdr model. The distinct colors represent the mappings. Circles $\circ$ denote mappings based on \cmcount, while triangles $\triangledown$ indicate mappings based on \cmsize. The \textcolor{sweepgreen}{green} horizontal lines delineate the performance achieved by \textcolor{sweepgreen}{\texttt{sweep}}. }

550 — 2005.10612

\caption{On the left a user with a HoloLens navigating a network shown on a shared display, moving their head from left to right. On the right their personal view in the HoloLens at the start (top) and end (bottom) of their movement. The augmented content consists only of the white visuals connecting the headset center of view (cursor) to a link of the network on the shared display (\tslide\shown, red marks added for illustration).% % }

\caption{Schematic representations of the two interactive metaphors and their visual variations tested in our experiments (the BaseLine techniques is not shown). Visuals in {\bf \textcolor[RGB]{89, 114, 215}{blue}} indicate content rendered inside the AR headset and {\bf black} visuals indicate content on the shared display. % }

551 — 2005.10722

\caption{Gas morphology for the simulation when it reaches a quasi-steady state at \textcolor{red}{T=XX} orbits. The left panel shows the entire gas disc where highlights the structures of the outer region ($r>90$ au), while the right panel shows a zoom-in at the inner region ($r<90$ au). The disc is seen face-on while the binary, located in the cavity, is perpendicular to the disc.}

552 — 2005.10737

\caption{Instantaneous coupling efficiency of three observations: HD~113496 (\textcolor{dred}{\textit{Red}}) with 400 AO modes correction, HD~89758 with 300 AO modes (\textcolor{dgreen}{\textit{Green}}), and HD~89758 with 153 AO modes (\textcolor{dpurp}{\textit{Purple}}). Shaded regions indicate uncertainty in the instantaneous measurement.}

553 — 2005.10857

\caption{Left: Summary of all absolute frequency measurements of the 5s$^2$~$^1$S$_0$ to 5s5p~$^3$P$_0$ clock transition in $^{87}$Sr since the CIPM first recommended its value in 2006. Measurements were recorded at JILA (\protect\tikz\fill[color={rgb:red,.36;green,1;blue,0.0}] (0,0) circle (.6ex);) \cite{Boyd2007b, Campbell2008}, University of Tokyo (\protect\tikz\fill[color={rgb:red,.5;green,0;blue,0}] (0,0) circle (.6ex);) \cite{Hong2009}, SYRTE (\protect\tikz\fill[color=yellow] (0,0) circle (.6ex);) \cite{Baillard2007, LeTargat2013, Lodewyck2016}, PTB (\protect\tikz\fill[color={rgb:red,0;green,.73;blue,0}] (0,0) circle (.6ex);) \cite{Falke2011b, Falke2014, Grotti2018, Koller2017}, NICT \cite{Matsubara2009, Yamaguchi2011, Hachisu2016, Hachisu2017} (\protect\tikz\fill[color={rgb:red,.27;green,0;blue,0.63}] (0,0) circle (.6ex);), NMIJ \cite{Akamatsu2014, Tanabe2015} (\protect\tikz\fill[color={rgb:red,0;green,.67;blue,0.6}] (0,0) circle (.6ex);), NIM \cite{Lin2015} (\protect\tikz\fill[color={rgb:red,0;green,.46;blue,0.87}] (0,0) circle (.6ex);), and NPL (\protect\tikz\fill[color=black] (0,0) circle (.6ex);). Also shown is the updated value for the transition frequency as recommended by the CIPM in 2017 (blue-shaded region) \cite{Riehle2018}. Right: Contribution from the systematic uncertainty of the strontium clocks -- neglecting gravitational redshift -- to the total uncertainty of each absolute frequency measurement. %Not included are results reported at JILA \cite{Nicholson2015} and Riken \cite{Ushijima2015}---which both have predicted uncertainties below those presented above---as no absolute frequency measurement exists for these systems. %However for the clocks at Riken, two independent systems were compared and showed agreement at the mid-$10^{-18}$ level. \label{fig:history}}

554 — 2005.10964

\caption{ Mean-flow profiles of both the $M_j = 0.4$ LES and RANS, where the RANS simulation was tuned to best match the LES mean flow. (a) presents the streamwise mean velocity at three radial locations, $r/D =$ {\solidrule} 0.25, {\dashedrule} 0.5, {\dottedrule} 1, versus streamwise distance from the nozzle, while (b) gives the streamwise mean velocity at three streamwise locations, $x/D =$ {\solidrule} 0.5, {\dashedrule} 5, {\dottedrule} 10, versus radial distance.}

555 — 2005.11362

\caption{Recurrent CNNs trained with backpropagation through time (\textcolor{BPTT}{BPTT}) have unstable dynamics and forget task information. This pathology is corrected by our \textit{Lipschitz Coefficient Penalty} (LCP). \textbf{(a)} Visualization of horizontal gated unit (hGRU) state spaces. Models were trained on Pathfinder-14, and state spaces were visualized by projecting hidden states onto each model's top-two eigenvectors. Grey dots are the 2D-histogram of projected hidden states, red contours are hidden state densities up to the task-optimized $N$ steps, and blue contours are hidden state densities beyond that point ($t>N$, for 40 steps). Exemplar dynamics for a single image are plotted in yellow. While dynamics of the \textcolor{BPTT}{BPTT}-trained model diverge when $t>N$, models trained with LCP did not. We refer to the learning algorithms of LCP-trained models as ``\textcolor{CBPTT}{contractor-BPTT}'' (\textcolor{CBPTT}{C-BPTT}) and ``\textcolor{CRBP}{contractor-RBP}'' (\textcolor{CRBP}{C-RBP}). \textbf{(b)} Model dynamics are reflected in their performance on Pathfinder-14. Segmentations evolve over time, as depicted by the colormap. While the \textcolor{BPTT}{BPTT}-trained hGRU is accurate at $N$ steps (red box), it fails when asked to process for longer ($t=T=40$, blue box). \textbf{(c)} Two-sample KS-tests indicate that the distance in state space between $t=N$ and $t=T$ hidden states is significantly greater for an hGRU trained with \textcolor{BPTT}{BPTT} than an hGRU trained with \textcolor{CBPTT}{C-BPTT} or \textcolor{CRBP}{C-RBP} (n.s.$\ =\ $ not significant).}

\caption{Enforcing contraction in recurrent CNNs improves their performance, parameter efficiency, and enables our constant-memory \textcolor{CRBP}{C-RBP} learning algorithm. \textbf{(a)} hGRU models were trained and tested on different versions of \textit{Pathfinder}. Only the version trained with \textcolor{CRBP}{C-RBP}, trained for 20 steps, maintained high performance across the three datasets. \textbf{(b)} \textcolor{CRBP}{C-RBP} models can rely on recurrent processing rather than spatially broad kernels to solve long-range spatial dependencies. \textcolor{BPTT}{BPTT} -trained models cannot practically do this due to their linear memory complexity. \textbf{(c)} LCP improves the stability of hGRU dynamics and, as a result, the generalization of learned visual routines for contour integration. Models were trained on Pathfinder-14, and tested on all three Pathfinder datasets. hGRUs trained with \textcolor{CRBP}{C-RBP} and \textcolor{CBPTT}{C-BPTT} generalized far better than a version trained with \textcolor{BPTT}{BPTT} or a 6-layer CNN control. Numbers above each curve denote the max-performing step.}

\caption{\textcolor{CRBP}{C-RBP} trained recurrent vision models outperform the feedforward standard on MS-COCO Panoptic Segmentation despite using nearly 800K fewer parameters. \textbf{(a)} Performance of our recurrent FPN-ResNet 50 trained with \textcolor{CRBP}{C-RBP} improves when trained with more steps of processing, despite remaining constant in its memory footprint. \textbf{(b)} Recurrent processing refines instance segmentations and controls false detections of the standard feedforward architecture (additional examples in SI). \textbf{(c)} Panoptic segmentation timecourses for an FPN-ResNet 50 trained with \textcolor{CRBP}{C-RBP} for 20 steps.}

\caption{Convolutional LSTMs trained with (\textcolor{BPTT}{BPTT}) exhibit unstable dynamics, like the \textcolor{BPTT}{BPTT}-trained hGRUs examined in the main text. Once again, LCP corrects this pathology. \textbf{(a)} Visualization of convLSTM and hGRU state spaces following the state space method described in Section~\label{sec:si_state_space}. Here, the BPTT-LSTM was trained for 6 steps, the \textcolor{CBPTT}{C-RBP LSTM} for 60 steps, and the \textcolor{CRBP}{C-RBP hGRU} for 40 steps. Grey dots are the 2D-histogram of projected hidden states, red contours are hidden state densities up to the task-optimized $N$ steps, and blue contours are hidden state densities beyond that point ($t>N$). Exemplar dynamics for a single image are plotted in yellow. While dynamics of the \textcolor{BPTT}{BPTT} trained model diverge when $t>N$, models trained with LCP did not. \textbf{(b)} Model dynamics are reflected in their performance on Pathfinder-14 at $t=N$ and $t=T$ steps of processing. \textbf{(c)} Two-sample KS-tests indicate that the distance in state space between $t=N$ and $t=T$ hidden states is significantly greater for the BPTT-trained convLSTM than for either of the models trained with \textcolor{CRBP}{C-RBP} (n.s.$\ =\ $ not significant).}

\caption{Additional state space analyses showed that alternatives to BPTT do not resolve the unstable dynamics we observed for recurrent CNNs. Here, \textcolor{BPTT}{BPTT per-step supervision} refers to a model which was optimized with a loss evaluated on each of its 6 steps of processing. \textbf{CBPTT}{T-BPTT} refers to a model trained with truncated backprop, for which gradients were accumulated over 3 steps of its 6 steps of processing. \textbf{(a,b,c)} The alternatives to \textcolor{BPTT} train models with unstable dynamics, which causes task information to be forgotten after the optimized $t=N$ steps of processing. The distances between $t=N$ and $t=T$ hidden states are significantly greater for hGRUs trained with these algorithms than for an hGRU trained with \textcolor{CRBP}{C-RBP} (n.s.$\ =\ $ not significant).}

\caption{Performance of hGRUs during training on \textit{Pathfinder} challenge datasets. The \textcolor{RBP}{RBP}-trained model struggles to fit any dataset, unlike the models trained with \textcolor{BPTT}{BPTT}, \textcolor{CBPTT}{C-BPTT}, or \textcolor{CRBP}{C-RBP}.}

\caption{The value of our LCP (computed with Eq.~6 in the main text) over the course of training for models that minimize it (\textcolor{CRBP}{C-RBP}, \textcolor{CBPTT}{C-BPTT}) and models that do not (\textcolor{RBP}{RBP}, \textcolor{BPTT}{BPTT}). In other words, the magnitude of this correlates with the stability/instability of model dynamics.}

\caption{Generalization performance for hGRUs and convLSTMs trained with BPTT and alternatives to BPTT. Models were trained on Pathfinder 14 and tested on Pathfinder 14/20/25. For reference, performance of the hGRU trained with C-RBP is plotted in both rows. BPTT per-step loss means that a loss was computed on each of the 6 steps of hGRU training, and weights were optimized with BPTT. In contrast, a loss was only calculated on the final step for \textcolor{BPTT}{BPTT}. TBPTT is truncated backprop through time, where gradients were computed over 3 steps of the 6 step dynamics. The LSTM trained with BPTT was trained for 6 steps, whereas the LSTM trained with C-RBP was trained for 60.}

556 — 2005.11527

\caption{Using advanced search with \red{forward object taint analysis} to uncover a caller chain of an interface method, \texttt{NetcastTVService\$1.run()}. Note that step 1 uses the basic signature search in \mysec\ref{sec:basicSearch}, the process of which is thus skipped.}

\caption{\blue{FlowDroid's call graph generation time for a set of 144 modern apps (under a timeout of 5 hours each).}}

557 — 2005.11728

\caption{Violin charts of the number of SQLi vulnerabilities identified by \texttt{DeepSQLi} (black lines) and \texttt{SQLmap} (\textcolor{gray}{gray lines}) on six SUT with \textit{essential} input validation across 20 runs.}

\caption{Violin charts of the number of SQLi vulnerabilities identified by \texttt{DeepSQLi} (black lines) and \texttt{SQLmap} (\textcolor{gray}{gray lines}) on SUT with \textit{advanced} input validation across 20 runs.}

\caption{Violin charts of the number of reductions of SQLi vulnerabilities identified by \texttt{DeepSQLi} (black lines) and \texttt{SQLmap} (\textcolor{gray}{gray lines}) on SUT with \textit{advanced} input validation across 20 runs.}

\caption{An illustrative example of \texttt{Beam search} with the beam width is 2 and the corpus size is 5, i.e., the possible SQL tokens to be chosen are \textcolor{blue}{\texttt{\itshape "1" }}, \textcolor{blue}{\texttt{\itshape "2" }}, \textcolor{blue}{\texttt{\itshape ">" }}, \textcolor{blue}{\texttt{\itshape "=" }} and \textcolor{blue}{\texttt{\itshape "OR" }} .}

558 — 2005.12189

\caption{Number of distinct crystals as function of dimensionless time. Separate experiments are denoted with different symbols. The dashed line (\protect\dashedblackline) indicates the scaling $ \propto \left(t/{\tau_c}\right)^\frac{5}{2}$ (equation~(\ref{eq:nucleationfull})). The capillary timescale $\tau_c=\left(\rho R_0^3 /\sigma\right)^{1/2}$. The colorbar shows the temperature difference $\Delta T=T_f - T_c$.}

559 — 2005.12227

\caption{Implementing \Det~using keying. \textcolor{red}{Red}: unknown to the opponent. \textcolor{orange}{Orange}: known by the opponent. \textcolor{blue}{Blue}: controlled by the opponent.}

560 — 2005.12383

\caption{Additional \aastex\symbols}

561 — 2005.12398

\caption{\label{man} A random sample of sentences from the WMT test sets and our proposed variations shown with `unexpected change' annotations ($\Delta Translation$). The cases where the unexpected change leads to a change in translation quality are marked in column $\Delta Quality$. \textcolor{colorzero}{\textbf{[$w_i$\textbackslash$w_j$]}} indicates that $w_i$ in the original sentence is replaced by $w_j$. $S$ is the original and modified source sentence, $R$ is the original and modified reference translation, $T$ is the translation of the original sentence, and $T_m$ is the translation of the modified sentence. Differences in translations related to annotations in the original and the modified translations are in \textcolor{colorone}{\textbf{red}} and \textcolor{colortwo}{\textbf{orange}}, respectively. Note that we are interested in \textit{unexpected} changes and do not highlight the changes that are a direct consequence of the modifications.}

562 — 2005.12402

\caption{{\bf Typical parameters of the immune model in Table \ref{tab:model_eqns}.} The parameters listed in this table are used to generate Fig.~\ref{fig:multi_infec}, while the other figures are created with slightly modified parameters as detailed in the Supplementary Information. \blue{ Most innate parameters were originally described in the Reynolds {\em et al.} model \cite{ReynoldsErmentrout2006}, while most adaptive parameters were originally described in the Stromberg and Carlson model \cite{StrombergCarlson2006}. Parameter values that are the same as those used in the original models are bold-faced.} \label{tab:Params}}

\caption{{\bf Transition to chronic inflammation (CI) is driven by depletion of naive cells and lack of protection from memory cells.} \blue{(a) The number of cognate T cells specific to a novel pathogen shape (equal to the sum of naive and memory cells) is the key indicator for whether an infection event will trigger the chronic inflammation steady state. Here, cognate T cell counts specific to an encountered pathogen $P_\ell$ are plotted for each infection event $\ell$ across 20 infection sequences sampled from Eq.~(\ref{eq:Ps}). The color of each point indicates the number of times that the encountered pathogen $P_\ell$ has been previously encountered. The colored bands are generated from 1000 infection sequences, and envelope the observed cognate cell counts. % %The abundance of cognate T cells (defined as the sum of naive and memory cells $N_{S_{\ell}}+M_{S_{\ell}}$) specific to pathogen shape $S_{\ell}$ in the $\ell$th infection event is dependent on the number of previous encounters with that pathogen shape. The large red circles in the lower-right corner mark the infections events that trigger chronic inflammation across all 1000 infection sequences, which occur when a novel pathogen is encountered after naive cells have been depleted below some threshold. A shorter time interval between pathogen encounters of the same shape results in less memory cell decay and hence more cognate T cells, and this effect causes the shape of the colored bands.} %Data are collected from 1000 randomly generated infection sequences and plotted is the range between the maximum and minimum cognate T cell abundance at each infection event $\ell$. %(b-d) Permuting the order of the infection events alters the rate of naive cell consumption and the timing of the onset of chronic inflammation. (b) We consider three synthetic reorderings of each ``authentic'' randomly generated pathogen sequence: the clustered sequence orders pathogens according to their prevalence; the cyclic sequence orders them to ensure immediate exposure to all pathogen types; % while naive cells are still plentiful; and the incomplete cyclic sequence induces fragility by quickly depleting naive cells and then introducing a novel pathogen. (c) The authentic sequence and three synthetic sequences transition to chronic inflammation (CI) at different times (black crosses). The pathogen shape distribution for this infection history (right histogram) is drawn from the theoretical shape distribution (black line overlaid) given by Eq.~(\ref{eq:Ps}). (d) The naive cell pool is depleted at different rates depending on how infection events are ordered. Naive cell counts and their variation across the 50 authentic sequences considered in panel (a) are shown for the three synthetic sequences. Error bars for the timing of chronic inflammation are 50\% confidence intervals. %\note{a/b/c/d formatting} %\note{I am not sure how these statistics were generated}\note{for each of the 50 authentic sequences considered in 5a, I generated the 3 synthetic sequences-so 150 synthetics in total-and solved for the immune response using these synthetics. 5d shows statistics on the timing of sterile inflammation and level of naive cells from both authentic sequences and synthetics} %Shaded area: 1 std; Solid dots and horizontal error bars: median and 25\%, 75\% quartiles of ASD onset’s timing. Authentic sequences are from 50 simulations and three reordered sequences are generated for each authentic sequence. \label{fig:ASD_mechanism}}

563 — 2005.12606

\caption{The stability analysis of the whistler temperature anisotropy instability (WTAI) and the whistler heat flux instability (WHFI) of a plasma with cold ions and electron population consisting of a dense thermal core and a tenuous suprathermal halo described by Maxwell and $\kappa-$distributions, respectively. Parameters $n_{\alpha}$, $u_{\alpha}$, $T_{\alpha}$ and $A_{\alpha}$ are density, parallel drift velocity (in the plasma rest frame), parallel temperature and temperature anisotropy (perpendicular to parallel temperature ratio) of the core ($\alpha=c$) and halo ($\alpha=h$) populations; $\beta_{c}=8\pi n_{c}T_{c}/B_0^2$ is the core parallel beta. The upper panels present unstable electron VDFs with corresponding parameters indicated in the panels; velocities parallel and perpendicular to a uniform quasi-static magnetic field, $v_{||}$ and $v_{\perp}$, are normalized to $v_{Ae}=B_0/(4\pi n_0 m_{e})^{1/2}$. The middle and bottom panels present whistler wave growth rates, $\gamma/\omega_{ce}$ versus $kc/\omega_{pe}$, for several values of halo temperature anisotropy $A_{h}$ and core drift velocity $u_{c}/v_{A}$, where $\omega_{ce}$ and $\omega_{pe}$ are electron cyclotron and plasma frequencies, $v_{A}=B_0/(4\pi n_0 m_{i})^{1/2}$ is the Alfv\'{e}n velocity. The growth rates are computed for several values of power-law index $\kappa$ including \blue{$\kappa\rightarrow \infty$}, which corresponds to \blue{Maxwellian halo population}. The panels also \blue{present exact whistler wave dispersion curves} (dashed grey), $\omega/\omega_{ce}$ versus $kc/\omega_{pe}$, \blue{as well as whistler wave dispersion curves (dashed black) valid in a cold plasma at $\omega\gg \omega_{ci}$: $\omega=\omega_{ce} k^2c^2/(k^2c^2+\omega_{pe}^2)$ \cite{Mikhailovskii74,Shklyar04}}.\label{fig1}}

\caption{The maximum growth rate $\gamma_{\rm max}/\omega_{ce}$ of the WHFI in dependence on core drift velocity $u_{c}/v_{A}$ and electron heat flux $q_{e}/q_0$ (upper horizontal axes), where $v_{A}$ is the Alv\'{e}n velocity and $q_0$ is \blue{the free-streaming heat flux value\cite{Cowie77,Gary1999b} defined as $q_{0}=1.5 \;n_0\;T_{e}\;(2T_{e}/m_{e})^{1/2}$, where $n_0=n_c+n_h$ is the total electron density, $T_{e}=(n_cT_c+n_hT_h)/n_0$}. The growth rates were computed for various values of power-law index $\kappa$ of the halo $\kappa-$distribution (indicated by color). The various panels corresponds to maximum growth rates computed at various $(\beta_{c},T_{h}/T_{c})$. In all computations the core and halo populations were isotropic ($A_{c}=1$ and $A_{h}=1$) and the density of the core population is $n_{c}=0.95\;n_0$. The shaded regions indicate the range of electron heat flux values typical of the solar wind, i.e. $q_{e}\lesssim q_0$ \blue{\cite{Tong19b:apj,Wilson19b}}.\label{fig2}}

\caption{The frequency $\omega_{\rm max}/\omega_{ce}$ and wave number $k_{\rm max}c/\omega_{pe}$ of the fastest growing whistler waves in dependence on $u_{c}/v_{A}$ and $q_{e}/q_0$ (upper horizontal axes) at $T_{h}/T_{c}=4$ and various values of \blue{core electron beta $\beta_{c}$} and \blue{power-law index} $\kappa$. The corresponding growth rates are shown in the first column of Figure \ref{fig2}. Only panels corresponding to $T_{h}/T_{c}=4$ are demonstrated, because that parameter does not critically affect the frequency and wave number of the fastest growing whistler waves (see Section \ref{sec3} for details). The shaded regions indicate the range of electron heat flux values typical of the solar wind, i.e. $q_{e}\lesssim q_0$ \blue{\cite{Tong19b:apj,Wilson19b}}. \label{fig3}}

\caption{The stability analysis of parallel and anti-parallel whistler waves at a fixed value of the halo temperature anisotropy, $A_{h}=1.3$, and various values of core drift velocity $|u_{c}|/v_{A}$: (a) dispersion curves of parallel ($kc/\omega_{pe}>0$) and anti-parallel ($kc/\omega_{pe}<0$) whistler waves; dashed black curves represent the whistler wave dispersion curves in a cold plasma \blue{at $\omega\gg \omega_{ci}$} \blue{\cite{Mikhailovskii74,Shklyar04}}: $\omega=\omega_{ce} k^2c^2/(k^2c^2+\omega_{pe}^2)$; (b) the growth rates of parallel and anti-parallel whistler waves; the growth rate computed at $u_{c}=0$ corresponds to the whistler temperature anisotropy instability (WTAI), which produces identical parallel and anti-parallel whistler waves. The stability analysis was performed at $\beta_{c}=1$, $T_{h}/T_{c}=6$ and \blue{$\kappa\rightarrow\infty$}.\label{fig4}}

\caption{The \blue{properties} of the fastest growing parallel and anti-parallel whistler waves computed at a fixed value of the halo temperature anisotropy, $A_{h}=1.3$, and various values of power-law index $\kappa$ and core drift velocity $|u_{c}|/v_{A}$: (a) the growth rate $\gamma_{\rm max}/\omega_{ce}$, (b) frequency $\omega_{\rm max}/\omega_{ce}$ and (c) wave number $|k_{\rm max}|c/\omega_{pe}$. The parameters of parallel and anti-parallel whistler waves are shown by solid and dashed curves, respectively. The other parameters are $\beta_{c}=1$ and $T_{h}/T_{c}=6$.\label{fig5}}

\caption{The properties of the fastest growing parallel and anti-parallel whistler waves computed at a fixed \blue{value} of the halo temperature anisotropy, $A_{h}=1.3$, \blue{and various values of core drift velocity $|u_{c}|/v_{A}$ and halo to core parallel temperature ratio $T_{h}/T_{c}$.} The format of the figure is identical to that of Figure \ref{fig5}. The other parameters are $\beta_{c}=1$ and \blue{$\kappa\rightarrow \infty$}.\label{fig6}}

\caption{The properties of the fastest growing parallel and anti-parallel whistler waves computed at a fixed value of the halo temperature anisotropy, $A_{h}=1.3$, and various values of core parallel beta parameter $\beta_{c}$ and core drift velocity $|u_{c}|/v_{A}$. The format of the figure is identical to that of Figure \ref{fig5}. The other parameters are $T_{h}/T_{c}=6$ and \blue{$\kappa\rightarrow \infty$}.\label{fig7}}

\caption{\blue{The properties of the fastest growing parallel and anti-parallel whistler waves computed at various values of the halo temperature anisotropy $A_{h}=T_{h\perp}/T_{h||}$ and core drift velocity $|u_{c}|/v_{A}$. The properties of parallel whistler waves in panels (e)-(d) are demonstrated for $0.8 \leq A_{h}\leq 1.5$, while the properties of anti-parallel whistler waves are shown for $1.1\leq A_{h}\leq 1.5$, because anti-parallel whistler waves are stable at $A_h\leq 1$. The format of panels (a)-(c) and (e)-(d) is identical} to that of Figure \ref{fig5}. The other parameters are $\beta_{c}=1$, $T_{h}/T_{c}=6$ and \blue{$\kappa\rightarrow\infty$}.\label{fig8}}

564 — 2005.12712

\caption{A walking agent (red-encircled) starts walking in the green source area and tries to reach the brown target area while the agent is blocked by a waiting crowd consisting of 13 agents. The colours represent the current behaviour of an agent: \textcolor{TargetOrientedColor}{Blue} is \textcolor{TargetOrientedColor}{target-oriented behaviour} and \textcolor{CooperativeColor}{green} is \textcolor{CooperativeColor}{cooperative behaviour}. \ii{Time step 1} When the simulation starts, all agents are target-oriented. While the walking agent is attracted by the brown target, the waiting crowd does not have a target and waits. \ii{Time step 4} The agents of the waiting crowd get cooperative because their speed falls below a certain threshold. \ii{Time step 29} The walking agents reaches the waiting crowd and cannot move anymore. Thus, the walking agent also gets cooperative. The walking agent searches for a swap candidate (orange-encircled) and both swap positions.}

\caption{Cooperative behaviour of agents inside the waiting crowd. The colours represent the current behaviour of an agent: \textcolor{TargetOrientedColor}{Blue} is \textcolor{TargetOrientedColor}{target-oriented behaviour} and \textcolor{CooperativeColor}{green} is \textcolor{CooperativeColor}{cooperative behaviour}. \ii{Time step 31} After swapping positions, the walking agent (red-encircled) and the swap candidate (orange-encircled ) get target-oriented again because their speed is above a certain threshold. \ii{Time step 36} The walking agents gets cooperative again and swaps position with another cooperative agent which is closer to the target. \ii{Time step 51} The walking agent found its way through the dense crowd by using a cooperative behaviour.}

565 — 2005.12869

\caption{Additional \aastex\symbols}

566 — 2005.13032

\caption{Resilience of {\em SeqL} for Pipelined Combinational Benchmarks for $5\%$ logic locking. {\em '\bluecheck' $ $ is secure and '\redx' $ $ is insecure.}}

\caption{Key Assignment Graph (KAG) for circuit in Figure~\ref{fig:eff-seql-scansat}(c). $KAG$ is a binary tree with dummy root node, the leaves of which correspond to the rows in Table~\ref{tab:truth-table-sequential-lock} whose scan-correctness column is \color{blue}{TRUE}.}

567 — 2005.13109

\caption{Legend: \protect\tikz{\protect\node[fill=mygrey2,draw=black]{};}\; EDD\protect\tikz{\protect\node[fill=mygrey3,draw=black]{};}\; Hungarian\protect\tikz{\protect\node[fill=mygrey4,draw=black]{};}\; MCTS\protect\tikz{\protect\node[fill=mygrey5,draw=black]{};}\; Q-Learning\protect\tikz{\protect\node[fill=myorange,draw=black]{};}\; SCoBA. On the metric of the fraction of unsuccessful tasks, i.e. objects missed, SCoBA consistently outperforms all other baselines. All results are averaged over$100$ trials, with $\horizon = 500$ time-steps per trial.}

\caption{Legend: \protect\tikz{\protect\node[fill=mygrey2,draw=black]{};}\; EDD\protect\tikz{\protect\node[fill=mygrey3,draw=black]{};}\; Hungarian\protect\tikz{\protect\node[fill=mygrey4,draw=black]{};}\; MCTS\protect\tikz{\protect\node[fill=myorange,draw=black]{};}\; SCoBA. For the drone delivery domain, on the primary metric of the fraction of late package deliveries, SCoBA outperforms the baselines on all but one setting. Results are averaged over$100$ trials each of $\horizon = 100$ time-steps. }

568 — 2005.13119

\caption{Accuracy Results on Three datasets. Better results between baselines and corresponding ITA models are in \textbf{BOLD} and the best results on datasets are in \textcolor{red}{RED}. The \textit{Random} is a script that making random decisions according to the positive/negative samples rate.}

\caption{Multiple Metrics Results on Three Datasets. Better results between baselines and corresponding ITA models are in \textbf{BOLD} and best results on datasets are in \color[HTML]{CB0000} \textbf{RED}.}

\caption{The effects of different types of imaginators in ITA Framework. All the model adapt the TextCNN as the classification model. The baseline is the TextCNN arbitration model without the imaginators model. %Results of the different imaginators generation performance (in BLEU score) and its ITA models accuracy score with the same TextCNNs based arbitrator. The \textit{Agent} and \textit{User} columns are the BLEU score of imaginators generated queries or answers. And \textit{Wait-or-Answer} columns are ITA model's accuracy score. Better results between imaginators are in \textbf{BOLD} and best results on datasets are in \color[HTML]{CB0000} \textbf{RED}.}

569 — 2005.13180

\caption{Types of annotation corrections performed by the ACN when trained with 800 images. \textcolor{green}{Green shows corrected annotations}. \textcolor{blue}{Blue shows misaligned annotations.}}

\caption{Sample images showing PSN performance when trained with corrected annotations. \textcolor{blue}{Blue footprints} show ACN-corrected annotations. \textcolor{green}{Green footprints} show PSN-predicted annotations trained with ${\alpha=Het.}$ and 400 ACN-corrected labels. PSN performance is dependent on the quality of corrected annotations.}

\caption{\textcolor{blue}{Hand-labelled annotations}, \textcolor{red}{OSM annotations} and \textcolor{green}{ACN-corrected annotations}. The ACN is trained on 400 images from Western Kenya and Nairobi, and improves label quality despite the noisier training data.}

\caption{Sample images and ground truth labels showing cropland extent in California; also shown in green are \textcolor{green}{PSN and lightUNet predicted footprints} $\alpha = 0.75$, overlaid on \textcolor{blue}{true cropland polygons}, shown in blue. PSN predictions remain highly accurate. Comparatively, the lightUNet predicts only a portion of the crop extents correctly}

570 — 2005.13265

\caption{\add Examples of {\color{comment}two} different kind of samples inputted for the CNN and the prediction results. On the left are negative samples that do not precede a T1 event within 0.1 seconds and on the right are positive samples which precede a T1 event. The sample types are starting from the first row: vertex centered grayscale image and bubble centered grayscale image. The texts ``Correct" and ``Incorrect" in the figure indicate the success of the CNN prediction.}

571 — 2005.13346

\caption{The IRX of ASAGAO sources as a function of their stellar mass (red circles). We also show the ALMA\textcolor{blue}{-}non-detected ZFOURGE sources \citep{straatman2016} within the ASAGAO field, ALESS sources \citep{dacunha2015}, and ALMA\textcolor{blue}{-}selected sources by \cite{dunlop2017}. The thick shaded blue line shows the {consensus relation compiled} by UV-selected galaxies at $z\sim$ 2--3 \citep{bouwens2016}. \label{fig:M_vs_IRX}}

\caption{The IRX of ASAGAO sources as a function of their SFRs (red circles). We also show the ALMA\textcolor{blue}{-}non-detected ZFOURGE sources within the ASAGAO field \citep{straatman2016}, ALESS sources \citep{dacunha2015}, and ALMA\textcolor{blue}{-}selected sources by \citet{dunlop2017}. \label{fig:SFR_vs_IRX}}

\caption{The IRX of the ASAGAO sources as a function of $\beta_\mathrm{UV}$ (red circles). Black crosses indicate ALMA\textcolor{blue}{-}non-detected ZFOURGE sources \citep{straatman2016} within the ASAGAO field. {The blue solid and dashed lines are the IRX-$\beta_\mathrm{UV}$ relations of \citet{meuer1999} and \citet{takeuchi2012}, respectively.} \label{fig:IRX-beta}}

572 — 2005.13610

\caption{Rate Constants for Top and Bottom MUXes for Two 8-Stage PUFs that Generate Different Responses\textcolor{red}{$^*$}}

\caption{Rate Constants for Top and Bottom MUXes for Two 16-Stage PUFs that Generate Different Responses\textcolor{red}{$^*$}}

573 — 2005.13681

\caption{Performance of all models relative to `\textit{Baseline Cascade}' ($\Delta=0$) across our 3 resource conditions. \\ \textbf{\orange{Cascaded}} models in \orange{\textbf{orange}}, \textbf{\purple{end-to-end}} models in \purple{\textbf{purple}}. Our proposed models yield improvements across all three conditions, with a widening margin under low-resource conditions for the phone cascade.}

\caption{\textbf{Baseline results} for end-to-end and cascaded speech translation models, with component ASR and MT model performance for cascades (\blue{blue}). ASR results in WER$\downarrow$ and translation results in BLEU$\uparrow$.}

574 — 2005.13708

\caption{Tracking results on VOT2018. (The best three results are highlighted by {\color{red}{red}}, {\color{blue}{blue}} and {\color{brown}{brown}}.)}

\caption{Tracking results on VOT2019. (The best three results are highlighted by {\color{red}{red}}, {\color{blue}{blue}} and {\color{brown}{brown}}.)}

575 — 2005.14341

\caption{Comparison of the five forums in our dataset. \reddit does not provide viewership information (marked by dashes). $^\alpha$\,Calculated via qualitative analysis of random samples of 250 threads per non-spam forum, see Section~\ref{sec:qual-methods}.}

576 — 2005.14609

\caption{$(a)$ Bifurcation of steady solutions with $Ri$ for given $Q \in [-2.5,2.5]$. Here, each contourline in the $U(0)-Ri$ plane indicates the value of $Q$: black, $Q=0$; blue to green, $Q \in [0.1,2.5]$ with $0.2$ increment; red to yellow, $Q \in [-2.5,-0.1]$ with $-0.2$ increment, while the line type represents the stability of the solution: \protect \solid, stable; \protect \dashed, unstable. Meanwhile, the dotted black lines (\protect \dotted) shows the bifurcation by varying $Q$ for given $G=0$. The bifurcation point ($Ri_c$) from (\ref{eq:Ri_c}) is denoted by magenta crosses (x), while the slope of the bifurcation curve computed from (\ref{eq:amp_eq}) is indicated by the short magenta dot-dash segment (\protect \dashdot). ($b$-$i$) Comparison of the nonlinear steady solution (blue lines) with the corresponding (appropriately normalised) linear instability mode $u_1(r)$ for each branch of $Q=0$ with ($b$-$e$) $U(0)=1$ and ($f$-$i$) $U(0)=-1$. }

\caption{Continuations of the steady solution emerged from the first bifurcation point with $U(0)<0$. $(a)$ $U(0)$(\protect \dashdot) , $Ri$(\protect \dotted) and $N(1)$(\protect \solid) for several $G$ on increasing $Q$ from $0$. $(b)$ The relation between $G$ and $Q$ for several fixed $Ri$. Here, in the inset, the maximum achievable flow rate $Q_{\max}$ is plotted for each $Ri$. \label{fig:contG}}

577 — 2005.14684

\caption{The performance (\%) comparison on VeRi776. The {\color{red}{red}}, {\color{green}{green}} and {\color{blue}{blue}} rows respectively represent the {\color{red}{$1$st}}, {\color{green}{$2$nd}} and {\color{blue}{$3$rd}} places, according to the R1 comparison. }

\caption{The performance (\%) comparison on VehicleID. % {\color{red}{red}}, {\color{green}{green}} and {\color{blue}{blue}} rows respectively represent the {\color{red}{$1$st}}, {\color{green}{$2$nd}} and {\color{blue}{$3$rd}} places, according to the average R1 comparison. %among the proposed flexible latent part interaction network (FLPIN) method and state-of-the-art approaches The {\color{red}{red}}, {\color{green}{green}} and {\color{blue}{blue}} rows respectively represent the {\color{red}{$1$st}}, {\color{green}{$2$nd}} and {\color{blue}{$3$rd}} places, according to R1 comparison.}

\caption{The performance (\%) comparison on VERI-Wild. The {\color{red}{red}}, {\color{green}{green}} and {\color{blue}{blue}} rows respectively represent the {\color{red}{$1$st}}, {\color{green}{$2$nd}} and {\color{blue}{$3$rd}} places, according to the R1 comparison.}

578 — 2005.14709

\caption{Dataset by publication year with {\color{red}no} or {\color{blue}any} spurious correlations detection methods applied; applied in a {\color{orange}later} publication; created using {\color{green}adversarial} filtering, or {\color{yellow}both}.}

579 — 2006.00044

\caption{Plant and plant-controller interaction modeling: (a) $\mathbf{TPN}$ model of the pick~\&~place station;% ($[\underline{t}_{p\&p}^{proc},\overline{t}_{p\&p}^{proc}]$/$[\underline{t}_{p\&p}^{ret},\overline{t}_{p\&p}^{ret}]$ denote the range of pick~\&~place/return times); (b) extended model of $LC_2$ from~Fig.~\ref{fig:exampleCIPN}(c) with TPN-compatible sensing/actuation -- the model is not a $\mathbf{TPN}$ as it still relies on the communication API (in {\color{red}red}) for interaction with other LCs; (c) model of incoming workpieces with a lower bound on the workpiece interarrival.}

\caption{Sensing/actuation signal timings for a nominal pick~\&~place run (a), a run where a signal injection is performed resulting in a dropped workpiece (b), and a run where progress is inhibited due to a DoS attack (c). Messages exchanged by LCs are marked with blue arrows.$X$ axis is unlabeled as the speed of the workcycle can be controlled by regulating air pressure in the system and is thus not crucial.}

580 — 2006.00171

\caption{ Sparse view CT image reconstruction on AAPM-8 and Pancreas-210 test sets. Different level Gaussian noise is added to the sinogram. \textcolor{red!50}{Red} and \textcolor{blue!50}{blue} indicate the best and the second best performance, respectively. }

581 — 2006.00182

\caption{Comparison of the mean velocity profile obtained using DaVis (\bcircle) and the single pixel routine (\bsquare) for flow at station 5 over the smooth wall. %The solid line represents DNS data from \citet{schlatter2010assessment}. }

\caption{The mean velocity profile acquired using the single pixel method is presented in deficit form in panel (a) and in inner-normalized units in panel (b). Black squares (\bsquare) show measurements made over the smooth wall at station 5 while white squares (\wsquare) show measurements made over the porous substrate at the same location.}

\caption{Inner-normalized mean velocity profiles measured at station 1 upstream of the cutout (a) and at station 5 ($x/h = 44$) where the flow is fully developed over the porous substrate (b). In both plots, white squares (\wsquare) show measurements from the porous substrate experiments while black squares (\bsquare) show measurements made with the smooth wall insert in place. The solid lines ($-$) show mean profiles obtained in DNS by \citet{schlatter2010assessment} at $\Ret \approx 250$ (a) and at $\Ret \approx 360$ (b). The dashed line (- -) in panel (b) shows a shifted linear profile of the form $U^+ = \kpxx + y^+$.}

\caption{Mean turbulence statistics for smooth and porous cases for station 5 at $x/h = 44$. Mean velocity profiles are shown in (a), profiles of the root-mean-square streamwise and wall-normal velocity fluctuations are shown in (b) and (c), respectively. The Reynolds shear stress profile is shown in (d). Statistics for the smooth wall and porous substrates are shown as black circles (\bcircle) and white circles (\wcircle) respectively. The black (\bsquare) and white squares (\wsquare) in (a) show the single-pixel mean profile estimates.}

582 — 2006.00444

\caption{Intrinsic dimensionality is the maximum slope of the smoothed \textcolor{blue}{blue} curve of $\mathit{ln(r)}$ vs $\mathit{ln}(C(r))$ (see the \textcolor{orange}{orange} line). }

583 — 2006.00678

\caption{ Incident pattern in the disk restframe, in terms of luminosity $L = \int F_E {\rm d}E$, for the case of the truncated disk plus precessing corona (i.e., Fig. \ref{schematic}). The truncation radius $R_{\rm tr} = 90$. The BH spin $a=0.3$ is assumed. The colorbar represents the scaled photon counts (in logarithm) intercepted by the disk. The difference in intercepted luminosity could be up to 1.8 orders of magnitude. The four panels (a,b,c,d) correspond to the four precession angles $\gamma/2\pi = 0$, $1/4$, $1/2$, and $3/4$, respectively. In each panel, the blue arrow at the center represents the projection of the corona axis $\bm{J_{\rm C}}$ on the disk plane. Animated versions of these plots can be viewed at and downloaded from \color{blue}\smash{http://202.127.29.4/AGN/beiyou/QPO/index\_QPO.html}\color{black} \label{irradiation_pattern_r90}}

\caption{ Incident pattern in the disk restframe, in terms of luminosity $L = \int F_E {\rm d}E$, for the case of the truncated disk plus precessing corona (i.e., Fig. \ref{schematic}). The truncation radius $R_{\rm tr} = 10$. The BH spin $a=0.3$ is assumed. The colorbar represents the scaled photon counts (in logarithm) intercepted by the disk. The difference in intercepted luminosity could be up to one order of magnitude. The four panels (a,b,c,d) correspond to the four precession angles $\gamma/2\pi = 0$, $1/4$, $1/2$, and $3/4$, respectively. In each panel, the blue arrow at the center represents the projection of the corona axis $\bm{J_{\rm C}}$ on the disk plane. Animated versions of these plots can be viewed at and downloaded from \color{blue}\smash{http://202.127.29.4/AGN/beiyou/QPO/index\_QPO.html}\color{black} \label{irradiation_pattern_r10}}

\caption{ The reflection pattern in the observer restframe, in terms of luminosity $L = \int F_E {\rm d}E$, for the truncation radius $R_{\rm tr} = 90$, being observed with middle inclination angle $\cos \theta \sim 0.5$. The BH spin $a=0.3$ is assumed. The colorbar represents the scaled luminosity (in logarithm) intercepted by the disk. The four panels (a,b,c,d) correspond to the four precession angles $\gamma/2\pi = 0$, $1/4$, $1/2$, and $3/4$, respectively. In each panel, the blue arrow at the center represents the projection of the corona axis $\bm{J_{\rm C}}$ on the disk plane. The cartoon of eye indicates the azimuthal position of observer $\varphi = 0$ at infinity. Animated versions of these plots can be viewed at and downloaded from \color{blue}\smash{http://202.127.29.4/AGN/beiyou/QPO/index\_QPO.html}\color{black} \label{mid_refl_pattern_r90}}

\caption{ The observed Fe K$\alpha$ line for large truncation radius $R_{\rm tr} = 90$ and mid inclination angle $\cos \theta \sim 0.5$. The black, blue, green and red profile correspond to four specific precession angles $\gamma/2\pi = 0$, $1/4$, $1/2$, and $3/4$, respectively. The BH spin $a=0.3$ is assumed. Animated versions of these plots can be viewed at and downloaded from \color{blue}\smash{http://202.127.29.4/AGN/beiyou/QPO/index\_QPO.html}\color{black} \label{line_shape_r90}}

\caption{ The reflection pattern in the observer restframe, in terms of luminosity $L = \int F_E {\rm d}E$, for the truncation radius $R_{\rm tr} = 10$, being observed with middle inclination angle $\cos \theta \sim 0.5$. The BH spin $a=0.3$ is assumed. The colorbar represents the scaled luminosity (in logarithm) intercepted by the disk. The four panels (a,b,c,d) correspond to the four precession angles $\gamma/2\pi = 0$, $1/4$, $1/2$, and $3/4$, respectively. In each panel, the blue arrow at the center represents the projection of the corona axis $\bm{J_{\rm C}}$ on the disk plane. The cartoon of eye indicates the azimuthal position of observer $\varphi = 0$ at infinity. Animated versions of these plots can be viewed at and downloaded from \color{blue}\smash{http://202.127.29.4/AGN/beiyou/QPO/index\_QPO.html}\color{black} \label{mid_refl_pattern_r10}}

\caption{ The observed Fe K$\alpha$ line for large truncation radius $R_{\rm tr} = 10$ and mid inclination angle $\cos \theta \sim 0.5$. The black, blue, green and red profile correspond to four specific precession angles $\gamma/2\pi = 0$, $1/4$, $1/2$, and $3/4$, respectively. The BH spin $a=0.3$ is assumed. Animated versions of these plots can be viewed at and downloaded from \color{blue}\smash{http://202.127.29.4/AGN/beiyou/QPO/index\_QPO.html}\color{black} \label{line_shape_r10}}

584 — 2006.00836

\caption{\textcolor{orange}{ Metrics for upsampled results of coarse classifications alone with no fine-grained component (see Table \ref{exp1_table}). Complete images of training and test data are downsampled, and resulting patch classifications are upsampled to full resolution. } }

585 — 2006.00838

\caption{Relative clause example. Pre-existing edges in graph are in magenta and blue. The algorithm observes an \texttt{acl:relcl} relation (highlighted in blue) which causes it to generate two new relations (highlighted in green). A \texttt{ref} relation is created between \textcolor{depred}{\textbf{{who}}} and its antecedent, \textcolor{deporange}{\textbf{{jesters}}}. Then a \texttt{nsubj} is propagated from the head of \textcolor{depred}{\textbf{{who}}}, \textit{ruined}, to \textcolor{deporange}{\textbf{{jesters}}}.}

\caption{Conjunction example. Magenta and blue edges are those existing in the graph after one pass. During the first pass \textcolor{deporange}{\textbf{{elves}}} is stored as it is the dependent of a \texttt{conj} relation (highlighted in blue). On the second pass the \texttt{obj} relation of \textcolor{depred}{\textbf{{dwarves}}}, the head of this \texttt{conj} relation, propagates to \textcolor{deporange}{\textbf{{elves}}} generating a new \texttt{obj} relation (highlighted in green) from \textit{angered}.}

\caption{Control example. The edges of the graph after two passes are in magenta and blue. During the first pass \textcolor{deporange}{\textbf{{weep}}} is stored as it is a dependent of a \texttt{xcomp} relation (highlighted in blue) but it cannot be resolved until \textcolor{depred}{\textbf{{wanted}}} is. \textcolor{depred}{\textbf{{wanted}}} is resolved in the second pass and an \texttt{nsubj} relation (shown in blue) is propagated from the head, \textit{gnomes}, of its conjunct, \textcolor{depred}{\textbf{{quailed}}}. In the third pass this is further propagated to \textcolor{deporange}{\textbf{{weep}}} generating a \texttt{nsubj:xsubj} relation (highlighted in green).}

586 — 2006.00879

\caption{ SED of the \gray emission toward W40 for a spatially uniform disk model with a radius of $0.46^{\circ}$. Both the statistical and systematic errors for the six low-energy bins are considered. The solid curve represents the spectrum of \grays from interactions of relativistic protons with the ambient gas, assuming a power-law distribution of protons (see Sect.~\ref{sec:cr}). The dashed curve represents the predicted fluxes of \gray emission derived from the \ion{H}{ii} column density map, the CRs are assumed to have the same spectra as measured in the solar neighborhood \citep{Aguilar15}. (For details, see the context in Sect.~\ref{sec:Gas}). The gray data points are the fluxes of NGC 3603 taken from \citet{Yang17}. }

\caption{ Light curve of \gray emission towards W40 from August 4, 2008 (MJD 54683) until April 30, 2019 (MJD 58604). }

\caption{Total mass of the hydrogen atom derived from different tracers within the \gray emission region.}

\caption{Derived CR density profile near W40. The data points are the \gray emission above 1 GeV of W40. The upper limits are derived for the rings defined in Fig.\ref{fig:ring}. The balck and red curves are the projected $1/r$ and $1/r^2$ profiles, respectively. }

587 — 2006.00988

\caption{Interpretation of attention. Attention weight between the masked word and the context word is given by $e^{(\boldsymbol{k}_{\text{masked word}})^T\boldsymbol{q}_{\text{context word}}}$, while the word vector similarity between the masked word and the context word is given by $e^{(\boldsymbol{u}_{\text{masked word}})^T\boldsymbol{u}_{\text{context word}}}$. Highly frequent words are highlighted with a \colorbox{gray!55}{dark gray background}. For each masked word, the attention weights corresponding to context words that are most attended to (excluding highly frequent words) are highlighted with \textbf{bold} numerals and \colorbox{gray!20}{light gray background}.}

\caption{More examples of attention weights between the masked word and the context words. Attention weight (att. wt.) between the masked word and context word is given by $e^{(\boldsymbol{k}_{\text{masked word}})^T\boldsymbol{q}_{\text{context word}}}$, while the word vector similarity (sim.) between the masked word and context word is given by $e^{(\boldsymbol{u}_{\text{masked word}})^T\boldsymbol{u}_{\text{context word}}}$. Highly frequent words are highlighted with a \colorbox{gray!55}{dark gray background}. For each masked word, the attention weights corresponding to context words that are most attended to (excluding highly frequent words) are highlighted with \textbf{bold} numerals and a \colorbox{gray!20}{light gray background}.}

588 — 2006.00996

\caption{Left:~memory requirements of ResNet26 (\bluebullet), ResNet56 (\greenbullet), and 9x ResNet26 (\redbullet). For our dynamic residual adapters (\orangebullet), only a small portion of parameters need to be learned (\purplebullet). Right:~Activations of the gating mechanism $g$ for samples from each domain distribution $\mathbb P_d$ at different layers of the network. The domains that activate each gate strongest are highlighted.}

\caption{Left:~PCA of samples represented by their $L$-dimensional activation paths. Gate paths are semantically meaningful: visually similar domains \textit{art painting} and \textit{photo} (\bluebullet,\redbullet) cluster together, \textit{cartoon} (\greenbullet) resides between real world imagery and \textit{sketches} (\orangebullet). A sample with an erroneous ground-truth domain label is highlighted. Right:~a group of samples (coloring corresponds to left-hand side) that share similar gate activation paths.}

589 — 2006.01030

\caption[LoF entry]{keypoint target estimation \tikz\draw[green,fill=white, very thick] (0,0) circle (0.7ex); - $K_{proj}$, \tikz\draw[cyan,fill=yellow, very thick] (0,0) circle (0.7ex); - $K'_{h}$ ,$K'$ \tikz\draw[yellow,fill=blue, very thick] (0,0) circle (0.7ex); - $K_{h}$ \protect \thicklines \textcolor{yellow}{ \line(2, 1){10} } - geometric match \hspace{1cm} \textcolor{green}{ \line(2, 1){10} } - projection from $I$ to $I_h$ \protect \thicklines \textcolor{magenta}{ \line(2, 1){10} } - descriptor match \hspace{1cm} \textcolor{cyan}{ \line(2, 1){10} } - projection from $I_h$ to $I$ }

590 — 2006.01038

\caption{Example annotations of the DocBank. The colors of semantic structure labels are: \colorbox{titlecolor}{Title}, \colorbox{abstractcolor}{Abstract}, \colorbox{authorcolor}{\textcolor{white}{Author}}, \colorbox{sectioncolor}{\textcolor{white}{Section}}, \colorbox{footercolor}{Footer}, \colorbox{equationcolor}{\textcolor{white}{Equation}}, \colorbox{figurecolor}{\textcolor{white}{Figure}}, \colorbox{captioncolor}{Caption}, \colorbox{tablecolor}{Table}, \colorbox{paracolor}{Paragraph}}

591 — 2006.01067

\caption{Examples of contrastive highlights (\S~\ref{sec:contrastive}) of instances from the AG News corpus. The model used for $m_p$ is fine-tuned bert-base-cased. \hl{Yellow highlight} refers to $h$ and \textcolor{red}{\hl{\textbf{yellow-and-red}}} refers to $\hat{h}$.}

592 — 2006.01110

\caption{A structured compositional deep network encoding one formula, $\protect\ltlglobally\textsc{gem}\wedge\protect\ltlfinally\textsc{factory}$. This formula corresponds to a command such as ``Hold the gem and get to the factory''. Each operator and predicate in the formula is represented by a network, shown in black, selected from a trained collection of sub-networks. Sub-networks are {\orange RNNs} which maintain state over time, shown in {\orange orange}. Each sub-network takes as input {\purple features} extracted from the surroundings of the robot using a co-trained network, shown in {\purple dotted purple}. The {\green next state} of each sub-network is decoded by a {\green linear layer} and passed to its parents, shown in {\green green}. The {\blue previous state} of each sub-network is decoded using a {\blue linear layer} and passed to its children, shown in {\blue blue}. Finally, the state of the root node is decoded into a distribution over the value of the actions the robot can take at the current time step. Crucially, due to the compositional representation employed, novel formulas that were never seen at training time can be encoded and followed on novel never before seen maps. This couples the power of deep networks to learn to extract features and engage in complex behaviors with knowledge about the structure of formulas, to perform zero-shot execution. See \protect\cref{fig:model-execution} for an example of the model in action.}

\caption{Performance of the model as the number of training and test formulas increases. The x-axis is the number of model updates performed, each unit is 200 updates for \texttt{Craft} and 100 updates for \texttt{Symbol}. The \texttt{Symbol} domain performance on {\blue 1,000} to {\red 10,000} formulas shows a significant performance improvement in domain of about 10\% but a staggering 30\% performance jump out of domain from 55\% to 85\% accuracy. The \texttt{Craft} domain on {\blue 1,000} to {\orange 4,000} formulas shows a similar performance improvement. Seeing on the order of 10,000 formulas allows good generalization to new formulas. Note that any minor error made while executing the formulas was considered a failure.}

593 — 2006.01378

\caption{Predictions for the NW resolvent mode. (a) Comparison between normalized mode gain ($\sigma_{\bkv,p}/\sigma_{\bkv,s}$, black symbols) and drag reduction observed in DNS ($\Delta \mnU^+$, light gray symbols) by \citet{gomez-de-segura_garcia-mayoral_2019}, plotted as a function of $\protect\kpyy$. The resolvent-based predictions shown in this panel are obtained using the velocity profiles from DNS. (b) Comparison between normalized gains predicted using the DNS mean profiles (filled symbols) and the synthetic mean profiles (open symbols) plotted as a function of $\protect\kpyy$. (c) Comparison between normalized gains predicted using the DNS mean profiles (filled symbols) and the synthetic mean profiles (open symbols) plotted as a function of $\protect\kpxx - \protect\kpyy$. The black and red dashed lines show linear fits to the initial decrease in normalized gain for the DNS- and synthetic-mean predictions, respectively. For all panels, the \protect\bsquare symbols represent substrates with $\phi_{xy} = 3.6$, \protect\btriangle symbols represent substrates with $\phi_{xy}= 5.5$, and \protect\bcircle symbols represent substrates with $\phi_{xy}= 11.4$. }

\caption{Comparison of resolvent-based gain predictions for spanwise-coherent structures with anisotropy ratios $\phi_{xy}=3.6$ (\,\,\protect\bsquare), $5.5$ (\,\,\protect\btriangle), and $11.4$ (\,\,\protect\bcircle). The maximum normalized gain obtained for resolvent modes with $\protect\llx\in[50,500]$ and $c^+ \in [4,10]$ is shown as a function of wall-normal permeability for (a) $\kzz \approx 2.3$ ($\llz = 500$) and for (b) $\kzz = 0$ ($\llz = \infty$). All of these predictions were obtained using the synthetic mean profiles.}

\caption{Predictions for the NW resolvent mode with a modified phase speed of $c^+ \approx 10 + \protect\kpxx - \protect\kpzz$. (a) Comparison between normalized mode gain ($\sigma_{\bkv,p}/\sigma_{\bkv,s}$, black symbols) and drag reduction observed in DNS ($\Delta \mnU^+$, light gray symbols) by \citet{gomez-de-segura_garcia-mayoral_2019}, plotted as a function of $\protect\kpyy$. The resolvent-based predictions shown in this panel are obtained using the velocity profiles from DNS. (b) Comparison between normalized gains predicted using the DNS mean profiles (filled symbols) and the synthetic mean profiles (open symbols) plotted as a function of $\protect\kpyy$. (c) Comparison between normalized gains predicted using the DNS mean profiles (filled symbols) and the synthetic mean profiles (open symbols) plotted as a function of $\protect\kpxx - \protect\kpyy$. The black and red dashed lines show linear fits to the initial decrease in normalized gain for the DNS- and synthetic-mean predictions, respectively. For all panels, the \protect\bsquare symbols represent substrates with $\phi_{xy} = 3.6$, \protect\btriangle symbols represent substrates with $\phi_{xy}= 5.5$, and \protect\bcircle symbols represent substrates with $\phi_{xy}= 11.4$.}

594 — 2006.01772

\caption{List of the target VOCs with class labels in the elution order. Blue frames indicate pairs of overlapping compounds. Compounds of relatively low concentrations are marked with colours: \aka{Mean EIC-Area} $\leq 10^3 \leq$ \navyblue{Mean EIC-Area} $\leq Q_1$ (first quartile); see Supplementary Table 1.}

595 — 2006.01930

\caption{\makered{Parameters and settings used for the experiments. $a=-\omega_o/\omega_i$ is the rotation ratio, the condition of the inner cylinder (IC) is given, while the outer cylinder is kept smooth, $\tilde s$ is the dimensionless patch size, and $\alpha$ the void fraction of air. Range of values (indicated by --) mean that either the rotation rates or the rotation ratio is changed quasi-statically during the experiment.}}

\caption{a) Confocal microscopy image of a sample of the sandpaper roughness \makered{using the data from~\cite{Bakhuis2019}}. b) Height distribution (PDF) of the surface determined using the data from~\cite{Bakhuis2019}. The roughness is determined as $k \approx \SI{695}{\micro\metre}$ using the peak-to-through distance, with the peak and through at $+2\sigma$ and $-2\sigma$ from the mean as described in~\citep{Bakhuis2019}.}

\caption{Digitally enhanced photographs of the set~up, taken at $\Rey = 0.8\e{6}$ and $\alpha = \perc{1} $. From visual inspection it is clear that most bubbles reside \makered{at} the rough patches, where the turbulent intensity of the flow is higher compared to the smooth patches. The effect is therefore two-fold: 1) compared to a completely smooth inner cylinder, for the same $\Rey$ there will be stronger turbulent mixing in the flow, resulting in a better axial distribution of the air bubbles. And 2), at the rough-wall regions, the transfer of energy to the flow (or drag) is higher compared to the smooth-wall regions. The drag-reducing effect of bubbles is therefore at these positions of largest influence on the total drag. Hence, the bubbles move to the locations in the flow where they are needed most.}

\caption{Results of torque measurements for $a=0$, plotted as skin friction coefficient $C_f$ versus (shear) Reynolds number $\Rey$ for a single phase flow with no air bubbles ($\alpha = \perc{0}$) in the working liquid. Shown in a) are the individual results. We also included a linear interpolation between the fully rough and fully smooth data included to arrive at a 56/44 rough/smooth distribution similar to the patched roughness data of certain $\tilde{s}$ (dashed line). Also included are the results of the counter-rotation measurements at $a = 0.36$, where the value of the torque is maximum (black pluses). Shown in b) are the differences in skin friction coefficient $\Delta C_f$ between the different rough and the smooth surface. Errors bars are shown in both graphs, based on the error in the torque sensor and measurement repeatability. \makered{The data for $\alpha = 0$ is the same as used in~\citep{Bakhuis2019}.}}

\caption{Results of torque measurements for $a=0$, plotted as skin friction coefficient $C_f$ versus shear Reynolds number $\Rey$ for a two-phase flow with 2 volume percent air bubbles ($\alpha = \perc{2}$) in the working liquid. Shown in a) are the individual results. Shown in b) are the differences in skin friction coefficient $\Delta C_f$ between the flow with $\alpha = \perc{2}$ and with $\alpha = \perc{0}$ over the same surface. Also included are the differences in $C_f$ for $\alpha = \perc{2}$ and $\alpha = \perc{0}$ of the counter-rotation measurements at $a = 0.36$ (black pluses). Errors bars are shown in both graphs, based on the error in the torque sensor and measurement repeatability. \makered{The data for $\alpha = 0$ is the same as used in~\citep{Bakhuis2019}}.}

\caption{Digitally enhanced photographs of the set~up \makered{for the case of smooth walls}, taken at $\Rey = 0.8\e{6}$ and $\alpha = \perc{1} $. The position of the air bubbles strongly depends on the ratio between rotation ratio $a = -\omega_o / \omega_i$. When $a = 0$, the mixing and axial distribution of bubbles is much better compared to when $a > 0$. For $a>0$, stable roll structures are present in the flow that trap the bubbles, which is especially visible for the strongest roll structures at $a=0.36$.}

\caption{Results of the counter-rotation torque measurements, plotted as skin friction coefficient $C_f$ versus the rotation ratio $a = -\omega_o / \omega_i$ for three different Reynolds numbers \makered{for the case of smooth walls}. Shown are the results of measurements without air bubbles $\alpha = \perc{0}$ and with two volume percent of air bubbles $\alpha = \perc{2}$ in the working liquid. \makered{These continuous measurement are done during \SI{48}{\minute} ($\Rey=0.8\times10^6$), \SI{72}{\minute} ($\Rey=1.2\times10^6$), and \SI{96}{\minute} ($\Rey=1.6\times10^6$).} For the two highest Reynolds numbers, the rolls are so strong when $a>0.2$, that they transport the air bubbles away from the inner cylinder and the \makered{DR} is lost. For the smallest Reynolds number are buoyancy effects still to strong to allow an even axial distribution of air, resulting in only minor \makered{DR}.}

596 — 2006.01943

\caption{The training and test sets generated from the Multi-PIE and the FERET datasets. The training set contains 80\% of the images from 240 training subjects for the Multi-PIE dataset and 80\% of the images from 504 training subjects for the FERET dataset. The remaining subjects for both datasets are selected to generate subject independent test sets. For subject dependent test set 1, we used remaining 20\% of the images from 240 subjects for the Multi-PIE and 20\% of the images from 504 subjects for the FERET dataset. In order to have a fair comparison between the subject dependent and subject independent setups, we also randomly selected the same number of subjects with the subject independent test set to create subject dependent test set 2. The experimental setup and codes will be available \textcolor{magenta}{\url{https://github.com/yamand16/ear2face}}. }

597 — 2006.01964

\caption{The solvers that do not consider a baseline ({\color{red} interpolation}, {\color{green} txy}, {\color{blue} txyz}, {\color{cyan} $\omega$}, {\color{magenta} $\omega$t}) gracefuly degrade in performance as the actual baseline in the data increases, whereas the 6DOF solver that considers a baseline (black) provides stable performance in terms of the pixel error of the undistorted correspondences with respect to the global shutter equivalent correspondences.}

598 — 2006.02049

\caption{\protect\centering Architecture of the \predictor.}

\caption{\protect\centering \Predictor's performance on both proxy metrics and accuracy.}

599 — 2006.02052

\caption{SEDs of GeV emission in the region A for a uniform elliptical spatial model with a semimajor and semiminor axes of $0.49\deg$ and $0.25\deg$. The data in the MAGIC J1835-069 region (black) are taken from \protect \cite{MAGIC19}. The dashed line represents the predicted \gray\emissions assuming the CR density in this region are the same as those measured locally by AMS-02\protect \citep{ams02}. See the context in Sect.~\ref{sec:Gas} for details.}

\caption{ SED of \gray\emission around RSGC 1 (B region) for a uniform elliptical spatial model with a semimajor and semiminor axes of$0.87\deg$ and $0.5\deg$. The dashed line represents the predicted \gray\emissions assuming the CR density in this region are the same as those measured locally by AMS-02\citep{ams02}.}

\caption{ Light curves of the \gray\emission from region A, B, and HESS J1837-069 from August 4, 2008 (MJD 54683) until August 03, 2019 (MJD 58698). The horizontal lines are the best fitting lines correspondingly.}

\caption{ Left shows the map of \ion{H}{i} column density derived from 21-cm all-sky survey. Right shows the H$_{2}$ column density derived from the CO data. Integrated the gases within the velocity range of $38 - 50\ \rm km~s^{-1}$. The red ellipse represents the region A and the yellow circle marks the GeV \gray\emission of the SNR G24.7+0.6 listed in 4FGL. See the context in Sect.~\ref{sec:Gas} for details. }

600 — 2006.02158

\caption{Detection results for PASCAL VOC2007 test set under the semi-supervised training setting. The following experiments use VOC07 (labeled) and VOC12 (unlabeled) data. \textcolor{blue}{Blue} and \textcolor{red}{Red} are represented as the Baseline score and Best score, respectively.}

601 — 2006.02220

\caption{ Dust stream-function profiles with different values of magnetic field $B_{0}$ and {\color{red}{$\alpha$} = 1}.}

\caption{ Dust stream-function profiles with different values of $\alpha$ and constant magnetic field {\color{red}{$B_0$} = 6.4$\times{10^{-10}}$}.}

\caption{x-component of dust velocity profiles with different values of $\alpha$ and constant magnetic field}{\color{red}{$B_0$} = 6.4$\times{10^{-10}}$}

\caption{ Functions for the dust flow profiles with different driver mode number (a) driver profile $\omega_a$ (b) dust stream function $\Psi(z,x_0)$ and (c) x-component of dust velocity profile for mode (m=1) and similarly fig. (d),(e),(f) and (g),(h),(f) for m = 2 and m =3 , respectively, for {\color{red}{$\alpha$} = 1} and {\color{red}{$B_0$} = 6.4$\times{10^{-10}}$.}}

\caption{ streamlines for the dust fluid flow, with different driver mode number (a) m=1 (b) m=2 (c) m=3 using parameters {\color{red}{$\alpha$} = 1} and {\color{red}{$B_0$} = 6.4$\times{10^{-10}}$.}}

602 — 2006.02635

\caption{\label{retrieval-on-coco} Multilingual image-text retrieval results on Multi30K and MSCOCO. The metric is the mean Recall (mR). Each \textbf{bold number} indicates the best mR score in that column. As MULE and SMALR are using different dev/test splits of MSCOCO compared with all the other models, we highlight these numbers in \textcolor{blue}{blue color}. We report the mR results of Unicoder-VL on the en datasets, as it is pre-trained based on the same image-caption corpus (i.e. Conceptual Captions), as \modelshort\did.}

\caption{Pre-training tasks used in \modelshort. (Top) Four understanding tasks. (Bottom) Three generation tasks. % The \modelshort\ is separately pre-trained by multiple understanding or generation pre-training objectives, and employes two shared parameters. The \modelshort\employs two separate shared weights for multiple understanding or generation pre-training objectives.{\color{blue} Blue color} denotes text-based inputs/outputs. {\color{yellow} Yellow color} denotes image-based inputs/outputs. For understanding (top row), the text stream is either used as a standalone input or concatenated with the image stream. While for generation (bottom row), the text-based inputs and image-based inputs are fed into the encoder individually.}

603 — 2006.02843

\caption{\protect\includegraphics[scale=0.6]{Fig01}}

604 — 2006.03132

\caption{ Illustration of time series model for prediction of earnings of a company with quarterly reports $q_t$ at time step $t$. We seek a mapping \predAbbildung from pattern $x$ of earning data of the past to label $y$ of the predicted earning for the future $t=t+ \predHorizont$. The window size \predFeaturewindow describes the time span of considered past earnings. }

605 — 2006.03232

\caption{The performance of the systemic error estimation algorithm. The discrepancy for each object in the Monte Carlo simulation is given as $\Delta z$, which is defined as $z_\mathrm{LASD} - z_\mathrm{sys}$, where \zsys\is derived from optical emission lines. The difference in velocities is given on the upper axis. The median value is shown with the dashed black line, and interquartile range in the orange region. Median and IQR ranges are also given as inset text, as velocities in\kms.}

\caption{The performance of the \zsys\identification with redshift. Redshift is shown on the abscissa, while$\Delta z$ is shown on the ordinate-axis. Each point illustrates the median value from the simulated sample, and the error-bar shows the interquartile range. Our best fit is a linear function with the coefficients shown in the Figure.}

\caption{The range of \llya\and\ewlya\spanned by the COS sample (grey stars) and MUSE samples (colored circles). The MUSE data are color-coded by redshift, with the colorbar shown to the upper right. A histogram showing the\llya\distribution is shown above, and histograms showing\ewlya\are shown to the right: the first includes the total COS sample, while the second shows the COS sample after retaining only galaxies more luminous than the cutoff of\llya$>10^{42}$~\ergsec; this luminosity cutoff is illustrated by the dashed black line.}

\caption{The relative contribution of the blue \lya\peak, expressed as the\Lbluered\ratio as a function of\llya\and\ewlya. It is similar to Figure~\ref{fig:cos_prop_stack} but on a galaxy-by-galaxy basis. The blue and red \lya\peaks are measured using the LASD software. Colored points are medians with interquartile ranges, and match with the bins in Figure~\ref{fig:cos_prop_stack}.}

\caption{The evolution of the \lya\profile with redshift. The individual panels are the same as in Figures~\ref{fig:cos_prop_stack} and \ref{fig:mw_prop_stack}, but the sample is now binned by redshift; each bin includes 45 or 46 galaxies. The MUSE-WIDE sub-samples are shown in colors and the redshifts are displayed in the caption. The low-$z$ sample observed with HST is shown in black. Error regions are shaded, and represent the interquartile range estimated from bootstrap resampling. The evolution of the blue wing, decreasing with increasing $z$, is obvious.}

\caption{Simulating IGM absorption of LAEs at various redshifts. Each column shows a different redshift bin: the median $z$, \ewlya, \llya\, are listed above, together with the half-quartile range of each quantity, and the number of galaxies in each bin. The top rows show the median stack of the\lya\luminosity density in absolute units (erg~s$^{-1}$~\AA$^{-1}$) in green, with the median IGM absorption overlaid, scaled so that 1 matches the red peak of the \lya. Shaded regions represent the interquartile range. The second row shows the same \lya\profiles normalized to an intensity of 1 at the peak, and zoomed in around the wings of the line: IGM is again shown on arbitrary scale of$0.15\times$. The third row shows the stacked spectrum of the $z\approx 0$ sample in pink, always normalized to a peak intensity of 1 (same in every panel). The IGM absorption is again shown in each case, now scaled to the absolute throughput. The average optical depth, \tauigmlya, is computed in the $\lambda_\mathrm{rest}=1200-1210$~\AA\range, and is shown at the top of each column. The lowest row shows the profile of the COS spectrum, absorbed by the IGM for each redshift bin, together with the stacked observed MUSE spectra for the same redshift. The overlap between the two lines is often striking.}

606 — 2006.03511

\caption{\small \textbf{Example of function tokenization.} We show two versions of the same Python function and their common tokenization. These function versions differ by extra spaces and one extra new line. Our Python tokenizer is robust to extra spaces and extra new lines except in strings. In strings, spaces are tokenized as \pmboxdrawuni{2581} (U+2581). Indentation is meaningful in Python: indented blocks are surrounded by INDENT DEDENT tokens.}

607 — 2006.03559

\caption{\textcolor{m_blue}{(a) Concept of point-of-load voltage control (PVC); (b) power electronic compensator (PEC) and control scheme for PVC}}

\caption{\textcolor{m_blue}{Operation cost and payback period for (a) 2030 GnW NSC; (b) 2030 GnW SC; (c) 2030 SwP NSC; (d) 2030 SwP SC}}

\caption{\textcolor{m_blue}{24-hour dispatch profile for different types of generation and frequency response for a day with low net-demand in 2030 GnW SC}}

\caption{\textcolor{m_blue}{24-hour dispatch profile for different types of generation and frequency response for a day with high net-demand in 2030 GnW SC}}

\caption{\textcolor{m_blue}{Operation cost and payback period for 2030 GnW SC under normal (Case A) and fully controllable cases (Case B)}}

\caption{\textcolor{m_blue}{Impact of BESS on the economic value of EFR provision from PVC, for two different ratings of the BESS: 0.5GW and 1GW}}

608 — 2006.03656

\caption{ The {\NAME} framework. {{Architecture encoding ($P_{1}^{\alpha}, ..., P_{n}^{\alpha}$)}} and {{hyper-parameter (HP) encoding ($P_{1}^{h}, ..., P_{m}^{h}$)}} represent the distribution of possible choices. Similar to \cite{pham2018efficient,liu2019darts}, we use a {\sm} to share the weights among all candidate architectures. {\NAME} alternates between {\textcolor{cyan}{updating the shared weights $\gW$}} and {\textcolor{pink}{updating the encoding ($P_{i}^{\alpha}$ and $P_{i}^{h}$)}}. {\textcolor{cyan}{When updating encoding}}, each HP basis combination will result in a separate copy of the model weights ($\gW_{1}$, ..., $\gW_{m}$). These copies are weighted by the HP encoding to compute the final weights $\gW^{'}$. Encoding is updated by back-propagation to minimize validation loss. {\textcolor{pink}{When updating the shared weights}}, we first forward the {\sm} to compute the training loss. Then, different HP basis are weighted by the HP encoding to compute one set of hyper-parameters, which will be used to back-propagate the gradients from the training loss to update the shared weights $W$. After this searching procedure, {\NAME} will derive the final architecture and hyper-parameters from the learned architecture and HP encodings. }

609 — 2006.03701

\caption{ATIS performance of multi-task models compressed with structured pruning (ours) and knowledge distillation \cite{Hinton2015DistillingTK} as the compression rate (CR; \%) increases. We report intent accuracy and slot F1. Darker shades of \colorbox{red!50}{red} indicate higher absolute performance drops with respect to 100\%.}

610 — 2006.03796

\caption{Comparison between images and label distribution from NIH and CheXpert. Note the CheXpert dataset not only differs from NIH dataset in pixel-wise appearances but also includes more views (about 35\% images in CheXpert are lateral views). Right: Histogram of the label distribution of NIH and CheXpert training sets. CheXpert contains much more common pathologies, as well as many other positive findings. The 20 findings for two datasets are Emphysema (Emph), Fibrosis (Fibr), Hernia (Hern), Infiltration (Infi), Pleural Thickening (P\_T), Mass, Nodule (Nodu), Atelectasis (Atel), Cardiomegaly (Card), Consolidation (Cons), Edema (Edem), Effusion (Effu), Pneumonia (Pne1), Pneumothorax (Pne2), Enlarged Cardiomediastinum (E\_C), Fracture (Frac), Pleural Other (P\_O), Lung Opacity (L\_O), Lung Lesion (L\_L), and Support Devices (S\_D). The specific number of each category can be found in Table\ref{Datasets Comparison Table}.}

\caption{Top-8 predicted findings and the corresponding prediction scores of joint trained DenseIBN-121 and our method. The ground truth labels are highlighted in \textcolor{red}{Red}. The number on the top of the image is its corresponding file name. Best viewed in color.}

\caption{Diseases classification average AUC scores on NIH dataset with different values of hyperparameters $\lambda_{tat}$ and $\lambda_{ute}$. \textcolor{red}{Red} curve represents the mean AUC of all 14 categories, \textcolor{blue}{blue} curve stands for the average performance of common labels, and \textcolor{green}{green} curve means the average score of NIH only classes. Best viewed in color.}

611 — 2006.03983

\caption{\label{fig:specificity} Specificity of targeted ad campaigns. $\alpha$ is the fraction of the targeted demographic in the population, $\beta$ is the mean specificity for that demographic via the campaign, and the $95\%$ CI assumes the counts follow a Binomial distribution. The high specificity voter-list based custom campaigns ($\beta \ge \alpha$) are in {\color{blue} \bf blue} and the low specificity campaigns ($\beta < \alpha$) are in {\color{red} \bf red}.}

612 — 2006.04009

\caption{Additional \aastex\symbols}

613 — 2006.04050

\caption{ Translations proposed by English language learners at various levels of fluency, from diverse backgrounds. Our multi-checkpoint ensemble models mimic learner fluency.\textcolor{blue}{$^4$}%~\footnote{\url{http://sharedtask.duolingo.com/#task-definition-data}} }

614 — 2006.04093

\caption{Comparison of ensemble error rate (Top-1, \%) among OKD methods including branch-based (B) and network-based (N) methods on CIFAR-100. \textcolor{blue}{Blue}/\textcolor{red}{Red}: Best and second best results.}

615 — 2006.04150

\caption{Comparisons with the state-of-the-art unsupervised and generalised Re-ID methods on iLIDS, VIPeR, 3DPeS, CAVIAR, PRID and GRID. Rank-1 accuracies are reported. % The best results are shown in {\bf \textcolor[rgb]{1,0,0}{red bold}}, while the second-best are in {\bf \textcolor[rgb]{0,0,1}{blue bold}}. % $^{\dagger}$: Re-ID domain generalisation results. Note that FedReID doesn't use any training data in new testing domains. % }

616 — 2006.04224

\caption{Number of objects missed on an average across clusters for each parent-level class. The colored bars in each subplot from left-right are: \textcolor{green}{Ours (wet season)}, \textcolor{brown}{Ours (dry season)}, \textcolor{blue}{Nightlight}, \textcolor{red}{Fixed-18}, \textcolor{purple}{Random-25}, \textcolor{orange}{Stochastic-25}.}

617 — 2006.04489

\caption{\it This table shows the evolution of the performances w.r.t different \# of temporal pyramids per stream. In order to combine the outputs of these multiple pyramids (when using concatenation), we add a succession of FC+ReLU+BatchNorm to reduce the dimensionality{\bf from} ``63 (number of nodes in TP of 6 levels) $\times$ 128 (node dimension) $\times$ \# TPs''{\bf to} ``128''. All these results correspond to temporal pyramids of 6 levels.}

618 — 2006.04509

\caption{Ablation study for performance without ontology subclass in KG refinement task for \typeecomplex models. For brevity, we have shown results for fb15k-237 at end of second epoch and NELL at end of third epoch. Size is normalised by number of triples in original KG.\textcolor{blue}{The difference of the ablated results, to the overall (All) are also mentioned in bracket along with the actual numbers.}}

\caption{\color{blue} Ablation study for performance with only one ontology subclass in KG refinement task for \typeecomplex models. For brevity, we have shown results for \fb at end of second epoch and \nell at end of third epoch.\textcolor{blue}{The difference of the ablated results, to the overall (All) are also mentioned in bracket along with the actual numbers.}}

\caption{\# of instances for each ontological component required by\pslkgi}

619 — 2006.04569

\caption{ Visualization of Retrieval Results. \textbf{(a)} Given one 3D query, we show the original 2D images and the top-5 retrieval results. \textbf{(b)} We also show two challenging cases, such as occlusion and part detected query. The \textcolor{OliveGreen}{green} index indicates the true-matches, while the \textcolor{red}{red} index denotes the false-matches. }

620 — 2006.04648

\caption{Visualisation of the attribute word vector outcomes in CZSL task on AwA2 and CUB. The left side of each sub-figure are example images, the upper part is the input image, and the lower part is from the ambiguity class. The top of each sub-figure is the classification probabilities of the compared methods, the \textcolor{blue}{\textbf{blue}} bar refers to the target class, and \textcolor{magenta}{\textbf{pink}} indicates the ambiguity class. The symbols '\textcolor{blue}{+}', '\textcolor{green}{o}', '\textcolor{red}{x}' in the scatter represent the word vector projections of the target class, ambiguity class, and sub-output of the SGV, respectively. The PCA dimensionality reduction~\cite{tipping1999probabilistic} is used.}

\caption{The Grad-CAM visualization~\cite{selvaraju2017grad} and classification probabilities of SGV$^{LT}$-101 (w/ SGV) and LFGAA (w/o SGV). The \textcolor{red}{\textbf{red}} bar denotes the classification probability of target class, and the others indicate top-4 ambiguous classes.}

621 — 2006.04700

\caption{An example from the nuScenes dataset~\cite{nuscenes}. Given the past observations of pedestrians (colored bounding boxes (top)) and the egomotion of the car (red arrow), our framework predicts multiple modes of their future visualized by a set of bounding boxes and their distribution as an overlaid heatmap. Prediction covers possible options for \textcolor{magenta}{(2nd row)} turning left/right, \textcolor{blue}{(3rd row)} slowing down/accelerating, \textcolor{orange}{(4th row)} being on the sidewalk. }

622 — 2006.04738

\caption{{\color{blue} \textbf{The different behaviors of $R^{-1}(x)$}}. The scaling of $Z_1$ and $Z_2$ in (\ref{Z1}) and (\ref{Z2}) depends on the limit $\lim_{x \rightarrow 0} R^{-1}(x)$. Using power series analysis we write $R^{-1}(x) = C_0 x^{\Omega(0)} + C_1 x^{\Omega(1)} + \dots$, and take the limit $x \rightarrow 0$, such that only the leading powers are significant. We distinguish between three different scenarios: {\color{blue} \textbf{(a)} $\Omega(0) > 0$}.\Here we have$R^{-1}(x \rightarrow 0) \sim x^{\Omega(0)} \rightarrow 0$ (blue). This, by inversion, provides $R(x) \sim x^{1/\Omega(0)}$ (red), both limits relevant in the $x \rightarrow 0$ regime.\{\color{blue} \textbf{(b)} $\Omega(0) < 0$}.\Under a negative leading power we have$R^{-1}(x \rightarrow 0) \sim x^{\Omega(0)} \rightarrow \infty$. This provides $R(x) \sim x^{1/\Omega(0)}$, this time in the limit $x \rightarrow \infty$, instead of $x \rightarrow 0$.\{\color{blue} \textbf{(c)} $\Omega(0) = 0$}.\Here$R^{-1}(x \rightarrow 0)$ approaches a constant $C_0$ - hence, inversion helps characterize the original function $R(x)$ around $x \rightarrow C_0$, as $R(x) \sim [(x - C_0)/C_1]^{1/\Omega(1)}$, a shifted polynomial. In display we set both coefficients $C_0, C_1$ to unity, for simplicity. In all panels the dashed line represents $y = x$. }

623 — 2006.04889

\caption{Illustration of our proposed time-varying RI-MTS meta-atom. (a) Isometric and (b) Top-down view of the unit cell. Here, $H$ = 0.2 mm, $L_{1}$ = 2.8 mm ($\lambda_{c}/3.83$), $L_{2}$ = 2.8 mm ($\lambda_{0}/3.83$), $L_{3}$ = 0.1 mm, $L_{4}$ = 0.1 mm, $L_{5}$ = 2.0 mm, $L_{6}$ = 1.3 mm, $L_{7}$ = 0.1 mm, and $L_{8}$ = 0.1 mm. The metal traces (top layer) and ground-plane (bottom layer) are copper. The dielectric layer of thickness $H$ is a RT/Duroid{\textregistered} 5880 ($\epsilon_{r} = 2.2$, $\tan \delta = 0.0009 $) substrate. (c) Equivalent circuit model representation of the meta-atom. The complex impedance of the MTS is $Z(p,q,t) = R(t) + jX(t)$. (d) Circuit model of our tunable diode comprises a tunable series resistance ($R$) and capacitance ($C$).}

624 — 2006.04947

\caption{\textcolor{blue}{{\em well-controlled} and {\em poorly controlled} subject counts}}

625 — 2006.04984

\caption{Trade-offs between the FC, IC, and the FIC techniques. Entries marked Yes/Offline and \textcolor{red}{No}/\textcolor{red}{Online} are favorable and unfavorable, respectively.}

626 — 2006.04996

\caption{Illustration of the domain discriminator shortcut. The domain discriminator aims to distinguish between different domains~({\color{red}{red}} and {\color{blue}{blue}}), where the decision boundary is represented by dashed lines. But misaligned samples create a shortcut where the domain labels can be directly determined by the misaligned class labels~(3 and 6). The decision boundary of the resulting shortcut is independent of the covariate that causes the domain difference, which does not contribute to adversarial domain-invariant learning. }

627 — 2006.05042

\caption{Categorization of DM Security studies. ``DoS'', ``Rev. Engg.'', ``Tamper'', ``Unreliable'', Cov. channel'' stand for ``Denial of Service'', ``Reverse Engineering'', ``Tampering data'', ``Reduce reliability'', and ``covert channel'', respectively. \textcolor{red}{Red rows are attack case studies in section \ref{sec:threats}}. \textcolor{blue}{Blue rows are defense case studies in section \ref{sec:defense}}.}

628 — 2006.05124

\caption{Additional \aastex\symbols}

629 — 2006.05136

\caption{Analytical ({\protect\redline}) and numerical ({\protect\blackdashedline}, particle-in-cell) velocity distribution function, at various locations along the Hall thruster discharge. The conditions are these of test case A.}

\caption{Comparison of PIC and polynomial VDFs with same density, average velocity and temperature, for three selected locations of test case A. PIC VDF ({\protect\blacklinelongdash}), triangular ({\protect\bluedashdottedline}), parabolic ({\protect\orangedashedline}) and cubic function ({\protect\greenline}) approximations.}

\caption{Application of triangular heat flux to the lower moments of the PIC simulation of test case A. PIC simulation ({\protect\blackcircle \\protect\blackcircle \\protect\blackcircle}); non-limited triangular ({\protect\bluedashdottedline}), parabolic ({\protect\orangedashedline}) and cubic ({\protect\purpledottedline}) VDF heat fluxes $Q_x$. Cubic closure with $\mathrm{erf}()$ limiting ($Q_x^*$, {\protect\greenline}). }

\caption{Linear ({\protect\blackdashedline}) and $\mathrm{erf}()$({\protect\greenline}) corrections to the polynomial heat flux.}

\caption{Magnification of Fig.~\ref{fig:heat-flux-comparison} on the posititive heat flux region. PIC simulation ({\protect\blackcircle \\protect\blackcircle \\protect\blackcircle}); Non-limited triangular ({\protect\bluedashdottedline}), parabolic ({\protect\orangedashedline}) and cubic ({\protect\purpledottedline}) VDF heat fluxes $Q_x$. Cubic closure with $\mathrm{erf}()$ limiting ($Q_x^*$, {\protect\greenline}). }

\caption{First moments of the ions distribution function for the PIC test cases. Analytical solution ({\protect\redline}); PIC simulation ({\protect\blackcircle \\protect\blackcircle \\protect\blackcircle}); Symbol \textcolor{red}{$*$} denotes the starting point for the integration, taken where $E = 0$.}

\caption{Solution of anisotropic fluid equations for the PIC test cases. PIC simulation ({\protect\blackcircle \\protect\blackcircle \\protect\blackcircle}); anisotropic fluid equations with zero heat flux ({\protect\bluedashedline}); and cubic-VDF heat flux ({\protect\greenlinedark}).}

\caption{Test case D - Axial VDF. Analytical ({\protect\redline}) vs experimental ({\protect\bluetriangle \\protect\bluetriangle \\protect\bluetriangle}). Top: 2 mm inside the thruster, from the exit plane; Bottom: 8 mm after the exit (in the plume). Vertical lines: 15\% oscillation.}

\caption{Test case D - Average velocity and velocity dispersion. Analytical (\protect\redline) vs experimental ({\protect\bluetriangle \\protect\bluetriangle \\protect\bluetriangle}).}

\caption{Test case D - Analytical heat flux (\protect\redline) and values reconstructed from experimental VDFs ({\protect\bluetriangle}) of Fig.~\ref{fig:experimental-analytic-VDF}.}

630 — 2006.05265

\caption{Summarized accuracy results on the POJ-104 test set for {\color{pltorange} code2vec, NCC, and Aroma} and {\color{pltblue} MISIM}. Bar heights are the averages of the measurements over 3 runs, and error bars are bounded by the minimum and the maximum of measured values.}

631 — 2006.05728

\caption{Example human-object detector \textcolor{green}{success} and \textcolor{red}{failure} cases.}

632 — 2006.05796

\caption{Reconnection of colliding vortex rings: evolution of $\delta^2(t)$ at $Re_\Gamma=2000$ (\protect\marksymbol{square}{black}) and $4000$ (\protect\marksymbol{o}{red}) for (a) pre- and (b) post-reconnection phases. The blue dashed lines indicate the linear scaling. The insets are flow structures represented by vorticity isosurface at 5\% of maximum initial vorticity $|\boldsymbol{\omega}| = 0.05\omega_0$ for $Re_\Gamma=2000$; and $\delta$ as a function of $|t-t_0|$ for $Re_\Gamma=4000$ with the dashed line referring the $t^{1/2}$ scaling. }

\caption{Reconnection of orthogonal vortex tubes: time evolution of $\delta^2(t)$ at $Re_\Gamma=2000$ (\protect\marksymbol{square}{black}) and $4000$ (\protect\marksymbol{o}{red}) for (a) the pre- and (b) post-reconnection phases, with the dashed lines indicating linear scaling. The insets are flow structures represented by vorticity isosurface $|\boldsymbol{\omega}| = 0.05\omega_0$; the bottom inset in (b) is $\delta$ as a function of $|t-t_0|$ for $Re_\Gamma=4000$ with the dashed line indicating the $t^{1/2}$ scaling. }

633 — 2006.05877

\caption{Mean squared error (MSE) analysis for the 4-THz reconstruction results (see \textcolor{urlblue}{Visualization 1}). (a) 320 $\times$ 320 pixel object and its SPI reconstructed image using (b) $M$ = 300, (c) 1200, (d) 1800, (e) 3600, (f) 5400, and (g) 9000. (h) MSE values versus the number of masks used (MSE = 0.0535, 0.0194, 0.0057, 0.0045, 0.0037, and 0.0034 for 300, 1200, 1800, 3600, 5400, and 9000 masks, respectively).}

634 — 2006.05992

\caption{ {\color{referee}Unconfirmed spectral lines}}

\caption{The observed frequency against the covered redshift for all sources, separated between robustly-detected redshifts (\textit{left-hand side}) and ambiguous redshifts (\textit{right-hand side}). For each potentially-observed spectral line (CO: \textit{black line}, H$_2$O and [C\,{\sc i}]: \textit{dashed blue line}), we show their observed frequency as a function of redshift. On the left-hand-side, we show the observed bandwidth, and indicate at which redshifts we would have expected to detect 1 CO-line (\textit{orange fill}) or multiple CO-lines (\textit{blue fill}). We indicate the frequencies of the observed spectral lines with horizontal lines, and indicate the corresponding potential redshifts for each line with an arrow in the top of the graph, {\color{referee} with the photometric redshift estimate at the bottom of each graph}. For the robust detections, we note that these arrows line up at the robust redshift identification, encircled with a \textit{black line}. If the redshift is not robustly-identified, the redshift possibilities within the \textit{blue fill}, e.g. expected multiple CO-line detection, can be excluded, as this would have resulted in multiple line detections. This graphical method removes the need for unwieldy and error-prone comparisons of spectral lines, and graphically shows the frequencies that need to be probed in future missions. }

635 — 2006.06069

\caption{\footnotesize \textcolor{Periwinkle}{(a):} A vulnerable spam detection pipeline, with steps numbered as (1), (2), etc. Accuracy-based detectors can be misled to detect numerous insignificant spams from new accounts, leaving behind the more manipulative elite spams. We define a zero-sum game to find a robust defender against unknown and evolving spamming strategies $A(p)$. \textcolor{Periwinkle}{(b):} The Practical Effect vs. Recall of individual detectors (shown in legend) against a mixed spamming strategy. The curve is obtained by sweeping the detection thresholds. For most detectors, the attack could attain high practical effects even with high detection recall scores. \textcolor{Periwinkle}{(c):} For a fixed spam detector (Fraudar), a spammer can choose the best out of five attack strategies to maximize the practical effect.}

636 — 2006.06469

\caption{Comparison of the classification accuracy convergence curves of \elco-enhanced (\textcolor{blue}{blue}) and original (\textcolor{red}{red}) models on the three tested datasets. X-axes are for \# of epochs.}

\caption{The prediction accuracy growth of GBDT with different training node compositions. \textit{Elector nodes only}, \textit{voter nodes only} and \textit{mixed nodes} are denoted using \textcolor{red}{red}, \textcolor{dark_green}{green} and \textcolor{blue}{blue} lines, respectively.}

637 — 2006.06506

\caption[Column Density and Star Positions of Numerical Simulation]{Figure\,1 from\citetalias{2018MNRAS.477.5422A}: Positions of stars at the onset of feedback, with stellar mass in colour scale, overlaid on column density in greyscale (both are logarithmic). The most massive star is \SI{33.7}\,\msun\in red. The second highest is\SI{11.3}\,\msun. The third is \SI{5.7}\,\msun. The least massive is \SI{0.82}\,\msun.}

\caption[Dendrogram Comparing Shapes of Example SOs with Added Gaussian Noise to the MAGPIS Sample]{Dendrogram of the MAGPIS sample of \hii regions from \paperIt~ with the 12 example SOs with added Gaussian noise (those shown in Fig.\,\ref{fig:gauss_so}). The dendrogram represents the results from applying hierarchical clustering of the shape data of each \hii region. The branches of the \hii Regions are labelled by their Galactic longitude (for the MAGPIS sample) or an ID number (for the SO sample). by The branches of the SOs are coloured by their age: 0.1\,Myr in\textcolor{red}{red}, 0.2\,Myr in\textcolor{magenta}{pink}, 0.4\,Myr in\textcolor{blue}{blue} and 0.6\,Myr in\textcolor{green}{green}. The horizontal axis represents the height computed from the agglomerative clustering method.}

\caption[Dendrogram Comparing Shapes of Example SOs inserted into different MAGPIS NPs to the MAGPIS Sample]{Dendrogram of the MAGPIS sample of \hii regions from \paperIt~ with the 20 example SOs that were inserted into the MAGPIS NPs (shown in Fig.\,\ref{fig:magpis_so}, excluding NP 5). The dendrogram represents the results from applying hierarchical clustering of the shape data of each \hii region. The branches of the 20 SOs are coloured by their age: 0.1\,Myr in\textcolor{red}{red}, 0.2\,Myr in\textcolor{magenta}{pink}, 0.4\,Myr in\textcolor{blue}{blue} and 0.6\,Myr in\textcolor{green}{green}.}

\caption[Dendrogram of the Shapes of the Full SOs Sample Inserted into Different MAGPIS NPs]{Dendrogram resulting from applying hierarchical clustering to the shape data of the sample of 385 SO \hii regions inserted into MAGPIS NPs. As with the previous dendrograms, the branches are coloured by their age: 0.1\,Myr in\textcolor{red}{red}, 0.2\,Myr in\textcolor{magenta}{pink}, 0.4\,Myr in\textcolor{blue}{blue} and 0.6\,Myr in\textcolor{green}{green}. The three groups displayed in Fig.\,\ref{fig:age_so_5np_3groups} are indicated by the respective group numbers at the first split into three. Six groups are delineated by the dashed red boxes and labelled 1 through 6. 20 further groups as seen in Fig.\,\ref{fig:radius_so_5np} are delineated by the green boxes with the first and last labelled 1 and 20, respectively.}

\caption[Overview of NP and Projection Influence on Group]{Bar charts showing the respective influence of changing the NP or the projection angle of the SO on resulting group. Ages are as with previous plots: 0.1\,Myr in\textcolor{red}{red}, 0.2\,Myr in\textcolor{magenta}{pink}, 0.4\,Myr in\textcolor{blue}{blue} and 0.6\,Myr in\textcolor{green}{green}. Top: Distribution of different NPs for a fixed age and projection. Each bar represents a given SO projection angle and age, with the fraction of NPs belonging to the respective group shown. Bottom: Distribution of different projection angles for a fixed age and NP. Each bar represents a given SO age and NP, with the fraction of projection angles belonging to the respective group shown. Mean values in each group are shown by the dashed black lines.}

\caption[Overview Dendrogram of SOs Sample as a Training Set for MAGPIS Subsample]{Dendrogram resulting from applying hierarchical clustering to the shape data of the sample of 385 SO \hii regions along with 26 of the MAGPIS \hii regions. The branches are coloured by their age for the SO data: 0.1\,Myr in\textcolor{red}{red}, 0.2\,Myr in\textcolor{magenta}{pink}, 0.4\,Myr in\textcolor{blue}{blue} and 0.6\,Myr in\textcolor{green}{green}.; and the MAGPIS \hii regions are in \textcolor{cyan}{cyan}.}

638 — 2006.06516

\caption{A diagonal path {\blue $\A{1}{3}{10}$}, from $i=1$ to $\ell=3$ consisting of 6 NE-steps and 4 SE-steps, with bound $h=4$. The target region $J$ is $\br{2}{1}=[1\dotdot 4]$.}

\caption{A right-left version of the constrained path {$\A{1}{3}{10}$}, consisting of 6 \textsf{R}-steps (colored {\blue blue}) and 4 \textsf{L}-steps ({\red red}) along a 5-vertex point graph ($h=4$), starting at vertex $i=1$. The path in this representation is \textsf{RRLRRLLLRR}, based at $1$. It is an accordion fold of the blue path in Figure~\ref{fig:path}.}

\caption{An orthogonal path, starting at origin and ending at $\langle a , b\rangle=\langle 4,6\rangle$, consisting of 6 N-steps and 4 E-steps, staying strictly within bounds $s=4$ (below {\red $y=x+4$}) and $t=2$ (above {\red $y=x-2$}). This path and its constraints are analogues of those in Figure~\ref{fig:path}; see Section \ref{sec:comb}.}

\caption{The number of paths $\AH{i}{\ell}{n}{4}$ for $i,\ell\in[0\dotdot 4]$, $n\in[0\dotdot 16]$. For example, $\A{2}{2}{16}\strutx=\A{0}{[0\dotdot 4]}{16}=4374$ and $\A{3}{3}{16}=\A{4}{[2\dotdot 4]}{16}\strutx=3281$. Like for a bishop on a chessboard, half the squares are unreachable from any given starting point. The few squares that require backward steps are likewise inaccessible. The particular path of Figures~\ref{fig:path}--\ref{fig:Mohanty} is highlighted in {\blue blue \textbf{boldface}}.}

639 — 2006.06787

\caption{Comparison of the face verification and identification performance of different methods on the IJB-C dataset. Top performance is marked with \textbf{black} and second-best with \textcolor{Blue}{\textbf{blue}}. \oreo significantly improves the face verification and identification performance compared to the baseline, and achieves state-of-the-art results in terms of retrieval rate.}

640 — 2006.06830

\caption{\methodshared improves performance under weak supervision with each GNN (\gcn, \gsage, \gat and \jknet, left to right) and across augmentation settings (\methodtwo on top, \method on bottom). Relative improvement is clear even with many training nodes, but is larger with few training nodes.}

\caption{\methodshared performance across GNN architectures and six benchmark datasets. }

\caption{\methodshared augmentation especially improves performance under weak supervision.}

\caption{\label{fig:tiny} \gcn performance (test micro-F1) on the original Zachary's Karate Club graph in (a), and three augmented graph variants in (b-d), evaluated on both original ($O$) and modified ($M$) graph settings. Black, solid-blue, dashed-blue edges denote original graph connectivity, newly added, and removed edges respectively. While random graph modification (b) hurts performance, our proposed \methodshared augmentation approaches (c) demonstrate significant relative performance improvements, narrowing the gap to omniscient, class-aware modifications (d).}

\caption{Proposed \methodshared mod.\\ $M: 95.7$, $O: 94.3$ F1}

641 — 2006.06841

\caption{ Results on the seq2seq model. % , across two datasets, and 2 poisoning levels for each backdoor scheme. The number of singular vectors used $k=10$. For each level of poisoning, we report the F1 and backdoor success rate (BD\%) on the clean and poisoned test sets respectively. We compare % the performance our method with two different input representations, (1) Encoder Output and (2) Context Vectors. For each, we report the \emph{recall}, i.e., the percentage of poisoned points eliminated using our algorithm. In parentheses, we report \pickcolor{\emph{Post-BD\%}}, the backdoor success rate of a model trained after removing the poisoned points (top 1.5$\epsilon$\%). }

642 — 2006.06900

\caption{Model architecture from \cite{tikhonov2019style}, where the style discriminator ($D$) is a structured constraint the generator optimize against. A latent code discriminator ensure the independence between semantic part of the latent representation and the style of the text. \textcolor{blue}{Blue} dashed arrows denote additional independence constraints of latent representation and controlled attribute, see \cite{tikhonov2019style} for the details.}

643 — 2006.07006

\caption{Observation on dynamism and inconsistency of background frames from THUMOS'14. It should be noted that none of them contain any action instance, \ie, they are all background frames. (a) The frames in the \textcolor{red}{red box} showing soccer players celebrating are very dynamic, even though they are background frames. (b) There are two types of background frames: black scenes with subtitles (\textcolor{ForestGreen}{green box}) and a golfer preparing to shoot (\textcolor{blue}{blue box}). These two types have very inconsistent appearances. }

644 — 2006.07021

\caption{The histograms of \textcolor{blue}{true positive (TP)}, \textcolor{orange}{false positive (FP)}, \textcolor{green}{true negative (TN)}, and \textcolor{red}{false negative (FN)} predictions from the GIN trained with MAP (top) and SWAG (bottom).}

\caption{The histogram of \textcolor{blue}{true positive (TP)}, \textcolor{orange}{false positive (FP)}, \textcolor{green}{true negative (TN)}, and \textcolor{red}{false negative (FN)} results for the BACE prediction task.}

\caption{The histogram of \textcolor{blue}{true positive (TP)}, \textcolor{orange}{false positive (FP)}, \textcolor{green}{true negative (TN)}, and \textcolor{red}{false negative (FN)} results for the BBBP prediction task.}

\caption{The histogram of \textcolor{blue}{true positive (TP)}, \textcolor{orange}{false positive (FP)}, \textcolor{green}{true negative (TN)}, and \textcolor{red}{false negative (FN)} results for the HIV prediction task.}

645 — 2006.07139

\caption{Attribute analysis with style representation on VGG-19 when trained on synthetic GPR-800, while tested on \textbf{Market-1501}. It can be easily observed that the most critical factor in each datasets corresponds with items which have minimum loss in each attribute. \textcolor[rgb]{1.00,0.50,0.00}{\textbf{Orange}} in the bar chart indicates the most important factor in attribute analysis when performing re-ID task GPR-1000 $\rightarrow$ Market-1501.}

\caption{Attribute analysis with style representation on VGG-19 when trained on synthetic GPR-800, while tested on \textbf{DukeMTMC-reID}. It can be easily observed that the most critical factor in each datasets corresponds with items which have minimum loss in each attribute. \textcolor[rgb]{1.00,0.50,0.00}{\textbf{Orange}} in the bar chart indicates the most important factor in attribute analysis when performing re-ID task GPR-1000 $\rightarrow$ DukeMTMC-reID.}

646 — 2006.07237

\caption{ReLU in x86-like code, with EAX holding a 32-bit float on entry. No floating point stack required; the function is applied bitwise with no branching. \textcolor{gray}{Grey} instructions take one micro-op. Timings from~\cite{agner}.}

\caption{tanh in x86-like code; floating-point operations here begin '{\tt f}', which need FPUs and have higher execution times. \textcolor{red}{Red} instructions take more than ten micro-ops.}

647 — 2006.07388

\caption{Our best-fit 1L1S model with the lowest BIC, ($+,-$), and \textcolor{blue}{Poleski et al (2016)}'s 1L1S model with the lowest $\chi ^2$, ($-,+$), are shown in the left and right panel, respectively. In the top panels, the pink and blue dots represent our PLD corrected lightcurve and the photometry obtained on the current \textit{Spitzer} Microlensing pipeline described in \textcolor{blue}{Calchi Novati et al. (2015b)}, respectively. The orange dots represent the OGLE photometry. In the bottom panels, the pink and blue dots represent the residuals from our PLD decorrelation and the current \textit{Spitzer} microlensing pipeline, respectively. Note that PLD removes the correlated residuals in the \textcolor{blue}{Poleski et al (2016)} data that could be mistaken for a planetary anomaly.}

\caption{The standard deviation of our PLD corrected 1L1S ($+,-$) residuals and \textcolor{blue}{Poleski et al (2016)}'s 1L1S ($-,+$) residuals are represented by the pink and blue dots, respectively. The dashed pink line represents the expected standard deviations if our residuals had been white noise. The orange-shaded area represents the timescales of interest for microlensing anomalies in the \textit{Spitzer} data. \label{fig:ob150448_Allan}}

648 — 2006.07416

\caption{Plan generation with XTREE. From Krishna et al.~\cite{krishna2017learning}. An example has fallen down a learned decision tree to the \textcolor{aoenglish}{{\bf current branch}} where the probability of defects is 1.00. The nearby \textcolor{orange}{{\bf desired branch}} predicts a 0.00 probability of defects. XTREE's generates a plan that is the delta $\Delta$ between these two branches .}

\caption{ An example of output generated by Table~\ref{inside} when applied to the data sets of the form of Table~\ref{ck}. The y-axis shows the feature name and the confidence interval during which the explanation stays effective. The x-axis indicates the importance weight of each attribute. The prediction label of this instance is 1 (defective), and the weights show how each feature contributes to the prediction. A \textcolor{aoenglish}{{\bf positive}} weight means the feature encourages the classifier to predict the instance as a positive label (defective), and vice versa for the \textcolor{red}{{\bf negative}} weight. Larger weights indicate greater feature importance in terms of the prediction value based on that feature weighted by a similarity kernel.}

649 — 2006.07560

\caption{Results about the state-of-the-art trackers in VOT2015. \textcolor{red}{Red}, \textcolor{blue}{blue} and \textcolor{green}{green} represent the $1$st, $2$nd and $3$rd respectively.}

\caption{Results about the published state-of-the-art trackers in VOT2016. \textcolor{red}{Red}, \textcolor{blue}{blue} and \textcolor{green}{green} represent the $1$st, $2$nd and $3$rd respectively.}

\caption{Results about the published state-of-the-art trackers in VOT2018. \textcolor{red}{Red}, \textcolor{blue}{blue} and \textcolor{green}{green} represent the $1$st, $2$nd and $3$rd respectively.}

\caption{Comparison pn the TrackingNet test set with the state-of-the-art trackers. \textcolor{red}{Red}, \textcolor{blue}{blue} and \textcolor{green}{green} represent the $1$st, $2$nd and $3$rd respectively.}

650 — 2006.07838

\caption{Achievable uplink sum-rate performance in bps/Hz versus the SNR in dB for a $160$-antenna mMIMO BS with $16$ RF chains serving simultaneously $64$ users within a cell with $400$m radius \textcolor{NewColor}{based on \cite{shlezinger2019dynamic}}. For the DMA architecture, each of the $M=16$ microstrips includes $L=10$ unit metamaterial elements. In the fully connected hybrid A/D beamforming architecture, the phase-shifter network interconnects each antenna element to all RF chains. }

\caption{\textcolor{NewColor}{An experimental implementation of a dynamic 1D waveguide-fed metasurface \cite{sleasman2017experimental,sleasman2016design}:} a) Detailed circuitry of a reconfigurable metamaterial element. b) The simulated response of the metamaterial element, where the impact of the PIN diodes to render the element in radiating ($e_{1,\rm on}$ and $e_{2,\rm on}$) and non-radiating ($e_{1,\rm off}$ and $e_{2,\rm off}$) states are shown. c) A close-up view of a sample 1D DMA with metamaterial elements having two different resonance frequencies. d) Beamforming capability of the fabricated 1D DMA.}

651 — 2006.08247

\caption{(a) SRTG gate states. The gates can be \textcolor{red}{inactive} or \textcolor{green}{active}. When inactive, main stream and LSTM stream are fused. When active, the output is determined by the Temporal Gate and is either the fused result (open gate) or only the main stream (close state). (b) SRTG configuration options described in Section~\ref{sec:Variants}. Similar to Residual Networks, we distinguish between \textit{Simple} blocks with two conv operations and \textit{Bottleneck} blocks with three conv operations.}

652 — 2006.08305

\caption{Error rate mean and std of IEN, maxout and the original model design on different deep model architectures. The lower, the better. The subscript $\widetilde{\textbf{w}}$ indicates results using the weight downsizing method from section~\ref{sec:downsize}. \textit{+FC} stands for IEN applied on both FC and CNN layers. Maxout results are only on CNN layers. The \textcolor{blue}{blue} colored models have the exact same number of parameters. The rest have the same number of parameters that scales with $m$. IEN and maxout have $m=4$.}

\caption{Model parameters count / Average training time per epoch in seconds. M stands for Million, K stands for Thousand. All models were trained on Nvidia V100 GPU. Model in \textcolor{blue}{blue} have the same size at the testing time. \textit{+FC} stands for IEN applied on both FC and CNN layers. Maxout results are only on CNN layers. All modles have $m=4$}

653 — 2006.08335

\caption{\todo{This is a new table} T-statistics \textcolor{red}{using eval losses from each of the experiments}}

\caption{\todo{This is a new table} P-values for T-statistics \textcolor{red}{using eval losses from each of the experiments}}

654 — 2006.09136

\caption{Experiments for GCN through M3S. \textcolor{gray}{Gray} numbers are from \cite{sun2019multi}.}

\caption{Node classification performances (accuracy; unit: \%) when incorporating three self-supervision tasks (Node \textbf{Clu}stering, Graph \textbf{Par}titioning, and Graph \textbf{Comp}letion) into GCNs through various schemes: pretraining \& finetuning (abbr. P\&T), self-training M3S\cite{sun2019multi}), and multi-task learning (abbr. MTL). \darkred{Red} numbers indicate the best two performances with the mean improvement at least 0.8 (where 0.8 is comparable or less than observed standard deviations). In the case of GCN without self-supervision, \textcolor{gray}{gray} numbers indicate the published results.}

\caption{Experiments on SOTAs (GCN, GAT, GIN, GMNN, and GraphMix) with multi-task self-supervision. % Red numbers indicate the best several results for the corresponding SOTA. % xxx-1, -2, -3 indicates the best, the second and the third best hyper-parameter configurations of the corresponding self-supervised task xxx in validation performance. \darkred{Red} numbers indicate the best two performances for each SOTA.}

\caption{Adversarial defense performances on Cora using adversarial training (abbr. AdvT) without or with graph self-supervision. Attacks include those on links, features (abbr. Feats), and both. \darkred{Red} numbers indicate the best two performances in each attack scenario (node classification accuracy; unit: \%).}

655 — 2006.09348

\caption{ Segmentation Segmentation on Real LiDAR point clouds. \textbf{Left}: LiDARsim trained; \textbf{Right}: real trained. {\color{blue} Road}, {\color{red}Car}, {\color{brown}Background}}

\caption{BEV Detection on real LiDAR point clouds. \textbf{Left:} LiDARsim trained; \textbf{Right}: real trained. {\color{cyan} Blue: Predictions}, {\color{red}Red: Groundtruth}}

656 — 2006.09396

\caption{Tested values for hyperparameter tuning on UCI datasets. The chosen values for each dataset are marked as: (\textasteriskcentered) White wine, (\textdagger) Red wine, (\textdollar) Boston housing.}

657 — 2006.09879

\caption{Angular distance (absolute value) between singular values of classes Ship and Truck, when labels are correct (\textcolor{blue}{blue}) and when labels are random (\textcolor{red}{red}). Randomizing the labels leads to collapse of angular distances and makes the patterns disassociated from the classes.}

658 — 2006.09881

\caption{Determination of the CuI ionization \blue{energy} by \blue{ambient pressure} photoemission spectroscopy 5 min after CuI film growth (a), 65 min after CuI film growth (b), and 48 h after CuI film growth (c). \blue{Under the assumption of a nondegenerate (degenerate) semiconductor, the ionization energy can be extracted by linear regression and extrapolation of the $Y^{\frac{1}{3}}$ spectrum ($Y^{\frac{1}{2}}$ spectrum), where $Y$ is the photoelectron yield.~\cite{Baikie2014,Harwell2016} The extrapolated values (in eV) are shown. Spectra are normalized.}}

\caption{(a): Conductivity as a function of time for a CuI film stored in air at 35\%~RH, and for a nominally identical film stored in a glove box with water content below 1~ppm. (b): Work function (as measured by KP) and ionization \blue{energy} (as measured by PES) of a CuI film held at (35$\pm$5)\%~RH. \blue{The shaded region is a guide to the eye. For reasons explained in the main text, the two initial ionization energy data points are determined using the $Y^{\frac{1}{2}}$ method in Fig.~\ref{fig:pes}(a,b), whereas the final data point is determined using the $Y^{\frac{1}{3}}$ method in Fig.~\ref{fig:pes}(c). Error bars are $\pm 30$~meV for both the ionization energy and the work function.} (c): Compilation of work function values measured in previous studies. Star markers are used for KP measurements.~\cite{Rojas2016,Kaushik2017,Das2015} Triangle markers are used for UV photoemission spectroscopy (UPS) measurements.~\cite{Yoon2016,Shao2012,Sun2014,Jeon2018}}

\caption{(a) Simplified sketch of the mechanism generating a surface dipole by water adsorption \blue{on} CuI. The average perpendicular component of the dipole moment of a water molecule ($\mu_{\perp}$) is given by the vector average of the individual water dipoles $\vec{\mu}_\mathrm{i}$. \blue{The surface is drawn as Cu-terminated due to the high volatility of I.} (b) Band diagram of a CuI surface \blue{shortly after growth ($t = 0$, left side) and after exposure to ambient humidity for 48~hours ($t = 48$~h, right side). VBM and CBM are the valence band maximum and conduction band minimum respectively, and $E_\mathrm{vac}$ is the vacuum level. Ionization energy and work function data are taken from Fig.~\ref{fig:time_dependent}(b).} %The ionization \blue{energy} is $\mathrm{IE} = \mathrm{VBM} - E_\mathrm{vac} = 5.72$~eV. %The work function of the moisture-exposed surface is $\Phi = E_\mathrm{F} - E_\mathrm{vac} = 4.80$~eV on the right side of the surface dipole. The band gap $E_\mathrm{g}$ is estimated as 3.01~eV using a Tauc plot for direct gap materials (Fig.~S4, Supporting Information). ~\blue{Based on this band gap value, the electron affinity of the as-grown surface (2.71~eV) is derived.}}

\caption{Sum $\Sigma_\mu$ of electron- and hole mobility in a CuI film deposited on fused silica glass, as measured by THz \blue{transmission spectroscopy using a 400~nm laser pump to generate free carriers.} The dashed lines are fits to the real and imaginary parts of $\Sigma_\mu$ using the Drude model with a carrier effective mass of 0.3~$m_\mathrm{e}$ and a carrier scattering time of 20~fs. The extrapolated value of $\Sigma_\mu$ at \blue{zero frequency} (DC electric field) is 120~cm$^2$/Vs. }

659 — 2006.09930

\caption{\textbf{tSNE Embedding} -- \textcolor{gtemb}{Blue points correspond to embeddings computed from the original data}, \textcolor{predemb}{yellow points correspond to embeddings predicted by our relational model $\relational$}. (Left) Our model with KL-divergence regularization on the latent space (i.e., \modelemb-$\encoder$/$\decoder$+VAE in \Table{embedding_table}), (middle) our model trained in a sequence-based fashion (i.e., \modelemb-$\relational$~(Ord.) in \Tab{res_table}), (right) our model (i.e., \modelemb-$\relational$ in \Tab{res_table}). }

\caption{\textbf{Failure cases} -- Given the \textcolor{firststroke}{\textbf{first}} and \textcolor{secondstroke}{\textbf{second}} strokes, failed predictions of our model. (Left) A problem with connecting distant shapes via a long arrow. (Middle-Right) With increasing number of predictions, our model may predict overlapping arrows. }

\caption{Given the \textcolor{firststroke}{\textbf{first}} and \textcolor{secondstroke}{\textbf{second}} strokes, random predictions of our model. }

\caption{\textbf{Auto-regressive completion.} Performed by CoSE trained on \didi and \quickdraw datasets.}

\caption{{\bf Qualitative examples from \modelemb} -- Drawings were sampled from the model given the \textcolor{firststroke}{\textbf{first stroke}}. (left) trained on \didi, (right) trained on \quickdraw cats \& elephants, respectively.}

660 — 2006.10019

\caption{\small Spectral flow w.r.t. $\xi_{1,2}$ of the dynamical matrix in the presence of a boundary for (a-d) \textcolor{orange}{\large $\bullet$}-gap and (e-h) \textcolor{violet}{\large $\bullet$}-gap from Fig.~\ref{Fig:SpecIDS3}(b). The simulations were performed on a $21 \times 21$ lattice and $\theta = 1.55$ for \textcolor{orange}{\large $\bullet$}-gap, and on a $26 \times 26$ lattice and $\theta = 2.75$ for \textcolor{violet}{\large $\bullet$}-gap. The spectra computed with periodic boundary conditions (black curves) have been overlaid on top such that the boundary spectra (red curves) can be easily identified.}

\caption{\small Same as Fig.~\ref{Fig:BB1}(a-d) for \textcolor{red}{\large $\bullet$}-gap from Fig.~\ref{Fig:SpecIDS3}(b).}

661 — 2006.10042

\caption[]{Illustration of the process of coarse-to-fine inference of symmetry detection and feature warping module. (a): The sampled normal direction in a 4-round coarse-to-fine inference. The color of points represents the scores from symmetry confidence network. (b): The feature map $\mathbf{F}$ is warped to $\mathbf{F}_i$ according to the transformation $\x' = \mathbf{C}\x$ for various depth $d_i$ in $\x$. Here, the \textcolor{cyan}{input feature} (cyan dot) corresponds to warped features at \textcolor{Green}{correct depth} $d_1$ (green dot) rather than the warped features at \textcolor{orange}{incorrect depth} $d_2$ (orange dot).}

662 — 2006.10183

\caption{Normalized dimensions of strict diagrams in a greedy sequence (\protect\includegraphics{line_red.png}) and in an improved sequence (\protect\includegraphics{line_green.png}).}

\caption{The differences of normalized dimensions multiplied by $\sqrt{n}$ when a box is added to the 6th row (\protect\includegraphics{line_red.png}) and the 9th row (\protect\includegraphics{line_green.png}).}

\caption{Normalized dimension of strict diagrams in the greedy sequence (\protect\includegraphics{line_green.png}) and in a typical Plancherel sequence (\protect\includegraphics{line_red.png}).}

663 — 2006.10369

\caption{WMT14 EN$\rightarrow$DE test results over varying depths of the encoder under the \smin latency constraint of \textcolor{nred}{AT 12-1} \textcolor{nred}{$\blacksquare$}.}

664 — 2006.10373

\caption{Identification result under transient conditions, the left figure shows the magnitude of the identified frequency response function and the right figure shows its estimation error when compared to the model. The results show that the LPM \tikzdashedline{mycolor2}, described in Sec. \ref{sec:LPM}, outperforms the spectral analysis method using both a rectangular \tikzdottedline{mycolor4} and a Hann \tikzdashdottedline{mycolor5} window. The LPM is invariant to the transient contribution \tikzline{mycolor3} that dominates the response at lower frequencies when compared to the plant \tikzline{mycolor1}. }

\caption{Comparison of the estimated FRF of the true system \tikzdashedline{mycolor1} using the direct method \tikzdashdottedline{mycolor5} and the indirect method \tikzline{mycolor3} . It is shown that applying the direct method in a closed-loop setting yields a significantly biased result.}

\caption{Experimental estimation of the MIMO transfer function matrix, shown as a magnitude [dB] plot, using both matrix wise \tikzline{mycolor1}, with corresponding variance \tikzline{mycolor2}, or element wise division \tikzline{mycolor3}, shown without variance, yielding significantly different models. Depending on the desired model, a specific operator should be used.}

\caption{Estimating the FRF of the sensitivity function $S = \frac{u}{d}$ using $2$ periods of a $5$ [s] multisine, compared to using $6$ periods as a baseline reference \tikzline{mycolor1}. Results show that the estimation error using spectral analysis \tikzdottedline{mycolor3} is significantly higher than when using the LPM method \tikzdashedline{mycolor2}, this is caused by the transient contribution \tikzdashdottedline{mycolor4}.}

665 — 2006.10379

\caption{Identification experiment used to identify the temperature dependent electrical resistance $R_m(T)$ and Seebeck coefficient $S_m(T)$. The different sub-plots show the Current, Voltage and Temperature respectively. The experiment is repeated for $3$ modules, TEM 1 \tikzline{mycolor1}, TEM 2 \tikzline{mycolor2} and TEM 3 \tikzline{mycolor3}. }

\caption{Identifying the temperature dependent electrical resistance $R_m(T)$ for different peltier modules. It shows that for TEM 1 \tikzmarkline{mycolor1}, TEM 2 \tikzmarkline{mycolor2} and TEM 3 \tikzmarkline{mycolor3} the results show a similar linear relation with the average temperature $T_{avg}$ for all TEMs leading to an average $R_m(T_{avg})$ \tikzdashedline{black}.}

\caption{Identifying the temperature dependent Seebeck coefficient $S_m(T)$ for different TEMs. It shows that for TEM 1 \tikzmarkline{mycolor1} and TEM 2 \tikzmarkline{mycolor2} the result is quite similar, and TEM 3 \tikzmarkline{mycolor3} deviates from the rest. This yields a slightly shifted average linear relation for $S_m(T_{avg})$ \tikzdashedline{black}.}

\caption{Simulation results (dashed) compared to experimental measurements (solid) using constant parameters at $T_{avg} = 35^\circ C$ for $T_1$ \tikzline{mycolor2}, $T_2$ \tikzline{mycolor5}, $T_3$ \tikzline{mycolor1}, $T_4$ \tikzline{mycolor6}. % The hot side of the peltier is connected to a waterchiller \tikzline{mycolor3} that maintains a constant temperature. The results show that at temperatures significantly different from $T_{avg} = 35^\circ C$ the model is inaccurate since temperature dependency must be taken into account. }

\caption{Simulation results (dashed) compared to experimental measurements (solid) using temperature dependent parameters for $T_1$ \tikzline{mycolor2}, $T_2$ \tikzline{mycolor5}, $T_3$ \tikzline{mycolor1}, $T_4$ \tikzline{mycolor6}. The model prediction error is significantly improved to results in Fig. \ref{fig:Sim_Peltier_P2_Tavg_35} by taking into account temperature dependent parameters. }

666 — 2006.10463

\caption{Geometry and dimensions of the SIS-100 superconducting dipole magnet. This worksheet enables the~students to build up the model themselves. Nonetheless, a ready model can be offered to bring down the extend of the~exercise or to serve as a backup. The steps to be carried out by the students are marked by \protect\tikz[baseline=-0.5ex]{ \protect\node[star,star points=7,star point ratio=0.6,draw=black,fill=black!30] at (0,0) {1}; } up to \protect\tikz[baseline=-0.5ex]{ \protect\node[star,star points=7,star point ratio=0.6,draw=black,fill=black!30] at (0,0) {41}; }. This allows the tutor to monitor the progress of the students or the student groups. }

\caption{Worksheet for calculating the coefficients of the algebraic system of equations. Here, a bit of calculus is required to come up with expressions for the matrix coefficients and right-hand-side contributions. Task \protect\tikz[baseline=-0.5ex]{ \protect\node[star,star points=7,star point ratio=0.6,draw=black,fill=black!30] at (0,0) {28}; } requires a first implementation action at the implementation points in three routines and thereby marks the start of the third part of the exercise. Here, the students may need some support when coding the first lines.}

667 — 2006.10480

\caption{ ROC curves for the $H \to b\bar{b}$, $H \to gg$, and $Z \to q\bar{q}$ analyses described in Section~\ref{sec:dist}. These curves characterise the trade-off between the (desirable) true-positive event identification rate (aka efficiency), and the (undesirable) false-positive identification rate. Curves are shown for selections defined by sliding $x < C$ cuts, for all possible values of the cut $C$ and observables $x$ from the set of jet color ring \colorring, dipolarity \dipolarity, pull angle \pullangle, and \Dtwo.}

\caption{Illustrations of complementarity of \Dtwo with the color ring observables. The first two figures show the 2D distributions of \Dtwo with the color ring \colorring and its $\colorring'$ variant, with contour overlays at 50\% and 75\% of the maximum values for signal (solid) and background (dashed) densities separately, to indicate their main concentrations. Some orthogonality is seen between the observables, suggesting additional separation power from a 2D cut. The third plot shows the previous ROC curves for \Dtwo and \colorring, compared to the performance of a simple 2D $\Dtwo + \colorring' < C$ cut, an example of which is indicated by a dotted line in the middle plot. Mild performance improvements over \Dtwo alone are seen for all values of signal efficiency.}

668 — 2006.10721

\caption{(a) Regression: the pixels in groundtruth box, \ie the \textcolor{red}{red} region, are labeled as the positive samples in training. (b) Regular-region classification: the pixels closing to the target's center, \ie the \textcolor{red}{red} region, are labeled as the positive samples. The \textcolor[RGB]{204,0,153}{purple} points indicate the sampled positions of a location in the score map. (c) Object-aware classification: the \textit{IoU} of predicted box and groundtruth box, \ie, the region with \textcolor{red}{red} slash lines, is used as the label during training. The \textcolor[RGB]{102,255,255}{cyan} points represent the sampling positions for extracting object-aware features. The \textcolor{yellow}{yellow} arrows indicate the offsets induced by spatial transformation. Best viewed in color.}

\caption{Performance comparisons on VOT-2018 benchmark. {\color{red}{Red}}, {\color{green}green} and {\color{blue}{blue}} fonts indicate the top-3 trackers. ``Ocean'' denotes our propose model.}

669 — 2006.10724

\caption{Comparisons of prior DARTS and our CDARTS. In prior DARTS\cite{DARTS} and PDARTS\cite{PDARTS}, the target evaluation network does not involve into the progress of architecture search. In contrast, our CDARTS combines the search and evaluation networks into a joint optimization framework. The \textcolor[rgb]{0.67,0.847,0.92}{\bf blue} and \textcolor[rgb]{1,0.8,0.2}{\bf gold} boxes indicate the search and evaluation networks, respectively. %(Best viewed in color). }

670 — 2006.10742

\caption{Train Policy (changes to SAC in \blue{blue})}

671 — 2006.10839

\caption{% (a) Definition of the enstrophy and energy peak wavelengths. \solid, Premultiplied enstrophy spectrum; \dashed, energy spectrum. Case T768 at $\omega_0' t =6.3$. % (b) Evolution of the enstrophy and energy peak wavelengths; normalized with the enstrophy wavelength at $t=0$. Symbols as in table \ref{tab:cases}. Open symbols are enstrophy, and closed black ones are energy. The two polynomial fits are used as reference in later figures. The red closed symbols are the Taylor scale, stretched for clarity to $10\lambda_\tau/\lambda_{\omega 0}$. % (c) Logarithmic slope of the energy spectrum at the end of each simulation. From left to right, T1024 to T256. The two horizontal lines mark slopes $-1$ and $-3$, which respectively correspond to the energy and enstrophy peak wavelengths in (a). % }

\caption{% (a,b) P.d.f. of the template approximation error. Energy error norm. Case T512, $\omega'_0 t=9.2$. Template size, increasing from red to blue: $L_T/L=0.045$, 0.08, 0.11, 0.22, 0.34, 0.45, 0.56. (a) Template is a vortex. (b) Template is a dipole. The arrow is in the direction of increasing $L_T$. % (c,d) Approximation error as a function of case and of template size. Cases are plotted for different times as grey lines without labels, except for the final time of each simulation, which is highlighted and labelled as in table \ref{tab:cases}. \solid, Template is a vortex; \dashed, template is a dipole. The dashed vertical line is a representative value of $\lambda_\omega/L$, from table \ref{tab:cases}. % (c) Error is averaged over all template positions. % (d) Error measured as the fraction of relative local errors larger than unity. % (e) Template size for optimum $P_q$. Lines are the polynomial fits to $\lambda_\omega$ and $\lambda_q$ in figure \ref{fig:specs}(b). Closed symbols are dipole templates; open symbols are vortex templates. (f) As in (e), for the optimum $P_q$. % }

\caption{% Properties of the largest thresholded structure of low $P_q$ in each snapshot. % (a) Temporal evolution of the geometry of the largest structure. \solid, Inner scale normalised with the initial enstrophy scale, $\rho_1/\lambda_{\omega 0}$; \dashed, area of the largest structure normalised with its initial value, $S/S_0$; symbols without lines are the integral length $L_{int}/\lambda_{\omega 0}$ for the correlations of $\Phi_q$, defined in \r{eq:Lint}. Symbols as in table \ref{tab:cases}. Open symbols are vortex templates; closed symbols are dipoles. % (b) Flow properties within the largest structures. \solid, kinetic-energy density; \dashed, enstrophy density. % (c) As in (a), but unnormalised, versus the kinetic-energy wavelength. % }

\caption{% Properties of the vortex pairs. % (a) Joint p.d.f. of the velocity of the centre of gravity of a vortex pair versus the averaged velocity magnitude of its two component vortices. In all the panels in this figure: \dotted, T256; \dashed, T512; \chndot, T768; \solid, T1024. Red lines are dipoles, and black ones are corrotating pairs. The two probability contours in each case enclose 50\% and 95\% of the probability mass. % (b) Mean circulation magnitude of the vortex components of the pair, versus the inter-component distance. % (c) Vortex circulation versus vortex mobility. % }

672 — 2006.10915

\caption{Data Collection based on our proposed system \textcolor{red}{(combine this plot with Figure 1)} \label{fig: IHSCHASH}}

673 — 2006.11001

\caption{QC success rate ($\%$) of the Doppler spectra for short ($N=1024$ time steps) synthetic HFR times series, when computed with {\bfseries\color{blue} ------}~AR-MEM and experimental noise; {\bfseries\color{red} ------}~FFT method and experimental noise; {\bfseries\color{blue} -\,-\,-\,-}~AR-MEM and Gaussian white noise; {\bfseries\color{red} -\,-\,-\,-}~FFT method and Gaussian white noise.}

\caption{Fraction of radar cells (covered area, in $\%$) passing the quality test when the radial current is computed with short (266~s, 1024 time steps) noisy HFR times series and processed with: {\bfseries\color{blue} ------}~AR-MEM; {\bfseries\color{red} ------}~FFT method. The dataset has been acquired with the Cap B\'enat WERA HFR on Feb. 4th,~2019.}

\caption{Temporal fluctuations of the radial surface current $U_r$ estimated with a short integration time of $T=266$~s (1024 time steps): {\color{blue}{$\circ$}}~MEM and {\color{green}{$\diamond$}}~FFT; long integration time $T=1065$~s (4096 time steps): {{\bfseries\color{red} ------}\,/\,\color{red}{$\diamond$}}~FFT. The data have been acquired on Feb. 4th, 2019 with the Cap B\'enat WERA HFR and the radar cell is chosen in the middle of the radar coverage. Brief fluctuations are enlighted at around 21:00~UTC.}

\caption{Same as Figure \ref{fig:timeseries} for the radial surface current magnitude $|U_r|$ estimated with a short integration time of $T=266$~s ($1024$ time steps): {\color{blue}{$\circ$}}~AR-MEM; {\color{cyan}{$\square$}}~MLE.}

\caption{Experimental spectrum of the radial surface current $U_r$ at a given radar cell. $U_r$ estimated with: {\bfseries\color{blue} ------}~MEM, short integration time ($N=1024$~time steps); {\bfseries\color{red} ------}~FFT, long integration time ($N=4096$~time steps). Data have been acquired on Feb. 6th to 10th, 2019, with the Cap B\'enat WERA HFR; the radar cell is in the middle of the coverage.}

674 — 2006.11007

\caption{A conceptual illustration of the effect of adversarial distribution shift on BatchNorm. In the plot, the {\color{blue}blue line} represents an ideal distribution that BatchNorm and {\color{orange}orange line} shows the inference data distribution. Input distribution is a good approximate of ideal distribution for clean images but the distribution gets shifted when adversarial noise is added in the input image. This invalidates the implicit assumption of BatchNorm that the train and validation data will be from the same distribution. This makes population statistics estimated during training (with clean distribution) inaccurate and causes adversarial vulnerability.}

675 — 2006.11132

\caption{\textbf{Overview.} \textbf{(a)} Given an image \red{$x_i$} and prototypes \blue{$c_1$} and \green{$c_2$}, standard clustering such as K-means assigns the sample to the closest prototype. Our DTI clustering first aligns prototypes to the sample using a family of parametric transformations - here rotations - then picks the prototype whose alignment yields the smallest distance. \textbf{(b)} We predict alignment with deep learning. Given an image \red{$x_i$}, each parameter predictor~\orange{$f_k$} predicts parameters for a sequence of transformations - here affine \orange{$\mathcal{T}^{\,\textrm{aff}}_{\beta_{\textrm{aff}}}$}, morphological \orange{$ \mathcal{T}^{\,\textrm{mor}}_{\beta_{\textrm{mor}}}$}, and thin plate spline \orange{$\mathcal{T}^{\,\textrm{tps}}_{\beta_{\textrm{tps}}}$} - to align prototype {$c_k$} to \red{$x_i$}. \textbf{(c)} Examples of interpretable prototypes discovered from large images sets (15k each) associated to hashtags in Instagram using our DTI clustering with 40 clusters. Each cluster contains from 200 to 800 images.}

\caption{\textbf{Qualitative results.} \textbf{(a)} compares prototypes learned from GMM and our DTI GMM, \textbf{(b)} shows transformed prototypes given \tsfblue{query samples} from MNIST and highlight the \tsfgreen{closest prototype}.}

676 — 2006.11163

\caption{(a) Geometry of the microchannel and sharp-edge. (b) Trajectories of individual particles (diameter 4.9 $\mu$m), over several periods, \textcolor{red}{for the left-hand-side zoom-in image, the frame per second (fps) equals $4 f$ = 10000 fps, while for the right-hand-side one, the fps equals $10 f$ = 25000 fps, the two images have the same exposure time $1/(10f) = 1/25000$s}. Far from the tip, the flow is oscillating at frequency $f$ and amplitude $A$, as testified by the segment described by each particle. Close to the tip, the trajectories of the particles show a superposition of oscillations with higher amplitude due to the sharp edge and advection due to the intense streaming flow.}

677 — 2006.11271

\caption{Chiralities (cw: clockwise, ccw: counterclockwise) of the central VW after the first pulse (pw2, +y direction), after the second pulse (pw1, \textcolor{gray}{greyed out}); all measurements have been confirmed by micromagnetic simulations. $\vec{B}\uparrow$/$\vec{B}\downarrow$ indicates the field direction in relation to the curvature of the wire: $\curvearrowleft$/$\curvearrowright$ indicate the motion of the vortex in the HR element. U and L indicate upper and lower HR element. Results obtained by SEMPA measurements and micromagnetic simulations ($B=$\unit[15]{mT}) have normal font, Italic setting means only simulation with $B=$\unit[15]{mT}, an asterix indicates additional confirmation with B=24 mT simulation. “H2H” stand for head-to-head DW; “T2T” for tail-to-tail DW.}

678 — 2006.11325

\caption{$5$-way $K$-shot accuracies with 95\% confidence intervals on mini-ImageNet for a varying number of training images and classes. Methods: ProtoTransfer (\solidrule), transfer learning baseline Pre+Linear (\protect\dashedrule). Note the logarithmic scale. Detailed results available in Table \ref{tab:results_n_classes_or_images_full} in the appendix.}

\caption{$5$-way $K$-shot accuracies with 95\% confidence intervals on mini-ImageNet as a function of training images and classes. Methods: ProtoTransfer (\solidrule), transfer learning baseline Pre+Linear (\protect\dashedrule). Note the logarithmic scale. Detailed results available in Table \ref{tab:results_n_classes_or_images_full} in the appendix.}

679 — 2006.11438

\caption{Network architecture of DICG. It can be used for either a centralized-training-centralized-execution (CTCE) approach or as a centralized-training-decentralized-execution (CTDE) approach. The \textcolor{nice-blue}{blue} arrows indicate the CTCE approach. The DICG module serves as a joint observation encoder. We use the integrated observations $\Tilde{E}$ to directly obtain actions for agents through a parameter sharing policy. The baselines in CTCE are estimated by a concatenation of raw observations. The \textcolor{nice-red}{red} arrows indicate the CTDE approach. We pass the integrated observations $\Tilde{E}$ through an aggregator network to estimate a centralized baseline. We then use the baseline to compute the advantage to guide policy optimization.}

680 — 2006.11530

\caption{(a) Intrinsic HWHM plot of magnon modes ($\Gamma(\q,E_{\q})$) along the [-2H H 0] direction with calculated two-magnon (red solid line) and Stoner (blue dashed line) continuum DOS at $(\q,E_\q)$. $\Gamma(\q,E_\q)$ was extracted from the fitting (\figref{fig:Gamma}) with the instrumental resolution excluded. \tred{Data points in a shaded region may not be reliable due to the overlap with the continuum-like signal.} Error bars indicate the standard deviations of the fitted HWHM. (b) INS data, (c) calculated two-magnon continuum DOS, and (d) calculated Stoner continuum DOS along the same direction as in (a). Red dotted lines indicate the magnon dispersion $E_{\q}$ from the LSWT calculations. White rectangles indicate the region where the continuum-like excitations appear. (e)-(h) are shown along the [-K K 0] direction. For a better presentation, a logarithmic scale was used in (c)-(d) and (g)-(h). }

681 — 2006.11575

\caption{{\protect\small The behavior of effective potential in terms of $r $ for $M=1,\protect\theta=\frac{\protect\pi}{2},b=\frac{E}{L}=0.1,q=0.5,a=0.5 $% . (\textcolor{red}{dashed line} for metric (\protect\ref{metric}) and \textcolor{blue}{solid line} for metric (\protect\ref{eq22}).}}

682 — 2006.11628

\caption{Causal tree approach from \textcolor{blue}{\cite{athey2016recursive}} on Study 2 data}

683 — 2006.11687

\caption{$B_{\BWT}$ for our example, with the BWT of $S$ and the suffixes of $S$ in lexicographic order. We have highlighted in red the unique proper phrase suffix of length at least $w$ following each character, to clarify how $B_{\BWT}$ is defined. (We show $S [n - 1] = \mathtt{\#}$ and the empty suffix as {\tt \textcolor{red}{\#GATTAC}AT\#GATACAT\#GATTAGATA\#\#} and {\tt \textcolor{red}{GATTAC}AT\#GATACAT\#GATTAGATA\#\#} instead, because we consider $S$ to be cyclic and this should make clearer how the characters in the BWT are sorted.)}

684 — 2006.11710

\caption{\label{fig:fig4}The angular frequency variation against magnetic field corresponds to the discharge conditions of Fig.~\ref{fig:fig3}{\color{black}$\blacksquare$} $V_{up}$ = 60 V, $V_{down}$ = 50 V and annulus width of dusty plasma $\sim$ 5 mm, {\color{red}$\bullet$} $V_{up}$ = 60 V, $V_{down}$ = 50 V, and annulus width $\sim$ 2.5 mm, {\color{blue}$\blacktriangle$} $V_{up}$ = 50 V and $V_{down}$ = 50 V and annulus width $\sim$ 4 mm.}

\caption{\label{fig:fig8} The angular frequency plots against magnetic field correspond to discharge conditions of Fig.~\ref{fig:fig7}{\color{black}$\blacksquare$} $V_{up}$ = 55 V, $V_{down}$ = 50 V and annulus width of dusty plasma $\sim$ 4 mm, {\color{red}$\bullet$} $V_{up}$ = 55 V, $V_{down}$ = 55 V, and annulus width $\sim$ 9 mm, {\color{blue}$\blacktriangle$} $V_{up}$ = 45 V and $V_{down}$ = 45 V and annulus width $\sim$ 3 mm.}

685 — 2006.12100

\caption{Ablation results on the {validation} sets in the transductive setting. (i) Without using the feed-forward sub-layer: $\boldsymbol{\mathsf{u}}^{(k)}_{\mathsf{v}} = \textsc{LayerNorm}\left(\boldsymbol{\mathsf{u}}^{(k-1)}_{\mathsf{v}} + \textsc{Att}\left(\boldsymbol{\mathsf{u}}^{(k-1)}_{\mathsf{v}}\right)\right)$ (ii) Without using the multi-head self-attention sub-layer: $\boldsymbol{\mathsf{u}}^{(k)}_{\mathsf{v}} = \textsc{LayerNorm}\left(\boldsymbol{\mathsf{u}}^{(k-1)}_{\mathsf{v}} + \textsc{FF}\left(\boldsymbol{\mathsf{u}}^{(k-1)}_{\mathsf{v}}\right)\right)$. $\ast$ denotes the statistically significant differences at $p < 0.05$ (using the two-tailed {paired t-test}). }

686 — 2006.12179

\caption{% Example of a molecular graph and its cluster assignment for obtaining junction trees. Cluster colors refer to \textcolor{orange}{$\blacksquare$} singletons, \textcolor{blue}{$\blacksquare$} bonds and \textcolor{green}{$\blacksquare$} rings. }

687 — 2006.12249

\caption{Ratio $\kappa(\alpha)$ of transition rates from minima of the potential (\ref{eq:potential}) to the barrier top as a function of the stability index $\alpha$. Points represents results of computer simulation while solid lines present scallings given by Eq.~(\ref{eq:transitionRate}) (green solid line) and Eq.~(\ref{eq:ratioReturn}) (orange dashed line). Black dots (\textcolor{black}{$\bullet$}) represent unrestricted motion, while red squares (\textcolor{red}{$\blacksquare$}) correspond to the motion restricted by reflecting boundaries placed in the minima of the potential. Simulations parameters $h_1=8$, $h_2=12$, $L_1=1$, $L_2=1$, $\gamma=1$ and $\sigma=0.2.$ }

688 — 2006.12394

\caption{For the stochastic oscillator \cref{eq:43} with $m=2$ and $\sigma_\varepsilon^2 = 10^{-3}$, (a) performance of US (\solidbrown), US-LW\textsubscript{raw} (\solidcyan), and US-LW with $n_\textit{GMM}=1$ (\dashedred), $n_\textit{GMM}=2$ (\solidred), $n_\textit{GMM}=4$ (\dashdottedred), and $n_\textit{GMM}=6$ (\dottedred); and (b) performance of IVR (\solidbrown), IVR-IW (\solidyellow), and IVR-LW with $n_\textit{GMM}=1$ (\dashedred), $n_\textit{GMM}=2$ (\solidred), $n_\textit{GMM}=4$ (\dashdottedred), and $n_\textit{GMM}=6$ (\dottedred). The error bands indicate one half of the median absolute deviation.}

\caption{For the stochastic oscillator \cref{eq:43} with $m=10$, performance of US-LW (\uslwline), IVR-LW (\ivrlwline), and LHS of the active subspace with $k=2$ and $q=2$ (\lhsline), $k=4$ and $q=2$ (\lhslinedashed) and $k=4$ and $q=4$ (\lhslinedotted) for (a) $\sigma_\varepsilon^2 = 0$ and (b) $\sigma_\varepsilon^2 = 10^{-3}$. The error bands indicate one half of the median absolute deviation. }

\caption{Progression of the sequential sampling algorithm at iteration 20 (left panel), 40 (center panel), and 80 (right panel): (\textit{a}--\textit{c}) danger map computed on test data, with the $z$-axis ranging from 0 to 0.14 (as opposed to 1.1 in \cref{fig:2a}) to facilitate visualization of the attractor core where the algorithm operates; and (\textit{d}--\textit{f}) for the same test data, time series for the observable (\blackline), the indicator $\mu$ trained with the sequential algorithm (\orangeline), and the indicator $\mu$ trained with LHS points (\thinblueline).}

689 — 2006.12411

\caption{Component functions of each feature in the the generalized additive model. Shown are the values model estimates (solid black line) along with the 95\% confidence interval (dashed lines). {\color{red} TODO: make graphs more readable. pick which ones to show: only significant ones}}

690 — 2006.12429

\caption{Numerical validation of the behavior of two spins with the coupling constant $ J = 2 $ and the local quenched disorder fields $ b_1 = 2 $, $ b_2 = 1 $. The colored lines show the energy of the different internal states of the system $ H_i $. It can be seen, that for a single spin flip dynamics only three states of the system can be reached (\textcolor{lachs}{\linie}, \textcolor{rot}{\linie}, \textcolor{blau}{\linie}), whereas one state is impossible to reach (\textcolor{gruen}{\linie}). Also the related threshold values $ \beta = -3 $ and $ \alpha = 0 $ are given by the change of the energetic favorable states, indicated by the corresponding interception points.}

\caption{For two coupled spins $ N = 2 $ the system shows the bifurcation scenarios expected from piecewise-smooth square-root maps, illustrated by the different colored boxes: immediate jump to robust chaos with a positive largest Lyapunov exponent $\lambda_\text{max}$ (\textcolor{rot}{\linie}), period-adding with chaotic windows (\textcolor{blau}{\linie}) and overlapping period-adding cascade (\textcolor{gruen}{\linie}). The local disorder fields are $ b_1 = 1 $ and $ b_2 = -1 $.}

\caption{Comparison of the chaotic attractor (\textcolor{grau}{\linie}), the Poincar\'{e} section (\textcolor{blau}{\linie}) and the magnetization (\textcolor{rot}{\linie}). (a): The continuous system in its thermodynamic limit. (b): The piecewise-smooth system with $N/2 = 10\,000$ spins. For both systems we chose $C=4.0$ and one specific disorder realization $ b_i $.}

\caption{Calculation of the mean of the box counting and the Kaplan-Yorke dimension over $500$ disorder realizations for the piecewise-smooth system (squares and crosses). The dashed horizontal lines illustrate the corresponding fractal dimensions for the system in its thermodynamic limit. (a): The red and the blue curve are determined for $ C = 4.0 $ (\textcolor{rot}{\linie},\textcolor{blau}{\linie}). (b): The green and violet curve for $ C = 2.9 $ (\textcolor{gruen}{\linie},\textcolor{lila}{\linie}).}

\caption{(a): For $ C = 4.0 $ (\textcolor{rot}{\linie},\textcolor{blau}{\linie}) and for $ C = 2.9 $ (\textcolor{gruen}{\linie},\textcolor{lila}{\linie}) we determined the \ac{SAP} of the box counting and the Kaplan-Yorke dimension over $500$ disorder realizations for the piecewise-smooth system. The fractal dimensions shows self averaging behavior in case of $ C = 4.0 $. For $ C = 2.9 $ the box counting and the Kaplan-Yorke dimension do not show self averaging properties. (b): For $ C = 2.9 $ the number of realizations with $ D_\text{KY} = 0 $ converges to a constant value near $100$. This indicates the non self-averaging property of the Kaplan-Yorke dimension.}

\caption{Histograms of the empirical \ac{PDF} for $ N = 10\,000 $, $ 15\,000 $ and $ 20\,000 $ for $ C = 4.0 $ (\textcolor{blau}{\linie}), $ C = 2.9 $ (\textcolor{gruen}{\linie}) and for \ac{iid} input (\textcolor{black}{\linie}). In contrast to the magnetization for \ac{iid} input, in the system with feedback for both values of $ C $ the \ac{PDF} does not become narrower for increasing $ N $, which indicates the non self-averaging property of the magnetization.}

691 — 2006.12483

\caption{Validation (\full) and test (\dashed) loss in the FCN prediction at (from left to right, top to bottom): $y^+=15$, 30, 50 and 100. Orange represents the models trained with the full dataset and random initialization, grey the models trained with the full dataset and initialized with previously-trained networks, pink and brown represent models initialized with the parameters from the $Re_{\tau}=180$ network, trained with 50\% and 25\% of the original dataset, respectively.}

692 — 2006.12486

\caption{\textsc{LMConv} with {\color{ForestGreen} mask conditioning}}

693 — 2006.12588

\caption{Example of a projection of a state space trajectory of the harmonic oscillator coupled to $N=3$ spins. There are four smooth regions $S_i$ with piecewise constant magnetization (\textcolor{rot}{\linie}) corresponding to four different spin configurations ($\uparrow\downarrow\uparrow$). At each of the three boundaries a jump discontinuity appears in the acceleration $\ddot{q}$. The values of the local disorder are $b_1 = -1.0$, $b_2 = 1.5$ and $ b_3 = -3.0 $.}

\caption{Bifurcation diagram for the totally symmetric system with $ N = 1$ and $ b_1 = 0 $. The system shows the bifurcation scenarios expected from piecewise-smooth square-root maps, illustrated by the different colored boxes (from left to right): immediate jump to robust chaos (\textcolor{rot}{\linie}), period-adding with chaos (\textcolor{blau}{\linie}) and overlapping period-adding cascade (\textcolor{gruen}{\linie}).}

\caption{Comparison of the chaotic attractor (\textcolor{grau}{\linie}), Poincar\'{e} section (\textcolor{blau}{\linie}) and magnetization (\textcolor{rot}{\linie}). (a): The system in its thermodynamic limit $ N \to \infty $. (b): The piecewise-smooth system with $N=20\,000$ spins and one specific disorder realization $ \{b_i\} $. Both systems are evaluated for fixed coupling constant $C=3.5$ and with the randomness set to $ R = 1.7 $.}

\caption{For $ C = 2.9 $ and $ N = \num{20000} $ there exist two different typical attractors of the system illustrated by the red (\textcolor{rot}{\linie}) and blue (\textcolor{blau}{\linie}) dots. This explains the non self-averaging behavior of the magnetization. (b): The non self-averaging behavior of the magnetization is also reflected in the corresponding histogram by the two main bars at $ M = \pm 0.2 $.}

694 — 2006.12890

\caption{A self-generated label of a training image during the training. \textcolor{yellow}{Yellow}: scribble for background, \textcolor{blue}{Blue}: scribble for cell, \textcolor{red}{Red}: the pixels below consistency threshold $\uptau$. these pixels are ignored when calculating unscribbled loss. White and Black are the cell or background pixels over $\uptau$. The scores show IoU compared with the ground truth.}

695 — 2006.13084

\caption{Detailed architecture overview of \netname. Each object represents either a direct network prediction ({\setlength{\fboxsep}{0.5pt}\colorbox{c_yellow}{yellow}}, e.g. Bounding Box predictions and classification parameters), a calculated item ({\setlength{\fboxsep}{1pt}\colorbox{c_green}{green}}, e.g. scaling, translation and rotation matrix $\mathcal S, \mathcal T$ and $ \mathcal R$) or a constant ({\setlength{\fboxsep}{0.5pt}\colorbox{c_purple}{purple}}, e.g. the projection matrix $\mathcal P$ or the camera extrinsics). \textbf{Left:} Different types of explicit network predictions like 2D Bounding Boxes, Regression and Classification parameters. A description of these predictions can be found in \autoref{sub:prediction}. \textbf{Middle:} Graph of the calculation steps performed in the 3D Box Generator. The 3D Box Generator calculates a scaling, translation and rotation matrix $\mathcal S, \mathcal T$ and $ \mathcal R$ for each object. Using this set of transformations the final 3D bounding box is calculated and back-projected into image plane. See \autoref{sub:3dboxcalc} for more details. \textbf{Right:} The losses that are used in \netname. Different types of losses are applied both explicitly on network predictions as well as implicitly on calculated objects as described in \autoref{sub:losses}. Best viewed in color. \label{fig:netoverview}}

696 — 2006.13133

\caption{\textcolor{r1}{Vehicle-level conflict separation}}

\caption{\textcolor{r1}{List of Abbreviations}}

697 — 2006.13164

\caption{\textbf{Proposed Network.} (a) Appearance and motion feature extractor: we extract the motion feature $f_m$ using LSTM and MLP shown in {\color{red} \textbf{red}} from tracklets $T_{t-1}$ and anchors $A_t$ respectively. A shared DarkNet53 shown in {\color{blue} \textbf{blue}} is used to extract the appearance feature $f_a$ from $T_{t-1}$ and $A_t$ as well. (b) Graph Neural Network: we design a GNN module shown in {\color{antiquefuchsia} \textbf{purple}} to model the object-object interactions through feature aggregation. The detection head uses the node feature from anchors to predict bounding boxes, while the data association head learns to regress the similarity matrix based on the edge features.}

698 — 2006.13419

\caption{ \protect\subref{fig:vel_dissip_input} Parameterized jet centerline velocity (\protect\blueline) and dissipation (\protect\redline), used as an input to the 1D ODE model. \protect\subref{fig:conc_cent_output} Scaled centerline number concentration, $n_2$ ; $d = 18.5\;\mu \mbox{m}$ (\protect\magline), $5\times n_7$ ; $d = 76\;\mu \mbox{m}$ (\protect\blueline), $10\times n_9$ ; $d = 134\;\mu \mbox{m}$(\protect\blackline) and $10\times n_{12}$ ; $d =313\;\mu \mbox{m}$(\protect\redline) as a function of downstream distance. The initial conditions for LES are determined by the concentration values at $z=10 D_J$ depicted by the dashed line (\protect\blackdline).}

\caption{(a) Comparison of centerline droplet size distribution from experimental data (\protect\bsq, right axis) at $z/D_J=666$ with extended LES results (also right axis, \protect\reddline). The latter is obtained by solving Eq. (\ref{eqn:centerline_conc}) using the LES data as inlet condition at $z/D_J=333$ (left axis) as initial condition (these LES data at $z/D_J=333$ are shown by the top \protect\reddline line). Error bars display the r.m.s. at $z/D_J=333$ due to turbulence. The 1D ODE model applied between $z/D_J=2$ to 333 is depicted by (\protect\kcircle, left axis). (b) Comparison of droplet size distribution from SIM 2 (\protect\blueddline) with 1D ODE model (\protect\kcircle) and SIM 1 size distribution (\protect\reddline) at $z/D_J = 333$.}

\caption{The top panel depicts the radial distribution of the averaged Sauter mean diameter, $D_{32}$ normalized by its centerline value while the bottom panel depicts the normalized standard deviation at $z/D_{J}= 135$ (\protect\redline), $z/D_{J}= 168 m$ (\protect\greendline), $z/D_{J}= 211$ (\protect\blueddline), $z/D_{J}= 243$ (\protect\magdline) for (a) SIM 1 and (b) SIM 2.}

\caption{The top panel depicts the radial distribution of the averaged total surface area, $\widetilde{A}$ normalized by its centerline value while the bottom panel depicts the normalized standard deviation at $z/D_{J}= 135$ (\protect\redline), $z/D_{J}= 168 m$ (\protect\greendline), $z/D_{J}= 211$ (\protect\blueddline), $z/D_{J}= 243$ (\protect\magdline) for (a) SIM 1 and (b) SIM 2.. }

\caption{ Evolution of the inverse breakup time scale with downstream distance for (a) SIM 1 and (b) SIM 2. The lines are $d = 14\;\mu \mbox{m}$ (\protect\redline), $d = 100\;\mu \mbox{m}$ (\protect\greendline), $d = 550\;\mu \mbox{m}$ (\protect\blueddline) and $d = 2261\;\mu \mbox{m}$ (\protect\magdline) and $d = 3000\;\mu \mbox{m}$ (\protect\blackldline). }

\caption{The top panel depicts the radial distribution of the averaged inverse breakup timescale, $\tilde{t}_i = \widetilde{S}_{b,i}/\tilde{n_i}$ normalized by its centerline value while the bottom panel depicts the normalized standard deviation for (a) SIM 1 and (b) SIM 2.The lines are $d = 14\;\mu \mbox{m}$ (\protect\redline), $d = 100\;\mu \mbox{m}$ (\protect\greendline), $d = 550\;\mu \mbox{m}$ (\protect\blueddline) and $d = 3000\;\mu \mbox{m}$ (\protect\magdline) at $z/D_{J}= 70$.}

699 — 2006.13461

\caption{Examples of lazy learning and a simple method to alleviate it \textit{(best viewed in color)}. The leftmost column shows two unlabeled images in the reference set, and the right columns show the segmentation results when the full reference set has been used for self-learning and when half of the reference set has been used. Segmentation accuracy is significantly improved when the reference set does not contain the test case. The \textcolor{red}{red}, \textcolor{green}{green}, and \textcolor{yellow}{yellow} masks indicate the true label, prediction, and overlapping region, respectively.}

\caption{Visualization of the improvement on the reference and test sets (best viewed in color). For each case, the second to the last row show the results produced by the self-learning baseline, STSO, and ATSO, respectively. In each pair, the left and right sides of the arrow are the outputs of an intermediate and the final generations. We show typical 2D slices that reflect the difference, while the DSC numbers in the bottom-right corner are computed in the entire 3D volume. The \textcolor{red}{red}, \textcolor{green}{green}, and \textcolor{yellow}{yellow} masks indicate the true label, prediction, and overlapping region, respectively. Please also zoom in to see the white dashed circles that mark the regions with significant accuracy gain.}

700 — 2006.13463

\caption{The RL-based framework for active learning on GNNs. \textcolor{blue}{Blue} and \textcolor{orange}{orange} nodes represent unlabeled and labeled nodes in the training set respectively. For simplicity, we omit the validation nodes and test nodes. In the policy network $ \pi $, each column represents a layer of GNN, and the graphs in each column correspond to the feature aggregation on different nodes.}

701 — 2006.13465

\caption{ Radio spectra of V404 Cygni in the quiescent (phase 1; solid line) and outburst phases (phase 2\&3: dotted and dashed lines, respectively) at 1.4\, GHz. Each color indicates radio spectra obtained in different phases. Filled green triangle shows an upper limit for 1.4\, GHz radio flux density in phase 1. Additionally, filled\textcolor{black}{red} diamond shows 0.34\, GHz radio flux density, but it was obtained more than 10 hours later compared to other radio flux densities in phase 2.}

702 — 2006.13572

\caption{Schematic representation of a piezo-stepper actuator showing the clamp (`C') and shear (`S') elements of the first (\protect\redlinet, \protect\orangeline) and second (\protect\blueline, \protect\purpleline) group.}

\caption{The waveforms for clamps 1 (\protect\yelline) and 2 (\protect\purline) contain regions where both clamps could be in contact with the mover, indicated in gray. In these regions the inputs for the shear elements 1 (\protect\redline) and 2 (\protect\blueline) have equal derivatives.}

\caption{Disturbances for a piezo-stepper during open-loop walking with drive frequencies \SI{20}{\hertz} (\protect\blueline), \SI{25}{\hertz} (\protect\redline), \SI{30}{\hertz} (\protect\yelline) and \SI{40}{\hertz} (\protect\purline). In the temporal domain (a) the sampling is equidistant (see zoom plot) but the disturbance is not repeating for different drive frequencies. In the $\alpha$-domain (b) the sampling is non-equidistant for varying drive frequencies, but the disturbances are similar.}

\caption{The position of the mover with standard shear waveforms (\protect\blueline) and $f_\alpha = 30 \, \si{\hertz}$ deviates from the reference (\protect\blackdash). The enhanced shear waveforms of iteration 17 compensate the disturbance such that the position (\protect\redline) approximates the reference. }

\caption{Comparison between the sampled error signal (\protect\blackdash) and fits using 30 inverse quadratic radial basis functions and 1000 (\protect\blueline) or 50 (\protect\redline) samples.}

\caption{Waveform enhancement using a learned disturbance-compensating input signal. Regions where both groups could be in contact with the mover are indicated in gray. A compensating input signal (\protect\blackline) is divided into two inputs for the shear elements 1 (\protect\reddash) and 2 (\protect\bluedash). These inputs are added to the standard waveforms, resulting in enhanced waveforms for shear elements 1 (\protect\redline) and 2 (\protect\blueline).}

\caption{Error signal during a step for iterations 0 (\SI{30}{\hertz}, \protect\bluedashdot), 3 (\SI{30}{\hertz}, \protect\blueline), 7 (\SI{35}{\hertz}, \protect\redline), 11 (\SI{25}{\hertz}, \protect\yelline) and 15 (\SI{28}{\hertz}, \protect\purline). Between iterations 0 and 15 the RMS value of the error is reduced by a factor 35.}

\caption{Convergence of the RMS value of the error during an open-loop walking experiment with $W_e = 1$, $W_u = 0$ and $W_{\Delta u} = 4.7\times 10^{-17}$. Subsequent drive frequencies: \SI{30}{\hertz}(\protect\bluedot), \SI{35}{\hertz}(\protect\reddot), \SI{25}{\hertz}(\protect\yeldot), \SI{28}{\hertz}(\protect\purdot), \SI{22}{\hertz}(\protect\gredot), \SI{20}{\hertz}(\protect\bluedott).}

703 — 2006.13607

\caption{Three paths, two ( \textcolor{cyan}{\rule{0.7em}{.7em}--\rule{0.7em}{.7em}} % \textcolor{cyan}{\rule{0.7em}{.7em}} and \textcolor{green}{$\blacktriangle-\blacktriangle$} % \textcolor{green}{$\blacktriangle$} ) of which connect a net (two pins in small red squares \textcolor{red}{\rule{0.5em}{.5em}}) while one ($\blacklozenge-\blacklozenge$) of which does not. }

704 — 2006.13707

\caption{\footnotesize WBCC predictions issued by a standard RNN (\rxdash) for one patient over time, along with the BJ-based confidence intervals (dashed limits \rxdashed). An antibiotic is safely administered~(red~circle) on a given day if the upper confidence limit crosses the Leucopenic range, and the lower limit is above the Leucopenic range.}

705 — 2006.13856

\caption{Horizontal flow \\ \colorbar{-50}{50}}

\caption{Vertical flow \\ \colorbar{-50}{50}}

\caption{Uncertainty \#1\\ \colorbar{0}{1.5}}

\caption{Uncertainty \#2\\ \colorbar{0}{15}}

\caption{Fusion network architecture for inferring the optical flow (from FlowNet~2.0), showing the two different approaches for applying Monte Carlo dropout for quantifying the variability/uncertainty in the output. The network layers that these dropout options affect are shown with dashed black lines. The fusion network is only the final part of FlowNet~2.0 as seen in Fig.~2 in \citep{Ilg+Mayer+Saikia:2016}. The input to the fusion network has a size of $11 \times N_1 \times N_2$, where $11$ is the number of channels and $N_1 \times N_2$ is the native input image resolution. The 11 channels in the input of the fusion network contain one channel of both large and small displacement flow magnitudes, two channels of both large and small displacement flows, one channel of both large and small displacement brightness errors and three channels of original input frame~1. The numbers inside rectangular blocks represent the number of channels and pixel dimensions of intermediate results within the fusion network. Green (\protect\tikz[anchor=base, baseline]\protect\draw[->, draw=mycolor1, line width=2, yshift=2pt] (0,0) -- (0.7,0);) and red (\protect\tikz[anchor=base, baseline]\protect\draw[->, draw=mycolor2, line width=2, yshift=2pt] (0,0) -- (0.7,0);) arrows represent convolution and de-convolution layers in the network. Purple arrows (\protect\tikz[anchor=base, baseline]\protect\draw[->, draw=mycolor3, line width=2, yshift=2pt] (0,0) -- (0.7,0);) and circles represent channel-wise concatenation. The output has two channels in the native input image resolution, representing the horizontal and vertical optical flows. \cref{fig:flow} shows an example pair of input frames, example horizontal and vertical flow outputs and example uncertainty estimates for both the Monte Carlo dropout options.}

706 — 2006.13877

\caption{Results of 5-fold cross validation of different transfer learning strategies under different non-COVID19 lung lesion pre-trained models. The best results are shown in \textcolor{red}{\textbf{red}} font.}

\caption{Results of Average Sensitivity, F1-score and Accuracy of different transfer learning strategies under different non-COVID19 lung lesion pre-trianed models. The best results are shown in \textcolor{red}{\textbf{red}} font.}

\caption{Results of 5-fold cross validation of different transfer learning strategies based on Multi-lesion pre-trained model. \textbf{Time} represents the training time for an epoch. The best results are shown in \textcolor{red}{\textbf{red}} font.}

707 — 2006.14007

\caption{t-SNE~\cite{MaatenHinton08a} visualizations of Cantonese phone embeddings from \textbf{unseen} models supervised with distinctive features (left) and phones (right). \textcolor[rgb]{0,0.5,0.7}{Blue} phones appear in other languages; \textcolor[rgb]{0.87,0.56,0.02}{orange} phones are unique to Cantonese.}

708 — 2006.14032

\caption{NetDissect \cite{bau2017network} assigns unit 106 the label {\color{darkred} \texttt{bullring}}, but in reality it is detects general sports fields, except football fields, as revealed by the {\color{darkblue} \textbf{length 3}} and {\color{darkgreen} \textbf{length 10}} explanations.}

\caption{Image classification explanations categorized by {\color{darkgreen} \textbf{semantically coherent}} abstraction (a--b) and specialization (c), and {\color{darkred} \textbf{unrelated}} polysemanticity (d). For clarity, logical forms are length $N = 3$.}

\caption{``copy-paste'' adversarial examples for vision. For each {\textcolor{darkred}{\textbf{scene}}}, the units that contribute most (by \textcolor{darkblue}{\textbf{connection weight}}) are shown, along with their explanations. We target the \textbf{bold} explanations to crudely modify an input image and change the prediction towards/away from the scene.}

\caption{``copy-paste'' adversarial examples for NLI. Taking an example from SNLI, we construct an {\color{darkpurple} \textbf{adversarial (adv)}} premise or hypothesis which changes the true label and results in an \emph{incorrect} model prediction (original label/prediction {\color{darkpurple} $\advarrow$} adversarial label/prediction).}

709 — 2006.14158

\caption{\color{Gray} %A comparison of changes in estimated epidemic growth rate near the onset of school closures. A change in growth rate has been considered as a response to school closures if (a) it occurs more than 5 days from the intervention date and (b) the effect persists for at least 5 days. Comparison of estimated lag time and pre- and post-intervention daily growth rates in different German states. Their equivalent formulation as doubling times can be found in the Supplementary Material. Note that the pre-response growth rate in Baden-W\"{u}rttemberg is influenced by a strong weekend effect. If the corresponding data points from the 22\textsuperscript{nd} and 23\textsuperscript{rd} of March are omitted from the fitting process, then the pre-response growth rate is 0.196 (0.179 - 0.213).\newline Similarly, a weekend effect is observed in North Rhine-Westphalia on the 21\textsuperscript{st} and 22\textsuperscript{nd} of March. If these data points are omitted from the fit, then the observed post-response growth rate is 0.115 (0.106 - 0.124). This yields a relative reduction in the post-intervention growth rate of 44\%.}

\caption{\color{Gray}Modelled and observed cases in Baden-W{\"u}rttemberg. Note that the strong weekend effect 5 and 6 days after school closure lead to artificially deflated values. As a result, it was necessary to fit the GP model to 6 days after closure.}

\caption{\color{Gray}Modelled and observed cases in Hesse.}

\caption{\color{Gray}Modelled and observed cases in Lower Saxony.}

\caption{\color{Gray}Modelled and observed cases in Rhineland Palatinate.}

\caption{\color{Gray}Modelled and observed cases in Sweden.}

\caption{\color{Gray} Confirmed cases of COVID-19 in staff (red) and students (blue) in schools, kindergartens, holiday camps, and other educational venues or institutions for under-18s. The exact age distribution of those tested is not known. Left shows daily new confirmed cases, and right shows the instantaneous growth rate (shaded regions are 95\% confidence intervals). Solid vertical lines indicate when students returned to school, and dashed lines indicate other loosened measures. In April and early May with small numbers of primary school or exam students returning, there was no notable difference between the incidence among students and staff. Accounting for the detection delay, the incidence among students was higher than that of staff following the return of more (and older) students on May 18\textsuperscript{th}.}

\caption{\color{Gray} Reported daily hospital admissions in Germany, excluding those working in education, front-line healthcare workers, carers, catering, and hospitality. These numbers indicate transmission in a general, average-exposure population. Left shows daily admissions, and right shows the instantaneous growth rate (shaded regions are 95\% confidence intervals). The large confidence intervals on the instantaneous growth rates do not allow one to conclude if, following the reopening of schools, the growth rate has continued to be negative, or whether it is approximately zero. This suggests that the return of younger (and exam) students did not significantly impact the general hospitalised population.}

\caption{\color{Gray} Reported daily hospitalisations in Denmark. New admissions are reported left, and right shows the instantaneous growth rate (shaded regions are 95\% confidence intervals). Given the small numbers of children with COVID-19 admitted to hospital for treatment, the return to schools will be seen in the following generations, implying longer delays until an effect might be observed. %The delay from infection to hospitalisation is 10 to 14 days, and is consistent with the 12 day delay observed from mass quarantine with the peak in daily admissions. Note that large confidence intervals on the growth rate are a result of the small number of hospitalisations at the end of May. Solid vertical lines indicate when students returned to school, and dashed lines indicate other loosened measures. }

\caption{\color{Gray} Reported daily confirmed cases in Norway. New cases are reported left, and right shows the instantaneous growth rate (shaded regions are 95\% confidence intervals). Solid vertical lines indicate when students returned to school, and dashed lines indicate other loosened measures. These numbers are obtained over a time period of increased testing, so there is little reason to believe the case numbers to be higher than reported. Norway has a significantly large testing rate per capita. }

\caption{\color{Gray} The epidemic model used to simulate cases. The model uses multiple exposed compartments to account for an Erlang-distributed incubation period.}

\caption{\color{Gray} Daily testing from a subset of German testing laboratories during March. Weekends are highlighted in grey. There is a periodic drop in testing occurring on weekends, particularly evident on Sundays. These drops do not coincide with any changes to the positive test ratio. }

\caption{\color{Gray} Weekly testing in Germany remained consistent from March 18\textsuperscript{th}, however the weekend effect (see Figure \ref{DE_daily_tests}) was likely present across the entire period. There were no abrupt changes in the positive test ratio.}

\caption{\color{Gray} Weekly testing in Denmark was not consistent across the period of this investigation, and so confirmed cases up to April 20\textsuperscript{th} cannot be relied upon to provide a reliable representation of the underlying epidemic.}

\caption{\color{Gray} Norway saw inconsistent testing during March, making confirmed cases an inappropriate metric for assessing school closures. More consistent testing was apparent in April and May.}

\caption{\color{Gray} Reported weekly tests carried out in Sweden.}

\caption{\color{Gray} %A comparison of changes in estimated epidemic growth rate near the onset of school closures. A change in growth rate has been considered as a response to school closures if (a) it occurs more than 5 days from the intervention date and (b) the effect persists for at least 5 days. Comparison of estimated lag time and pre- and post-intervention doubling times in different German states. Note that the pre-response doubling time in Baden-W\"{u}rttemberg is influenced by a strong weekend effect. If the corresponding data points from the 22\textsuperscript{nd} and 23\textsuperscript{rd} of March are omitted from the fitting process, then the pre-response doubling time is 2.9 (2.7 - 3.3) days.\newline Similarly, a weekend effect is observed in North Rhine-Westphalia on the 21\textsuperscript{st} and 22\textsuperscript{nd} of March. If these data points are omitted from the fit, then the observed post-response doubling time is 6.0 (5.6 - 6.5) days.}

\caption{\color{Gray} Cumulative cases for Baden-W\"{u}rttemberg and North Rhine-Westphalia when corrected for the three-day shift in school closure between the two. The effective day of school closure in both states is shown in red, with the timings of other interventions which took place in North Rhine-Westphalia included for reference.\newline There is very good agreement between the two data streams despite the time difference in the school closure, suggesting comparable underlying transmission in the two states following school closure. Additionally, when considering the weekend effect occurring in Baden-W\"{u}rttemberg, the lag times are comparable between the two states. }

\caption{\color{Gray}Modelled and observed cases in Bavaria.}

\caption{\color{Gray}Modelled and observed cases in Berlin.}

\caption{\color{Gray}Modelled and observed cases in North Rhine-Westphalia.}

\caption{\color{Gray}Modelled and observed cumulative hospitalisations in Denmark. The GP model has been fitted 5 days after school closures, but is unable to adequately predict the trajectory of cases. As such, it is not possible to estimate any lag period for the response.}

\caption{\color{Gray}Modelled and observed incidence of hospitalisations in Denmark. The GP model incorrectly predicts the exponential growth seen in the period prior to closures to continue, making it difficult to identify a response to school closures.}

\caption{\color{Gray}Modelled and observed cumulative hospitalisations in Norway. The model is able to reasonably predict the trend in cases for around 10 days after school closures, however the confidence in this prediction is very low.}

\caption{\color{Gray}Modelled and observed hospitalisations in Norway. The GP model is less effective when dealing with the incidence data, failing to account for any points after the assumed lag period of 5 days. As such, it is impossible to estimate post response growth rates in Norway.}

\caption{\color{Gray} Reported daily confirmed cases in Denmark. New confirmed cases are reported left, and right shows the instantaneous growth rate (shaded regions are 95\% confidence intervals). Solid vertical lines indicate when students returned to school, and dashed lines indicate other loosened measures. We present these numbers in support of the observations made for daily hospital admissions due to the larger numbers of cases recorded here. These results are not qualitatively different from those obtained from hospitalisation data, but support the conclusions which are harder to draw from that data set due to the longer delay from infection to hospitalisation. }

\caption{\color{Gray} A comparison of tests carried out among staff working in different stages of the Danish educational and childcare sector dated June 2\textsuperscript{nd}. We indicate the proportion of tested staff relative to estimated employee numbers in each group, and the percentage of those tested who test positive. For reference, the absolute numbers of tests are also shown.}

710 — 2006.14462

\caption{Design of a phononic topological insulator using smooth topological indicators. (a) Material design made by repeating a 3-material unit cell. (b) Dispersion relation for the system in a. The red region denotes the gap selected for optimization. (c) Multiband Berry phase as a function of the unit cell parameters $E_b$ and $E_c$. The red line shows a gradient-descending optimization trajectory starting from a symmetric, trivial phase and ending in a symmetric, topological configuration. (d) Finite Element Method simulation (COMSOL\textregistered Multiphysics) of a localized mode at the boundary between a trivial and a topological material \cite{SupMat}, corresponding respectively to the start-point and endpoint in c. The domain wall is placed at $x=0$. The inset shows the localized mode at the two unit cells adjacent to the domain wall. The moduli $E_A$, $E_B$ and $E_C$ corresponding to the trivial phase have been scaled by a proportionality constant to ensure overlapping band gaps. Except when otherwise indicated, the system parameters are $\rho=2704\,kg/m^3$, $E_A=2\,GPa$, $E_B = E_C = 6.5\,GPa$, $W=10\,mm$ and $W_A=0.6759\,W$.}

711 — 2006.14796

\caption{Sample rollouts with proxy human \protect\tikz\protect\draw[blue,fill=blue] (0,0) circle (.5ex);, immovable blocks \protect\tikz\protect\draw[gray,fill=gray] (0,0) rectangle (1ex, 1ex);, and goal at the star. Each frame consists of an action taken by the agent. Left column shows two cases of goal inference: ideal case (top) and failure mode with misspecified case (bottom) where the goal is blocked.}

712 — 2006.14856

\caption{Fooling ratio ($\%$) shown on the vertical axis for the perturbations crafted on ordinary classifier (source in \textcolor{red}{red}) and transferred to a retrained ordinary target model (black) and orthogonal target model (\textcolor{green}{green}). The model shown are trained on CIFAR-10 dataset. The horizontal axes show the step size ($\epsilon$) for perturbation computation varied from 0 to 8$\%$ of the image dynamic range.}

\caption{Fooling ratio ($\%$) is shown on the vertical axis for the perturbations crafted on ordinary classifier (source in \textcolor{red}{red}) and transferred to ordinary VGG-16 model (black) and orthogonal VGG-16 model (\textcolor{green}{green}). The horizontal axis show the step size ($\epsilon$), and is varied from 0 to 8$\%$ of images dynamic range. Each row reports the results of perturbations crafted on a pretrained ImageNet model as shown in blue font on the left hand side. }

713 — 2006.14911

\caption{ Uncertainty estimators as indicators of catastrophes on \texttt{CARNOVEL}. We collect 50 scenes for each model that led to a crash, record the uncertainty 4 seconds~\citep{taoka1989brake} before the accident and assert if the uncertainties can be used for detection. RIP's (ours) predictive variance (\textcolor{matlab-blue}{in blue}, cf. Eqn.~(\ref{eq:var})) serves as a useful detector, while DIM's~\citep{rhinehart2020deep} negative log-likelihood (\textcolor{matlab-orange}{in orange}) cannot be used for detecting catastrophes. }

\caption{ We evaluate different autonomous driving prediction methods in terms of their robustness to distribution scene, in the nuScenes ICRA 2020 challenge~\citep{phan2019covernet}. We use the provided train--val--test splits and report performance on the test (i.e., out-of-sample) scenarios. A ``$\clubsuit$'' indicates methods that use LIDAR observation, as in~\citep{rhinehart2019precog}, and a ``$\diamondsuit$'' methods that use bird-view privileged information, as in~\citep{phan2019covernet}. A ``$\bigstar$'' indicates that we used the results from the original paper, otherwise we used our implementation. \textcolor{black!50}{Standard errors} are in gray (via bootstrap sampling). The \textbf{outperforming} method is in bold. }

\caption{ We evaluate different autonomous driving methods in terms of their robustness to distribution shifts, in our new benchmark, \texttt{CARNOVEL}. All methods are trained on CARLA \texttt{Town01} using imitation learning on expert demonstrations from the autopilot~\citep{dosovitskiy2017carla}. A ``$\dagger$'' indicates methods that use first-person camera view, as in~\citep{chen2019learning}, a ``$\clubsuit$'' methods that use LIDAR observation, as in~\citep{rhinehart2020deep} and a ``$\diamondsuit$'' methods that use the ground truth game engine state, as in~\citep{chen2019learning}. A ``$\bigstar$'' indicates that we used the reference implementation from the original paper, otherwise we used our implementation. For all the scenes we chose pairs of start-destination locations and ran $10$ trials with randomised initial simulator state for each pair. \textcolor{black!50}{Standard errors} are in gray (via bootstrap sampling). The \textbf{outperforming} method is in bold. The complete \texttt{CARNOVEL} benchmark results is in Appendix~\ref{app:experimental-results-on-carnovel}. }

\caption{{\color{matlab-blue}Adaptive} Robust Imitative Planning}

\caption{ We evaluate different autonomous driving methods in terms of their robustness to distribution shifts, in our new benchmark, \texttt{CARNOVEL}. All methods are trained on CARLA \texttt{Town01} using imitation learning on expert demonstrations from the autopilot~\citep{dosovitskiy2017carla}. A ``$\dagger$'' indicates methods that use first-person camera view, as in~\citep{chen2019learning}, a ``$\clubsuit$'' methods that use LIDAR observation, as in~\citep{rhinehart2020deep} and a ``$\diamondsuit$'' methods that use the ground truth game engine state, as in~\citep{chen2019learning}. A ``$\bigstar$'' indicates that we used the reference implementation from the original paper, otherwise we used our implementation. For all the scenes we chose pairs of start-destination locations and ran $10$ trials with randomized initial simulator state for each pair. \textcolor{black!50}{Standard errors} are in gray (via bootstrap sampling). The \textbf{outperforming} method is in bold. }

714 — 2006.15067

\caption{Graph bipartitioning results for bespoke-QUBO and constrained-optimization formulations sampled by Mukai samplers compared to best previously known results. The results are black where equal to best results prior to this work, \textcolor{teal}{green} where Mukai results (cut size or diversity) are better, and \textcolor{red}{red} where Mukai results are worse.}

715 — 2006.15373

\caption{Time (in seconds) to process image pairs normalized by their size in megapixels. {\color{red} this is from the middlebury en kitti training set} }

716 — 2006.15762

\caption{Example of a "Crafting" world. The agent verifies a hypothesis (provided a text) about a causal relationship. Acting according a \textcolor{blue2}{learned policy}, the agent manipulates the observation to one that allows a learned \textcolor{orange}{predictor} to determine if the hypothesis is true. The learning of policy and predictor is aided by a pretraining phase, during which an intermediate reward signal is provided by utilizing hypotheses that factor into \{{\em pre-condition state}, {\em action sequence}, {\em post-condition state}\}. }

717 — 2006.16030

\caption{Multiwavelength light curve of \es during 2008-2020. {\it a)} and {\it b)} \gray flux and photon index computed for normal and adaptively time binning. {\it c} and {\it d} {\it Swift XRT} measured X-ray flux and photon index variation in time. {\it e)} {\it Swift UVOT} measured UV/optical fluxes in V, B, U, W1, M2, and W2 bands. {\it f)} The arrival time of HE photons from the direction of \es.}

\caption{Multiwavelength SED of \es for different periods. The red data corresponds to the \fermi spectrum averaged over 11.7 years, and the blue bowtie shows the spectrum during the hard emission period. The {\it UVOT} data in light blue corresponds to the highest flux in U, W1, M2 and W2 filters observed around MJD $58490$. The archival data from SSDC are in gray. VHE \gray data from the {\it VERITAS} observation are in light blue squares.}

718 — 2006.16143

\caption{A cut through the HDNN-PES in the vicinity of the minimum energy path to chemisorption. The H atom is constrained to lie directly above a C-atom. H$_z$ and C$_z$ indicate the distance of H and C, respectively, from the plane of the graphene sheet. The physisorption (\textcolor{red}{+}) and chemisorption (+) wells have depths of 9 and 657\,meV, respectively. The barrier to chemisorption (\textbf{$\times$}) has a height of 172\,meV.}

719 — 2006.16202

\caption{Plot of the behavior of the two proposed algorithms on the Limpet dataset. \algoalt has been repeated 100 times following a multi-start strategy and in two settings (\textcolor{plotblue}{$T$=20} and \textcolor{plotorange}{$T$=100}). Each point on the orange and blue lines reports the cumulative time and best objective found during these 100 restarts. Note that the objective values are to be multiplied by $1e^{-13}$.}

\caption{Plot of the behavior of the two proposed algorithms on the Facebook Comment Volume dataset. \algoalt has been repeated 100 times following a multi-start strategy and in two settings (\textcolor{plotblue}{$T$=20} and \textcolor{plotorange}{$T$=100}). Each point on the orange and blue lines reports the cumulative time and best objective found during these 100 restarts.}

\caption{Plot of the behavior of the two proposed algorithms on the Superconductivty dataset. The \algoalt has been repeated 100 times following a multi-start strategy and in two settings (\textcolor{plotblue}{$T$=20} and \textcolor{plotorange}{$T$=100}). Each point on the orange and blue lines reports the cumulative time and best objective found during these 100 restarts.}

\caption{Plot of the behavior of the two proposed algorithms on the YearPredictionMSD dataset. \algoalt has been repeated 100 times following a multi-start strategy and in two settings (\textcolor{plotblue}{$T$=20} and \textcolor{plotorange}{$T$=100}). Each point on the orange and blue lines reports the cumulative time and best objective found during these 100 restarts.}

720 — 2006.16300

\caption{Additional \aastex\symbols}

721 — 2006.16362

\caption{Comparison of the BLEU score on WMT17 English German translation task for an encoder-decoder transformer \citep{vaswani2017attention} using collaborate vs. concatenate heads with key/query dimension $D_k$. With collaborative heads, $D_k$ can be decreased by a factor of {\color{black!70} $\times 8$} without any drop in performance. }

\caption{ Performance on MRPC, STS-B and CoLA datasets of a \protect\includegraphics[height=\myheight]{figures/plot_legend_bert_cropped}~fine-tuned BERT-base model, \protect\includegraphics[height=\myheight]{figures/plot_legend_compressed_cropped}~decomposed with collaborative heads of compressed dimension $\tilde D_k$ (\emph{horizontal axis}). \protect\includegraphics[height=\myheight]{figures/plot_legend_finetuned_cropped}~Repeating fine-tuning after compression can make the model recover the original performance when compression was drastic. The \protect\includegraphics[height=\myheight]{figures/plot_legend_baseline_cropped}~GLUE baseline gives a reference for catastrophic failure. }

722 — 2006.16535

\caption{\red{Merge and Purge}}

\caption{\red{SpecBuffer}}

723 — 2006.16616

\caption{\highlightForReview{Long-running applications hardly complete} when MTBF is too small.}

\caption{Comparison between \highlightForReview{a} CP-dedicated threads scheme and a traditional scheme.}

\caption{\highlightForReview{Overhead reduction} with differential checkpoint for a certain scenario (2400 processes write 1 GB per process to the PFS). $n_d$ corresponds to the ratio of dirty data blocks to protected data blocks.}

\caption{Overhead introduced by \texttt{OpenCHK} \highlightForReview{compared to} using native FTI/SCR/VeloC}

724 — 2006.16644

\caption{PanColorGAN model: Architecture details for its Generator and Discriminator Networks are depicted. Two modes exist for Generator network: In the training phase, \textcolor{blue}{$X_{GMS}$} is provided along with $X_{MS}$ to generate \textcolor{blue}{$\hat{Y}_G$}. In the testing phase, \textcolor{red}{$X_{PAN}$} is provided along with $X_{MS}$ to generate \textcolor{red}{$\hat{Y}_P$}. Also during the training phase, Discriminator network gets two different types of batches. A real batch consists a concatenated set of $X_{GMS}$, $X_{MS}$ and $Y_{MS}$. A fake batch consists a concatenated set of $X_{GMS}$, $X_{MS}$ and $\hat{Y}_G$, as shown on the bottom right. }

725 — 2006.16676

\caption{Left: Separating isosurfaces of volumetric distributed triples ``UC all'' (behind white), ``DC spheres'' (between white and beige), ``DC pipes'' (between beige and green), and ``DC all'' (remaining volume on top). Right: The respective triple groups from the parameter set, giving rise to the separating isosurfaces, i.e.\@ ``DC all''~$\blacksquare$, ``DC pipes''~\textcolor{tube1}{$\blacksquare$}, ``DC spheres''~\textcolor{sphere1}{$\blacksquare$}, and ``UC all'' in white.}

726 — 2006.16719

\caption[]{Top: Resulting positioning error for the traditional RC (\tikz{\draw[mblue,line width=1.2pt] (0,2.5pt)--(10pt,2.5pt); \draw (0,0)}) and the spatial RC (\tikz{\draw[mred,line width=1.2pt] (0,2.5pt)--(10pt,2.5pt); \draw (0,0)}). Middle: corresponding disturbance signal. Bottom: positioning signal generating the disturbance. The individual periods are indicated with the white and gray areas.}

\caption[]{$2$-norm of the error normalized with the period length, for the traditional RC (\textcolor{mblue}{$\mathbf{\times}$}), and the spatial RC (\textcolor{mred}{$\mathbf{\times}$}) as function of the repetition number.}

727 — 2006.16806

\caption{ Overall framework of \textbf{uncertainty-aware multi-view co-training} (UMCT), best viewed in color. \textcolor{revision}{UMCT can be applied to either the semi-supervised learning (SSL) task or the unsupervised domain adaptation (UDA) task, both of which include an unlabeled and a labeled subset of data. The overall pipeline is described as follows.} The $n$ multi-view inputs of $\mathbf{X}$ are first generated through different transforms $\mathbf{T}$, like rotations and permutations, before being fed into $n$ deep networks with asymmetrical 3D kernels. A confidence score $c$ is computed for each view by uncertainty estimation and acts as the weights to compute the pseudo labels $\hat{Y}$ of other views (Eq. \ref{Eqn:pseudo-label}) after inverse transform $\mathbf{T}^{-1}$ of the predictions. The pseudo labels $\hat{Y}$ for unlabeled data and ground truth $Y$ for labeled data are used as supervisions during training. %These pseudo labels are used as supervision for unlabeled data to train these deep networks, while the ground truth $Y$ is used for labeled data. }

\caption{ 2D visualizations for one example of NIH pancreas segmentation dataset 10\% labeled data setting. \textcolor{revision}{The first row is the supervised baseline and the second row is the prediction after our 3-view co-training.} DSC scores are largely improved. Best viewed in color. }

\caption{ Ablation studies on backbone structures (3 views UMCT). "Params" is short for parameters and "MACs" is short for multiply–accumulate operations. "10\% Sup" means supervised training with 10\% labeled data. \textcolor{revision}{A Wilcoxon signed-rank test reveals significant improvements ($p << 0.01$) of our 3D ResNets over V-Net in the last column, illustrating our asymmetrical design is beneficial for our co-training method.}}

\caption{ Experimental results for semi-supervised learning on a multi-organ dataset \textcolor{revision}{under four fold cross-validation}. "lab" is short for "labeled" and "unlab" is short for "unlabeled". \textcolor{revision}{Supervised results (first row) uses 100\% labeled training data in the training set, which is the upper bound but requires 100\% annotation. 10\% lab means we only use 10\% training data with annotation for supervised training. 10\%lab + 90\% unlab (ours) means we use 10\% labeled data and 90\% unlabeled data for our co-training method.} \textcolor{revision}{Results are reported via 4-fold cross-validation. Numbers in \textbf{bold} indicate significant improvement over supervised counterparts by Wilcoxon signed rank tests ($p<<0.01$).} }

728 — 2006.16830

\caption{ A sample beacon RSSI elaborated by argmax (\textbf{a.}\\&\textbf{b.}), sliding window (\textbf{c.}\\&\textbf{d.}), and machine learning (\textbf{e.}\\&\textbf{f.}) approaches. The left column reports the max of RSSI for the argmax and sliding window approaches (antennas located in the same room are labeled with same color but different marker), and the maximum among the rooms probabilities for the machine learning approach. The right column reports the corresponding reconstructed trajectories as a sequence of rooms. Not-detected statuses are marked by green crosses ({\color{ForestGreen}$\times$}). }

\caption{ Distribution of mutual distances between trajectories. The $x$-axis is normalized w.r.t.\the longest measured distance. The mean pairwise distance$\mu$ is reported in red while the shaded area denotes the range $\mu \pm \sigma$ ($\sigma$ being the standard deviation of the distribution). }

\caption{ \textbf{a.} Most common trajectory and, \textbf{b.}, distribution of the distances between such trajectory and all the others. The visitor performs a circular visit following the room numbering in the main floor, then they reach the Pinacoteque upstairs. \textbf{c.}\\&\textbf{d.} Analogous plot for the least common trajectory in our dataset. The visitor enters the museum via the Pinacoteque, then they visit the main floor twice, once clockwise and once counterclockwise. $x$-axis in \textbf{b.}\\&\textbf{d.}\is normalized w.r.t.\the longest measured distance among all the trajectories.% }

\caption{ \textbf{a.} Sample of measured trajectories. \textbf{b.} Distribution of the distances between the trajectory marked by ``blue plus signs'' ({\color{Cerulean}$+$}) in \textbf{a.} and all the others. Distances are reported in percentage w.r.t.\the longest distance measured. Trajectories closer than$0.025\%$ (yellow bin) likely belong to the same group of visitors; Trajectories closer than $0.15\%$ (red bins) are slightly time shifted; In trajectories closer than $0.30\%$ (purple bins) relations are still identifiable; Trajectories farther away than $0.30\%$ (green ones) are completely unrelated. We report in \textbf{a.} one random trajectory from each percentile set (color area) in \textbf{b.} }

\caption{ Cumulative hazard function associated to the Weibull distribution of the whole museum. Empirical values are calculated with Kaplan-Meier method. \textbf{a.} Without censoring, cf.\Figure\ref{F:wei}\textbf{d.}, and \textbf{b.} after censoring the last 5 minutes of visit (new parameters are $k_{*}=3.5$ and $\lambda_{*}=596$). This method allows us to get a better fit of the real distribution between 0 and 2h, i.e.\the visit interval. The uncensored fit, instead, is negatively influenced by the forced exit.}