[Bug]: Searching hyphenated words or phrases continued in the next line doesn't work #700

jdujava · 2024-12-10T21:01:19Z

Is there an existing issue for this?

I have searched the existing issues

Problem

Searching hyphenated words or phrases continued in the next line doesn't work.

Steps to reproduce

See the attached file: zathura-search-hyphenated.pdf

Try to search for suggested words/phrases. Occurrences that are being hyphenated/broken/continued to next line are not found.

Expected behavior

Zathura should find all occurrences. It should understand when hyphen/dash is followed by a newline, that it is one word. Similarly, concerning the searching, newline at the end of line should be equivalent to a space.

zathura version (zathura --version)

zathura 0.5.9

girara version (zathura --version)

girara 0.4.5

zathura backend

poppler

alerque · 2024-12-11T08:13:39Z

The way PDFs are constructed (including lots of possible variance) it is not always possible to deduce this information. It is possible for the PDF creator to embed information that could be used for this purpose, but most PDFs are constructed in a way that doesn't make it as simple as your issue report suggests "newline at the end of a line" is just not a thing, nor is "hyphen followed by a newline". At least not in any universal sort of way. The shaping and positioning of each letter or batch of letters is all done in advance and absolute positions or relative offsets on the page are recorded, but there is not concept of a "new line". The code for a subscript (that happens to be offset below the previously output characters) is going to look similar to the code to go to a new line, it is just a new x/y position. One can try to guess based on whether both the y position goes lower and the x position is reduced, but this guessing can and does also go very badly with some PDF constructions.

jdujava · 2024-12-11T08:57:29Z

Sure, I agree, in complete generality it is probably more difficult than I made it sound.

However, as is the case with the attached PDF (and virtually with any other PDF I have tried the following), selecting/copying the text in Zathura over the newline also includes the newline (when I paste it somewhere, it includes also the newline at "correct" position).

When I paste it in Zathura search box, it looks like

but it still can't find the text I copied.

Both browser PDF viewers and for example Evince handle this issue generally correctly (though I am not saying that some special PDFs be weird).

jdujava added the bug Something isn't working label Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Searching hyphenated words or phrases continued in the next line doesn't work #700

[Bug]: Searching hyphenated words or phrases continued in the next line doesn't work #700

jdujava commented Dec 10, 2024

alerque commented Dec 11, 2024

jdujava commented Dec 11, 2024

[Bug]: Searching hyphenated words or phrases continued in the next line doesn't work #700

[Bug]: Searching hyphenated words or phrases continued in the next line doesn't work #700

Comments

jdujava commented Dec 10, 2024

Is there an existing issue for this?

Problem

Steps to reproduce

Expected behavior

zathura version (zathura --version)

girara version (zathura --version)

zathura backend

alerque commented Dec 11, 2024

jdujava commented Dec 11, 2024