0

I have used a text-from-image generator to pull info from a image of a glossary that look like this:

word1      :meaning1
word2      :meaning2
word3      :meaning3
word4      :meaning4

Which resulted in a text file looking like this:

word1
word2
word3
word4
:meaning1
:meaning2
:meaning3
:meaning4

I feel like this is a common occurrence when copying glossary-like texts from pdf documents as well. Is there any handy way to recreate the original text disposition? Preferably create columns where the words and meanings are linked, even better if it is without cells.

I guess I'm looking for a way to paste/attach the content of several rows to the ends of several existing rows.

The only solution I can think of is to paste everything into LibreOffice writer and choose columns, but that would only recreate the source document presentation-wise and there is no real use of that.

The question is:

How to change the extracted text into two columns so that it looks like it was in the picture i.e. like:

word1      :meaning1
word2      :meaning2
word3      :meaning3
word4      :meaning4

I would prefer GUI tools, but non-advanced CLI solutions are also appriciated.

3 Answers 3

1

Non-advanced CLI solutions with the pr command:

$ pr -T2 < file.txt
word1                               :meaning1
word2                               :meaning2
word3                               :meaning3
word4                               :meaning4

or with the rs command:

$ rs -t 0 2 < file.txt
word1      :meaning1
word2      :meaning2
word3      :meaning3
word4      :meaning4
2
  • pr command had an issue with dealing with too many lines of text, as it divided the output into separate "pages", resulting in jumbling of the text. rs -t 0 2 < file.txt worked when adding the flag -e to read every row as an entry rather than reading whitespace separated strings as entries. Apparantly my text mass had whitespace in it that I did not represent in my example.
    – Kris
    Commented Sep 30, 2023 at 7:55
  • Excellent answer. Some explanations of what the options do would make the answer even better. -s option in pr to adjust the spacing is worth mentioning too.
    – dhm
    Commented Jun 21 at 7:21
0

Use an advanced text editor or IDE such as Geany.

E.g. with Geany you can edit text vertically maintaining Ctrl while selecting it with your mouse.

Once selected "vertically" just paste it at the end of your first line, then press tab as many time as you want, depending on the space you need between columns.

0

Mid-advanced CLI solution with awk:

$ cat file
word1
word2
word3
word4
:meaning1
:meaning2
:meaning3
:meaning4

Either based on RegEx patterns i.e. leading colon ^: or no leading colon ^[^:] (assuming that is consistent) like so:

$ awk '/^[^:]/ {
        wrd[i++] = $0
}

/^:/ {
        def[j++] = $0
}

END {
        for (k = 0; k < i; k++) {
                printf "%s\t%s\n", wrd[k], def[k]
        }
}' file
word1   :meaning1
word2   :meaning2
word3   :meaning3
word4   :meaning4

Or based on line numbers i.e. splitting the file into first half and last half like so:

$ awk '{
        lines[NR] = $0
}

END {
        k = NR / 2 + 1
        for (j = 1; j <= NR / 2; j++) {
                printf "%s\t%s\n", lines[j], lines[k++]
        }
}' file
word1   :meaning1
word2   :meaning2
word3   :meaning3
word4   :meaning4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .