4

I have two huge files (150G each) and I need to use a tool for which I should supply them as a single file (since the tool only accepts one file). However, I do not want to merge these files for several reasons, but I cannot pipe them using something like <(cat file1 file2) or myfile=$(cat file1 file2) because the script uses the path of the input file, not its content.

So I would need something like the following:

alias myfile = "cat file1 file2"

So that using the following command would work:

tool_x --file /path/myfile 

I already tried this mentioned command, but it didn't work.

I would just need to be able to treat the result of a "cat" command as an actual file, with the possibility to accessing this file using a path.

Is it possible to achieve something like that?

7
  • 1
    That looks like an XY problem: meta.stackexchange.com/questions/66377/what-is-the-xy-problem
    – FedKad
    Commented Dec 19, 2022 at 12:43
  • 3
    The answer depends on what tool_x is. That <(cat file1 file2) is a named pipe and sometimes works. Did it give an error? Commented Dec 19, 2022 at 12:46
  • 1
    Hello @Raffa, thank you for your answer. Unfortunately, I would not know how to explain it better, or which sample to bring. I would just need to be able to threat the result of a "cat" command as an actual file, with the possibility to access this file using a path. Commented Dec 19, 2022 at 12:49
  • 6
    What was the error, exactly. What else does the first tool do with the file. What is the second tool called. These all affect the answer. Not everything can use named pipes. You may just have to create a temporary file. Commented Dec 19, 2022 at 13:10
  • 2
    <(...) is a process substitution, not (necessarily) a named pipe. It's implemented using either /dev/fd or a named pipe, and I seem to recall named pipes are used only if /dev/fd is not available.
    – chepner
    Commented Dec 20, 2022 at 13:29

2 Answers 2

17

You could use a named pipe:

mkfifo /path/myfile
cat file1 file2 > /path/myfile &

Here, either the cat command has to be sent to the background, or you can run tool_x in another terminal, as cat will block until something starts reading from the pipe:

tool_x --file /path/myfile

This is essentially what process substitution is doing automatically for you.

9
  • Thank you, this also works perfectly. Commented Dec 19, 2022 at 15:18
  • 2
    I do not want to merge them mainly because, except for the current tool that I am using right now, the other tools that I am using on the same data need them to be supplied separately. So merging those files into one would force me to separate them again later, which is time consuming. If you are familiar with bioinformatics, I am working with [paired-end read](shttps://thesequencingcenter.com/knowledge-base/what-are-paired-end-reads/). Commented Dec 19, 2022 at 19:22
  • 2
    @EricDuminil effectively, yes. There's a buffer associated with pipe in memory. The writer can write until the buffer is full, and then it blocks. Something else reading from the buffer causes it to empty, and unblocks the writer of the pipe. Lines don't really come into play.
    – muru
    Commented Dec 20, 2022 at 9:02
  • 4
    @EricDuminil yes, just enough for the directory entry. Well, you can write to the pipe as long as someone's reading from it, but the problem is that you can't seek in the file (read to one position and jump to another one). The reading has to be sequential.
    – muru
    Commented Dec 20, 2022 at 9:56
  • 1
    @EricDuminil: It's exactly like a regular pipe like you'd get from cat | tool_x (except for what file descriptors it's opened as), with the named pipe acting as a rendezvous for two unrelated processes to get file descriptors to the pipe. (A buffer in the kernel). Via open(2) system calls, instead instead of the pipe(2) system call that gets fds for both ends in one process, and then typically forks+execs. Commented Dec 21, 2022 at 9:27
4

You can use a temporary file with mktemp like so:

myfile="$(mktemp)"
cat file1 file2 > "$myfile"
tool_x --file "$myfile"

Where $myfile will expand to an actual path like /tmp/tmp.Tg9Epuetsr

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .