Tee verb outputs the end of the chain (or at least some of it) #1671

holmescharles · 2024-10-04T17:46:04Z

I am having a little bit of trouble nailing down the exact issue here, but the upshot is that tee is misbehaving. I will try my best to provide examples that illustrate the issues

The tee verb does not have a description of what it is supposed to do, but my intuition is that it should emulate GNU tee and write a file with the state of the data at the point where its called. But it seems like its adding data from later in the chain.

For this call...

mlr tee -p cat then cat -n then nothing <<EOF
a=1,b=2
a=3,b=4
a=5,b=6
EOF

...the output is...

a=1,b=2
a=3,b=4
n=3,a=5,b=6

It looks like an n from the cat verb has been to the tee-ed output.

Other calls can add even more of the later data, e.g.:

mlr -o pprint tee -p cat then cat -n then nothing <<EOF
a=1,b=2
a=3,b=4
a=5,b=6
EOF

# output:
n a b
1 1 2
2 3 4

n a b
3 5 6

Not only does each record have the n added from cat, but there's a gap for some reason, too.

I don't know enough about mlr's implementation to hypothesize what the error is. Any help would be appreciated.

Thanks!

The text was updated successfully, but these errors were encountered:

aborruso · 2024-10-05T17:59:02Z

Hi @holmescharles what's your miller version? Using the last one, if I run

mlr -o pprint tee -p cat then cat -n then nothing <<EOF
a=1,b=2
a=3,b=4
a=5,b=6
EOF

I have

a=1,b=2
a=3,b=4
a=5,b=6

aborruso · 2024-10-05T18:08:51Z

About tee, it emulates GNU tee.

I have in example this input CSV

txt,value
Andy,45
Tom,87
Anna,8
Ralph,15

and I run this to exclude all rows where the txt field begins with "A"

mlr --from input.csv --csv filter '$txt=~"^[^A]"' | tee output.csv

I will have both in stdout and in output.csv this

txt,value
Tom,87
Ralph,15

And then I could add a standard grep

mlr --from input.csv --csv filter '$txt=~"^[^A]"' | tee output.csv | grep 'Tom'

to have

Tom,87

The output file remains unchanged.

holmescharles · 2024-10-07T18:02:57Z

I had 6.12, though I see 6.13 was released a day after I made this post. I just tried both versions and I get the same output as I reported in my original post.

Regarding your post, are you saying that tee is meant to be called at the end of a chain and never in the middle of a chain?

aborruso · 2024-10-07T19:11:49Z

Regarding your post, are you saying that tee is meant to be called at the end of a chain and never in the middle of a chain?

Wherever you want. If you have

txt,value
Andy,45
Tom,87
Anna,8
Ralph,15

and run

mlr --csv put '$s=1' then tee --ojson ./out.json then stats1 -a mean -f value input.csv

you get in stdout

value_mean
38.75

and you get the out.json file

[
{
  "txt": "Andy",
  "value": 45,
  "s": 1
},
{
  "txt": "Tom",
  "value": 87,
  "s": 1
},
{
  "txt": "Anna",
  "value": 8,
  "s": 1
},
{
  "txt": "Ralph",
  "value": 15,
  "s": 1
}
]

holmescharles · 2024-10-07T21:10:52Z

When I run mlr --csv put '$s=1' then tee --ojson ./out.json then stats1 -a mean -f value input.csv I get the same outputs as you, but look at the following:

mlr --csv put '$s=1' then tee --ojson ./out.json then put '$n=$s' then stats1 -a mean -f value input.csv

Standard out:

value_mean
38.75

out.json:

[
{
  "txt": "Andy",
  "value": 45,
  "s": 1
},
{
  "txt": "Tom",
  "value": 87,
  "s": 1
},
{
  "txt": "Anna",
  "value": 8,
  "s": 1,
  "n": 1
},
{
  "txt": "Ralph",
  "value": 15,
  "s": 1,
  "n": 1
}
]

It is still not clear to me if the addition of the "n" fields is expected or not.

aborruso · 2024-10-07T21:48:34Z

It is still not clear to me if the addition of the "n" fields is expected or not.

It seems to me a bug. You should not have the n field in the json. What do you think about @johnkerl ?

If you change output format (i.e. --otsv) you have a right output.

aborruso · 2024-10-07T21:50:56Z

It seems to work properly with rectangular output format (CSV, TSV)

holmescharles · 2024-10-07T21:53:48Z

No I don't. The header lacks the erroneous column, but the last two columns each have the extra value.

mlr --csv put '$s=1' then tee --ocsv ./out.csv then put '$n=$s' then stats1 -a mean -f value input.csv

out.csv:

txt,value,s
Andy,45,1
Tom,87,1
Anna,8,1,1
Ralph,15,1,1

aborruso · 2024-10-07T21:59:22Z

You are right

aborruso · 2024-10-07T22:04:30Z

Now that I see a CSV with a wrong structure I'm sure it's a bug

aborruso · 2024-10-27T18:02:12Z

Please @johnkerl ,as soon as you can, will you let us know if it looks like a bug?

Thank you

johnkerl self-assigned this Oct 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tee verb outputs the end of the chain (or at least some of it) #1671

Tee verb outputs the end of the chain (or at least some of it) #1671

holmescharles commented Oct 4, 2024

aborruso commented Oct 5, 2024

aborruso commented Oct 5, 2024 •

edited

Loading

holmescharles commented Oct 7, 2024

aborruso commented Oct 7, 2024 •

edited

Loading

holmescharles commented Oct 7, 2024

aborruso commented Oct 7, 2024

aborruso commented Oct 7, 2024

holmescharles commented Oct 7, 2024 •

edited

Loading

aborruso commented Oct 7, 2024

aborruso commented Oct 7, 2024

aborruso commented Oct 27, 2024

Tee verb outputs the end of the chain (or at least some of it) #1671

Tee verb outputs the end of the chain (or at least some of it) #1671

Comments

holmescharles commented Oct 4, 2024

aborruso commented Oct 5, 2024

aborruso commented Oct 5, 2024 • edited Loading

holmescharles commented Oct 7, 2024

aborruso commented Oct 7, 2024 • edited Loading

holmescharles commented Oct 7, 2024

aborruso commented Oct 7, 2024

aborruso commented Oct 7, 2024

holmescharles commented Oct 7, 2024 • edited Loading

aborruso commented Oct 7, 2024

aborruso commented Oct 7, 2024

aborruso commented Oct 27, 2024

aborruso commented Oct 5, 2024 •

edited

Loading

aborruso commented Oct 7, 2024 •

edited

Loading

holmescharles commented Oct 7, 2024 •

edited

Loading