Below is a short script that downloads and makes a PDF from the image files. No browser required.
The script uses a feature of HTTP/1.1 called pipelining; proponents of HTTP/2 and HTTP/3 want people to believe it has problems because it does not fit their commercialised web business model.
As demonstrated by the script below, it has no problems.
It's a feature that simply does not suit the online ad industry-funded business model with its gigantic corporate browser, bloated conglomeration web pages and incessant data collection.
Here, only 2 TCP connections are used to retrieve 141 images.
Most servers are less restrictive and allow more than 100 requests per TCP connection.
Pipelining works great. Much more efficient than browsers which open hundreds of connections.
IMHO.
(export Connection=keep-alive
x1=http://www.minimizedistraction.com/img/vrg_google_doc_final_vrs03-
x2(){ seq -f "$x1%g.jpg" $1 $2;};
x3(){ yy025|nc -vvn 173.236.175.199 80;};
x2 1 100|x3;
x2 101 200|x3;
)|exec yy056|exec od -An -tx1 -vw99999|exec tr -d '\40'|exec sed 's/ffd9ffd8/ffd9\
ffd8/g'|exec sed -n /ffd8/p|exec split -l1;
for x in x??;do xxd -p -r < $x > $x.jpg;rm $x;done;
convert x??.jpg 1.pdf 2>/dev/null;rm x??.jpg
ls -l ./1.pdf
I make most HTTP requests using netcat or similar tcp clients so I write filters that read from stdin. Reading text files with the chunk sizes in hex interspersed is generally easy. Sometimes I do not even bother to remove the chunk sizes. Where it becomes an issue is when it breaks URLs. Here is a simple chunked transfer decoder that reads from stdin and removes the chunk sizes.
flex -8iCrfa <<eof
int fileno (FILE *);
xa "\15"|"\12"
xb "\15\12"
%option noyywrap nounput noinput
%%
^[A-Fa-f0-9]+{xa}
{xa}+[A-Fa-f0-9]+{xa}
{xb}[A-Fa-f0-9]+{xb}
%%
int main(){ yylex();exit(0);}
eof
cc -std=c89 -Wall -pipe lex.yy.c -static -o yy045
I tried this but ended up with gibberish in my terminal. Also couldn't find an explanation for -a on flex's man page. I've never used the thing before.
The extra "a" is a typo but would have no effect. The "i" is also superfluous but harmless. Without more details on the "gibberish" it is difficult to guess what happened. The space before "int fileno (FILE *);" is required. All the other lines must be left-justified, no leading spaces, except the line with "int main()" which can be indented if desired.
The "gibberish" is GZIP compressed data. "yy054" is a simple filter I wrote to extract a GZIP file from stdin, i.e., discard leading and trailing garabage. As far as I can tell, the compressed file "ee.txt" is not chunked transfer encoded. If it was chunked we would first extract the GZIP, then decompress and finally process the chunks (e.g., filter out the chunk sizes with the filter submitted in the OP).
In this case all we need to do is extract the GZIP file "ee.txt" from stdin, then decompress it:
Hope this helps. Apologies I initially guessed wrong on here doc. I was not sure what was meant by "gibberish". Looks like the here doc is working fine.
Need to get rid of the leading spaces on all lines except the "int fileno" line. Can also forgo the "here doc" and just save the lines between "flex" and "eof" to a file. Run flex on that file. This will create lex.yy.c. Then compile lex.yy.c.
The compiled program is only useful for filtering chunked transfer encoding on stdin. Most "HTTP clients" like wget or curl already take care of processing chunked transfer encoding. It is when working with something like netcat that chunked tranfser encoding becomes "DIY". This is a simple program that attempts to solve that problem. It could be written by hand without using flex.
Okay I'll give up for now. There are really no spaces in front of the lines. In pastebin if you check the raw version you'll see they are tabs. Which get stripped out because I added a `-` before eof. Providing the file manually to flex also produces the same gibberish for me.
sed -n '/pattern/=' file|yy092|sed -nf/dev/stdin file