Tech Notes

Nginx Too Big Header

Thomas — Thu, 08 Jun 2023 03:24:24 GMT

I recently encountered an issue where one of the pages for allwin.co resulted in a

502 Bad Gateway error

After looking at the nginx logs, I could see that it was due to that specific page processing ending in upstream sent too big header while reading response header from upstream.

I understood the issue, but, I couldn't figure out "why?". Why something happened on that page and not on others. Especially as things being simple, I didn't have to manipulate the http headers of the response at all.

I quickly found an answer as how to fix, adding the following lines to nginx configuration:

location ~ \.php$ { 
	# ...
	fastcgi_buffers 16 16k;
	fastcgi_buffer_size 32k;
    #...
}

But still, why? why is it happening?

I could update the configuration blindly, but, acting on the production environment, I like to understand all configuration changes. As such, back to the question : why is it happening?

I found my answer looking at the headers from a different page. One that worked. The Link header ... I started the project using Laravel Breeze with Inertia, React and SSR enabled. By default, all js assets are broken down into their own files. And a link to each component file was present in the header... And for that specific page, it meant a long list bringing the header length over the limit.

There it is, the reason why ... so, configuration updated and I know why.

If you are in the same context, you may now know why as well. If you have the issue, look at the headers of a page that works what may make it too big in a different context.

Hacking Mobile App APIs for Automation

Thomas — Tue, 21 Feb 2023 10:51:37 GMT

As mentioned in another article, Website As List, I like to improve some of my activities building automations.

Recently, I have looked into hacking mobile apps and APIs for fun, let's dig deeper.

Disclaimer: As with everything I do, I do it, the 'nice' way. I am not trying to bring anything down or cheat the games or ... What I want to do is reuse some of existing things to build tools to enhance my personal experience. All that in an ethical way.

Context

While the context is specific to what I wanna do, the techniques used can be applied to other ones too.

During my spare time, for fun, I like to use the Panini Dunk mobile app. It is basically collecting basketball cards in an app. It is not really useful in any way, but it helps spend time on thing I enjoy.

However, there are some problems with it... Or things I don't necessarily like.

The main one is the UX. It is too slow, necessitate to many taps, isn't flexible to make it nice and there is no web app to make the experience better out of the phone.

Also, I am collecting / trying to gather Jason Kidd cards there, as my physical cards is as well, and ... I would rather not loose it if the app disappear at some point... (https://saasify.work/platform-risk/).

So, I started looking into it, could I access my account data and build some personal page and tools from it. Turns out, it is possible.

Tools needed

I strongly recommend using Postman (https://www.postman.com/) an API client to help with most of the work.

Execution

What do I want accomplished

The most basic thing I want is two things:

A page that can show my collection
An script that can alert me when a new Jason Kidd card is put on for auction

Understanding how it works

The very first step in order to do any hacking work, is to understand what are the various components and how they work together.

In the case of the Panini app, it is rather simple. There is the mobile app on one side, which is what the user is interacting with and seems to be built in Unity, and then there is a server side component that host the data. The mobile app <> server side exchanges are done through a rest API using json.

Different calls going to different host. This understanding comes from the following section.

Capturing the calls

First, given the context and the mechanics of the game, it was safe to assume that there were calls made between the app and servers somewhere. And given the tech context those days, it was also safe to assume that it was likely done through standard http calls.

The first step then to understand how it works was to see what get exchanged between the mobile app and the server. The easiest way to do that is to use Postman's proxy and capture the request.

This involves the following steps:

Setting up and enabling the proxy in Postman
Configuring my phone's wifi connection to use Postman's proxy (and accepting the SSL certificates)
Starting the capture in Postman
Use the app to get to the content I want to see/access

When you do that, if everything is setup as expected, and the app indeed uses http calls, you should be able to use the app as if there was no proxy, and all the request should start to appear in Postman.

Thankfully, my assumptions were correct and I can see all the calls I need

Understanding calls and what you can do

The next step, then is to understand what happens, when and how.

One aspect is to go step by step in the mobile app and see what calls get made. It becomes easy as it is as if you were using any other rest API. It gives you the context in which things happen. With that and the API request in Postman (URL, body, ...) you should start having a good idea of how it works.

Another aspect is to start analyzing and tweaking the calls you are interested in in Postman and see how the server responds. Few things of interests are, the URLs and request used (GET, POST, ...) , how (if any) is authentication done, is there any request signature involved, any pagination ? , ... Anything that can get you from the request the app did to what you wish the api can give you.

When you understand the API, you will have an idea of what's possible, can you achieve what you want and how, or possibly other ideas.

Building pages and tools

Unfortunately, the Panini API used is quite secure, it uses jwt tokens with encryption, but also requires a valid request signature to return data. That's quite annoying as it is hard to guess the key used.

For now, that means it reduces my ambitions and it means that my tools can't be too generic and I can't share or make them available to other if I want to. It's fine, I will stick with having those for me for now.

To build the tools, you can take advantage of the API call to source code feature in Postman. It can give a starting point. And then, it becomes like any other programming script. Get the data, and do something with it.

Here is my auction alert script, using pushover for notifications.

 "https://api.pushover.net/1/messages.json",
        CURLOPT_POSTFIELDS => [
            "token" => $pushoverAppToken,
            "user" => $pushoverKey,
            ...$notificationData
        ],
        CURLOPT_RETURNTRANSFER => true
    ]);
    curl_exec($curl);
    curl_close($curl);
}

function auctionDescription($anAuction)
{
    return join(" ", [
        $anAuction->card->collection_name,
        $anAuction->card->group_name,
        empty($anAuction->card->card_limit) ? "" : "/". $anAuction->card->card_limit
    ]);
}

function callPaniniAPI() {
    global $paniniAppId, $paniniBearerToken, $paniniNonce, $paniniSignature;

    $body = (object) [
        "attributes" => (object) [
            "level" => 0,
            "keyword" => "Jason Kidd",
            "albumNames" => [],
            "teams" => [],
            "collections" => "",
            "collectionIds" => [],
            "groups" => [],
            "positions" => [],
            "sortBy" => 1,
            "sortOrder" => 1,
            "is_featured" => false,
            "is_L_N" => false,
            "filter_card_type" => 0,
            "usr_crds_cnt" => 0,
            "lock_type" => [],
            "exclude_card_ids" => [],
            "src_typ" => "",
            "filterBy" => 0,
            "safe_cards" => true,
            "from_trade" => false
        ]
    ];

    $bodyStr = json_encode($body);

    $headers = [
        'host' => 'userinventory.panininba.com',
        'content-type' => 'application/json',
        'x-unity-version' => '2020.3.12f1',
        'accept' => '*/*',
        'authorization' => "Bearer $paniniBearerToken",
        'nonce' => $paniniNonce,
        'app_version' => '2.3.0',
        'accept-language' => 'en-US,en;q=0.9',
        'accept-encoding' => 'gzip, deflate, br',
        'signature' => $paniniSignature,
        'appid' => $paniniAppId,
        'content-length' => strlen($bodyStr),
        'user-agent' => 'NBADunk/262 CFNetwork/1402.0.8 Darwin/22.2.0',
        'connection' => 'keep-alive',
        'os_version' => 'iOS 16.2',
        'os_type' => 'iOS'
    ];

    $curl = curl_init();

    curl_setopt_array($curl, array(
        CURLOPT_URL => 'https://userinventory.panininba.com/v5/auction?featured=false&l=30&t=0',
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_ENCODING => '',
        CURLOPT_MAXREDIRS => 10,
        CURLOPT_TIMEOUT => 0,
        CURLOPT_FOLLOWLOCATION => true,
        CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
        CURLOPT_CUSTOMREQUEST => 'PUT',
        CURLOPT_POSTFIELDS => json_encode($body),
        CURLOPT_HTTPHEADER => array_map(fn($key, $value) => $key . ':' . $value, array_keys($headers), $headers)
    ));

    $response = curl_exec($curl);

    curl_close($curl);

    return $response;
}

//endregion

$response = callPaniniAPI();

// === Process the response
if ($response === false) {
    pushoverNotify(["message" => "DUNK - Failed to retrieve auction data", "priority" => 1]);
    die();
}

$responseObj = json_decode($response);
if ($responseObj->status !== 200) {
    pushoverNotify(["message" => "DUNK - Failed to retrieve auction data : {$responseObj->message}", "priority" => 1]);
    die();
}

$priorAuctions = [];
if (file_exists($cacheKnownAuctionsFile)) {
    $priorAuctions = json_decode(file_get_contents($cacheKnownAuctionsFile));
    unlink($cacheKnownAuctionsFile);
}

$allAuctions = $responseObj->data;
$currentAuctions = array_map(fn($anAuction) => $anAuction->_id, $allAuctions ?? []);
file_put_contents($cacheKnownAuctionsFile, json_encode($currentAuctions));


$newAuctions = array_diff($currentAuctions, $priorAuctions);
if (count($newAuctions) > 0) {
    printf("Notifying of %d new auctions\n", count($newAuctions));
    pushoverNotify([
        "title" => "DUNK - New auctions for Jason Kidd",
        "html" => 1,
        "message" => implode("\n", [
            count($newAuctions) . " new auctions",
            ... array_map(fn($anAuction) => "".auctionDescription($anAuction)."",
                array_filter($allAuctions, fn($anAuction) => array_search($anAuction->_id, $newAuctions) !== false)
            )
        ])
    ]);
} else {
    printf("No new auction\n");
}

Send pushover notification on new panini dunk auction

Access Page" class="kg-btn kg-btn-accent">Show Page

Going further

One step that I am still struggling on, and I will make an update when I solve the issue, is to generate the required signature for the request. I think I have identified how it is calculated but there may be an encryption key that I am still missing and would have to crack...

In order to do so, my first step what to get the android app .apk file and see whether I could reverse engineer it back to as close as the source code as possible, hoping to find something that looked like and encryption key, algorithm, ... anything to help. No success so far, but I am also in unknown territory and I need to learn how everything work there.

Hopefully I can find a way soon...

That would allow more advanced tools and automations to be built, especially to parameterize the API calls. I wouldn't have to rely on the proxy and making manual calls thourgh the app anymore.

Ideally though, Panini could either build a web app for it or give access to some API to make it easier. Unfortunately, with Fanatics getting the exclusive license for NBA cards (read more), I don't see it happening anytime.

Don't start a search for my local tld

Thomas — Thu, 26 Jan 2023 06:33:20 GMT

For local web development purposes, I use the Firefox web browser and laravel valet. It easily gives a local url to use for your project.

I configured the url to use the .loc tld, mainly because I prefer that to the other options, it reminds of localhost while being shorter.

One of my issue with Firefox has been that for those domains, unless I use valet secure to enable https, instead of opening my website, it would start a google search for the text.

Typing 'mevtho.loc' would result in a new google search for 'mevtho.loc'. That forced me to manually type the full URL, including http. So, 'http://mevtho.loc' would bring me to the website.

I have finally taken the time and figured out how to prevent this behaviour :

Head to firefox settings, using about:config in the URL bar
Type the following setting string : browser.fixup.domainsuffixwhitelist., replacing by what you want to use. In my case : browser.fixup.domainsuffixwhitelist.loc and set the value to boolean true.
Open a new tab and try your URL, it should now work as expected, not starting any google search.

Website as list

Thomas — Thu, 29 Sep 2022 08:24:32 GMT

Weird design, data spread around, cumbersome display, annoying pagination with few items and lots of pages, frequency of access, ease of filtering, ... are some of the lot of reasons why nowadays I like to view web content not as they are intended by the website they come from, but as a simple list.

That's why I have ended up building a collection of quick scripts and snippets that gets me what I want. Here are the strategies used.

💡

The code snippets come from various scripts and/or have been created only to illustrate my explanations. Because they are for private use, I build them as quickly as I can without necessarily caring much about improving them. The end result is what matters. Let me know if you would like to see a more formalized / standardized script.

Retrieving the data

How the website gets its content

The first step of the process of building your list is to understand how the data gets from the server to your browser to be rendered. Nowadays, there are usually two ways this happens. Either as the initial web page request, or as an asynchronous call later.

To determine how it is done, head to the network tab in your web browser and refresh the page.

The browser developer tools shows all the calls between your browser and the internet

Then head to the html and xhr tabs looking for the one that sends you the data you need. The developer tools allow you to preview the content response of each request. Often, you can expect it to be an XHR call when you see a small delay between when the page gets displayed on your screen and when the data gets rendered.

Reproducing the call

Once you know which call gets you the data you want, the first step is to be able to reproduce it manually, outside of the web browsing environment. For that, I use postman (https://www.postman.com) that helps test requests.

I won't get into much details here on how to reproduce the call as it is very dependent on the website. The main thing is that it is important to be able to reproduce the call outside of the web browser so that the request is as standalone as possible. The browser may set headers / cookies / ... automatically that you want to identify and reproduce later on.

💡

If the request is too complicated to reproduce, depends on some complicated authentication process, CSRF checks, cookies, ... for example, you can look into browser automation with tools like Selenium, PhantomJS, ... that allows reproducing what the website does. If that's not enough, tools like ScrappingBee offers a way to do it in a more efficient and reliable way (but are not free). Once you get the response, you can follow the next steps below.

The good thing with Postman is that once you get the call working, it is very easy to extract a working code from it. Click on the button on the sidebar to view a code you can reuse. I usually use "PHP - curl" as I am familiar with it, it works and is independent from any package.

The code looks more or less like that depending on what's needed to retrieve the page.

$curl = curl_init();

curl_setopt_array($curl, array(
  CURLOPT_URL => '',
  CURLOPT_RETURNTRANSFER => true,
  CURLOPT_ENCODING => '',
  CURLOPT_MAXREDIRS => 10,
  CURLOPT_TIMEOUT => 0,
  CURLOPT_FOLLOWLOCATION => true,
  CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
  CURLOPT_CUSTOMREQUEST => 'GET',
));

$response = curl_exec($curl);

curl_close($curl);
echo $response;

Postman gives a working code serving as the base of the script

Setting up a caching strategy

Once I get that code, the first thing I do is set up some caching, I want to be respectful of the website as much as possible.

💡

Depending on what you do, you may be hitting it with a load of requests in a very short time, please be considerate. Caching doesn't need much.

Here is what I use, it is not perfect, but it works and ensure I am not asking for too much from the server, especially when testing the building of the list.

";

// Building a file to store the response content from the URL name, here also using the date if I am expecting daily changes to the data.
$file = "cache/" . md5(date("Ymd") . $url). ".json";

// Then it is simply a matter of getting the content from the file 
// instead of calling the server if the file exists.
if (file_exists($file)) {
    $response = file_get_contents($file);
} else {
    $response = // Postman code to call server using $url
    
    // We create the file with the response we got from the server
    file_put_contents($file, $response);
}

Getting what you need

From a json response

That is the ideal scenario, the server gives you something you can directly interact with in a convenient way.

$content = json_decode($response);

And you are ready to interact with the data as you would with a normal php object.

From html

That is usually more annoying. The main reason being that the html contains a lot of of useless elements about the rendering of the page. The response was made to be processed by a web browser, not by your code. You'll have to look into making things work by inspecting and interacting with the DOM to build your own objects containing the data.

You may get issues with namespace. I managed to work around those in the past with :

function applyNamespace($expression)
{
    return join('/ns:', explode("/", $expression));
}

Wrapping each xpath query (that can be obtained through the developer tools) into a call to applyNamespace

$doc = new DOMDocument();
$doc->loadHTML($response);

$xpath = new \DOMXpath($dom);
$ns = $dom->documentElement->namespaceURI;
$xpath->registerNamespace("ns", $ns);
        
$element = $xpath->query($this->applyNamespace(""));

💡

Tip: Add libxml_use_internal_errors(true); to the top of your script to avoid random warnings when loading the HTML.

Retrieving multiple pages

One of the main reasons I build the list pages for myself is to not have to have to go through multiple pages manually to find the right one. So, in most cases, I need to also automate that pagination process. It is easy enough.

First step is to identify how to pagination is done on the website.

💡

Tip: When doing the step of reproducing the call above in postman, do it on page 2. It is very likely that once you have identified how the second page gets retrieved, getting the first one becomes a matter of replacing the values with 0 or 1.

Then move your calls into a loop.

💡

Same as with caching, be considerate of the server, if you need to retrieve more than a couple of pages, add some backoff when retrieve the data. A simple sleep(random_int(2, 12)) before curl_exec above won't slow you down much but would go a long way ensuring the server can stay available while also possible helping avoiding some detection mechanism that would block your access.

For the loop, you'll need an ending condition. There are multiple ways of handling it depending on what you want. Typically one of those :

End after a specific number of records or pages have been returned
End when the data returned doesn't match what you want anymore
End when a 404 Not found error code has been received
End when there is no more record to get

Most of the time I go with the last one as it is the most reliable one that gives all of the data. The first one is great if there are way too many records, and you can sort the data returned to what you need.

It usually gives something along the lines of :

$allRecords = [];
do {
    $url = "?page=$page";
    
    // Retrieve the data for $url as above (with the caching)
    
    // Whatever is needed to get the records into $recordsForCall
    $recordsForCall = [];
    
    $allRecords = array_merge($allRecords, $recordsForCall);
} while($recordsForCall > 0);

Viewing the list

At this stage, most of the work is done, all you need is a way to display that list. I use 2 different ways depending on what I want to do with it :

Display as html in the browser, if only to see the listing or if there isn't that much content
Export to CSV when I need to play with the data such as having some filtering or there are a lot of records / datapoints that make it not convenient to show in the browser.

As html

Goal being to just dump the data, It is usually quite raw like




",
                $row->column1, 
                $row->column2, 
                $row->column3
            ); 
        }
    ?>
    
    
    
        Column 1
        Column 2
        Column 3
    
    
    
    %s %s %s

Column 1	Column 2	Column 3
%s	%s	%s

Then, put on an accessible web server, I just open the page in the browser t access the list.

As spreadsheet

The easiest is to directly build a file from PHP, in the simplest case where you can assume all record have the same data, for example :

$output = "some_file.csv";
$columns = array_keys((array)$rows[0] ?? [])

// Open stream
$o = fopen($output, "w");

// Add header line
fputcsv($o, $columns);
foreach ($rows as $row) {
    // Add row
    fputcsv($o, lineCsv((array)$row));
}

// Close stream
fclose($o);

You can adjust $column to match what you want

Where lineCsv is

function lineCsv($row, $columns)
{
    $return = [];
    foreach ($columns as $column) {
        $return[$column] = $row[$column] ?? "";
    }
    return $return;
}

lineCSV ensures that the data always come in the same order provided $columns stays constant

I then run the script using php -f script.php that generate a file that can be opened in any spreadsheet software afterwards.

Going further

Using this techniques, it can be easy for you to add your own filters to the code to match exactly what you want, build your own aggregator from multiple website. Those steps are intended to give you an idea on how to proceed. It won't be a one size fits all. If you need something reliable, maybe build something using tools like scrapingbee that would offer both reliability on retrieving the data, but also a common interface you can build on.

💡

Using PHP and Laravel to write your script / code, there is now a faster way. You can copy the curl code from the developer tools and then use this tool : https://laravelshift.com/convert-curl-to-http to convert the command line to a code directly usable in Laravel.

Quickly add a favicon to your website

Thomas — Tue, 31 May 2022 12:41:24 GMT

Favicons... in my opinion, it is something that often gets overlooked but is an easy and quick way to make your website stand out and be more memorable.

What are they ? It is the icon that represents your website. It shows up in your browser's tab bar, helping users to quickly locate the tab you want to get to.

They can also show on iphone bookmark shortcuts on your users home screen.

Here are 2 ways to set an icon for your website.

1 - Favicon.ico

That's probably the easiest way, assuming you have a .ico file available. By default, browsers will try to access a file at the root of your domain named favicon.ico. If it exists, that will be your icon.

2- Page tags

That's where my preference is. Using tags allow for more flexibility in terms of files and options.

To achieve that, my preference is to use the website https://favicon.io/, a favicon generator. You first select the file that you want to use. It also offer generating the icon from an emoticon (as below). From that file, it is going to generate an archive file that contains your file in various sizes and format to add to your website.

The second step is to add the code in the header of your page or template.

The Smooth Commute icon

You can also play with favicons, changing it based on events happening on your website. A notification, message to your user could trigger updating to alert the user of an update.

Podyt

Thomas — Tue, 31 May 2022 03:30:18 GMT

==> https://github.com/mevtho/podyt

Last released is a full SaaS style application. Written with Laravel / PHP.

This was inspired by having to "watch" youtube videos that really didn't have anything visual to them. To make it short, youtube videos that had no point being watched.

Being an avid podcast listener, I quickly realised it would be nice to be able to have those videos as a podcast delivered straight to my phone, instead to have to deal with youtube hassle.

The way it works is by being able to follow youtube playlists and create an rss feed that can be added to any podcast player. Any new video added to the playlist gets added to the feed.

The app allows managing multiple feeds with multiple and is still missing basic features like deleting / managing the feed itself, but, it wasn't really needed. I manage the feed data from the podcast app.

HK Commute

Thomas — Mon, 30 May 2022 11:02:18 GMT

https://github.com/mevtho/commutehk/

Commuting ... For the past few months, one of my main activities has been to check when is the next / best bus to commute with.

The frequency is quite good as several routes work. The bus stop is right below the building. Reaching it takes 2 to 3 minutes.

However, it is nice to know when a bus comes. It is easier to adjust the timing and know if it is necessary to rush or not.

For a long time, I have been using the default application provided by the Hong-Kong transportation system. It works, kind of, it gives the next buses around, all the routes, and only the next buses. In short, finding the relevant information involves a lot of scrolling, clicking, ... through a non-intuitive interface. To add to the matter, it displays some ads and force a click to access the list. Small annoyances, but, over time, they all add up...

The app I meant to replace

Couldn't I use google or any other, ... possibly, but, I am not convinced nor found it more useful to be honest. I still prefer that initial app.

This changed when I learnt about the Open Data coming from the Hong Kong government (https://data.gov.hk/en/). It is possible to get the same data, the ETA of buses and metro at a specific stop through an API.

So of course I wrote my own web page that focuses only on my route. It gathers all the information I need in seconds. With as little effort as possible I can determine which bus I should target. It also gives me more information. Despite knowing that the predicted time down to the second won't be accurate. I find it useful to know if it is expected at the beginning of the minute or at the end of it. 8:45:03 is quite different to 8:45:54 in the current context.

The mobile view interface

The great thing is, now. I can glance at all the information I need in a matter of seconds.

In 2 or 3 taps, I get access to the one way or the other as well as either the time of the bus or how long to wait. No need to do the calculation anymore ...

A bookmark to the page created later, and with a single tap on the screen I am able to get what I need.

A shortcut for quick access

Important note, if you are looking to make a bookmark shortcut, make sure to create an icon for your page,especially if it takes some time to load. By default, on iphone, a screenshot of the page will be displayed. If for any reason your page doesn't show anything within the expected time frame, you'll end up with a blank square... If your page often changes, the display will also often change... Make it more static with a dedicated icon. It's easy ...

Now, the display is still not the best. After few days of usage, I realized that I didn't need to have the data by bus line... All I need is to know when is the next bus that I can realistically use... A one column display, gathering all the data as one line per incoming bus would be much more efficient. Well, that's next on the list. When it becomes too annoying as is.

Freeing my code

Thomas — Mon, 30 May 2022 07:50:00 GMT

For most of the 2000s' until today, I have written code almost everyday in various capacity, first as a hobby, then part of my education to become a software engineer and then professionally. Most of that code is currently sitting on hard drives, not being used in any form.

I have decided to slowly release most of this, making it available to anyone to view or use. Some with the story behind, the context and maybe some tutorial / explanations, some without.

While I make it available, I don't intend to maintain or support any of this. So, the code will be provided as-is, maybe with few clean-ups and enhancements as I see fit, but not much more. Feel free to use it, reference it, learn from my mistakes, ... Hopefully that can be useful to some of you.

Note that most of the code will be old and will be using syntax techniques and ideas that are outdated or wouldn't be suitable today. Also, it worked in the context it was used in. For example, don't expect a piece of code I wrote for some personal automation to be usable in a widely available production environment without doing your own testing and due diligence.