Hi GD team, could I clarify how the datetime perio...
# gooddata-platform
j
Hi GD team, could I clarify how the datetime periods for the
"Previous period"
comparison is derived? Right now, my assumption is that the # days are being calculated between the two selected dates, thereafter the previous dates will be calculated by subtracting the # days from the two selected dates. For example, for a static period from "1/27/2024 00:00" to "8/27/2024 23:59", there are 214 days in between, thus the previous dates calculated would be from "6/27/2023 00:00" to "1/26/2024 23:59". However, the numbers seem to be slightly off, would appreciate any help to correct the logic used 🙏 cc @Alson Yap
i
Hello Jennifer, May I get the link to the report where you comparing the dates please? I believe this is caused because when absolute date filter is used, previous period is determined by taking exact number of days and shifting backwards the same amount of days. Thank you.
j
Here's the link to the test dashboard I just made: https://terrascope.cloud.gooddata.com/dashboards/#/workspace/a24a8c0081854783985fb2d26da1aa4d/dashboard/f5d4e27c-9cf3-44b7-9cd2-2df2181da2c1 Seems like the date range is applied universally to all visualisations so the selection will need to be changed respectively. Have also noticed that the numbers shown in the
"Previous period"
column is not consistent...? Sometimes it shows .88217 then another time .88214 for example.
Based on your explanation of how previous period is derived, it seems to be aligned with how I've calculated it?
j
Hi Jennifer, Thanks for the link. We are reviewing this now in the support team. I can confirm what Ismail has said to be true. However, I'm not sure exactly what I'm looking for in the test dashboard you have provided. Can you explain exactly what values are off based on the link to the dashboard you have provided? What numbers are you expecting to see there?
j
Hi @Joseph Heun, I’ve been digging into this further, and before we get into the
"Previous period"
discussion, I’d like to better understand how GD is executing these calculations. For some context, we have two workspaces, namely
internal-mtdev
(which returns unfiltered data, i.e., all records for GD to handle filtering) and
internal-test
(which returns data filtered by date before being sent to GD) that we are working on to filter the data based on the date before returning to GD. However, the numbers shown on both workspaces do not correspond to the actual sum in MongoDB (neither do they correspond to each other strangely). For example, for the period from "7/1/2024 00:00" to "12/31/2024 23:59", the sum should be
2717147324996.3745
. This value remains unchanged throughout all conversion and processing steps before it is returned to GD (logs from
internal-test
). I've also created dashboards for internal-mtdev and internal-test to test. Would really appreciate any help or insight into what might be causing this discrepancy, thank you! 🙏
m
Hi Jennifer, thanks for sharing the dashboards and details, regarding the issue. For now, I believe we should focus on resolving the fact value discrepancy before exploring the “Previous period” one. We did some checks and noticed that DEV and TEST values for
Activity_Emission_Volume_CO2e
are slightly different. We found that In DEV there are multiple datasets with the same fact ID, while TEST only has one, then we forced the use of the corresponding dataset with
USING EmissionActivities
, by creating a metric SELECT SUM(fact) USING dataset, but we obtained the same results. To troubleshoot further, could you help with a few things? 1. Try running the same aggregation over a small range (5–10 days) and, if possible, share the raw MongoDB values for that period. 2. Share the exact MongoDB query you’re using for the sum. 3. You mentioned TEST filters by date before sending to GD—could you clarify how that filtering is applied? I ask this because if I use ALL dates in the filter, I would expect the same period of time 07.01.2024 to 12.31.2024, but is not the case 4. Have you seen similar discrepancies on other facts, or only on
Activity_Emission_Volume_CO2e
? 5. For DEV, please double-check that the MongoDB query is pulling from the same dataset (
EmissionActivities
), since multiple datasets have the same fact ID there. This should help us pinpoint if the difference starts in MongoDB, FluxConnect, or within GD itself. Thank you for your cooperation and patience.
j
Hi @Mauricio Cabezas, thank you for sharing your findings thus far. Please find my responses below: 1. Will share separately the raw MongoDB values for one particular day (
12/27/2024
) 2. MongoDB query:
Copy code
[
  {
    $match: {
      activityDate: {
        $gte: ISODate("2024-12-27T00:00:00Z"),
        $lte: ISODate("2024-12-27T23:59:00Z")
      }
    }
  },
  {
    $group: {
      _id: null,
      totalEmissionVolume: {
        $sum: "$emissionVolume"
      }
    }
  }
]
3. The query object below is passed to the
find_arrow_all
method via the query parameter to apply the filtering.
Copy code
{'activityDate': {'$gte': datetime.datetime(2024, 12, 27, 0, 0), '$lte': datetime.datetime(2024, 12, 27, 23, 59)}}
4. Yes, I've tested with other facts and the decimal places are slightly off for them too. 5. I've double checked and the MongoDB query is indeed pulling from
EmissionActivities
. Here's the corresponding numbers respectively, have also edited the dashboards linked previously to reflect this date as well: Dev: 12,257,277,645.4401245 Test: 12,257,277,645.4401646 MongoDB: 12257277645.440123
j
Hi Jennifer, thank you for this information. I can see that this wasn't asked yet in this thread - can you please confirm what data type are you using for example for this
EmissionActivities
?
j
Hi @Julius Kos, in the data returned by the MongoDB query schema, the datatype used is
pyarrow.float64()
and the datatype in MongoDB is
Int32
or
Double
. In the LDM, the source type is
Numeric
.
@Mauricio Cabezas Have just dropped you an email with the raw MongoDB values, appreciate the help on this matter!
f
Hi @Jennifer Chue thank you very much for the additional details! I’m Francisco, with the L2 Support Team, and I have taken on the investigation of this issue. Mauricio sent me the raw values and I’ll run some tests on my end to try and understand what’s going on. I’ll update you as soon as I have some progress to share! 🙂
🙏 1
Hi @Jennifer Chue, thanks for hanging in there while I checked this over. My findings indicate that the difference we are seeing between the values in MongoDB and in GoodData is the result of rounding errors, caused by the use of the
Double
data type (which is a type of float). Floating-point representation isn’t able to exactly represent all decimal numbers, and when aggregating values with a lot of decimal digits (like in this case), there is some loss in precision. When aggregated across thousands of values, this can add up. I ran some tests with the raw data you sent me, and running a normal SUM on the values (I used
awk
) returns the exact value we can see in the
mtdev
workspace - which confirms that the imprecision on the floats is the problem. I even tried running more precise calculations, and got results much closer to what you got on MongoDB. This subject is explained much better in this MongoDB Article: Quick Start: BSON Data Types - Decimal128. It also proposes a solution for these very precise values: casting them as
decimal128
data type instead. This will ensure that the data isn’t processed as Floats (which is MongoDB’s default for this kind of numbers with decimal digits) and help resolve the imprecision problem. Incidentally, I believe it might also resolve the different values across workspaces - as even differences in the order of the sums, when performing floating-point arithmetic, can lead to (very) small differences in results - such as you noticed, beyond the 4th decimal digit.
j
Hi @Francisco Antunes, apologies for the delayed response. Thank you for the detailed explanation and for investigating this thoroughly. I’ll share your findings with the team and discuss how we would like to proceed from here.
âś… 1
f
Sure thing! Feel free to let me know if you have any more questions or comments.