How we did it? Well let me tell you!
- Google for postal codes and (major) cities per country
- Revolve the geolocation (latitude & longitude) of each geographical point
- Now generate new coordinates within a range of 100 meters up to 4.000 meters
- Reverse lookup these new coordinates and check whether it’s an actual address
- Continue generating new coordinates and reversing them until you have at least 10.000 unique (real) addresses including geolocations per country
- Google for popular baby names (per gender) in the past few years per country
- Google for surnames / last names per country
- Generate a combined list of first and last names (per gender)
- Use these full names to generate e-mail addresses with suffixes @hotmail/gmail/outlook.com
- Generate total random birth dates with ages between 21 and 65 (working population)
- Google for (mobile) telephone number prefixes per country
- Generate random values with at least 13 figures to use as suffixes for the telephone numbers
- Google for the average median income / family income per person per country
- Use these values (with a certain margin) to generate [YearlyIncomeInUSD]
- Select values between 3% and 12% of the [YearlyIncomeInUSD] to generate [CustomerLifetimeValueInUSD]
- Google for credit card number specifications for issuers MasterCard, Maestro and Visa
- Generate credit card numbers and only use these when they’re valid according to the Luhn-algorithm
- Also generate random CVV numbers and expiry dates between next month up to 10 years from now
- Generate four random numbers up to 254 and concatenate these into IP Addresses
- Generate random values for [NumberOfOrders]
And that’s it!
It is though really a lot of work and might take you a few weeks to gather enough geolocations for a decent size sample dataset with real addresses.
